[go: up one dir, main page]

CN116568822A - Methods and systems for improving the signal-to-noise ratio of DNA methylation partition assays - Google Patents

Methods and systems for improving the signal-to-noise ratio of DNA methylation partition assays Download PDF

Info

Publication number
CN116568822A
CN116568822A CN202180080053.7A CN202180080053A CN116568822A CN 116568822 A CN116568822 A CN 116568822A CN 202180080053 A CN202180080053 A CN 202180080053A CN 116568822 A CN116568822 A CN 116568822A
Authority
CN
China
Prior art keywords
nucleic acid
acid molecules
methylation
partition
dna
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202180080053.7A
Other languages
Chinese (zh)
Inventor
安德鲁·肯尼迪
威廉·J·格林利夫
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guardant Health Inc
Original Assignee
Guardant Health Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guardant Health Inc filed Critical Guardant Health Inc
Priority claimed from PCT/US2021/071648 external-priority patent/WO2022073011A1/en
Publication of CN116568822A publication Critical patent/CN116568822A/en
Pending legal-status Critical Current

Links

Landscapes

  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

In one aspect, the present disclosure provides a method for determining methylation status, the method comprising: providing a biological sample of nucleic acid molecules; partitioning at least a subset of the nucleic acid molecules in the biological sample into more than one partition group based on the methylation state of the nucleic acid molecules; digesting at least a subset of one or more of the more than one partition groups with at least one methylation-sensitive restriction enzyme; enriching at least a subset of the nucleic acid molecules in more than one partition group for a genomic region of interest, wherein at least a subset of the nucleic acid molecules comprises digested nucleic acid molecules in one or more of the partition groups; and determining the methylation state at one or more genetic loci of the nucleic acid molecules in at least one of the partition groups.

Description

改进DNA甲基化分区测定的信噪比的方法和系统Methods and systems for improving the signal-to-noise ratio of DNA methylation partitioning assays

相关申请的交叉引用CROSS-REFERENCE TO RELATED APPLICATIONS

本申请要求2020年9月30日提交的美国临时专利申请第63/086,000号和2020年10月23日提交的美国临时专利申请第63/105,183号的优先权的权益,为了所有目的将其中每一项通过引用以其整体并入本文。This application claims the benefit of priority to U.S. Provisional Patent Application No. 63/086,000, filed on September 30, 2020, and U.S. Provisional Patent Application No. 63/105,183, filed on October 23, 2020, each of which is incorporated herein by reference in its entirety for all purposes.

发明领域Field of the Invention

本公开内容提供了涉及分析核酸诸如DNA(诸如无细胞DNA)的组合物和方法。在一些实施方案中,无细胞DNA来自患有或怀疑患有癌症的受试者和/或无细胞DNA包括来自癌细胞的DNA。在一些实施方案中,基于核酸分子的甲基化状态将DNA分区为多于一个分区组,并且用至少一种甲基化敏感性限制性内切酶消化至少一个分区组的至少一个亚组。The present disclosure provides compositions and methods related to analyzing nucleic acids such as DNA (such as cell-free DNA). In some embodiments, the cell-free DNA is from a subject suffering from or suspected of having cancer and/or the cell-free DNA includes DNA from cancer cells. In some embodiments, the DNA is partitioned into more than one partition group based on the methylation state of the nucleic acid molecule, and at least one subset of at least one partition group is digested with at least one methylation-sensitive restriction endonuclease.

背景background

当前的癌症诊断测定无细胞核酸(例如,无细胞DNA或无细胞RNA)的方法可以集中于检测肿瘤相关的体细胞变异,包括单核苷酸变异(SNV)、拷贝数变异(CNV)、融合和插入/缺失(indel)(即,插入或缺失),它们都是用于液体活组织检查的主流靶。越来越多的证据表明,非序列修饰,如无细胞DNA中的甲基化状态和片段组信号(fragmentomic signal)能够提供关于无细胞DNA来源和疾病水平的信息。无细胞DNA的非序列修饰,当与体细胞突变调用组合时,能够产生比单独使用任何一种方法更全面的肿瘤状态评估。Current cancer diagnostic methods for measuring cell-free nucleic acids (e.g., cell-free DNA or cell-free RNA) can focus on detecting tumor-associated somatic variations, including single nucleotide variations (SNVs), copy number variations (CNVs), fusions, and insertions/deletions (indels) (i.e., insertions or deletions), which are all mainstream targets for liquid biopsies. Increasing evidence suggests that non-sequence modifications, such as methylation status and fragmentomic signals in cell-free DNA, can provide information about the source of cell-free DNA and disease levels. Non-sequence modifications of cell-free DNA, when combined with somatic mutation calls, can produce a more comprehensive assessment of tumor status than either method alone.

然而,由于无细胞DNA的低浓度和异质性,开发用于分析提供关于核碱基修饰的详细信息的液体活检材料的准确且灵敏的方法一直具有挑战性。分离和处理可用于液体活检程序中进一步分析的无细胞DNA的级分(fraction)是这些方法的重要部分。因此,需要用于分析无细胞DNA(例如,在液体活检中)的改进的方法和组合物。However, due to the low concentration and heterogeneity of cell-free DNA, it has been challenging to develop accurate and sensitive methods for analyzing liquid biopsy materials that provide detailed information about nucleobase modifications. Separating and processing fractions of cell-free DNA that can be used for further analysis in liquid biopsy procedures is an important part of these methods. Therefore, there is a need for improved methods and compositions for analyzing cell-free DNA (e.g., in liquid biopsies).

概述Overview

本公开内容旨在满足对无细胞DNA的改进的分析的需要和/或提供其他益处。本公开内容提供了用于分析核酸的方法、组合物和系统。因此,提供以下示例性实施方案。实施方案1是用于分析生物样品中的核酸分子的方法,该方法包括:The present disclosure is intended to meet the need for improved analysis of cell-free DNA and/or to provide other benefits. The present disclosure provides methods, compositions and systems for analyzing nucleic acids. Therefore, the following exemplary embodiments are provided. Embodiment 1 is a method for analyzing nucleic acid molecules in a biological sample, the method comprising:

a)基于核酸分子的甲基化状态,将生物样品中的核酸分子的至少一个亚组分区为多于一个分区组,其中生物样品包含甲基化的核酸分子和未甲基化的核酸分子;a) partitioning at least a subset of nucleic acid molecules in a biological sample into more than one partitioning group based on the methylation status of the nucleic acid molecules, wherein the biological sample comprises methylated nucleic acid molecules and unmethylated nucleic acid molecules;

b)用至少一种甲基化敏感性限制性内切酶消化多于一个分区组中的一个或更多个分区组的至少一个亚组;和b) digesting at least a subset of one or more of the more than one partition groups with at least one methylation-sensitive restriction endonuclease; and

c)确定分区组中的至少一个中的核酸分子的一个或更多个遗传基因座处的甲基化状态。c) determining the methylation status at one or more genetic loci of the nucleic acid molecules in at least one of the partitioning groups.

实施方案2是用于确定核酸分子的甲基化状态的方法,该方法包括:Embodiment 2 is a method for determining the methylation state of a nucleic acid molecule, the method comprising:

a)提供核酸分子的生物样品,其中核酸分子包括甲基化的核酸分子和未甲基化的核酸分子;a) providing a biological sample of nucleic acid molecules, wherein the nucleic acid molecules include methylated nucleic acid molecules and unmethylated nucleic acid molecules;

b)基于核酸分子的甲基化状态将生物样品中的核酸分子的至少一个亚组分区为多于一个分区组;b) partitioning at least a subset of nucleic acid molecules in the biological sample into more than one partitioning group based on the methylation status of the nucleic acid molecules;

c)用至少一种甲基化敏感性限制性内切酶消化多于一个分区组中的一个或更多个分区组的至少一个亚组;c) digesting at least a subset of one or more of the more than one partition groups with at least one methylation-sensitive restriction endonuclease;

d)针对感兴趣的基因组区域对多于一个分区组中的核酸分子的至少一个亚组进行富集,其中核酸分子的至少一个亚组包含一个或更多个分区组中的消化的核酸分子;和d) enriching at least a subset of nucleic acid molecules in more than one partitioning group for a genomic region of interest, wherein at least a subset of nucleic acid molecules comprises digested nucleic acid molecules in one or more partitioning groups; and

e)确定分区组中的至少一个中的核酸分子的一个或更多个遗传基因座处的甲基化状态。e) determining the methylation status at one or more genetic loci of the nucleic acid molecules in at least one of the partitioning groups.

实施方案3是分析生物样品的核酸分子的方法,该方法包括:Embodiment 3 is a method for analyzing nucleic acid molecules in a biological sample, the method comprising:

a)基于核酸分子的甲基化状态,将生物样品中的核酸分子的至少一个亚组分区为多于一个分区组,其中生物样品包含甲基化的核酸分子和未甲基化的核酸分子,并且多于一个分区组包括第一分区组和第二分区组,其中相对于第二分区组,甲基化的核酸分子在第一分区组中被过度代表;a) partitioning at least a subset of nucleic acid molecules in a biological sample into more than one partitioning group based on the methylation status of the nucleic acid molecules, wherein the biological sample comprises methylated nucleic acid molecules and unmethylated nucleic acid molecules, and the more than one partitioning group comprises a first partitioning group and a second partitioning group, wherein methylated nucleic acid molecules are over-represented in the first partitioning group relative to the second partitioning group;

b)用至少一种甲基化敏感性限制性内切酶消化多于一个分区组中的第一分区组的至少一个亚组;和b) digesting at least a subset of a first partition of the more than one partitions with at least one methylation-sensitive restriction endonuclease; and

c)从第一分区组的至少一部分捕获包含表观遗传靶区的第一靶区组,并且从第二分区组的至少一部分捕获包含表观遗传靶区的第二靶区组。c) capturing a first set of target regions comprising the epigenetic target region from at least a portion of the first partitioning set, and capturing a second set of target regions comprising the epigenetic target region from at least a portion of the second partitioning set.

实施方案4是根据实施方案3的方法,其中捕获第一靶区组包括使第一分区组中的DNA与第一靶特异性探针组接触,并且捕获第二靶区组包括使第二分区组中的DNA与第二靶特异性探针组接触。Embodiment 4 is a method according to embodiment 3, wherein capturing the first target region set comprises contacting the DNA in the first partition set with a first target-specific probe set, and capturing the second target region set comprises contacting the DNA in the second partition set with a second target-specific probe set.

实施方案5是根据实施方案3或4的方法,该方法还包括确定分区组或靶区组中的至少一个中的核酸分子的一个或更多个遗传基因座处的甲基化状态。Embodiment 5 is a method according to embodiment 3 or 4, further comprising determining the methylation status at one or more genetic loci of the nucleic acid molecules in at least one of the partitioning group or the target group.

实施方案6是根据上述实施方案中任一项的方法,其中感兴趣的基因组区域、第一靶区组和/或第二靶区组包含序列可变靶区。Embodiment 6 is a method according to any one of the above embodiments, wherein the genomic region of interest, the first target region group and/or the second target region group comprises sequence variable target regions.

实施方案7是根据上述实施方案中任一项的方法,该方法还包括在消化步骤之前,将一个或更多个衔接子附接到多于一个分区组中的至少一部分核酸分子的至少一端。Embodiment 7 is a method according to any one of the above embodiments, further comprising attaching one or more adaptors to at least one end of at least a portion of the nucleic acid molecules in more than one partitioning group before the digestion step.

实施方案8是用于确定核酸分子的甲基化状态的方法,该方法包括:Embodiment 8 is a method for determining the methylation state of a nucleic acid molecule, the method comprising:

a)提供核酸分子的生物样品,其中核酸分子包括甲基化的核酸分子和未甲基化的核酸分子;a) providing a biological sample of nucleic acid molecules, wherein the nucleic acid molecules include methylated nucleic acid molecules and unmethylated nucleic acid molecules;

b)基于核酸分子的甲基化状态将生物样品中的核酸分子的至少一个亚组分区为多于一个分区组;b) partitioning at least a subset of nucleic acid molecules in the biological sample into more than one partitioning group based on the methylation status of the nucleic acid molecules;

c)将一个或更多个衔接子附接到多于一个分区组中的核酸分子的至少一端;c) attaching one or more adaptors to at least one end of the nucleic acid molecules in more than one partitioning group;

d)用至少一种甲基化敏感性限制性内切酶消化多于一个分区组中的一个或更多个分区组的至少一个亚组;d) digesting at least a subset of one or more of the more than one partition groups with at least one methylation-sensitive restriction endonuclease;

e)针对感兴趣的基因组区域对多于一个分区组中的核酸分子的至少一个亚组进行富集;其中核酸分子的至少一个亚组包含一个或更多个分区组中的消化的核酸分子;和e) enriching at least a subset of nucleic acid molecules in more than one partition grouping for a genomic region of interest; wherein at least a subset of nucleic acid molecules comprises digested nucleic acid molecules in one or more partition groups; and

f)确定分区组中的至少一个中的核酸分子的一个或更多个遗传基因座处的甲基化状态。f) determining the methylation status at one or more genetic loci of the nucleic acid molecules in at least one of the partitioning groups.

实施方案9是根据实施方案7或8的方法,其中将衔接子附接到多于一个分区组中的至少一部分核酸分子的两端。Embodiment 9 is a method according to embodiment 7 or 8, wherein adaptors are attached to both ends of at least a portion of the nucleic acid molecules in more than one partitioning group.

实施方案10是根据实施方案1的方法,该方法还包括,在c)之前,针对感兴趣的基因组区域对多于一个分区组中的核酸分子的至少一个亚组进行富集,其中核酸分子的至少一个亚组包含一个或更多个分区组中的消化的核酸分子。Embodiment 10 is a method according to embodiment 1, which further comprises, before c), enriching at least one subset of nucleic acid molecules in more than one partition group for the genomic region of interest, wherein at least one subset of nucleic acid molecules comprises digested nucleic acid molecules in one or more partition groups.

实施方案11是根据前述实施方案中任一项的方法,该方法还包括检测生物样品中癌症的存在或不存在。Embodiment 11 is a method according to any of the preceding embodiments, further comprising detecting the presence or absence of cancer in the biological sample.

实施方案12是根据上述实施方案中任一项的方法,该方法还包括确定生物样品中的癌症水平。Embodiment 12 is a method according to any of the above embodiments, further comprising determining the level of cancer in the biological sample.

实施方案13是根据上述实施方案中任一项的方法,其中确定甲基化状态包括对消化的核酸分子的至少一个亚组进行测序。Embodiment 13 is a method according to any of the above embodiments, wherein determining the methylation status comprises sequencing at least a subset of the digested nucleic acid molecules.

实施方案14是根据实施方案7-13中任一项的方法,其中一个或更多个衔接子包含至少一个标签。Embodiment 14 is a method according to any one of embodiments 7-13, wherein one or more adaptors comprise at least one tag.

实施方案15是根据上述实施方案中任一项的方法,其中甲基化敏感性限制性内切酶选择性地消化在甲基化敏感性限制性内切酶的识别位点处未甲基化的核酸分子。Embodiment 15 is a method according to any one of the above embodiments, wherein the methylation-sensitive restriction endonuclease selectively digests nucleic acid molecules that are unmethylated at the recognition site of the methylation-sensitive restriction endonuclease.

实施方案16是根据上述实施方案中任一项的方法,其中在消化步骤之后对至少一部分核酸分子进行扩增和/或测序,并且被甲基化敏感性限制性内切酶消化的核酸分子不被扩增和/或不被测序。Embodiment 16 is a method according to any one of the above embodiments, wherein at least a portion of the nucleic acid molecules are amplified and/or sequenced after the digestion step, and the nucleic acid molecules digested by the methylation-sensitive restriction endonuclease are not amplified and/or sequenced.

实施方案17是根据上述实施方案中任一项的方法,该方法包括用至少两种甲基化敏感性限制性内切酶消化多于一个分区组中的一个或更多个分区组的至少一个亚组。Embodiment 17 is a method according to any one of the above embodiments, comprising digesting at least a subset of one or more of the more than one partition groups with at least two methylation-sensitive restriction endonucleases.

实施方案18是根据实施方案17的方法,其中至少两种甲基化敏感性限制性内切酶由两种甲基化敏感性限制性内切酶组成。Embodiment 18 is a method according to embodiment 17, wherein the at least two methylation-sensitive restriction endonucleases consist of two methylation-sensitive restriction endonucleases.

实施方案19是根据实施方案17或18的方法,其中甲基化敏感性限制性内切酶包括BstUI和HpaII或由BstUI和HpaII组成。Embodiment 19 is a method according to embodiment 17 or 18, wherein the methylation-sensitive restriction endonuclease comprises or consists of BstUI and HpaII.

实施方案20是根据实施方案17或18的方法,其中甲基化敏感性限制性内切酶包括HhaI和AccII或由HhaI和AccII组成。Embodiment 20 is a method according to embodiment 17 or 18, wherein the methylation-sensitive restriction endonuclease comprises or consists of HhaI and AccII.

实施方案21是根据实施方案17或18的方法,其中至少两种甲基化敏感性限制性内切酶包括三种甲基化敏感性限制性内切酶或由三种甲基化敏感性限制性内切酶组成。Embodiment 21 is a method according to embodiment 17 or 18, wherein the at least two methylation-sensitive restriction endonucleases include or consist of three methylation-sensitive restriction endonucleases.

实施方案22是根据实施方案17或21的方法,其中甲基化敏感性限制性内切酶包括BstUI、HpaII和Hin6I或由BstUI、HpaII和Hin6I组成。Embodiment 22 is a method according to embodiment 17 or 21, wherein the methylation-sensitive restriction endonuclease comprises or consists of BstUI, HpaII and Hin6I.

实施方案23是根据上述实施方案中任一项的方法,其中甲基化敏感性限制性内切酶选自由以下组成的组:AatII、AccII、AciI、Aor13HI、Aor15HI、BspT104I、BssHII、BstUI、Cfr10I、ClaI、CpoI、Eco52I、HaeII、HapII、HhaI、Hin6I、HpaII、HpyCH4IV、MluI、MspI、NaeI、NotI、NruI、NsbI、PmaCI、Psp1406I、PvuI、SacII、SalI、SmaI和SnaBI。Embodiment 23 is a method according to any one of the above embodiments, wherein the methylation-sensitive restriction endonuclease is selected from the group consisting of: AatII, AccII, AciI, Aor13HI, Aor15HI, BspT104I, BssHII, BstUI, Cfr10I, ClaI, CpoI, Eco52I, HaeII, HapII, HhaI, Hin6I, HpaII, HpyCH4IV, MluI, MspI, NaeI, NotI, NruI, NsbI, PmaCI, Psp1406I, PvuI, SacII, SalI, SmaI and SnaBI.

实施方案24是根据实施方案7-23中任一项的方法,其中一个或更多个衔接子耐受甲基化敏感性限制性内切酶的消化。Embodiment 24 is a method according to any one of embodiments 7-23, wherein one or more adaptors are resistant to digestion by a methylation-sensitive restriction endonuclease.

实施方案25是根据实施方案24的方法,其中一个或更多个耐受性衔接子包含一个或更多个甲基化核苷酸,任选地其中甲基化核苷酸包括5-甲基胞嘧啶和/或5-羟甲基胞嘧啶。Embodiment 25 is a method according to embodiment 24, wherein one or more tolerant adaptors comprise one or more methylated nucleotides, optionally wherein the methylated nucleotides comprise 5-methylcytosine and/or 5-hydroxymethylcytosine.

实施方案26是根据实施方案24的方法,其中一个或更多个耐受性衔接子包含一个或更多个耐受甲基化敏感性限制性内切酶的核苷酸类似物。Embodiment 26 is a method according to embodiment 24, wherein one or more tolerant adaptors comprise one or more nucleotide analogs that are resistant to a methylation-sensitive restriction endonuclease.

实施方案27是根据实施方案24的方法,其中一个或更多个耐受性衔接子包含不被甲基化敏感性限制性内切酶识别的核苷酸序列。Embodiment 27 is a method according to embodiment 24, wherein one or more tolerant adaptors comprise a nucleotide sequence that is not recognized by a methylation-sensitive restriction endonuclease.

实施方案28是根据实施方案14-27中任一项的方法,其中标签包含分子条形码。Embodiment 28 is a method according to any one of embodiments 14-27, wherein the tag comprises a molecular barcode.

实施方案29是根据实施方案28的方法,其中与多于一个分区组中的第一分区组中的核酸分子附接的分子条形码不同于与多于一个分区组中的第二分区组中的核酸分子附接的分子条形码。Embodiment 29 is a method according to embodiment 28, wherein the molecular barcode attached to the nucleic acid molecules in a first one of the more than one partitions is different from the molecular barcode attached to the nucleic acid molecules in a second one of the more than one partitions.

实施方案30是根据实施方案1-29的方法,其中对多于一个分区组中的第一分区组和多于一个分区组中的第二分区组差异性加标签。Embodiment 30 is a method according to embodiments 1-29, wherein a first partition group among the more than one partition groups and a second partition group among the more than one partition groups are differentially labeled.

实施方案31是根据实施方案30的方法,其中将第一分区标签与第一分区组中的核酸分子附接,并且将第二分区标签与第二分区组中的核酸分子附接。Embodiment 31 is a method according to embodiment 30, wherein a first partition tag is attached to the nucleic acid molecules in the first partition grouping, and a second partition tag is attached to the nucleic acid molecules in the second partition grouping.

实施方案32是根据上述实施方案中任一项的方法,其中甲基化的核酸分子包括5-甲基胞嘧啶和/或5-羟甲基胞嘧啶。Embodiment 32 is a method according to any one of the above embodiments, wherein the methylated nucleic acid molecule comprises 5-methylcytosine and/or 5-hydroxymethylcytosine.

实施方案33是根据实施方案13-32中任一项的方法,其中测序由下一代测序仪进行。Embodiment 33 is a method according to any one of embodiments 13-32, wherein sequencing is performed by a next generation sequencer.

实施方案34是根据前述实施方案中任一项的方法,其中生物样品选自由以下组成的组:DNA样品、RNA样品、多核苷酸样品、无细胞DNA样品和无细胞RNA样品。Embodiment 34 is a method according to any one of the preceding embodiments, wherein the biological sample is selected from the group consisting of a DNA sample, an RNA sample, a polynucleotide sample, a cell-free DNA sample, and a cell-free RNA sample.

实施方案35是根据前述实施方案中任一项的方法,其中生物样品是无细胞DNA样品。Embodiment 35 is a method according to any one of the preceding embodiments, wherein the biological sample is a cell-free DNA sample.

实施方案36是根据实施方案35的方法,其中无细胞DNA在1ng和500ng之间。Embodiment 36 is a method according to embodiment 35, wherein the cell-free DNA is between 1 ng and 500 ng.

实施方案37是根据前述实施方案中任一项的方法,其中分区包括基于核酸分子与优先结合包含甲基化核苷酸的核酸分子的结合剂的不同结合亲和力对核酸分子进行分区。Embodiment 37 is a method according to any one of the preceding embodiments, wherein partitioning comprises partitioning the nucleic acid molecules based on their different binding affinities to a binding agent that preferentially binds to nucleic acid molecules comprising methylated nucleotides.

实施方案38是根据实施方案37的方法,其中结合剂是甲基结合结构域(MBD)蛋白。Embodiment 38 is a method according to embodiment 37, wherein the binding agent is a methyl binding domain (MBD) protein.

实施方案39是根据实施方案37的方法,其中结合剂是对一种或更多种甲基化核苷酸碱基特异性的抗体。Embodiment 39 is a method according to embodiment 37, wherein the binding agent is an antibody specific for one or more methylated nucleotide bases.

实施方案40是根据实施方案2-39中任一项的方法,其中感兴趣的基因组区域或表观遗传靶区包含用于癌症检测的差异性甲基化区域。Embodiment 40 is a method according to any one of embodiments 2-39, wherein the genomic region or epigenetic target region of interest comprises a differentially methylated region for cancer detection.

实施方案41是根据实施方案13-40中任一项的方法,该方法还包括在测序之前对至少一部分核酸分子进行扩增。Embodiment 41 is a method according to any one of embodiments 13-40, further comprising amplifying at least a portion of the nucleic acid molecule prior to sequencing.

实施方案42是根据实施方案41的方法,其中在扩增中使用的引物包含至少一种样品索引。Embodiment 42 is a method according to embodiment 41, wherein the primers used in amplification comprise at least one sample index.

实施方案43是根据上述实施方案中任一项的方法,其中一个或更多个遗传基因座包括多于一个遗传基因座。Embodiment 43 is a method according to any of the above embodiments, wherein the one or more genetic loci include more than one genetic locus.

实施方案44是根据实施方案43的方法,其中所述多于一个遗传基因座包含一个或更多个基因组区域。Embodiment 44 is a method according to embodiment 43, wherein the more than one genetic loci comprise one or more genomic regions.

在前述实施方案中的任一项中,可以从一个或更多个或每个分区组中捕获表观遗传靶区。方法中的任一种还可以包括例如通过测序或定量PCR来对捕获的表观遗传靶区进行定量。在一些实施方案中,该方法包括从第一分区组的至少一部分捕获包含表观遗传靶区的第一靶区组,并且从第二分区组的至少一部分捕获包含表观遗传靶区的第二靶区组。第一和第二靶区组可以相同或不同。In any of the foregoing embodiments, epigenetic target regions can be captured from one or more or each partition group. Any of the methods can also include quantifying the captured epigenetic target regions, for example by sequencing or quantitative PCR. In some embodiments, the method includes capturing a first target region group comprising an epigenetic target region from at least a portion of the first partition group, and capturing a second target region group comprising an epigenetic target region from at least a portion of the second partition group. The first and second target region groups can be the same or different.

表观遗传靶区可以包括高甲基化可变靶区组,例如,包括在至少一种类型的组织中具有比来自健康受试者的无细胞DNA中的甲基化程度高的甲基化程度的区域。方法中的任一种还可以包括至少部分地基于高甲基化可变靶区组中的区域的序列或量来确定癌症的存在、不存在或可能性。方法中的任一种还可以包括至少部分地基于高甲基化可变靶区组中的区域的序列或量来对样品中的肿瘤DNA进行定量。The epigenetic target regions may include a set of hypermethylated variable target regions, for example, including regions having a higher degree of methylation in at least one type of tissue than in cell-free DNA from healthy subjects. Any of the methods may also include determining the presence, absence, or likelihood of cancer based at least in part on the sequence or amount of the regions in the set of hypermethylated variable target regions. Any of the methods may also include quantifying tumor DNA in a sample based at least in part on the sequence or amount of the regions in the set of hypermethylated variable target regions.

表观遗传靶区可以包括低甲基化可变靶区组,例如,包括在至少一种类型的组织中具有比来自健康受试者的无细胞DNA中的甲基化程度低的甲基化程度的区域。方法中的任一种还可以包括至少部分地基于低甲基化可变靶区组中的区域的序列或量来确定癌症的存在、不存在或可能性。方法中的任一种还可以包括至少部分地基于低甲基化可变靶区组中的区域的序列或量来对样品中的肿瘤DNA进行定量。The epigenetic target regions may include a set of hypomethylated variable target regions, for example, including regions having a lower degree of methylation in at least one type of tissue than in cell-free DNA from healthy subjects. Any of the methods may also include determining the presence, absence, or likelihood of cancer based at least in part on the sequence or amount of the regions in the set of hypomethylated variable target regions. Any of the methods may also include quantifying tumor DNA in the sample based at least in part on the sequence or amount of the regions in the set of hypomethylated variable target regions.

在前述实施方案中的任一项中,可以从一个或更多个或每个分区组中捕获序列可变靶区。方法中的任一种还可以包括例如通过测序或定量PCR来对捕获的表观遗传靶区进行定量。可以将对应于序列可变靶区组的DNA分子测序到比对应于表观遗传靶区组的DNA分子更深的测序深度。In any of the foregoing embodiments, sequence variable target regions can be captured from one or more or each partition group. Any of the methods can also include quantifying the captured epigenetic target regions, for example, by sequencing or quantitative PCR. The DNA molecules corresponding to the sequence variable target region group can be sequenced to a deeper sequencing depth than the DNA molecules corresponding to the epigenetic target region group.

在前述实施方案中的任一项中,捕获靶区组可以包括使待捕获的DNA与靶特异性探针组接触,由此形成靶特异性探针和DNA的复合物。捕获还可以包括将复合物与未与靶特异性探针结合的DNA分离,从而提供捕获的DNA。In any of the foregoing embodiments, capturing the target region group may include contacting the DNA to be captured with the target-specific probe group, thereby forming a complex of the target-specific probe and the DNA. Capturing may also include separating the complex from the DNA not bound to the target-specific probe, thereby providing captured DNA.

在前述实施方案中的任一项中,可以在测序步骤之前扩增DNA,或者可以在捕获步骤之前扩增DNA。In any of the foregoing embodiments, the DNA may be amplified prior to the sequencing step, or the DNA may be amplified prior to the capture step.

在前述实施方案中的任一项中,DNA可以包括从体液中获得的DNA,任选地其中体液是血浆、尿液、淋巴或脊髓液。例如,DNA可以包括从测试受试者获得的无细胞DNA(cfDNA)。In any of the foregoing embodiments, the DNA may include DNA obtained from a body fluid, optionally wherein the body fluid is plasma, urine, lymph, or spinal fluid. For example, the DNA may include cell-free DNA (cfDNA) obtained from a test subject.

在前述实施方案中的任一项中,甲基化敏感性限制性内切酶可以裂解未甲基化的CpG序列。在前述实施方案中的任一项中,甲基化敏感性限制性内切酶可以是以下中的一种或更多种:AatII、AccII、AciI、Aor13HI、Aor15HI、BspT104I、BssHII、BstUI、Cfr10I、ClaI、CpoI、Eco52I、HaeII、HapII、HhaI、Hin6I、HpaII、HpyCH4IV、MluI、NaeI、NotI、NruI、NsbI、PmaCI、Psp1406I、PvuI、SacII、SalI、SmaI和SnaBI。In any one of the aforementioned embodiments, the methylation-sensitive restriction endonuclease can cleave unmethylated CpG sequences. In any one of the aforementioned embodiments, the methylation-sensitive restriction endonuclease can be one or more of the following: AatII, AccII, AciI, Aor13HI, Aor15HI, BspT104I, BssHII, BstUI, Cfr10I, ClaI, CpoI, Eco52I, HaeII, HapII, HhaI, Hin6I, HpaII, HpyCH4IV, MluI, NaeI, NotI, NruI, NsbI, PmaCI, Psp1406I, PvuI, SacII, SalI, SmaI, and SnaBI.

在前述实施方案中的任一项中,该方法还可以包括确定受试者患有癌症的可能性。例如,其中测序可以生成多于一个测序读段;并且该方法还可以包括将多于一个序列读段映射到一个或更多个参考序列以生成映射的序列读段,以及处理对应于序列可变靶区组和表观遗传靶区组的映射的序列读段以确定受试者患有癌症的可能性。In any of the foregoing embodiments, the method may further include determining the likelihood that the subject has cancer. For example, wherein sequencing may generate more than one sequencing read; and the method may further include mapping the more than one sequence read to one or more reference sequences to generate mapped sequence reads, and processing the mapped sequence reads corresponding to the sequence variable target region group and the epigenetic target region group to determine the likelihood that the subject has cancer.

在前述实施方案中的任一项中,测试受试者可能已被先前诊断为患有癌症并接受了一种或更多种先前癌症治疗,任选地其中在一种或更多种先前癌症治疗之后的一个或更多个预选时间点获得cfDNA,并对cfDNA分子的捕获组进行测序,由此产生序列信息组。这样的方法还可以包括使用序列信息组在预选时间点检测来源于或源自肿瘤细胞的DNA的存在或不存在。这样的方法还可以包括确定癌症复发评分,癌症复发评分指示来源于或源自测试受试者的肿瘤细胞的DNA的存在或不存在,该方法任选地还包括基于癌症复发评分确定癌症复发状态,其中当癌症复发评分被确定为处于或高于预定阈值时,将测试受试者的癌症复发状态确定为处于癌症复发风险,或者当癌症复发评分低于预定阈值时,将测试受试者的癌症复发状态确定为处于较低的癌症复发风险。这样的方法还可以包括将测试受试者的癌症复发评分与预定癌症复发阈值进行比较,其中当癌症复发评分高于癌症复发阈值时,将测试受试者分类为后续癌症治疗的候选者,或者当癌症复发评分低于癌症复发阈值时,将测试受试者分类为后续癌症治疗的非候选者。In any of the foregoing embodiments, the test subject may have been previously diagnosed with cancer and received one or more previous cancer treatments, optionally wherein cfDNA is obtained at one or more preselected time points after one or more previous cancer treatments, and the capture group of cfDNA molecules is sequenced, thereby generating a sequence information group. Such a method may also include using the sequence information group to detect the presence or absence of DNA derived from or derived from tumor cells at a preselected time point. Such a method may also include determining a cancer recurrence score, the cancer recurrence score indicating the presence or absence of DNA derived from or derived from tumor cells of the test subject, the method optionally also includes determining a cancer recurrence status based on the cancer recurrence score, wherein when the cancer recurrence score is determined to be at or above a predetermined threshold, the cancer recurrence status of the test subject is determined to be at risk of cancer recurrence, or when the cancer recurrence score is below a predetermined threshold, the cancer recurrence status of the test subject is determined to be at a lower risk of cancer recurrence. Such a method may further include comparing the test subject's cancer recurrence score to a predetermined cancer recurrence threshold, wherein when the cancer recurrence score is above the cancer recurrence threshold, classifying the test subject as a candidate for subsequent cancer treatment, or when the cancer recurrence score is below the cancer recurrence threshold, classifying the test subject as a non-candidate for subsequent cancer treatment.

在另一方面,本公开内容提供了系统,该系统包括控制器,该控制器包含或能够访问包含非暂时性计算机可执行指令的计算机可读介质,非暂时性计算机可执行指令在由至少一个电子处理器执行时,执行包括以下的方法:(a)基于核酸分子的甲基化状态,将生物样品中的核酸分子的至少一个亚组分区为多于一个分区组,其中生物样品包括甲基化的核酸分子和未甲基化的核酸分子;(b)用至少一种甲基化敏感性限制性内切酶消化多于一个分区组中的一个或更多个分区组的至少一个亚组;和(c)确定分区组中的至少一个中的核酸分子的一个或更多个遗传基因座处的甲基化状态。在一些实施方案中,该方法还包括,在(c)之前,针对感兴趣的基因组区域对多于一个分区组中的核酸分子的至少一个亚组进行富集,其中核酸分子的至少一个亚组包含一个或更多个分区组中的消化的核酸分子。在一些实施方案中,该方法还包括,在(b)之前,将一个或更多个衔接子附接到多于一个分区组中的核酸分子的至少一端。在一些实施方案中,该方法还包括,在确定甲基化状态之前,富集多于一个分区组中的核酸分子的至少一部分;其中核酸分子的至少一部分包括一个或更多个分区组中的消化的核酸分子。In another aspect, the present disclosure provides a system comprising a controller comprising or having access to a computer-readable medium comprising non-transitory computer-executable instructions, which, when executed by at least one electronic processor, perform a method comprising: (a) partitioning at least one subset of nucleic acid molecules in a biological sample into more than one partition group based on the methylation state of the nucleic acid molecules, wherein the biological sample comprises methylated nucleic acid molecules and unmethylated nucleic acid molecules; (b) digesting at least one subset of one or more of the more than one partition groups with at least one methylation-sensitive restriction endonuclease; and (c) determining the methylation state of one or more genetic loci of the nucleic acid molecules in at least one of the partition groups. In some embodiments, the method further comprises, prior to (c), enriching at least one subset of the nucleic acid molecules in more than one partition group for a genomic region of interest, wherein at least one subset of the nucleic acid molecules comprises digested nucleic acid molecules in one or more partition groups. In some embodiments, the method further comprises, prior to (b), attaching one or more adapters to at least one end of the nucleic acid molecules in more than one partition group. In some embodiments, the method further comprises, prior to determining the methylation status, enriching at least a portion of the nucleic acid molecules in more than one partitioning group; wherein the at least a portion of the nucleic acid molecules comprises digested nucleic acid molecules in one or more partitioning groups.

在另一方面,本公开内容提供了系统,该系统包括控制器,该控制器包含或能够访问包含非暂时性计算机可执行指令的计算机可读介质,非暂时性计算机可执行指令在由至少一个电子处理器执行时,执行包括以下的方法:a)提供核酸分子的生物样品,其中核酸分子包括甲基化的核酸分子和未甲基化的核酸分子;(b)基于核酸分子的甲基化状态将生物样品中的核酸分子的至少一个亚组分区为多于一个分区组;(c)用至少一种甲基化敏感性限制性内切酶消化多于一个分区组中的一个或更多个分区组的至少一个亚组;(d)针对感兴趣的基因组区域对多于一个分区组中的核酸分子的至少一个亚组进行富集,其中核酸分子的至少一个亚组包含一个或更多个分区组中的消化的核酸分子;和(e)确定分区组中的至少一个中的核酸分子的一个或更多个遗传基因座处的甲基化状态。在一些实施方案中,该方法还包括,在(b)之前,将一个或更多个衔接子附接到多于一个分区组中的核酸分子的至少一端。In another aspect, the present disclosure provides a system comprising a controller comprising or having access to a computer-readable medium comprising non-transitory computer-executable instructions, which, when executed by at least one electronic processor, perform a method comprising: a) providing a biological sample of nucleic acid molecules, wherein the nucleic acid molecules include methylated nucleic acid molecules and unmethylated nucleic acid molecules; (b) partitioning at least one subset of nucleic acid molecules in the biological sample into more than one partition group based on the methylation state of the nucleic acid molecules; (c) digesting at least one subset of one or more of the partition groups with at least one methylation-sensitive restriction endonuclease; (d) enriching at least one subset of nucleic acid molecules in more than one partition group for a genomic region of interest, wherein at least one subset of nucleic acid molecules includes digested nucleic acid molecules in one or more partition groups; and (e) determining the methylation state of one or more genetic loci of nucleic acid molecules in at least one of the partition groups. In some embodiments, the method further comprises, prior to (b), attaching one or more adapters to at least one end of the nucleic acid molecules in more than one partition group.

在另一方面,本公开内容提供了系统,该系统包括控制器,该控制器包含或能够访问包含非暂时性计算机可执行指令的计算机可读介质,非暂时性计算机可执行指令在由至少一个电子处理器执行时,执行包括以下的方法:a)提供核酸分子的生物样品,其中核酸分子包括甲基化的核酸分子和未甲基化的核酸分子;(b)基于核酸分子的甲基化状态将生物样品中的核酸分子的至少一个亚组分区为多于一个分区组;(c)将一个或更多个衔接子附接到多于一个分区组中的核酸分子的至少一端;(d)用至少一种甲基化敏感性限制性内切酶消化多于一个分区组中的一个或更多个分区组的至少一个亚组;(e)针对感兴趣的基因组区域对多于一个分区组中的核酸分子的至少一个亚组进行富集;其中核酸分子的至少一个亚组包含一个或更多个分区组中的消化的核酸分子;和(f)确定分区组中的至少一个中的核酸分子的一个或更多个遗传基因座处的甲基化状态。In another aspect, the present disclosure provides a system comprising a controller comprising or having access to a computer-readable medium comprising non-transitory computer-executable instructions which, when executed by at least one electronic processor, perform a method comprising: (a) providing a biological sample of nucleic acid molecules, wherein the nucleic acid molecules include methylated nucleic acid molecules and unmethylated nucleic acid molecules; (b) partitioning at least a subset of the nucleic acid molecules in the biological sample into more than one partition group based on the methylation status of the nucleic acid molecules; (c) attaching one or more adapters to at least one end of the nucleic acid molecules in more than one partition group; (d) digesting at least one subset of one or more of the more than one partition groups with at least one methylation-sensitive restriction endonuclease; (e) enriching at least one subset of the nucleic acid molecules in more than one partition group for a genomic region of interest; wherein at least one subset of the nucleic acid molecules includes digested nucleic acid molecules in one or more partition groups; and (f) determining the methylation status at one or more genetic loci of the nucleic acid molecules in at least one of the partition groups.

在另一方面,本公开提供了用于确定核酸分子的甲基化状态的方法,该方法包括:(a)提供核酸分子的生物样品,其中核酸分子包括甲基化的核酸分子和未甲基化的核酸分子;(b)基于核酸分子的甲基化状态将生物样品中的核酸分子的至少一个亚组分区为多于一个分区组;(c)将一个或更多个衔接子附接到多于一个分区组中的核酸分子的至少一端;(d)用至少一种甲基化敏感性限制性内切酶消化多于一个分区组中的一个或更多个分区组的至少一个亚组;(e)针对感兴趣的基因组区域对多于一个分区组中的核酸分子的至少一个亚组进行富集;其中核酸分子的至少一个亚组包含一个或更多个分区组中的消化的核酸分子;和(f)确定分区组中的至少一个中的核酸分子的一个或更多个遗传基因座处的甲基化状态。In another aspect, the present disclosure provides a method for determining the methylation state of a nucleic acid molecule, the method comprising: (a) providing a biological sample of nucleic acid molecules, wherein the nucleic acid molecules include methylated nucleic acid molecules and unmethylated nucleic acid molecules; (b) partitioning at least one subset of the nucleic acid molecules in the biological sample into more than one partition group based on the methylation state of the nucleic acid molecules; (c) attaching one or more adapters to at least one end of the nucleic acid molecules in more than one partition group; (d) digesting at least one subset of one or more of the more than one partition groups with at least one methylation-sensitive restriction endonuclease; (e) enriching at least one subset of the nucleic acid molecules in more than one partition group for a genomic region of interest; wherein at least one subset of the nucleic acid molecules comprises digested nucleic acid molecules in one or more partition groups; and (f) determining the methylation state at one or more genetic loci of the nucleic acid molecules in at least one of the partition groups.

在另一方面,本公开提供了用于确定核酸分子的甲基化状态的方法,该方法包括:(a)提供核酸分子的生物样品,其中核酸分子包括甲基化的核酸分子和未甲基化的核酸分子;(b)基于核酸分子的甲基化状态将生物样品中的核酸分子的至少一个亚组分区为多于一个分区组;(c)用至少一种甲基化敏感性限制性内切酶消化多于一个分区组中的一个或更多个分区组的至少一个亚组;(d)针对感兴趣的基因组区域对多于一个分区组中的核酸分子的至少一个亚组进行富集,其中核酸分子的至少一个亚组包含一个或更多个分区组中的消化的核酸分子;和(e)确定分区组中的至少一个中的核酸分子的一个或更多个遗传基因座处的甲基化状态。在一些实施方案中,该方法还包括,在(b)之前,将一个或更多个衔接子附接到多于一个分区组中的核酸分子的至少一端。On the other hand, the present disclosure provides a method for determining the methylation state of a nucleic acid molecule, the method comprising: (a) providing a biological sample of nucleic acid molecules, wherein the nucleic acid molecules include methylated nucleic acid molecules and unmethylated nucleic acid molecules; (b) partitioning at least one subset of nucleic acid molecules in the biological sample into more than one partition group based on the methylation state of the nucleic acid molecules; (c) digesting at least one subset of one or more of the partition groups with at least one methylation-sensitive restriction endonuclease; (d) enriching at least one subset of nucleic acid molecules in more than one partition group for a genomic region of interest, wherein at least one subset of nucleic acid molecules comprises digested nucleic acid molecules in one or more partition groups; and (e) determining the methylation state of one or more genetic loci of nucleic acid molecules in at least one of the partition groups. In some embodiments, the method further comprises, prior to (b), attaching one or more adapters to at least one end of the nucleic acid molecules in more than one partition group.

在一些实施方案中,该方法还包括检测生物样品中癌症的存在或不存在。在一些实施方案中,该方法还包括例如通过确定生物样品中来自癌细胞的DNA的水平,来确定生物样品中的癌症水平。在一些实施方案中,确定甲基化状态包括对消化的核酸分子的至少一个亚组进行测序。在一些实施方案中,测序由下一代测序仪进行。在一些实施方案中,一个或更多个衔接子包含至少一个标签。在一些实施方案中,衔接子耐受甲基化敏感性限制性内切酶的消化。在一些实施方案中,衔接子包含一个或更多个甲基化的核苷酸(例如,包含甲基化碱基的核苷酸)。在一些实施方案中,衔接子包含一个或更多个耐受甲基化敏感性限制性内切酶的核苷酸类似物(例如,具有连接修饰(linkage modifications)(诸如硫代磷酸酯)的核苷酸类似物)。在一些实施方案中,衔接子包含不被甲基化敏感性限制性内切酶识别的核苷酸序列。在一些实施方案中,衔接子不包含被方法中使用的甲基化敏感性限制性内切酶识别的任何序列。在一些实施方案中,标签包含分子条形码。在一些实施方案中,与第一分区组中的核酸分子附接的分子条形码不同于与第二分区组中的核酸分子附接的分子条形码。在一些实施方案中,对第一分区组与第二分区组差异性加标签。在一些实施方案中,将第一分区标签附接到第一分区组中的核酸分子,并且将第二分区标签附接到第二分区组中的核酸分子。In some embodiments, the method also includes detecting the presence or absence of cancer in the biological sample. In some embodiments, the method also includes, for example, determining the level of cancer in the biological sample by determining the level of DNA from cancer cells in the biological sample. In some embodiments, determining the methylation state includes sequencing at least one subset of the digested nucleic acid molecules. In some embodiments, sequencing is performed by a next-generation sequencer. In some embodiments, one or more adapters include at least one tag. In some embodiments, the adapter tolerates digestion of a methylation-sensitive restriction endonuclease. In some embodiments, the adapter includes one or more methylated nucleotides (e.g., nucleotides comprising methylated bases). In some embodiments, the adapter includes one or more nucleotide analogs (e.g., nucleotide analogs with linkage modifications (such as thiophosphates)) that tolerate methylation-sensitive restriction endonucleases. In some embodiments, the adapter includes a nucleotide sequence that is not recognized by a methylation-sensitive restriction endonuclease. In some embodiments, the adapter does not include any sequence recognized by a methylation-sensitive restriction endonuclease used in the method. In some embodiments, the tag includes a molecular barcode. In some embodiments, the molecular barcode attached to the nucleic acid molecules in the first partition is different from the molecular barcode attached to the nucleic acid molecules in the second partition. In some embodiments, the first partition is differentially labeled from the second partition. In some embodiments, a first partition tag is attached to the nucleic acid molecules in the first partition, and a second partition tag is attached to the nucleic acid molecules in the second partition.

在一些实施方案中,该方法包括用至少两种甲基化敏感性限制性内切酶(MSRE)消化多于一个分区组中的一个或更多个分区组的至少一个亚组。如本文使用的,对两种(或更多种)MSRE的提及意指使用具有不同性质(例如,不同识别序列)的两种(或更多种)不同的MSRE。在一些实施方案中,至少两种甲基化敏感性限制性内切酶由两种甲基化敏感性限制性内切酶组成。在一些实施方案中,两种甲基化敏感性限制性内切酶包括BstUI和HpaII。在一些实施方案中,两种甲基化敏感性限制性内切酶包括HhaI和AccII。在一些实施方案中,至少两种甲基化敏感性限制性内切酶包括三种甲基化敏感性限制性内切酶。在一些实施方案中,三种甲基化敏感性限制性内切酶包括BstUI、HpaII和Hin6I。在一些实施方案中,甲基化敏感性限制性内切酶选自由以下组成的组:AatII、AccII、AciI、Aor13HI、Aor15HI、BspT104I、BssHII、BstUI、Cfr10I、ClaI、CpoI、Eco52I、HaeII、HapII、HhaI、Hin6I、HpaII、HpyCH4IV、MluI、MspI、NaeI、NotI、NruI、NsbI、PmaCI、Psp1406I、PvuI、SacII、SalI、SmaI和SnaBI。在一些实施方案中,至少一种MSRE选择性地消化未甲基化的核酸分子。在一些实施方案中,至少一种MSRE选择性地消化甲基化的核酸分子。In some embodiments, the method includes digesting at least one subset of one or more partition groups in more than one partition group with at least two methylation sensitivity restriction endonucleases (MSRE). As used herein, the reference to two (or more) MSREs means using two (or more) different MSREs with different properties (e.g., different recognition sequences). In some embodiments, at least two methylation sensitivity restriction endonucleases are composed of two methylation sensitivity restriction endonucleases. In some embodiments, two methylation sensitivity restriction endonucleases include BstUI and HpaII. In some embodiments, two methylation sensitivity restriction endonucleases include HhaI and AccII. In some embodiments, at least two methylation sensitivity restriction endonucleases include three methylation sensitivity restriction endonucleases. In some embodiments, three methylation sensitivity restriction endonucleases include BstUI, HpaII and Hin6I. In some embodiments, the methylation-sensitive restriction endonuclease is selected from the group consisting of: AatII, AccII, AciI, Aor13HI, Aor15HI, BspT104I, BssHII, BstUI, Cfr10I, ClaI, CpoI, Eco52I, HaeII, HapII, HhaI, Hin6I, HpaII, HpyCH4IV, MluI, MspI, NaeI, NotI, NruI, NsbI, PmaCI, Psp1406I, PvuI, SacII, SalI, SmaI and SnaBI. In some embodiments, at least one MSRE selectively digests unmethylated nucleic acid molecules. In some embodiments, at least one MSRE selectively digests methylated nucleic acid molecules.

在一些实施方案中,甲基化核苷酸包括5-甲基胞嘧啶和/或5-羟甲基胞嘧啶。在一些实施方案中,生物样品选自由以下组成的组:DNA样品、RNA样品、多核苷酸样品、无细胞DNA样品和无细胞RNA样品。在一些实施方案中,生物样品是无细胞DNA样品。在一些实施方案中,无细胞DNA在1ng和500ng之间。In some embodiments, the methylated nucleotides include 5-methylcytosine and/or 5-hydroxymethylcytosine. In some embodiments, the biological sample is selected from the group consisting of: a DNA sample, an RNA sample, a polynucleotide sample, a cell-free DNA sample, and a cell-free RNA sample. In some embodiments, the biological sample is a cell-free DNA sample. In some embodiments, the cell-free DNA is between 1 ng and 500 ng.

在一些实施方案中,分区包括基于核酸分子与优先结合包含甲基化核苷酸(例如,包含甲基化碱基的核苷酸)的核酸分子的结合剂的不同结合亲和力对核酸分子进行分区。在一些实施方案中,结合剂是甲基结合结构域(MBD)蛋白。在一些实施方案中,结合剂是对一种或更多种甲基化核苷酸碱基特异性的抗体。在一些实施方案中,感兴趣的基因组区域包括用于癌症检测的差异性甲基化区域。In some embodiments, partitioning includes partitioning nucleic acid molecules based on different binding affinities of binding agents that preferentially bind to nucleic acid molecules comprising methylated nucleotides (e.g., nucleotides comprising methylated bases). In some embodiments, the binding agent is a methyl binding domain (MBD) protein. In some embodiments, the binding agent is an antibody specific to one or more methylated nucleotide bases. In some embodiments, the genomic region of interest includes a differentially methylated region for cancer detection.

在一些实施方案中,方法还包括在测序之前对至少一部分核酸分子进行扩增(例如,在消化步骤之后,或者在富集或捕获步骤之后)。在一些实施方案中,在扩增中使用的引物包含至少一种样品索引。在一些实施方案中,被MSRE消化的核酸分子不被扩增。在一些这样的实施方案中,除了被MSRE消化的核酸分子之外,样品中基本上所有的核酸分子都被扩增。In some embodiments, method also comprises before order-checking at least a portion of nucleic acid molecules is amplified (for example, after digestion step, or after enrichment or capture step).In some embodiments, the primer used in amplification comprises at least a sample index.In some embodiments, nucleic acid molecules digested by MSRE are not amplified.In some such embodiments, except nucleic acid molecules digested by MSRE, substantially all nucleic acid molecules are amplified in the sample.

在一些实施方案中,一个或更多个遗传基因座包括多于一个遗传基因座。在一些实施方案中,多于一个遗传基因座包括一个或更多个基因组区域。In some embodiments, the one or more genetic loci include more than one genetic loci. In some embodiments, the more than one genetic loci include one or more genomic regions.

在一些实施方案中,方法包括用至少两种甲基化敏感性限制性内切酶消化多于一个分区组中的一个或更多个分区组的至少一个亚组。在一些实施方案中,至少两种甲基化敏感性限制性内切酶由两种甲基化敏感性限制性内切酶组成。在一些实施方案中,两种甲基化敏感性限制性内切酶包括BstUI和HpaII。在一些实施方案中,两种甲基化敏感性限制性内切酶包括HhaI和AccII。在一些实施方案中,至少两种甲基化敏感性限制性内切酶包括三种甲基化敏感性限制性内切酶。在一些实施方案中,三种甲基化敏感性限制性内切酶包括BstUI、HpaII和Hin6I。在一些实施方案中,甲基化敏感性限制性内切酶选自由以下组成的组:AatII、AccII、AciI、Aor13HI、Aor15HI、BspT104I、BssHII、BstUI、Cfr10I、ClaI、CpoI、Eco52I、HaeII、HapII、HhaI、Hin6I、HpaII、HpyCH4IV、MluI、MspI、NaeI、NotI、NruI、NsbI、PmaCI、Psp1406I、PvuI、SacII、SalI、SmaI和SnaBI。在一些实施方案中,至少一种MSRE选择性地消化未甲基化的核酸分子。在一些实施方案中,至少一种MSRE选择性地消化甲基化的核酸分子。In some embodiments, the method comprises digesting at least one subset of one or more partition groups in more than one partition group with at least two methylation-sensitive restriction endonucleases. In some embodiments, the at least two methylation-sensitive restriction endonucleases consist of two methylation-sensitive restriction endonucleases. In some embodiments, the two methylation-sensitive restriction endonucleases comprise BstUI and HpaII. In some embodiments, the two methylation-sensitive restriction endonucleases comprise HhaI and AccII. In some embodiments, at least two methylation-sensitive restriction endonucleases comprise three methylation-sensitive restriction endonucleases. In some embodiments, three methylation-sensitive restriction endonucleases comprise BstUI, HpaII and Hin6I. In some embodiments, the methylation-sensitive restriction endonuclease is selected from the group consisting of: AatII, AccII, AciI, Aor13HI, Aor15HI, BspT104I, BssHII, BstUI, Cfr10I, ClaI, CpoI, Eco52I, HaeII, HapII, HhaI, Hin6I, HpaII, HpyCH4IV, MluI, MspI, NaeI, NotI, NruI, NsbI, PmaCI, Psp1406I, PvuI, SacII, SalI, SmaI and SnaBI. In some embodiments, at least one MSRE selectively digests unmethylated nucleic acid molecules. In some embodiments, at least one MSRE selectively digests methylated nucleic acid molecules.

在本发明的各个方面的一些实施方案中,本文公开的系统和/或方法的结果被用作输入以生成报告。报告可以是纸质或电子格式。例如,可以在这样的报告中示出如通过本文公开的方法或系统确定的关于癌症存在或不存在的信息。可选地或另外地,报告可以包括与表观遗传特征的表观遗传率相关的信息,例如它们是高于还是低于经调整的表观遗传率阈值。本文公开的方法或系统还可以包括将报告传送给第三方的步骤,第三方诸如是样品来源的受试者或健康护理从业者。In some embodiments of various aspects of the present invention, the results of the system and/or method disclosed herein are used as input to generate a report. The report can be in paper or electronic format. For example, information about the presence or absence of cancer as determined by the method or system disclosed herein can be shown in such a report. Alternatively or additionally, the report can include information related to the epigenetic rate of epigenetic features, such as whether they are higher or lower than the adjusted epigenetic rate threshold. The method or system disclosed herein can also include the step of transmitting the report to a third party, such as a subject or a health care practitioner in the sample source.

本文公开的方法的各步骤,或由本文公开的系统进行的步骤,可以在相同时间或不同的时间和/或在同一地理位置或不同的地理位置例如国家进行。本文公开的方法的各个步骤可以由同一个人或不同的人来执行。The steps of the methods disclosed herein, or the steps performed by the systems disclosed herein, can be performed at the same time or at different times and/or at the same geographic location or at different geographic locations such as countries. The steps of the methods disclosed herein can be performed by the same person or by different people.

根据以下详细描述,本公开内容的另外的方面和优点对本领域技术人员而言将变得明显,详细描述中仅示出和描述了本公开内容的说明性实施方案。如将意识到的,本公开内容能够具有其他和不同的实施方案,并且其若干细节能够在各种明显的方面进行修改,所有这些都不偏离本公开内容。相应地,附图和描述应被认为本质上是说明性的而非限制性的。Other aspects and advantages of the present disclosure will become apparent to those skilled in the art from the following detailed description, in which only illustrative embodiments of the present disclosure are shown and described. As will be appreciated, the present disclosure can have other and different embodiments, and its several details can be modified in various obvious aspects, all without departing from the present disclosure. Accordingly, the drawings and descriptions should be considered to be illustrative and non-restrictive in nature.

附图简述BRIEF DESCRIPTION OF THE DRAWINGS

并入本说明书并构成其一部分的附图说明了某些实施方案,并与书面描述一起用于解释本文公开的方法、计算机可读介质和系统的某些原理。当结合附图阅读时,本文提供的描述被更好地理解,附图以实例的方式而非限制的方式被包括在内。应当理解,除非上下文另有说明,否则在所有附图中,相同的附图标记表示相同的部件。还应当理解,一些或所有附图可以是出于说明目的的示意图,并不一定描绘所示元件的实际相对尺寸或位置。The accompanying drawings, which are incorporated into and constitute a part of this specification, illustrate certain embodiments and are used together with the written description to explain certain principles of the methods, computer-readable media, and systems disclosed herein. The description provided herein is better understood when read in conjunction with the accompanying drawings, which are included by way of example and not by way of limitation. It should be understood that, unless the context otherwise indicates, the same reference numerals represent the same parts in all the drawings. It should also be understood that some or all of the drawings may be schematic diagrams for illustrative purposes and do not necessarily depict the actual relative sizes or positions of the elements shown.

图1是当限制性内切酶(RE)识别位点包含未甲基化的核苷酸时,甲基化敏感性限制性内切酶(MSRE)消化/裂解DNA的示意图(上图);以及当限制性内切酶(RE)识别位点包含甲基化的核苷酸时,甲基化敏感性限制性内切酶(MSRE)不裂解DNA的示意图(下图)。因此,图1示出了一种类型的MSRE,其选择性地消化包含未甲基化核苷酸的识别位点,并且通常不消化包含甲基化核苷酸的识别位点。Fig. 1 is when restriction endonuclease (RE) recognition site comprises unmethylated nucleotide, the schematic diagram (upper figure) of methylation sensitivity restriction endonuclease (MSRE) digestion/cracking DNA;And when restriction endonuclease (RE) recognition site comprises methylated nucleotide, the schematic diagram (lower figure) of methylation sensitivity restriction endonuclease (MSRE) non-cracking DNA.Therefore, Fig. 1 shows a type of MSRE, which selectively digests the recognition site comprising unmethylated nucleotide, and does not digest the recognition site comprising methylated nucleotide usually.

图2是根据本公开内容的实施方案的用于确定从受试者获得的多核苷酸样品中核酸分子甲基化状态的方法的流程图表示。2 is a flow chart representation of a method for determining the methylation status of nucleic acid molecules in a polynucleotide sample obtained from a subject, according to an embodiment of the present disclosure.

图3是根据本公开内容的实施方案的用于检测受试者中癌症存在或不存在的方法的流程图表示。3 is a flow chart representation of a method for detecting the presence or absence of cancer in a subject according to an embodiment of the present disclosure.

图4是根据本公开内容的某些实施方案的用于检测受试者中癌症存在或不存在的方法的示意图。4 is a schematic diagram of a method for detecting the presence or absence of cancer in a subject according to certain embodiments of the present disclosure.

图5是适于供本公开内容的一些实施方案使用的系统的实例的示意图。5 is a schematic diagram of an example of a system suitable for use with some embodiments of the present disclosure.

图6示出了正常样品和稀释的CRC样品中有和没有MSRE处理的三个分区中的分子计数。FIG. 6 shows the molecular counts in three partitions with and without MSRE treatment in normal samples and diluted CRC samples.

图7示出如实施例3中描述的从患有早期结肠直肠癌的受试者(“早期CRC”)和三名健康受试者(“正常”)的三个样品获得的CpG甲基化定量结果。对于早期CRC图,MAF指示突变等位基因分数。Figure 7 shows the CpG methylation quantification results obtained from three samples of a subject with early colorectal cancer ("early CRC") and three healthy subjects ("normal") as described in Example 3. For the early CRC graph, MAF indicates the mutant allele fraction.

图8A-图8D示出了在所示的酶和缓冲液条件下具有FspEI回文位点的阳性和阴性对照分子的计数,如实施例4中描述。图8A和图8C对应于第一供体,并且图8B和图8D对应于第二供体。为了可读性,数据点沿水平轴分布。Figures 8A-8D show counts of positive and negative control molecules with FspEI palindromic sites under the indicated enzyme and buffer conditions, as described in Example 4. Figures 8A and 8C correspond to the first donor, and Figures 8B and 8D correspond to the second donor. For readability, the data points are distributed along the horizontal axis.

图9A-图9D示出了消化效率和阳性对照分子计数,如实施例4中描述。9A-9D show digestion efficiency and positive control molecule counts, as described in Example 4.

图10A-图10J示出了在所示的条件下的低甲基化可变靶区(“低VTR”)分子计数(10A-E)或者低VTR/阴性对照分子比值(10F-J),如实施例5中描述。为了可读性,数据点沿水平轴分布。三角形、圆形、加号和方形分别指示正常cfDNA的来源是四个健康供体中的第一、第二、第三或第四个。Figures 10A-10J show the low methylated variable target region ("Low VTR") molecule counts (10A-E) or the Low VTR/negative control molecule ratios (10F-J) under the indicated conditions, as described in Example 5. For readability, the data points are distributed along the horizontal axis. The triangle, circle, plus sign, and square indicate that the source of the normal cfDNA is the first, second, third, or fourth of four healthy donors, respectively.

定义definition

为了更容易地理解本公开内容,以下首先定义某些术语。以下术语和其他术语的另外定义可以通过本说明书进行阐述。如果下文阐述的术语的定义与通过引用并入的申请或专利中的定义不一致,则本申请中阐述的定义应该用于理解该术语的含义。In order to more easily understand the present disclosure, some terms are first defined below. The following terms and other definitions of other terms can be set forth through this specification. If the definition of the term set forth below is inconsistent with the definition in the application or patent incorporated by reference, the definition set forth in this application should be used to understand the meaning of the term.

除非上下文另外清楚地指明,否则如本说明书和所附的权利要求书中使用的单数形式“一(a)”、“一(an)”和“所述/该(the)”包括复数指示物。因此,例如,提及“一种(a)方法”包括一种或更多种本文描述和/或在阅读本公开内容等后将变得明显的类型的方法和/或步骤。As used in this specification and the appended claims, the singular forms "a," "an," and "the" include plural referents unless the context clearly dictates otherwise. Thus, for example, reference to "a method" includes one or more methods and/or steps of the type described herein and/or which will become apparent upon reading this disclosure and so forth.

还应理解的是,本文使用的术语仅为了描述特定实施方案的目的而非意图是限制性的。此外,除非另外定义,否则本文使用的所有技术术语和科学术语具有与本公开内容所属的领域的普通技术人员通常理解的相同含义。在描述和要求保护方法、计算机可读介质和系统方面,将根据下文阐述的定义使用以下术语及其语法变化形式。It should also be understood that the terms used herein are intended to be restrictive only for the purpose of describing specific embodiments and are not intended to be limiting. In addition, unless otherwise defined, all technical terms and scientific terms used herein have the same meanings as those of ordinary skill in the art to which the present disclosure belongs. In describing and claiming methods, computer-readable media, and systems, the following terms and their grammatical variations will be used according to the definitions set forth below.

约:如本文使用的,应用于一个或更多个感兴趣的值或要素的“约”或“大约”是指与所述参考值或要素相似的值或要素。在某些实施方案中,术语“约”或“大约”是指在任一方向上落在(大于或小于)所述参考值或要素的25%、20%、19%、18%、17%、16%、15%、14%、13%、12%、11%、10%、9%、8%、7%、6%、5%、4%、3%、2%、1%或更小以内的一系列值或要素,除非另有说明或从上下文中明显(除非这样的数字超过可能的值或要素的100%)。About: As used herein, "about" or "approximately" applied to one or more values or elements of interest refers to a value or element similar to the reference value or element. In certain embodiments, the term "about" or "approximately" refers to a series of values or elements that fall within (greater than or less than) 25%, 20%, 19%, 18%, 17%, 16%, 15%, 14%, 13%, 12%, 11%, 10%, 9%, 8%, 7%, 6%, 5%, 4%, 3%, 2%, 1% or less of the reference value or element in either direction, unless otherwise specified or obvious from the context (unless such a number exceeds 100% of the possible value or element).

衔接子:如本文使用的,“衔接子”是指通常是至少部分双链的短核酸(例如,长度小于约500个核苷酸、小于约100个核苷酸或小于约50个核苷酸),并且附接至给定样品核酸分子的任一末端或两个末端(即,将两个衔接子附接到核酸的两个末端——一个衔接子在核酸的一个末端)。衔接子可以包含核酸引物结合位点和/或测序引物结合位点,核酸引物结合位点允许在两个末端处侧接衔接子的核酸分子的扩增,测序引物结合位点包括用于测序应用(诸如各种下一代测序(NGS)应用)的引物结合位点。衔接子还可以包含捕获探针(诸如附接到流动池支持物的寡核苷酸等)的结合位点。衔接子还可以包含如本文描述的核酸标签。核酸标签通常相对于扩增引物结合位点和测序引物结合位点定位,使得核酸标签被包含在特定核酸分子的扩增子和序列读段中。相同或不同序列的衔接子可以连接至核酸分子的相应末端。在一些实施方案中,除了核酸标签不同的相同序列的衔接子被连接至核酸分子的相应末端。在一些实施方案中,衔接子是Y形衔接子,其中一个末端如本文所述是平末端或加尾的,用于连接也是平末端或用一个或更多个互补核苷酸加尾的核酸分子,并且Y形衔接子的另一个末端包含不杂交形成双链的非互补序列。在又其他的示例性实施方案中,衔接子是钟形衔接子,其包含用于连接至待分析的核酸分子的平末端或加尾末端。衔接子的其他实例包括加T尾和加C尾的衔接子。Adaptor: As used herein, "adaptor" refers to a short nucleic acid (e.g., less than about 500 nucleotides, less than about 100 nucleotides, or less than about 50 nucleotides in length) that is typically at least partially double-stranded and is attached to either end or both ends of a given sample nucleic acid molecule (i.e., two adaptors are attached to both ends of a nucleic acid-one adaptor at one end of a nucleic acid). An adaptor may include a nucleic acid primer binding site and/or a sequencing primer binding site, which allows amplification of nucleic acid molecules flanking the adaptor at both ends, and a sequencing primer binding site includes a primer binding site for sequencing applications such as various next generation sequencing (NGS) applications. An adaptor may also include a binding site for a capture probe such as an oligonucleotide attached to a flow cell support. An adaptor may also include a nucleic acid tag as described herein. Nucleic acid tags are typically positioned relative to amplification primer binding sites and sequencing primer binding sites so that the nucleic acid tag is included in the amplicon and sequence reads of a specific nucleic acid molecule. Adaptors of the same or different sequences may be connected to the corresponding ends of a nucleic acid molecule. In some embodiments, adapters of the same sequence except that the nucleic acid tags are different are connected to the corresponding ends of the nucleic acid molecules. In some embodiments, the adapter is a Y-shaped adapter, one of which is flat-ended or tailed as described herein, for connecting nucleic acid molecules that are also flat-ended or tailed with one or more complementary nucleotides, and the other end of the Y-shaped adapter comprises a non-complementary sequence that does not hybridize to form a double strand. In yet other exemplary embodiments, the adapter is a bell-shaped adapter, which comprises a flat end or tailed end for connecting to the nucleic acid molecule to be analyzed. Other examples of adapters include adapters with T tails and C tails.

扩增:如本文使用的,在核酸的上下文中“扩增(amplify)”或“扩增(amplification)是指通常从少量的多核苷酸(例如,单个多核苷酸分子)开始产生多个拷贝的该多核苷酸或该多核苷酸的一部分,其中扩增产物或扩增子通常是可检测的。多核苷酸的扩增包括多种化学和酶促过程。扩增包括但不限于聚合酶链式反应(PCR)。Amplification: As used herein, "amplify" or "amplification" in the context of nucleic acids refers to the production of multiple copies of a polynucleotide or a portion of a polynucleotide, typically starting from a small amount of the polynucleotide (e.g., a single polynucleotide molecule), wherein the amplification product or amplicon is typically detectable. Amplification of polynucleotides includes a variety of chemical and enzymatic processes. Amplification includes, but is not limited to, the polymerase chain reaction (PCR).

条形码:如本文使用的,在核酸的上下文中“条形码”是指包含能够用作标识符的序列的核酸分子。例如,条形码能够用作分子的标识符(即,分子条形码)、样品的标识符(即,样品条形码)或分区的标识符(即,分区条形码)。在下一代测序(NGS)文库制备期间,个体“条形码”序列通常被添加到每个DNA片段,使得在最终数据分析之前能够对每个读段进行鉴定和分选。Barcode: As used herein, "barcode" in the context of nucleic acids refers to a nucleic acid molecule comprising a sequence that can be used as an identifier. For example, a barcode can be used as an identifier of a molecule (i.e., a molecular barcode), an identifier of a sample (i.e., a sample barcode), or an identifier of a partition (i.e., a partition barcode). During next-generation sequencing (NGS) library preparation, individual "barcode" sequences are typically added to each DNA fragment, enabling identification and sorting of each read prior to final data analysis.

癌症类型:如本文使用的,“癌症类型”是指由例如组织病理学定义的癌症的类型或亚型。癌症类型可以通过任何常规标准来定义,诸如基于在特定组织中的发生(例如,血癌、中枢神经系统(CNS)癌、脑癌、肺癌(小细胞和非小细胞)、皮肤癌、鼻癌、喉癌、肝癌、骨癌、淋巴瘤、胰腺癌、肠癌、直肠癌、甲状腺癌、膀胱癌、肾癌、口癌、胃癌、乳腺癌、前列腺癌、卵巢癌、肺癌、小肠癌、软组织癌、神经内分泌癌、胃食管癌、头颈癌、妇科癌症、结肠直肠癌、尿路上皮癌、固态癌(solid state cancer)、异质性癌症(heterogeneous cancer)、同质性癌症(homogeneous cancer))、原发性来源未知等,和/或可以具有相同细胞谱系(例如,癌、肉瘤、淋巴瘤、胆管癌、白血病、间皮瘤、黑素瘤或成胶质细胞瘤)和/或可以是显示出癌症标志物(诸如,但不限于Her2、CA15-3、CA19-9、CA-125、CEA、AFP、PSA、HCG、激素受体和NMP-22)的癌症。癌症也可以根据阶段(例如,阶段1、阶段2、阶段3或阶段4)以及是原发性还是继发性来分类。Cancer type: As used herein, "cancer type" refers to a type or subtype of cancer defined, for example, by histopathology. Cancer type can be defined by any conventional criteria, such as based on occurrence in a particular tissue (e.g., blood cancer, central nervous system (CNS) cancer, brain cancer, lung cancer (small cell and non-small cell), skin cancer, nose cancer, laryngeal cancer, liver cancer, bone cancer, lymphoma, pancreatic cancer, intestinal cancer, rectal cancer, thyroid cancer, bladder cancer, kidney cancer, oral cancer, stomach cancer, breast cancer, prostate cancer, ovarian cancer, lung cancer, small intestine cancer, soft tissue cancer, neuroendocrine cancer, gastroesophageal cancer, head and neck cancer, gynecological cancer, colorectal cancer, urothelial cancer, solid state cancer, heterogeneous cancer, homogeneous cancer Cancer)), primary origin unknown, etc., and/or may have the same cell lineage (e.g., carcinoma, sarcoma, lymphoma, cholangiocarcinoma, leukemia, mesothelioma, melanoma, or glioblastoma) and/or may be a cancer that exhibits cancer markers (such as, but not limited to Her2, CA15-3, CA19-9, CA-125, CEA, AFP, PSA, HCG, hormone receptors, and NMP-22). Cancer may also be classified according to stage (e.g., stage 1, stage 2, stage 3, or stage 4) and whether it is primary or secondary.

捕获组:如本文使用的,核酸的“捕获组”是指已经经历捕获的核酸。Capture set: As used herein, a "capture set" of nucleic acids refers to nucleic acids that have undergone capture.

捕获:如本文使用的,“捕获”或“富集”一种或更多种靶核酸是指优先将一种或更多种靶核酸与非靶核酸分离或分开。Capture: As used herein, "capturing" or "enriching" one or more target nucleic acids refers to preferentially isolating or separating one or more target nucleic acids from non-target nucleic acids.

无细胞核酸:如本文使用的,“无细胞核酸”是指不包含在细胞内或本来不与细胞结合的核酸,或者在一些实施方案中,是指去除完整细胞后保留在样品中的核酸。无细胞核酸可以包括,例如,来源于来自受试者的体液(例如,血液、血浆、血清、尿液、脑脊液(CSF)等)的所有未被包封的核酸。无细胞核酸包括DNA(cfDNA)、RNA(cfRNA),以及它们的混杂物(hybrids),包括基因组DNA、线粒体DNA、循环DNA、siRNA、miRNA、循环RNA(cRNA)、tRNA、rRNA、小核仁RNA(snoRNA)、Piwi相互作用RNA(piRNA)、长非编码RNA(长ncRNA),和/或这些中的任一种的片段。无细胞核酸可以是双链的、单链的,或它们的混杂物。无细胞核酸可以通过分泌或细胞死亡过程(例如,细胞坏死、凋亡等)被释放到体液中。一些无细胞核酸是从癌细胞释放到体液中的,例如,循环肿瘤DNA(ctDNA)。其他的是从健康细胞释放的。ctDNA可以是未被包封的肿瘤来源的片段化DNA。无细胞核酸可以具有一种或更多种表观遗传修饰,例如,无细胞核酸可以被乙酰化、5-甲基化和/或羟基甲基化。Cell-free nucleic acid: As used herein, "cell-free nucleic acid" refers to nucleic acids that are not contained in cells or are not originally associated with cells, or in some embodiments, refers to nucleic acids that remain in a sample after the complete cells are removed. Cell-free nucleic acids may include, for example, all unencapsulated nucleic acids derived from body fluids from a subject (e.g., blood, plasma, serum, urine, cerebrospinal fluid (CSF), etc.). Cell-free nucleic acids include DNA (cfDNA), RNA (cfRNA), and their hybrids, including genomic DNA, mitochondrial DNA, circulating DNA, siRNA, miRNA, circulating RNA (cRNA), tRNA, rRNA, small nucleolar RNA (snoRNA), Piwi interacting RNA (piRNA), long non-coding RNA (long ncRNA), and/or fragments of any of these. Cell-free nucleic acids may be double-stranded, single-stranded, or hybrids thereof. Cell-free nucleic acids may be released into body fluids by secretion or cell death processes (e.g., cell necrosis, apoptosis, etc.). Some cell-free nucleic acids are released into body fluids from cancer cells, for example, circulating tumor DNA (ctDNA). Others are released from healthy cells. ctDNA can be fragmented DNA of tumor origin that is not encapsulated. Cell-free nucleic acid can have one or more epigenetic modifications, for example, cell-free nucleic acid can be acetylated, 5-methylated and/or hydroxymethylated.

细胞核酸:如本文使用的,“细胞核酸”是指至少在从受试者获取或收集样品时处于产生核酸的一个或更多个细胞内的核酸,即使作为特定分析过程的一部分,这些核酸随后被取出(例如,通过细胞裂解)。Cellular nucleic acid: As used herein, "cellular nucleic acid" refers to nucleic acid that is within one or more cells that produce the nucleic acid at least at the time the sample is obtained or collected from the subject, even if the nucleic acid is subsequently removed (e.g., by cell lysis) as part of a particular analytical procedure.

对应于靶区组:如本文使用的,“对应于靶区组”意指核酸诸如cfDNA来源于靶区组中的基因座或特异性地结合针对该靶区组的一种或更多种探针。Corresponding to a target region group: As used herein, "corresponding to a target region group" means that a nucleic acid, such as cfDNA, is derived from a locus in the target region group or specifically binds to one or more probes for the target region group.

覆盖率:如本文使用的,术语“覆盖率”、“总分子计数”或“总等位基因计数”可互换使用。它们是指特定样品中在特定基因组位置处的DNA分子的总数。Coverage: As used herein, the terms "coverage", "total molecule counts" or "total allele counts" are used interchangeably. They refer to the total number of DNA molecules at a specific genomic location in a specific sample.

脱氧核糖核酸或核糖核酸:如本文使用的,“脱氧核糖核酸”或“DNA”是指在糖部分的2’-位置处具有氢基团的天然或修饰的核苷酸。DNA通常包括包含以下四种类型的核苷酸碱基的核苷酸链:腺嘌呤(A)、胸腺嘧啶(T)、胞嘧啶(C)和鸟嘌呤(G)。如本文使用的,“核糖核酸”或“RNA”是指在糖部分的2’-位置处具有羟基基团的天然或修饰的核苷酸。RNA通常包括包含以下四种类型的核苷酸碱基的核苷酸链:A、尿嘧啶(U)、G和C。如本文使用的,术语“核苷酸”是指天然核苷酸或修饰的核苷酸。某些核苷酸对以互补方式彼此特异性结合(被称为互补碱基配对)。在DNA中,腺嘌呤(A)与胸腺嘧啶(T)配对并且胞嘧啶(C)与鸟嘌呤(G)配对。在RNA中,腺嘌呤(A)与尿嘧啶(U)配对并且胞嘧啶(C)与鸟嘌呤(G)配对。当第一条核酸链结合由与第一条链中的那些核苷酸互补的核苷酸构成的第二条核酸链时,两条链结合形成双链。如本文使用的,“测序数据”、“核酸测序信息”、“序列信息”、“核酸序列”、“核苷酸序列”、“基因组序列”、“序列读段”或“测序读段”表示指示核酸诸如DNA或RNA的分子(例如,全基因组、全转录组、外显子组、寡核苷酸、多核苷酸,或片段)中核苷酸碱基(例如,腺嘌呤、鸟嘌呤、胞嘧啶、和胸腺嘧啶或尿嘧啶)的顺序和身份的任何信息或数据。应当理解,本教导设想了使用所有可用的各种技术(technique)、平台或技术(technology)获得的序列信息,包括但不限于:毛细管电泳、微阵列、基于连接的系统、基于聚合酶的系统、基于杂交的系统、直接或间接的核苷酸鉴定系统、焦磷酸测序、基于离子或pH的检测系统以及基于电子信号的系统(electronic signature-based system)。Deoxyribonucleic acid or ribonucleic acid: As used herein, "deoxyribonucleic acid" or "DNA" refers to a natural or modified nucleotide having a hydrogen group at the 2'-position of the sugar portion. DNA generally includes a nucleotide chain containing the following four types of nucleotide bases: adenine (A), thymine (T), cytosine (C) and guanine (G). As used herein, "ribonucleic acid" or "RNA" refers to a natural or modified nucleotide having a hydroxyl group at the 2'-position of the sugar portion. RNA generally includes a nucleotide chain containing the following four types of nucleotide bases: A, uracil (U), G and C. As used herein, the term "nucleotide" refers to a natural nucleotide or a modified nucleotide. Certain nucleotide pairs specifically bind to each other in a complementary manner (referred to as complementary base pairing). In DNA, adenine (A) pairs with thymine (T) and cytosine (C) pairs with guanine (G). In RNA, adenine (A) pairs with uracil (U) and cytosine (C) pairs with guanine (G). When the first nucleic acid chain is combined with a second nucleic acid chain consisting of nucleotides complementary to those in the first chain, the two chains combine to form a double strand. As used herein, "sequencing data", "nucleic acid sequencing information", "sequence information", "nucleic acid sequence", "nucleotide sequence", "genomic sequence", "sequence reads" or "sequencing reads" refer to any information or data indicating the order and identity of nucleotide bases (e.g., adenine, guanine, cytosine, and thymine or uracil) in a molecule (e.g., a whole genome, a whole transcriptome, an exome, an oligonucleotide, a polynucleotide, or a fragment) of a nucleic acid such as DNA or RNA. It should be understood that the present teachings contemplate sequence information obtained using all available various techniques, platforms or technologies, including but not limited to: capillary electrophoresis, microarrays, ligation-based systems, polymerase-based systems, hybridization-based systems, direct or indirect nucleotide identification systems, pyrophosphate sequencing, ion-based or pH-based detection systems, and electronic signal-based systems.

消化效率:如本文使用的,“消化效率”或“切割效率”是指限制性内切酶消化的效率。消化效率可以基于用限制性内切酶消化时观察到的对照分子的数目和在不存在限制性内切酶消化的情况下观察到的对照分子的数目来计算。MSRE消化效率可通过以下计算:效率=1-(阴性对照分子数[MSRE]/阴性对照分子数[模拟])。MDRE(优先裂解甲基化DNA的MSRE,也称为甲基化依赖性限制性内切酶)消化效率可通过以下计算:效率=1-(阳性对照分子数[MDRE]/阳性对照分子数[模拟])。Digestion efficiency: As used herein, "digestion efficiency" or "cutting efficiency" refers to the efficiency of restriction endonuclease digestion. Digestion efficiency can be calculated based on the number of control molecules observed when digesting with restriction endonucleases and the number of control molecules observed in the absence of restriction endonucleases. MSRE digestion efficiency can be calculated by the following: efficiency = 1-(number of negative control molecules [MSRE] / number of negative control molecules [simulation] ). MDRE (MSRE that preferentially cleaves methylated DNA, also known as methylation-dependent restriction endonucleases) digestion efficiency can be calculated by the following: efficiency = 1-(number of positive control molecules [MDRE] / number of positive control molecules [simulation] ).

DNA序列:如本文使用的,“DNA序列”或“序列”是指“原始序列读段”和/或“共有序列”。原始序列读段是DNA测序仪的输出,并且通常包括相同亲本分子的冗余序列,例如扩增后。“共有序列”是源自意图代表原始亲本分子的序列的亲本分子的冗余序列的序列。共有序列包括单个位置处的碱基同一性。在一些实施方案中,共有序列可以代表特定基因组位置处的单个核苷酸碱基。在一些实施方案中,共有序列可以代表多于一个基因组位置处的一串核苷酸碱基。共有序列可以通过投票(voting)(其中每种主要核苷酸,例如,在序列中的特定碱基位置处最常观察到的核苷酸是共有核苷酸)或其他方法(诸如与参考基因组进行比较)来产生。共有序列可以通过用独特或非独特的分子标签对原始亲本分子加标签来产生,这允许通过追溯标签和/或使用序列读段内部信息来追溯子序列(例如,扩增后)。加标签或进行条形码化的实例,以及标签或条形码的使用,在例如美国专利公布第2015/0368708号、第2015/0299812号、第2016/0040229号和第2016/0046986号中提供,其中每一项通过引用整体并入本文。DNA sequence: As used herein, "DNA sequence" or "sequence" refers to "raw sequence reads" and/or "consensus sequences". Raw sequence reads are the output of a DNA sequencer and typically include redundant sequences of the same parent molecule, such as after amplification. A "consensus sequence" is a sequence derived from a redundant sequence of a parent molecule intended to represent the sequence of the original parent molecule. The consensus sequence includes base identities at a single position. In some embodiments, a consensus sequence may represent a single nucleotide base at a specific genomic position. In some embodiments, a consensus sequence may represent a string of nucleotide bases at more than one genomic position. The consensus sequence can be generated by voting (wherein each major nucleotide, for example, the nucleotide most frequently observed at a specific base position in the sequence is a consensus nucleotide) or other methods (such as comparison with a reference genome). The consensus sequence can be generated by tagging the original parent molecule with a unique or non-unique molecular tag, which allows tracing back subsequences (e.g., after amplification) by tracing back tags and/or using sequence read internal information. Examples of labeling or barcoding, and the use of labels or barcodes, are provided in, for example, U.S. Patent Publication Nos. 2015/0368708, 2015/0299812, 2016/0040229, and 2016/0046986, each of which is incorporated herein by reference in its entirety.

富集的样品:如本文使用的,“富集的样品”是指已经针对特定感兴趣区域富集的样品。可以通过扩增感兴趣的区域或通过使用能够与感兴趣的核酸分子杂交的单链DNA/RNA探针或双链DNA探针(例如,探针,Agilent Technologies)来富集样品。在一些实施方案中,富集的样品是指被富集的经处理的样品的亚组或部分,其中被富集的经处理的样品的亚组或部分包含来自无细胞多核苷酸或多核苷酸样品的核酸分子。Enriched sample: As used herein, "enriched sample" refers to a sample that has been enriched for a particular region of interest. This can be accomplished by amplifying the region of interest or by using single-stranded DNA/RNA probes or double-stranded DNA probes that can hybridize to nucleic acid molecules of interest (e.g., In some embodiments, the enriched sample refers to a subset or portion of the processed sample that is enriched, wherein the subset or portion of the processed sample that is enriched comprises nucleic acid molecules from the cell-free polynucleotide or polynucleotide sample.

表观遗传表征:如本文使用的,“表观遗传表征”是指能够用于分析DNA分子表观遗传特征的该DNA分子的任何可直接观察的量度。例如,如果表观遗传特征是甲基化,则DNA分子的表观遗传表征可以指但不限于对DNA分子的分区、DNA分子中CpG残基数目和DNA分子的位置(或偏移)。例如,如果表观遗传特征是片段组信号,则表观遗传表征可以是但不限于cfDNA分子的长度、cfDNA分子的位置(或偏移)——cfDNA分子的起始和/或终止位置。Epigenetic characterization: As used herein, "epigenetic characterization" refers to any directly observable measure of a DNA molecule that can be used to analyze the epigenetic characteristics of a DNA molecule. For example, if the epigenetic characteristic is methylation, the epigenetic characterization of the DNA molecule may refer to, but is not limited to, the partitioning of the DNA molecule, the number of CpG residues in the DNA molecule, and the position (or offset) of the DNA molecule. For example, if the epigenetic characteristic is a fragment group signal, the epigenetic characterization may be, but is not limited to, the length of the cfDNA molecule, the position (or offset) of the cfDNA molecule-the starting and/or ending position of the cfDNA molecule.

表观遗传特征:如本文使用的,“表观遗传特征”是指可以表现出核酸的非序列修饰且还包括染色质修饰的任何参数。这些修饰不会改变DNA的序列。表观遗传特征可以包括但不限于甲基化状态;片段组信号;核小体、CTCF蛋白、转录起始位点、调控蛋白和任何其他可以与DNA结合的蛋白的位置/分布。Epigenetic signature: As used herein, "epigenetic signature" refers to any parameter that can manifest non-sequence modifications of nucleic acids and also includes chromatin modifications. These modifications do not change the sequence of the DNA. Epigenetic signatures can include, but are not limited to, methylation status; fragment group signal; location/distribution of nucleosomes, CTCF protein, transcription start site, regulatory proteins, and any other protein that can bind to DNA.

表观遗传靶区组:如本文使用的,“表观遗传靶区组”是指在赘生性细胞(例如,肿瘤细胞和癌细胞)和非肿瘤细胞(例如,免疫细胞、来自肿瘤微环境的细胞)中可以表现出非序列修饰的靶区组。这些修饰不会改变DNA的序列。非序列修饰改变的实例包括,但不限于,甲基化的改变(增加或减少)、核小体分布、CTCF结合、转录起始位点、调控蛋白结合区和任何其他可能与DNA结合的蛋白的改变。对于本发明的目的,易于发生与赘生物、肿瘤或癌症相关的聚焦扩增和/或基因融合的基因座也可以被包括在表观遗传靶区组中,因为通过测序检测拷贝数的改变或检测映射到参考基因组中多于一个基因座的融合序列趋向于比检测核苷酸取代、插入或缺失更类似于检测以上讨论的示例性表观遗传改变,例如,在以下方面:聚焦扩增和/或基因融合可以以相对浅的测序深度检测到,因为它们的检测不依赖于一个或几个单独位置处的碱基判定(call)的准确度。例如,表观遗传靶区组可以包括用于分析片段长度或片段终点位置分布的靶区组。在一些实施方案中,表观遗传靶区组包括一个或更多个基因组区域,其中这些区中的cfDNA分子的表观遗传状态(例如,甲基化状态)在癌症中不变,但是它们在血液中的存在/量指示来自某些组织(例如,癌症来源)的cfDNA向循环中的增加的异常的呈现。术语“表观遗传学”和“表观基因组学”在本文可互换使用。Epigenetic target set: As used herein, "epigenetic target set" refers to a set of target regions that can exhibit non-sequence modifications in neoplastic cells (e.g., tumor cells and cancer cells) and non-tumor cells (e.g., immune cells, cells from the tumor microenvironment). These modifications do not change the sequence of the DNA. Examples of non-sequence modification changes include, but are not limited to, changes in methylation (increase or decrease), nucleosome distribution, CTCF binding, transcription start site, regulatory protein binding region, and any other protein that may bind to DNA. For the purposes of the present invention, loci that are prone to focused amplification and/or gene fusion associated with neoplasms, tumors or cancers can also be included in the epigenetic target set, because detection of copy number changes by sequencing or detection of fusion sequences mapped to more than one locus in a reference genome tends to be more similar to detection of the exemplary epigenetic changes discussed above than detection of nucleotide substitutions, insertions or deletions, for example, in the following aspects: focused amplification and/or gene fusions can be detected with relatively shallow sequencing depth because their detection does not rely on the accuracy of base calls at one or a few individual positions. For example, the epigenetic target group may include a target group for analyzing the fragment length or fragment end position distribution. In some embodiments, the epigenetic target group includes one or more genomic regions, wherein the epigenetic state (e.g., methylation state) of the cfDNA molecules in these regions is unchanged in cancer, but their presence/amount in the blood indicates the abnormal presentation of the increase of cfDNA from certain tissues (e.g., cancer source) to the circulation. The terms "epigenetics" and "epigenomics" are used interchangeably herein.

片段组信号:如本文使用的,“片段组信号”是指特定基因组区域处的cfDNA片段尺寸和cfDNA片段位置的分布。片段组信号可以包括但不限于cfDNA片段长度、cfDNA分子的起始和/或终止位置(片段的尺寸覆盖范围)。片段组信号还可以包括DNA分子终点出现在基因组位置(在特定位置或特定位置周围的感兴趣区域)的频率。片段组信号还可以包括DNA分子的核小体定位。在一些实施方案中,片段组信号包括DNA分子的终点信息,但不必包括DNA分子的长度参数)。Fragment group signal: As used herein, "fragment group signal" refers to the distribution of cfDNA fragment sizes and cfDNA fragment positions at specific genomic regions. Fragment group signals may include, but are not limited to, cfDNA fragment lengths, starting and/or ending positions of cfDNA molecules (size coverage of fragments). Fragment group signals may also include the frequency at which DNA molecule endpoints appear at genomic locations (regions of interest at or around specific locations). Fragment group signals may also include nucleosome positioning of DNA molecules. In some embodiments, the fragment group signal includes endpoint information of the DNA molecule, but does not necessarily include length parameters of the DNA molecule).

基因组区域:如本文使用的,“基因组区域”是指基因组例如染色体、染色体臂、基因或外显子的任何区域(例如碱基对位置范围)。基因组区域可以是连续或不连续的区域。“遗传基因座”(或“基因座”)可以是基因组区域的一部分或全部(例如,基因、基因的一部分或基因的单个核苷酸)。在一些实施方案中,基因组区域的尺寸包括高达染色体/染色体臂或拓扑相关结构域(TAD)的长度。在一些实施方案中,基因组区域的尺寸可能受限于该区域的生物活性(例如,转录单位或调控单位)。Genomic region: As used herein, "genomic region" refers to any region (e.g., base pair position range) of a genome, such as a chromosome, a chromosome arm, a gene, or an exon. A genomic region can be a continuous or discontinuous region. A "genetic locus" (or "locus") can be part or all of a genomic region (e.g., a gene, a portion of a gene, or a single nucleotide of a gene). In some embodiments, the size of a genomic region includes a length of up to a chromosome/chromosome arm or a topologically associated domain (TAD). In some embodiments, the size of a genomic region may be limited to the biological activity of the region (e.g., a transcription unit or a regulatory unit).

高甲基化:如本文使用的,“高甲基化”是指相对于核酸分子群体(例如,样品)内的其他核酸分子,核酸分子的增加的甲基化水平或程度。在一些实施方案中,高甲基化是指相对于来自非肿瘤样品中特定基因组区域的核酸分子的甲基化程度,来自肿瘤样品中同一基因组区域的核酸分子的增加的甲基化水平或程度。在一些实施方案中,高甲基化DNA可以包括包含至少1个甲基化残基、至少2个甲基化残基、至少3个甲基化残基、至少5个甲基化残基、至少10个甲基化残基、至少20个甲基化残基、至少25个甲基化残基或至少30个甲基化残基的DNA分子。Hypermethylation: As used herein, "hypermethylation" refers to an increased level or degree of methylation of a nucleic acid molecule relative to other nucleic acid molecules within a nucleic acid molecule population (e.g., a sample). In some embodiments, hypermethylation refers to an increased level or degree of methylation of a nucleic acid molecule from a specific genomic region in a tumor sample relative to the degree of methylation of a nucleic acid molecule from the same genomic region in a non-tumor sample. In some embodiments, hypermethylated DNA may include a DNA molecule comprising at least 1 methylated residue, at least 2 methylated residues, at least 3 methylated residues, at least 5 methylated residues, at least 10 methylated residues, at least 20 methylated residues, at least 25 methylated residues, or at least 30 methylated residues.

低甲基化:如本文使用的,“低甲基化”是指相对于核酸分子群体(例如,样品)内的其他核酸分子,核酸分子的降低的甲基化水平或程度。在一些实施方案中,低甲基化DNA包括未甲基化的DNA分子。在一些实施方案中,低甲基化是指相对于来自非肿瘤样品中特定基因组区域的核酸分子的甲基化程度,来自肿瘤样品中同一基因组区域的核酸分子的降低的甲基化水平或程度。在一些实施方案中,低甲基化DNA可以包括包含0个甲基化残基、至多1个甲基化残基、至多2个甲基化残基、至多3个甲基化残基、至多4个甲基化残基或至多5个甲基化残基的DNA分子。Hypomethylation: As used herein, "hypomethylation" refers to a reduced methylation level or degree of a nucleic acid molecule relative to other nucleic acid molecules within a nucleic acid molecule population (e.g., a sample). In some embodiments, hypomethylated DNA includes unmethylated DNA molecules. In some embodiments, hypomethylation refers to a reduced methylation level or degree of a nucleic acid molecule from the same genomic region in a tumor sample relative to the degree of methylation of a nucleic acid molecule from a specific genomic region in a non-tumor sample. In some embodiments, hypomethylated DNA can include a DNA molecule comprising 0 methylated residues, at most 1 methylated residue, at most 2 methylated residues, at most 3 methylated residues, at most 4 methylated residues, or at most 5 methylated residues.

甲基化:如本文使用的,“甲基化”或“DNA甲基化”可以指在CpG位点(胞嘧啶-磷酸-鸟嘌呤位点——即,按核酸序列的5’->3’方向胞嘧啶之后为鸟嘌呤)处的胞嘧啶存在甲基基团。在一些实施方案中,DNA甲基化包括向腺嘌呤添加甲基,诸如在N6-甲基腺嘌呤中。在一些实施方案中,DNA甲基化是5-甲基化(对胞嘧啶6碳环的第5个碳的修饰)。在一些实施方案中,5-甲基化包括将甲基基团添加到胞嘧啶的5C位置以产生5-甲基胞嘧啶(m5c)。在一些实施方案中,甲基化包括m5c的衍生物。m5c的衍生物包括但不限于5-羟甲基胞嘧啶(5-hmC)、5-甲酰基胞嘧啶(5-fC)和5-羧基胞嘧啶(5-caC)。在一些实施方案中,DNA甲基化是3C甲基化(胞嘧啶6-碳环的第3个碳的修饰)。在一些实施方案中,3C甲基化包括将甲基基团添加到胞嘧啶的3C位置以产生3-甲基胞嘧啶(3mC)。甲基化也可以发生在非CpG位点处,例如,甲基化可以发生在CpA、CpT或CpC位点处。DNA甲基化可以改变甲基化DNA区域的活性。例如,当启动子区域中的DNA被甲基化时,基因的转录可能被阻遏。DNA甲基化对于正常发育是至关重要的,并且甲基化的异常可以破坏表观遗传调控。表观遗传调控的破坏,例如阻遏,可以引起疾病,诸如癌症。DNA中的启动子甲基化可以指示癌症。Methylation: As used herein, "methylation" or "DNA methylation" may refer to the presence of a methyl group on a cytosine at a CpG site (cytosine-phosphate-guanine site, i.e., a guanine following a cytosine in the 5'->3' direction of a nucleic acid sequence). In some embodiments, DNA methylation comprises adding a methyl group to an adenine, such as in N 6 -methyladenine. In some embodiments, DNA methylation is 5-methylation (modification of the 5th carbon of the 6-carbon ring of cytosine). In some embodiments, 5-methylation comprises adding a methyl group to the 5C position of cytosine to produce 5-methylcytosine (m5c). In some embodiments, methylation comprises derivatives of m5c. Derivatives of m5c include, but are not limited to, 5-hydroxymethylcytosine (5-hmC), 5-formylcytosine (5-fC), and 5-carboxylcytosine (5-caC). In some embodiments, DNA methylation is 3C methylation (modification of the 3rd carbon of the 6-carbon ring of cytosine). In some embodiments, 3C methylation includes adding a methyl group to the 3C position of cytosine to produce 3-methylcytosine (3mC). Methylation can also occur at non-CpG sites, for example, methylation can occur at CpA, CpT or CpC sites. DNA methylation can change the activity of methylated DNA regions. For example, when the DNA in the promoter region is methylated, the transcription of the gene may be suppressed. DNA methylation is crucial for normal development, and the abnormality of methylation can destroy epigenetic regulation. The destruction of epigenetic regulation, such as suppression, can cause diseases, such as cancer. Promoter methylation in DNA can indicate cancer.

甲基化敏感性限制性内切酶(MSRE):如本文使用的,“甲基化敏感性限制性内切酶”或“MSRE”是指对DNA的甲基化状态(例如胞嘧啶甲基化)敏感的限制性内切酶,即,核苷酸碱基中甲基基团的存在或不存在改变酶裂解靶DNA的速率。在一些实施方案中,如果识别序列处的特定核苷酸碱基是甲基化的,则甲基化敏感性限制性内切酶不裂解DNA。例如,HpaII是识别序列为“CCGG”的甲基化敏感限制性内切酶,并且如果该识别序列中的第二个胞嘧啶是甲基化的,则HpaII不裂解DNA。在一些实施方案中,如果识别序列处的特定核苷酸碱基是甲基化的,则甲基化敏感性限制性内切酶裂解DNA。例如,SgeI是识别序列为“5mCNNG(N)9”的甲基化敏感性限制性内切酶,并且如果识别序列中的胞嘧啶是甲基化的(5mC),则SgeI裂解DNA。作为另一实例,FspEI是识别序列为“C5mC(N)12”的甲基化敏感性限制性内切酶,并且如果识别序列中的胞嘧啶是甲基化的(5mC),则FspEI裂解DNA。图1是当限制性内切酶(RE)识别位点包含未甲基化的核苷酸时,甲基化敏感性限制性内切酶(MSRE)消化/裂解DNA的示意图(上图);以及当限制性内切酶(RE)识别位点(虚线框)在影响MSRE活性的位置处包含甲基化核苷酸时,甲基化敏感性限制性内切酶(MSRE)不裂解DNA的示意图(下图)。在一些实施方案中,相对于相同识别位点的未甲基化形式,MSRE对甲基化识别位点的酶活性高至少10倍、20倍、50倍或100倍。在一些实施方案中,相对于相同识别位点的甲基化形式,MSRE对未甲基化识别位点的酶活性高至少10倍、20倍、50倍或100倍。Methylation-sensitive restriction endonuclease (MSRE): As used herein, "methylation-sensitive restriction endonuclease" or "MSRE" refers to a restriction endonuclease that is sensitive to the methylation state of DNA (e.g., cytosine methylation), i.e., the presence or absence of a methyl group in a nucleotide base changes the rate at which the enzyme cleaves the target DNA. In some embodiments, if a specific nucleotide base at the recognition sequence is methylated, the methylation-sensitive restriction endonuclease does not cleave DNA. For example, HpaII is a methylation-sensitive restriction endonuclease with a recognition sequence of "CCGG", and if the second cytosine in the recognition sequence is methylated, HpaII does not cleave DNA. In some embodiments, if a specific nucleotide base at the recognition sequence is methylated, the methylation-sensitive restriction endonuclease cleaves DNA. For example, Sgel is a methylation-sensitive restriction endonuclease with a recognition sequence of " 5mCNNG (N) 9 ", and if the cytosine in the recognition sequence is methylated ( 5mC ), Sgel cleaves DNA. As another example, FspEI is a methylation-sensitive restriction endonuclease with a recognition sequence of "C 5m C(N) 12 ", and if the cytosine in the recognition sequence is methylated ( 5m C), FspEI cleaves DNA. Figure 1 is a schematic diagram (upper figure) of methylation-sensitive restriction endonuclease (MSRE) digestion/cleavage of DNA when the restriction endonuclease (RE) recognition site contains unmethylated nucleotides; and a schematic diagram (lower figure) of methylation-sensitive restriction endonuclease (MSRE) not cleaving DNA when the restriction endonuclease (RE) recognition site (dashed box) contains methylated nucleotides at a position that affects the activity of MSRE. In some embodiments, the enzymatic activity of MSRE to methylated recognition sites is at least 10 times, 20 times, 50 times, or 100 times higher relative to the unmethylated form of the same recognition site. In some embodiments, the enzymatic activity of MSRE to unmethylated recognition sites is at least 10 times, 20 times, 50 times, or 100 times higher relative to the methylated form of the same recognition site.

甲基化状态:如本文使用的,“甲基化状态”可以指核酸分子中特定基因组位置处DNA碱基(例如胞嘧啶)上甲基基团的存在或不存在。它也可以指核酸序列中的甲基化程度(例如,高度甲基化、低甲基化、中等甲基化或未甲基化的核酸分子)。甲基化状态也可以指特定核酸分子中甲基化核苷酸的数目。Methylation state: As used herein, "methylation state" can refer to the presence or absence of a methyl group on a DNA base (e.g., cytosine) at a particular genomic position in a nucleic acid molecule. It can also refer to the degree of methylation in a nucleic acid sequence (e.g., a highly methylated, hypomethylated, moderately methylated, or unmethylated nucleic acid molecule). Methylation state can also refer to the number of methylated nucleotides in a particular nucleic acid molecule.

突变:如本文使用的,“突变”是指从已知的参考序列的变异,并且包括突变诸如,例如,单核苷酸变异(SNV)和插入/缺失。突变可以是种系突变或体细胞突变。在一些实施方案中,用于比较目的的参考序列是提供测试样品的受试者的物种的野生型基因组序列,通常是人类基因组。Mutation: As used herein, "mutation" refers to a variation from a known reference sequence, and includes mutations such as, for example, single nucleotide variations (SNVs) and insertions/deletions. Mutations can be germline mutations or somatic mutations. In some embodiments, the reference sequence for comparison purposes is the wild-type genomic sequence of the species of the subject providing the test sample, typically the human genome.

赘生物:如本文使用的,术语“赘生物”和“肿瘤”可互换使用。它们是指受试者的细胞的异常生长。赘生物或肿瘤可以是良性的、潜在恶性的或恶性的。恶性肿瘤是指癌症或癌性肿瘤。Neoplasm: As used herein, the terms "neoplasm" and "tumor" are used interchangeably. They refer to an abnormal growth of cells of a subject. A neoplasm or tumor may be benign, potentially malignant, or malignant. A malignant tumor refers to a cancer or cancerous tumor.

下一代测序:如本文使用的,“下一代测序”或“NGS”是指与传统的基于Sanger和毛细管电泳的方法相比具有增加的通量的测序技术,例如,具有一次产生数十万个相对较小的序列读段的能力。下一代测序技术的一些实例包括但不限于合成测序、连接测序和杂交测序。在一些实施方案中,下一代测序包括使用能够对单个分子进行测序的仪器。用于进行下一代测序的商业上可得的仪器的实例包括但不限于NextSeq、HiSeq、NovaSeq、MiSeq、IonPGM和Ion GeneStudio S5。Next Generation Sequencing: As used herein, "Next Generation Sequencing" or "NGS" refers to sequencing technologies with increased throughput compared to traditional Sanger and capillary electrophoresis-based methods, e.g., the ability to generate hundreds of thousands of relatively small sequence reads at a time. Some examples of next generation sequencing technologies include, but are not limited to, sequencing by synthesis, sequencing by ligation, and sequencing by hybridization. In some embodiments, next generation sequencing includes the use of an instrument capable of sequencing a single molecule. Examples of commercially available instruments for performing next generation sequencing include, but are not limited to, NextSeq, HiSeq, NovaSeq, MiSeq, IonPGM, and Ion GeneStudio S5.

核酸标签:如本文使用的,“核酸标签”是指用于区分来自不同样品的核酸(例如,呈现为样品索引(sample index)),区分来自不同分区的核酸(例如,呈现为分区标签)、或同一样品中不同类型的或经历不同处理的或不同核酸分子(例如,呈现为分子条形码)的短核酸(例如,长度小于约500个核苷酸、约100个核苷酸、约50个核苷酸或约10个核苷酸)。核酸标签包含预定的、固定的、非随机的、随机的或半随机的寡核苷酸序列。这样的核酸标签可用于标记不同的核酸分子或不同的核酸样品或子样品。核酸标签可以是单链的、双链的或至少部分双链的。核酸标签任选地具有相同的长度或不同的长度。核酸标签还可以包括具有一个或更多个平末端的双链分子,包括5’或3’单链区域(例如,突出端),和/或包括在特定分子内的其他位置处的一个或更多个其他单链区域。核酸标签可以被附接至其他核酸(例如,待被扩增和/或测序的样品核酸)的一端或两端。核酸标签可以被解码以揭示诸如特定核酸的样品来源、形式或处理的信息。例如,核酸标签也可以用于使包含带有不同分子条形码和/或样品索引的核酸的多个样品的汇集和/或并行处理成为可能,其中核酸随后通过检测(例如,读取)核酸标签被解卷积。核酸标签也可以称为标识符(例如分子标识符、样品标识符)。另外地或可选地,核酸标签可以用作分子标识符(例如,用于区分同一样品或子样品中的不同分子或不同亲本分子的扩增子)。这包括,例如,对特定样品中的不同的核酸分子独特地加标签,或对这样的分子非独特地加标签。在非独特加标签应用的情况下,可以使用有限数目的标签(即分子条形码)对每个核酸分子加标签,使得不同分子可以基于其内源序列信息(例如,其映射至所选择的参考基因组的起始和/或终止位置、序列的一端或两端的子序列和/或序列的长度)与至少一个分子条形码的组合而被区分。通常,使用足够数目的不同分子条形码,使得任何两个分子可具有相同的内源序列信息(例如,起始和/或终止位置、序列的一个或两个末端的子序列和/或长度)以及还具有相同的分子条形码的概率较低(例如,小于约10%、小于约5%、小于约1%或小于约0.1%的概率)。Nucleic acid tag: As used herein, "nucleic acid tag" refers to a short nucleic acid (e.g., less than about 500 nucleotides, about 100 nucleotides, about 50 nucleotides, or about 10 nucleotides in length) used to distinguish nucleic acids from different samples (e.g., presented as a sample index), nucleic acids from different partitions (e.g., presented as a partition tag), or different types of nucleic acid molecules or nucleic acid molecules that have undergone different treatments in the same sample (e.g., presented as a molecular barcode). Nucleic acid tags contain predetermined, fixed, non-random, random, or semi-random oligonucleotide sequences. Such nucleic acid tags can be used to label different nucleic acid molecules or different nucleic acid samples or subsamples. Nucleic acid tags can be single-stranded, double-stranded, or at least partially double-stranded. Nucleic acid tags optionally have the same length or different lengths. Nucleic acid tags can also include double-stranded molecules with one or more blunt ends, including 5' or 3' single-stranded regions (e.g., overhangs), and/or include one or more other single-stranded regions at other positions within a particular molecule. Nucleic acid tags can be attached to one or both ends of other nucleic acids (e.g., sample nucleic acids to be amplified and/or sequenced). Nucleic acid tags can be decoded to reveal information such as sample sources, forms or processing of specific nucleic acids. For example, nucleic acid tags can also be used to make it possible to collect and/or process in parallel a plurality of samples containing nucleic acids with different molecular barcodes and/or sample indexes, wherein nucleic acids are subsequently deconvoluted by detecting (e.g., reading) nucleic acid tags. Nucleic acid tags can also be referred to as identifiers (e.g., molecular identifiers, sample identifiers). Additionally or alternatively, nucleic acid tags can be used as molecular identifiers (e.g., for distinguishing between different molecules or amplicons of different parental molecules in the same sample or subsample). This includes, for example, uniquely labeling different nucleic acid molecules in a particular sample, or non-uniquely labeling such molecules. In the case of non-unique tagging applications, each nucleic acid molecule can be tagged with a limited number of tags (i.e., molecular barcodes) so that different molecules can be distinguished based on a combination of their endogenous sequence information (e.g., the start and/or end position mapped to a selected reference genome, a subsequence at one or both ends of the sequence, and/or the length of the sequence) and at least one molecular barcode. Typically, a sufficient number of different molecular barcodes are used so that any two molecules may have the same endogenous sequence information (e.g., start and/or end position, a subsequence at one or both ends of the sequence, and/or length) and also have a low probability of having the same molecular barcode (e.g., a probability of less than about 10%, less than about 5%, less than about 1%, or less than about 0.1%).

分区:如本文使用的,“分区”是指基于核酸分子的特征对样品中的核酸分子混合物进行物理分离或分级。分区可以是分子的物理分区。分区可以包括基于表观遗传特征(例如,甲基化)水平将核酸分子分成群或组。例如,核酸分子可以基于核酸分子的甲基化水平来分区。在一些实施方案中,用于分区的方法和系统可见于PCT专利申请第PCT/US2017/068329号中,在此将该申请通过引用以其整体并入。在分区之后,分离或分级的核酸分子的群或组在本文也被称为级分、分区或分区组。Partition: As used herein, "partition" refers to physical separation or classification of a mixture of nucleic acid molecules in a sample based on the characteristics of nucleic acid molecules. Partitions can be physical partitions of molecules. Partitions can include dividing nucleic acid molecules into groups or groups based on epigenetic features (e.g., methylation) levels. For example, nucleic acid molecules can be partitioned based on the methylation levels of nucleic acid molecules. In some embodiments, methods and systems for partitioning can be found in PCT patent application No. PCT/US2017/068329, which is incorporated herein by reference in its entirety. After partitioning, groups or groups of separated or graded nucleic acid molecules are also referred to herein as fractions, partitions, or partition groups.

分区组:如本文使用的,“分区组”或“分区”是指基于核酸分子或与核酸分子关联的蛋白与结合剂的不同结合亲和力被分区为组(set)或群(group)的核酸分子组。结合剂优先地结合包含具有表观遗传修饰的核苷酸的核酸分子。例如,如果表观遗传修饰是甲基化,则结合剂可以是甲基结合结构域(MBD)蛋白。在一些实施方案中,分区组可以包含属于特定水平或程度的表观遗传特征(例如甲基化)的核酸分子。例如,核酸分子可以被分区为三个组——一组为高甲基化的核酸分子(高分区组或高甲基化分区组),第二组为低甲基化的核酸分子(低分区组或低甲基化分区组),并且第三组为中等甲基化的核酸分子(中等分区组或中等甲基化分区组)。在另一种实例中,核酸分子可以基于甲基化核苷酸数目来分区——一个分区组可以具有带有9个甲基化核苷酸的核酸分子,并且另一个分区组可以具有未甲基化的核酸分子(零甲基化核苷酸)。Partition group: As used herein, "partition group" or "partition" refers to a group of nucleic acid molecules that are partitioned into a set or group based on the different binding affinities of nucleic acid molecules or proteins associated with nucleic acid molecules and binding agents. The binding agent preferentially binds to nucleic acid molecules containing nucleotides with epigenetic modifications. For example, if the epigenetic modification is methylation, the binding agent can be a methyl binding domain (MBD) protein. In some embodiments, the partition group can include nucleic acid molecules belonging to epigenetic features (e.g., methylation) of a specific level or degree. For example, nucleic acid molecules can be partitioned into three groups-one group is a highly methylated nucleic acid molecule (high partition group or high methylation partition group), a second group is a low methylated nucleic acid molecule (low partition group or low methylation partition group), and a third group is a medium methylated nucleic acid molecule (medium partition group or medium methylation partition group). In another example, nucleic acid molecules can be partitioned based on the number of methylated nucleotides-one partition group can have a nucleic acid molecule with 9 methylated nucleotides, and another partition group can have an unmethylated nucleic acid molecule (zero methylated nucleotide).

多核苷酸:如本文使用的,“多核苷酸”、“核酸”、“核酸分子”或“寡核苷酸”是指通过核苷间键连接的核苷(包括脱氧核糖核苷、核糖核苷或其类似物)的线性聚合物。通常,多核苷酸包含至少三个核苷。寡核苷酸的尺寸范围通常从几个单体单元例如3-4个到几百个单体单元。当多核苷酸以字母序列表示时,诸如“ATGCCTG”,核苷酸以5’→3’顺序从左至右呈现,并且在DNA的情况下,“A”表示脱氧腺苷,“C”表示脱氧胞苷,“G”表示脱氧鸟苷,并且“T”表示脱氧胸苷,除非另有说明。字母A、C、G和T可以用于指碱基本身、核苷或包含碱基的核苷酸。Polynucleotide: As used herein, "polynucleotide", "nucleic acid", "nucleic acid molecule" or "oligonucleotide" refers to a linear polymer of nucleosides (including deoxyribonucleosides, ribonucleosides or their analogs) linked by internucleoside bonds. Typically, a polynucleotide contains at least three nucleosides. The size of an oligonucleotide typically ranges from a few monomer units, such as 3-4, to hundreds of monomer units. When a polynucleotide is represented by a letter sequence, such as "ATGCCTG", the nucleotides are presented from left to right in a 5'→3' order, and in the case of DNA, "A" represents deoxyadenosine, "C" represents deoxycytidine, "G" represents deoxyguanosine, and "T" represents deoxythymidine, unless otherwise indicated. The letters A, C, G and T can be used to refer to a base itself, a nucleoside, or a nucleotide comprising a base.

处理:如本文使用的,“处理”是指用于生成适于测序的核酸文库的一组步骤。这组步骤可以包括但不限于核酸的分区、末端修复、添加测序衔接子、加标签和/或PCR扩增。Processing: As used herein, "processing" refers to a set of steps used to generate a nucleic acid library suitable for sequencing. This set of steps may include, but is not limited to, partitioning of nucleic acids, end repair, addition of sequencing adapters, tagging, and/or PCR amplification.

定量量度:如本文使用的,“定量量度”是指绝对量度或相对量度。定量量度可以是但不限于数字、统计量度(例如,频率、平均值、中位值、标准差或分位数)或程度或相对量(例如,高、中和低)。定量量度可以是两个定量量度的比值。定量量度可以是定量量度的线性组合。定量度量可以是归一化的度量。Quantitative measure: As used herein, "quantitative measure" refers to an absolute measure or a relative measure. A quantitative measure can be, but is not limited to, a number, a statistical measure (e.g., frequency, mean, median, standard deviation, or quantile) or a degree or relative amount (e.g., high, medium, and low). A quantitative measure can be the ratio of two quantitative measures. A quantitative measure can be a linear combination of quantitative measures. A quantitative measure can be a normalized measure.

参考序列:如本文使用的,“参考序列”是指用于与实验确定的序列进行比较的目的的已知序列。例如,已知序列可以是整个基因组、染色体,或它们的任何区段。参考序列可以与基因组或染色体或染色体臂的单个连续序列对齐,或者可以包括与基因组或染色体的不同区域对齐的非连续区段。参考序列的实例包括,例如,人类基因组,诸如,hg19和hg38。Reference sequence: As used herein, a "reference sequence" refers to a known sequence for the purpose of comparison with an experimentally determined sequence. For example, the known sequence can be an entire genome, a chromosome, or any segment thereof. The reference sequence can be aligned to a single continuous sequence of a genome or chromosome or chromosome arm, or can include non-contiguous segments aligned to different regions of a genome or chromosome. Examples of reference sequences include, for example, human genomes, such as hg19 and hg38.

限制性内切酶:如本文使用的,“限制性内切酶”是在特定识别位点或其附近识别和裂解DNA的酶。Restriction endonuclease: As used herein, "restriction endonuclease" is an enzyme that recognizes and cleaves DNA at or near a specific recognition site.

样品:如本文使用的,“样品”意指能够通过本文公开的方法和/或系统分析的任何事物。Sample: As used herein, "sample" means anything capable of being analyzed by the methods and/or systems disclosed herein.

测序:如本文使用的,“测序”是指用于确定生物分子例如核酸诸如DNA或RNA的序列(例如,单体单元的身份和顺序)的许多技术中的任一种。测序方法的实例包括但不限于靶向测序、单分子实时测序、外显子或外显子组测序、内含子测序、基于电子显微术的测序、panel测序、晶体管介导的测序、直接测序、随机鸟枪法测序、Sanger双脱氧终止测序、全基因组测序、杂交测序、焦磷酸测序、双链体测序、循环测序、单碱基延伸测序、固相测序、高通量测序、大规模并行信号测序(massively parallel signature sequencing)、乳液PCR、低变性温度共扩增PCR(COLD-PCR)、多重PCR、可逆染料终止子测序、配对末端测序、near-term测序、外切核酸酶测序、连接测序、短读段测序、单分子测序、合成测序、实时测序、反向终止子测序、纳米孔测序、454测序、Solexa基因组分析仪测序、SOLiDTM测序、MS-PET测序及其组合。在一些实施方案中,测序可以通过诸如,例如可从Illumina,Inc.、PacificBiosciences,Inc.或Applied Biosystems/Thermo Fisher Scientific以及许多其他的商业可获得的基因分析仪进行。Sequencing: As used herein, "sequencing" refers to any of a number of techniques for determining the sequence (e.g., identity and order of monomeric units) of a biomolecule, e.g., a nucleic acid such as DNA or RNA. Examples of sequencing methods include, but are not limited to, targeted sequencing, single molecule real-time sequencing, exon or exome sequencing, intron sequencing, electron microscopy-based sequencing, panel sequencing, transistor-mediated sequencing, direct sequencing, random shotgun sequencing, Sanger dideoxy termination sequencing, whole genome sequencing, hybridization sequencing, pyrophosphate sequencing, duplex sequencing, cycle sequencing, single base extension sequencing, solid phase sequencing, high throughput sequencing, massively parallel signature sequencing, emulsion PCR, low denaturation temperature co-amplification PCR (COLD-PCR), multiplex PCR, reversible dye terminator sequencing, paired end sequencing, near-term sequencing, exonuclease sequencing, ligation sequencing, short read sequencing, single molecule sequencing, sequencing by synthesis, real-time sequencing, reverse terminator sequencing, nanopore sequencing, 454 sequencing, Solexa Genome Analyzer sequencing, SOLiD™ sequencing, MS-PET sequencing, and combinations thereof. In some embodiments, sequencing can be performed by a commercially available genetic analyzer such as, for example, available from Illumina, Inc., Pacific Biosciences, Inc., or Applied Biosystems/Thermo Fisher Scientific, among many others.

序列信息:如本文在核酸聚合物的上下文中使用的“序列信息”意指该聚合物中单体单元(例如,核苷酸等)的顺序和身份。Sequence information: "Sequence information" as used herein in the context of a nucleic acid polymer means the order and identity of the monomeric units (eg, nucleotides, etc.) in the polymer.

序列可变靶区组:如本文使用的,“序列可变靶区组”是指在赘生性细胞(例如肿瘤细胞和癌细胞)中可表现出序列变化(诸如核苷酸取代、插入、缺失、或基因融合或转座)的一组靶区。在一些实施方案中,核苷酸取代是单核苷酸变异。Sequence variable target region group: As used herein, "sequence variable target region group" refers to a group of target regions that can exhibit sequence changes (such as nucleotide substitutions, insertions, deletions, or gene fusions or transpositions) in neoplastic cells (e.g., tumor cells and cancer cells). In some embodiments, the nucleotide substitution is a single nucleotide variation.

体细胞突变:如本文使用的,术语“体细胞突变”或“体细胞变异”可互换使用。它们是指受孕后发生的基因组中的突变。体细胞突变可以发生在除生殖细胞外的任何身体细胞中,并且因此不会传给子代。Somatic mutation: As used herein, the terms "somatic mutation" or "somatic variation" are used interchangeably. They refer to mutations in the genome that occur after conception. Somatic mutations can occur in any cell of the body except germ cells and are therefore not passed on to offspring.

特异性结合:如本文使用的,在探针或其他寡核苷酸与靶序列的上下文中的“特异性结合”意指在适当的杂交条件下,寡核苷酸或探针与它的靶序列或其复制物杂交,以形成稳定的探针:靶杂交体,而同时稳定的探针:非靶杂交体的形成最小化。因此,探针与靶序列或其复制物杂交的程度比与非靶序列杂交的程度大到足以实现对靶序列的捕获或检测。适当的杂交条件是本领域熟知的,可以基于序列组成来预测,或者可以通过使用常规测试方法来确定(参见,例如,Sambrook等人,Molecular Cloning,ALaboratory Manual,第二版.(Cold Spring Harbor Laboratory Press,Cold Spring Harbor,NY,1989)在§§1.90-1.91,7.37-7.57,9.47-9.51和11.47-11.57,特别是§§9.50-9.51,11.12-11.13,11.45-11.47和11.55-11.57,通过引用并入本文)。Specific binding: As used herein, "specific binding" in the context of a probe or other oligonucleotide and a target sequence means that under appropriate hybridization conditions, the oligonucleotide or probe hybridizes to its target sequence or a copy thereof to form a stable probe: target hybrid, while minimizing the formation of stable probe: non-target hybrids. Thus, the probe hybridizes to a target sequence or a copy thereof to a greater extent than to a non-target sequence to achieve capture or detection of the target sequence. Appropriate hybridization conditions are well known in the art and can be predicted based on sequence composition or can be determined by use of routine testing procedures (see, e.g., Sambrook et al., Molecular Cloning, A Laboratory Manual, 2nd ed. (Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY, 1989) at §§1.90-1.91, 7.37-7.57, 9.47-9.51 and 11.47-11.57, in particular §§9.50-9.51, 11.12-11.13, 11.45-11.47 and 11.55-11.57, incorporated herein by reference).

受试者:如本文使用的,“受试者”是指动物,诸如哺乳动物物种(例如,人类),或禽类(例如,鸟类)物种,或其他生物体,诸如植物。更具体地,受试者可以是脊椎动物,例如,哺乳动物,诸如小鼠、灵长类动物、猿或人类。动物包括农场动物(例如,生产用牛(productioncattle)、奶牛、家禽、马、猪等)、运动动物和伴侣动物(例如,宠物或支持动物)。受试者可以是健康的个体、患有或被怀疑患有一种疾病或有患该疾病倾向的个体或需要治疗或被怀疑需要治疗的个体。术语“个体”或“患者”意在与“受试者”可互换。例如,受试者可以是已经被诊断为患有癌症、将要接受癌症治疗和/或已经接受至少一种癌症治疗的个体。受试者可以处于癌症缓解中。作为另一实例,受试者可以是被诊断为患有自身免疫性疾病的个体。作为另一实例,受试者可以是妊娠或计划妊娠的女性个体,其可能已经被诊断患有或被怀疑患有疾病,例如癌症、自身免疫疾病。Subject: As used herein, "subject" refers to an animal, such as a mammalian species (e.g., human), or an avian (e.g., bird) species, or other organisms, such as plants. More specifically, the subject can be a vertebrate, for example, a mammal, such as a mouse, a primate, an ape, or a human. Animals include farm animals (e.g., production cattle, dairy cows, poultry, horses, pigs, etc.), sports animals, and companion animals (e.g., pets or support animals). The subject can be a healthy individual, an individual suffering from or suspected of having a disease or having a tendency to suffer from the disease, or an individual who needs treatment or is suspected of needing treatment. The term "individual" or "patient" is intended to be interchangeable with "subject". For example, the subject can be an individual who has been diagnosed with cancer, is about to receive cancer treatment, and/or has received at least one cancer treatment. The subject can be in cancer remission. As another example, the subject can be an individual diagnosed with an autoimmune disease. As another example, the subject can be a female individual who is pregnant or planning pregnancy, who may have been diagnosed with or suspected of having a disease, such as cancer, autoimmune disease.

靶区组:如本文使用的,“靶区组(target-region set)”或“靶区组(set oftarget regions)”或“靶区”或“感兴趣的靶区”或“感兴趣的基因组区域”是指被靶向用于捕获和/或被探针组靶向(例如,通过序列互补性)的多于一个基因组基因座或多于一个基因组区域。Target region set: As used herein, "target-region set" or "set of target regions" or "target region" or "target region of interest" or "genomic region of interest" refers to more than one genomic locus or more than one genomic region that is targeted for capture and/or targeted by a probe set (e.g., by sequence complementarity).

肿瘤分数:如本文使用的,“肿瘤分数”是指对于给定样品或样品-区域对,来源于肿瘤细胞的cfDNA分子的比例。Tumor fraction: As used herein, "tumor fraction" refers to the proportion of cfDNA molecules that are derived from tumor cells for a given sample or sample-region pair.

如本文使用的术语“或其组合”和“或其组合”是指该术语之前所列术语的任何和所有排列和组合。例如,“A、B、C或其组合”旨在包括以下中的至少一种:A、B、C、AB、AC、BC或ABC,并且如果顺序在特定上下文中是重要的,则还包括BA、CA、CB、ACB、CBA、BCA、BAC或CAB。继续这个实例,明确包括了含有一个或更多个项目或术语的重复的组合,诸如BB、AAA、AAB、BBC、AAABCCCC、CBBAAA、CABABB等。本领域技术人员将理解,通常不存在对任何组合中的项目或术语的数目限制,除非另外从上下文明显。As used herein, the terms "or combinations thereof" and "or combinations thereof" refer to any and all permutations and combinations of the terms listed preceding the term. For example, "A, B, C, or combinations thereof" is intended to include at least one of: A, B, C, AB, AC, BC, or ABC, and if order is important in the particular context, also includes BA, CA, CB, ACB, CBA, BCA, BAC, or CAB. Continuing with this example, combinations containing repetitions of one or more items or terms, such as BB, AAA, AAB, BBC, AAABCCCC, CBBAAA, CABABB, etc., are expressly included. Those skilled in the art will understand that there is generally no limit to the number of items or terms in any combination unless otherwise apparent from the context.

“或”以包含性意义使用,即等同于“和/或”,除非上下文另有要求。"Or" is used in an inclusive sense, ie, equivalent to "and/or", unless the context requires otherwise.

详细描述Detailed Description

本文描述了本发明的某些实施方案。虽然将结合这样的实施方案描述本发明,但是应当理解它们并非意图本发明限于这些实施方案。相反,意图本发明覆盖所有替代、修改和等同方案,它们可以被包括在由所附的权利要求书限定的本发明内。Certain embodiments of the present invention are described herein. Although the present invention will be described in conjunction with such embodiments, it should be understood that they are not intended that the present invention is limited to these embodiments. On the contrary, it is intended that the present invention covers all substitutions, modifications and equivalents, which may be included in the present invention limited by the appended claims.

数值范围包括限定该范围的数字。考虑到与测量相关的有效数字和误差,测量值和可测量值应被理解为是近似的。此外,“包含(comprise)”、“包含(comprises)”、“包含(comprising)”、“含有(contain)”、“含有(contains)”、“含有(containing)”、“包括(include)”、“包括(includes)”和“包括(including)”的使用并非意图限制。应该理解,前述概括描述和详细描述两者仅是示例性的和说明性的,而不是限制本教导。Numerical ranges include the numbers that define the range. Measurements and measurable values are understood to be approximate, taking into account significant figures and errors associated with the measurements. In addition, the use of "comprise," "comprises," "comprising," "contain," "contains," "containing," "include," "includes," and "including" is not intended to be limiting. It should be understood that both the foregoing general description and the detailed description are exemplary and illustrative only, and are not intended to limit the present teachings.

除非在以上说明书中具体提到,否则说明书中叙述的“包含/包括”各种组分/组件的实施方案也被设想为“由所叙述的组分/组件组成”或“主要由所叙述的组分/组件组成”;说明书中叙述的“由各种组分/组件组成”的实施方案也被设想为“包含/包括”或“主要由所叙述的组分/组件组成”;并且说明书中叙述的“主要由各种组分/组件组成”的实施方案也被设想为“由所叙述的组分/组件组成”或“包含/包括”所叙述的组分/组件(这种可互换性不应用于权利要求书中对这些术语的使用)。Unless specifically mentioned in the above description, the embodiments described in the specification that "include/comprise" various components/components are also contemplated as "consisting of the described components/components" or "consisting mainly of the described components/components"; the embodiments described in the specification that "consisting of various components/components" are also contemplated as "consisting of/including" or "consisting mainly of the described components/components"; and the embodiments described in the specification that "consisting mainly of various components/components" are also contemplated as "consisting of the described components/components" or "include/comprise" the described components/components (this interchangeability does not apply to the use of these terms in the claims).

本文使用的章节标题用于组织目的,而不应被解释为以任何方式限制所公开的主题。在通过引用并入的任何文件或其他材料与本说明书的任何明确内容(包括定义)相矛盾的情况下,以本说明书为准。The section headings used herein are for organizational purposes and should not be construed as limiting the disclosed subject matter in any way. In the event that any document or other material incorporated by reference contradicts any explicit content of this specification, including definitions, this specification controls.

I.概述I. Overview

癌症的形成和进展可以由脱氧核糖核酸(DNA)的遗传修饰和表观遗传特征两者引起。本公开内容提供了用于分析DNA(诸如无细胞DNA(cfDNA))的方法和系统。本公开内容提供了用于提高甲基化分区测定的信噪比的方法和系统。The formation and progression of cancer can be caused by both genetic modifications and epigenetic features of deoxyribonucleic acid (DNA). The present disclosure provides methods and systems for analyzing DNA, such as cell-free DNA (cfDNA). The present disclosure provides methods and systems for improving the signal-to-noise ratio of methylation partitioning assays.

不希望受任何特定理论的约束,癌症或赘生物中或周围的细胞可能比健康受试者中相同组织类型的细胞脱落更多的DNA。因此,某些DNA样品(诸如cfDNA)的来源组织的分布可能在癌变时变化。因此,例如,显示出比至少一种其他组织类型低的甲基化的健康的cfDNA中高甲基化可变靶区水平的增加,可以作为癌症存在(或复发,取决于受试者的病史)的指标。类似地,样品中低甲基化可变靶区水平的增加可以是癌症存在(或复发,取决于受试者的病史)的指标。Without wishing to be bound by any particular theory, cells in or around cancer or neoplasms may shed more DNA than cells of the same tissue type in healthy subjects. Therefore, the distribution of the source tissue of certain DNA samples (such as cfDNA) may change when cancerous. Therefore, for example, an increase in the level of high methylation variable target regions in healthy cfDNA that shows low methylation than at least one other tissue type can be used as an indicator of cancer presence (or recurrence, depending on the subject's medical history). Similarly, an increase in the level of low methylation variable target regions in a sample can be an indicator of cancer presence (or recurrence, depending on the subject's medical history).

此外,癌症可以由非序列修饰(诸如甲基化)来指示。癌症中甲基化变化的实例包括参与正常生长控制、DNA修复、细胞周期调控和/或细胞分化的基因的TSS处的CpG岛的DNA甲基化的局部增加。这种高甲基化可能与相关基因转录能力的异常损失相关,并且至少与引起基因表达改变的点突变和缺失一样频繁地发生。DNA甲基化谱分析(profiling)可用于检测基因组中具有不同程度甲基化的区域(“差异性甲基化的区域”或“DMR”),诸如在发育期间或由疾病(例如,癌症或任何癌症相关疾病)引起的异常甲基化。例如,可以使用DNA甲基化谱分析来检测在给定样品类型(例如,来自血流的cfDNA)中正常高甲基化或低甲基化但可能显示出与赘生物或癌症相关的异常甲基化程度(例如,由于组织对样品类型的贡献异常增加(例如,由于赘生物或癌症中或周围的DNA脱落增加)和/或来自于甲基化程度的贡献异常增加)的区域。In addition, cancer can be indicated by non-sequence modification (such as methylation).The example of methylation change in cancer includes the local increase of DNA methylation of CpG island at TSS of genes involved in normal growth control, DNA repair, cell cycle regulation and/or cell differentiation.This hypermethylation may be associated with the abnormal loss of transcriptional ability of related genes, and occurs at least as frequently as point mutations and deletions that cause gene expression changes.DNA methylation spectrum analysis (profiling) can be used to detect regions with different degrees of methylation in the genome ("differential methylation regions" or "DMR"), such as during development or by abnormal methylation caused by disease (e.g., cancer or any cancer-related disease).For example, DNA methylation spectrum analysis can be used to detect normal high methylation or low methylation in a given sample type (e.g., cfDNA from bloodstream) but may show abnormal methylation degree related to neoplasm or cancer (e.g., due to abnormal increase in contribution of tissue to sample type (e.g., due to increased DNA shedding in or around neoplasm or cancer) and/or abnormal increase in contribution from degree of methylation) region.

在一些实施方案中,DNA甲基化包括向CpG位点(胞嘧啶-磷酸-鸟嘌呤位点(即,按核酸序列的5’->3’方向胞嘧啶之后为鸟嘌呤))处的胞嘧啶残基添加甲基基团。在一些实施方案中,DNA甲基化包括向腺嘌呤残基添加甲基基团,诸如在N6-甲基腺嘌呤中。在一些实施方案中,DNA甲基化是5-甲基化(对胞嘧啶6碳环的第5个碳的修饰)。在一些实施方案中,5-甲基化包括向胞嘧啶残基的5C位置添加甲基基团以产生5-甲基胞嘧啶(m5c或5-mC或5mC)。在一些实施方案中,甲基化包括m5c的衍生物。m5c的衍生物包括但不限于5-羟甲基胞嘧啶(5-hmC或5hmC)、5-甲酰基胞嘧啶(5-fC)和5-羧基胞嘧啶(5-caC)。在一些实施方案中,DNA甲基化是3C甲基化(对胞嘧啶残基的6碳环的第3个碳的修饰)。在一些实施方案中,3C甲基化包括向胞嘧啶残基的3C位置添加甲基基团以产生3-甲基胞嘧啶(3mC)。甲基化也可以发生在非CpG位点处,例如,甲基化可以发生在CpA、CpT或CpC位点处。DNA甲基化可以改变甲基化DNA区域的活性。例如,当启动子区域中的DNA被甲基化时,基因的转录可以被抑制。DNA甲基化对于正常发育是关键的,并且甲基化的异常可以破坏表观遗传调控。表观遗传调控的破坏,例如抑制,可以引起疾病,诸如癌症。DNA中的启动子甲基化可以指示癌症。In some embodiments, DNA methylation comprises adding a methyl group to a cytosine residue at a CpG site (cytosine-phosphate-guanine site (i.e., guanine follows cytosine in the 5'->3' direction of the nucleic acid sequence)). In some embodiments, DNA methylation comprises adding a methyl group to an adenine residue, such as in N 6 -methyladenine. In some embodiments, DNA methylation is 5-methylation (modification of the 5th carbon of the 6-carbon ring of cytosine). In some embodiments, 5-methylation comprises adding a methyl group to the 5C position of a cytosine residue to produce 5-methylcytosine (m5c or 5-mC or 5mC). In some embodiments, methylation comprises a derivative of m5c. Derivatives of m5c include, but are not limited to, 5-hydroxymethylcytosine (5-hmC or 5hmC), 5-formylcytosine (5-fC), and 5-carboxylcytosine (5-caC). In some embodiments, DNA methylation is 3C methylation (modification of the 3rd carbon of the 6-carbon ring of a cytosine residue). In some embodiments, 3C methylation includes adding a methyl group to the 3C position of a cytosine residue to produce 3-methylcytosine (3mC). Methylation can also occur at non-CpG sites, for example, methylation can occur at CpA, CpT or CpC sites. DNA methylation can change the activity of methylated DNA regions. For example, when the DNA in the promoter region is methylated, transcription of the gene can be inhibited. DNA methylation is critical for normal development, and methylated abnormalities can destroy epigenetic regulation. The destruction of epigenetic regulation, such as inhibition, can cause diseases, such as cancer. Promoter methylation in DNA can indicate cancer.

甲基化谱分析可以包括确定遍及基因组的不同区域的甲基化模式。例如,在基于甲基化程度(例如,每个分子中甲基化核苷酸的相对数目)对分子进行分区和测序之后,可以将不同分区中的分子的序列映射到参考基因组。这可以示出基因组中与其他区域相比甲基化更高或甲基化更低的区域。这样,与单个分子相反,基因组区域的甲基化程度可以不同。Methylation spectrum analysis can include determining the methylation pattern throughout different regions of the genome. For example, after the molecules are partitioned and sequenced based on the degree of methylation (e.g., the relative number of methylated nucleotides in each molecule), the sequences of the molecules in the different partitions can be mapped to the reference genome. This can show regions in the genome that are more methylated or less methylated than other regions. In this way, in contrast to a single molecule, the degree of methylation in a genomic region can be different.

将从甲基化谱分析获得的信号与从体细胞变异(例如,SNV、插入/缺失、CNV和基因融合)获得的信号相结合有助于癌症的检测。Combining the signals obtained from methylation profiling with those obtained from somatic variants (e.g., SNVs, indels, CNVs, and gene fusions) can aid in the detection of cancer.

样品中的核酸分子可以基于核酸分子的甲基化状态进行分级或分区。对样品中的核酸分子进行分区可以增加罕见信号。例如,存在于高甲基化DNA中但在低甲基化DNA中较少存在(或不存在)的遗传变异可以通过将样品分区为高甲基化和低甲基化核酸分子而容易地检测。通过分析样品的多于一个级分,可以对单个分子进行多维分析,并且从而可以实现更大的灵敏度。分区可以包括基于一个或更多个甲基化核苷酸(例如,包含甲基化碱基的核苷酸)的存在或不存在将核酸分子物理地分区为亚组或群。可以基于指示差异基因表达或疾病状态的特征将样品分级或分区为一个或更多个分区组。在核酸(例如,无细胞DNA(“cfDNA”)、非cfDNA、肿瘤DNA、循环肿瘤DNA(“ctDNA”)和无细胞核酸(“cfNA”))的分析过程中,可以基于提供正常状态和疾病状态之间的信号差异的特征或其组合来对样品进行分级。Nucleic acid molecules in a sample can be graded or partitioned based on the methylation state of nucleic acid molecules. Partitioning nucleic acid molecules in a sample can increase rare signals. For example, genetic variations present in hypermethylated DNA but less present (or absent) in hypomethylated DNA can be easily detected by partitioning the sample into hypermethylated and hypomethylated nucleic acid molecules. By analyzing more than one fraction of a sample, a single molecule can be multidimensionally analyzed, and thereby greater sensitivity can be achieved. Partitioning can include physically partitioning nucleic acid molecules into subgroups or groups based on the presence or absence of one or more methylated nucleotides (e.g., nucleotides comprising methylated bases). Samples can be graded or partitioned into one or more partition groups based on the features indicating differential gene expression or disease states. In the analysis of nucleic acids (e.g., cell-free DNA ("cfDNA"), non-cfDNA, tumor DNA, circulating tumor DNA ("ctDNA"), and cell-free nucleic acids ("cfNA")), samples can be graded based on the features or combinations thereof that provide signal differences between normal states and disease states.

在一些实施方案中,基于甲基化的核酸分子与结合剂(即,与甲基化核苷酸(例如,包含甲基化碱基的核苷酸)结合的结合剂)的不同结合亲和力,可以将样品分区为两个或更多个分区组(例如,至少3个、4个、5个、6个或7个分区组)。结合剂的实例包括但不限于甲基结合结构域(MBD)和甲基结合蛋白(MBP)。本文设想的MBP的实例包括,但不限于:In some embodiments, the sample can be partitioned into two or more partition groups (e.g., at least 3, 4, 5, 6, or 7 partition groups) based on the different binding affinities of the methylated nucleic acid molecules to the binding agent (i.e., a binding agent that binds to a methylated nucleotide (e.g., a nucleotide comprising a methylated base). Examples of binding agents include, but are not limited to, methyl binding domains (MBDs) and methyl binding proteins (MBPs). Examples of MBPs contemplated herein include, but are not limited to:

(a)相比于结合未修饰的胞嘧啶,优先结合5-甲基-胞嘧啶的蛋白MeCP2和MBD2;(a) Proteins MeCP2 and MBD2 that preferentially bind 5-methyl-cytosine over unmodified cytosine.

(b)相比于结合未修饰的胞嘧啶,优先结合5-羟甲基-胞嘧啶的RPL26、PRP8和DNA错配修复蛋白MHS6;(b) RPL26, PRP8, and the DNA mismatch repair protein MHS6 that preferentially bind to 5-hydroxymethyl-cytosine over unmodified cytosine;

(c)相比于结合未修饰的胞嘧啶,优先结合5-甲酰基-胞嘧啶的FOXK1、FOXK2、FOXP1、FOXP4和FOXI3(Iurlaro等人,Genome Biol.14,R119(2013));和(c) FOXK1, FOXK2, FOXP1, FOXP4, and FOXI3 that preferentially bind 5-formyl-cytosine over unmodified cytosine (Iurlaro et al., Genome Biol. 14, R119 (2013)); and

(d)对一个或更多个甲基化核苷酸碱基特异性的抗体。(d) Antibodies specific for one or more methylated nucleotide bases.

在这样的实施方案中,与修饰未被充分代表的核酸(nucleic acidsunderrepresented in the modification)相比,修饰被过度代表的核酸(nucleic acidsoverrepresented in a modification)以更大的程度与剂结合。可选地,具有修饰的核酸可以以全或无的方式结合。但是然后,各种水平的修饰可以从结合剂顺序洗脱。In such an embodiment, nucleic acids overrepresented in a modification are bound to the agent to a greater extent than nucleic acids underrepresented in the modification. Alternatively, nucleic acids with modifications can be bound in an all-or-nothing manner. However, then, various levels of modifications can be sequentially eluted from the binding agent.

例如,在一些实施方案中,分区可以是二元的或者基于甲基化的程度/水平。例如,可以使用甲基结合结构域蛋白(例如MethylMiner甲基化DNA富集试剂盒(ThermoFisherScientific))将所有甲基化片段与未甲基化片段分区。随后,另外的分区可以包括通过调整含有甲基结合结构域和结合片段的溶液的盐浓度来洗脱具有不同甲基化水平的片段。随着盐浓度增加,具有更大甲基化水平的片段被洗脱。For example, in some embodiments, the partition can be binary or based on the degree/level of methylation. For example, all methylated fragments and unmethylated fragments can be partitioned using a methyl binding domain protein (e.g., MethylMiner methylated DNA enrichment kit (ThermoFisherScientific)). Subsequently, other partitions can include eluting fragments with different methylation levels by adjusting the salt concentration of a solution containing a methyl binding domain and a binding fragment. As the salt concentration increases, fragments with greater methylation levels are eluted.

与标准甲基化分析方法(例如亚硫酸氢盐测序)相比,甲基化分区方法在回收分析物分子方面高度有效,并且能够同时检测体细胞变化。然而,当该方法通过分区来鉴定分子的甲基化水平时,该方法的灵敏度和特异性受到甲基化/未甲基化分子分区不正确(例如,未甲基化分子被分区到高分区组中)的挑战。这种来自甲基化分区测定的分子错误分区的技术噪声限制了测定的性能。为了增加甲基化分区测定的信噪比,可以使特定的分区组经历甲基化敏感性限制性内切酶(RE)消化反应,以特异性去除不正确分区的分子。例如,可以将仅裂解带有RE识别位点的未甲基化分子的甲基化敏感性限制性内切酶(MSE)应用于高分区组,以选择性地裂解和去除(从测定过程中)仅被错误分区的未甲基化的分子。因此,通过减少高分区组中未甲基化分子的数目,改进测定的灵敏度和特异性,这继而改进了检测循环肿瘤DNA(ctDNA)存在或不存在的能力。Compared with standard methylation analysis methods (such as bisulfite sequencing), methylation partitioning methods are highly effective in recovering analyte molecules, and somatic changes can be detected simultaneously. However, when the method identifies the methylation level of a molecule by partitioning, the sensitivity and specificity of the method are challenged by incorrect partitioning of methylated/unmethylated molecules (for example, unmethylated molecules are partitioned into high partition groups). The technical noise of the molecular error partitioning determined from methylation partitioning limits the performance of the assay. In order to increase the signal-to-noise ratio of methylation partitioning assays, specific partitioning groups can be subjected to methylation-sensitive restriction endonucleases (RE) digestion reactions to specifically remove molecules of incorrect partitions. For example, methylation-sensitive restriction endonucleases (MSEs) that only crack unmethylated molecules with RE recognition sites can be applied to high partitioning groups, to selectively crack and remove (from the assay process) unmethylated molecules that are only partitioned by errors. Therefore, by reducing the number of unmethylated molecules in high partitioning groups, the sensitivity and specificity of the assay are improved, which then improves the ability to detect the presence or absence of circulating tumor DNA (ctDNA).

本公开内容提供了用于改进DNA甲基化分区测定的灵敏度和特异性的方法和系统。这些方法和系统可用于各种应用中,诸如预测癌症的预后、癌症的诊断、监测、复发(recurrence)和/或再发(relapse)。The present disclosure provides methods and systems for improving the sensitivity and specificity of DNA methylation partitioning assays. These methods and systems can be used in various applications, such as predicting the prognosis of cancer, diagnosis, monitoring, recurrence and/or relapse of cancer.

因此,在一方面,本公开内容提供了用于分析生物样品中的核酸分子的方法,该方法包括:(a)基于核酸分子的甲基化状态,将生物样品中的核酸分子的至少一个亚组分区为多于一个分区组,其中生物样品包括甲基化的核酸分子和未甲基化的核酸分子;(b)用至少一种甲基化敏感性限制性内切酶消化多于一个分区组中的一个或更多个分区组的至少一个亚组;和(c)确定分区组中的至少一个中的核酸分子的一个或更多个遗传基因座处的甲基化状态。Thus, in one aspect, the present disclosure provides a method for analyzing nucleic acid molecules in a biological sample, the method comprising: (a) partitioning at least a subset of nucleic acid molecules in the biological sample into more than one partitioning group based on the methylation status of the nucleic acid molecules, wherein the biological sample comprises methylated nucleic acid molecules and unmethylated nucleic acid molecules; (b) digesting at least one subset of one or more of the more than one partitioning groups with at least one methylation-sensitive restriction endonuclease; and (c) determining the methylation status at one or more genetic loci of the nucleic acid molecules in at least one of the partitioning groups.

在一些实施方案中,该方法还包括检测生物样品中存在或不存在癌症。在一些实施方案中,该方法包括例如通过确定生物样品中来自癌细胞的DNA的水平,来确定生物样品中的癌症水平。在一些实施方案中,该方法还包括在消化之前,将一个或更多个衔接子附接到多于一个分区组中的核酸分子的至少一端(即5’和/或3’末端)。在一些实施方案中,确定甲基化状态包括对消化的核酸分子的至少一个亚组进行测序。在一些实施方案中,该方法还包括,在确定甲基化状态之前,针对感兴趣的基因组区域对多于一个分区组中的核酸分子的至少一个亚组进行富集,其中核酸分子的至少一个亚组包含一个或更多个分区组中的消化的核酸分子。在一些这样的实施方案中,感兴趣的基因组区域包括表观遗传靶区组。在一些这样的实施方案中,该方法包括从第一分区组的至少一部分富集或捕获第一表观遗传靶区组,并且从第二分区组的至少一部分富集或捕获第二表观遗传靶区组。In some embodiments, the method further includes detecting the presence or absence of cancer in the biological sample. In some embodiments, the method includes determining the level of cancer in the biological sample, for example, by determining the level of DNA from cancer cells in the biological sample. In some embodiments, the method further includes attaching one or more adapters to at least one end (i.e., 5' and/or 3' ends) of the nucleic acid molecules in more than one partition group before digestion. In some embodiments, determining the methylation state includes sequencing at least one subset of the digested nucleic acid molecules. In some embodiments, the method further includes, before determining the methylation state, enriching at least one subset of the nucleic acid molecules in more than one partition group for the genomic region of interest, wherein at least one subset of the nucleic acid molecules comprises digested nucleic acid molecules in one or more partition groups. In some such embodiments, the genomic region of interest includes an epigenetic target region group. In some such embodiments, the method includes enriching or capturing a first epigenetic target region group from at least a portion of a first partition group, and enriching or capturing a second epigenetic target region group from at least a portion of a second partition group.

在另一方面,本公开提供了用于确定核酸分子的甲基化状态的方法,该方法包括:(a)提供核酸分子的生物样品,其中核酸分子包括甲基化的核酸分子和未甲基化的核酸分子;(b)基于核酸分子的甲基化状态将生物样品中的核酸分子的至少一个亚组分区为多于一个分区组;(c)用至少一种甲基化敏感性限制性内切酶消化多于一个分区组中的一个或更多个分区组的至少一个亚组;(d)针对感兴趣的基因组区域对多于一个分区组中的核酸分子的至少一个亚组进行富集,其中核酸分子的至少一个亚组包含一个或更多个分区组中的消化的核酸分子;和(e)确定分区组中的至少一个中的核酸分子的一个或更多个遗传基因座处的甲基化状态。在一些这样的实施方案中,感兴趣的基因组区域包括表观遗传靶区组。在一些这样的实施方案中,该方法包括从第一分区组的至少一部分富集或捕获第一表观遗传靶区组,并且从第二分区组的至少一部分富集或捕获第二表观遗传靶区组。In another aspect, the present disclosure provides a method for determining the methylation state of a nucleic acid molecule, the method comprising: (a) providing a biological sample of nucleic acid molecules, wherein the nucleic acid molecules include methylated nucleic acid molecules and unmethylated nucleic acid molecules; (b) partitioning at least one subset of nucleic acid molecules in the biological sample into more than one partition group based on the methylation state of the nucleic acid molecules; (c) digesting at least one subset of one or more of the more than one partition groups with at least one methylation-sensitive restriction endonuclease; (d) enriching at least one subset of nucleic acid molecules in more than one partition group for a genomic region of interest, wherein at least one subset of nucleic acid molecules comprises digested nucleic acid molecules in one or more partition groups; and (e) determining the methylation state at one or more genetic loci of nucleic acid molecules in at least one of the partition groups. In some such embodiments, the genomic region of interest includes an epigenetic target region group. In some such embodiments, the method includes enriching or capturing a first epigenetic target region group from at least a portion of the first partition group, and enriching or capturing a second epigenetic target region group from at least a portion of the second partition group.

在一些实施方案中,该方法还包括检测生物样品中存在或不存在癌症。在一些实施方案中,该方法包括例如通过确定生物样品中来自癌细胞的DNA的水平,来确定生物样品中的癌症水平。在一些实施方案中,该方法还包括在消化之前,将一个或更多个衔接子附接到多于一个分区组中的核酸分子的至少一端(即5’和/或3’末端)。在一些实施方案中,确定甲基化状态包括对消化的核酸分子的至少一个亚组进行测序。In some embodiments, the method further comprises detecting the presence or absence of cancer in the biological sample. In some embodiments, the method comprises determining the level of cancer in the biological sample, for example, by determining the level of DNA from cancer cells in the biological sample. In some embodiments, the method further comprises attaching one or more adapters to at least one end (i.e., 5' and/or 3' ends) of the nucleic acid molecules in more than one partition group prior to digestion. In some embodiments, determining the methylation state comprises sequencing at least one subset of the digested nucleic acid molecules.

图2图示了用于确定从受试者获得的多核苷酸样品中核酸分子的甲基化状态的方法200的示例性实施方案。在202,从受试者获得多核苷酸样品。在一些实施方案中,多核苷酸样品是从肿瘤组织活检中获得的DNA样品。在一些实施方案中,多核苷酸样品是从血液中获得的无细胞DNA(cfDNA)样品。在204,将多核苷酸样品分区为至少两个分区组。在一些实施方案中,进行分区包括基于多核苷酸与优先结合包含甲基化核苷酸(例如,包含甲基化碱基的核苷酸)的多核苷酸的结合剂的不同结合亲和力将核酸分子进行分区。结合剂的实例包括但不限于甲基结合结构域(MBD)和甲基结合蛋白(MBP)。本文设想的MBP的实例在以上列出。Fig. 2 illustrates an exemplary embodiment of a method 200 for determining the methylation state of nucleic acid molecules in a polynucleotide sample obtained from a subject. At 202, a polynucleotide sample is obtained from a subject. In some embodiments, the polynucleotide sample is a DNA sample obtained from a tumor tissue biopsy. In some embodiments, the polynucleotide sample is a cell-free DNA (cfDNA) sample obtained from blood. At 204, the polynucleotide sample is partitioned into at least two partition groups. In some embodiments, partitioning includes partitioning nucleic acid molecules based on different binding affinities of a binding agent that preferentially binds to a polynucleotide comprising a methylated nucleotide (e.g., a nucleotide comprising a methylated base). Examples of binding agents include, but are not limited to, methyl binding domains (MBDs) and methyl binding proteins (MBPs). Examples of MBPs contemplated herein are listed above.

(e)对一个或更多个甲基化核苷酸碱基特异性的抗体。(e) Antibodies specific for one or more methylated nucleotide bases.

分区可以指基于核酸分子的特征将核酸分子物理分离或分级。分区可以是分子的物理分区。分区可以包括基于核酸分子的甲基化水平将核酸分子分成群或组。在一些实施方案中,用于分区的方法和系统可以如PCT专利申请WO2018/119452所描述地进行,在此将该专利申请通过引用以其整体并入。在这些实施方案中,基于不同甲基化水平(例如,不同的甲基化核苷酸(例如,包含甲基化碱基的核苷酸)的数目或频率)来对核酸进行分区。在一些实施方案中,可以将核酸分区为两个或更多个分区组(例如,至少3个、4个、5个、6个或7个分区组)。例如,核酸分子可以被分区为三个组——一组为高甲基化的核酸分子(高分区组或高甲基化分区组),第二组为低甲基化的核酸分子(低分区组或低甲基化分区组),并且第三组为中等甲基化的核酸分子(中等分区组或中等甲基化分区组)。在一些实施方案中,分区组代表具有不同程度的甲基化(过度代表性(over representative)或代表性不足(under representative)的修饰)的核酸。过度代表性和代表性不足可以通过DNA分子(例如cfDNA分子)中存在的甲基化核苷酸的数目相对于群体中每条链中甲基化核苷酸的中位数来定义。例如,如果样品中的核酸分子中5-甲基胞嘧啶核苷酸的中位数为2,则包含多于两个5-甲基胞嘧啶残基的核酸分子是过度代表性的,而具有1个或0个5-甲基胞嘧啶残基的核酸是代表性不足的。亲和分离的作用是富集在结合相中修饰(即甲基化水平)被过度代表的核酸和在未结合相(即在溶液中)中修饰代表性不足的核酸。结合相中的核酸可以在后续处理之前洗脱。Partitioning can refer to physical separation or classification of nucleic acid molecules based on the characteristics of nucleic acid molecules.Partitioning can be the physical partitioning of molecules.Partitioning can include dividing nucleic acid molecules into groups or groups based on the methylation level of nucleic acid molecules.In some embodiments, the method and system for partitioning can be carried out as described in PCT patent application WO2018/119452, and the patent application is incorporated by reference in its entirety.In these embodiments, nucleic acid is partitioned based on different methylation levels (for example, the number or frequency of different methylated nucleotides (for example, nucleotides comprising methylated bases)).In some embodiments, nucleic acid can be partitioned into two or more partition groups (for example, at least 3, 4, 5, 6 or 7 partition groups).For example, nucleic acid molecules can be partitioned into three groups--one group is a highly methylated nucleic acid molecule (high partition group or high methylation partition group), the second group is a low methylated nucleic acid molecule (low partition group or low methylation partition group), and the third group is a medium methylated nucleic acid molecule (medium partition group or medium methylation partition group). In some embodiments, partition groups represent nucleic acids with varying degrees of methylation (modifications that are overrepresentative or underrepresentative). Overrepresentation and underrepresentation can be defined by the number of methylated nucleotides present in a DNA molecule (e.g., a cfDNA molecule) relative to the median of methylated nucleotides in each chain in a population. For example, if the median of 5-methylcytosine nucleotides in a nucleic acid molecule in a sample is 2, nucleic acid molecules containing more than two 5-methylcytosine residues are overrepresentative, while nucleic acids with 1 or 0 5-methylcytosine residues are underrepresentative. The effect of affinity separation is to enrich for nucleic acids that are overrepresented in the binding phase (i.e., methylation levels) and nucleic acids that are underrepresented in the unbound phase (i.e., in solution). Nucleic acids in the binding phase can be eluted before subsequent treatment.

在206,用至少一种甲基化敏感性限制性内切酶(MSRE)消化至少一个分区组中的核酸分子。在一些实施方案中,用至少两种MSRE消化至少一个分区组中的核酸。在一些实施方案中,两种MSRE用于消化至少一个分区组中的核酸分子。在一些实施方案中,两种MSRE是BstUI和HpaII。在一些实施方案中,两种MSRE是HhaI和AccII。在一些实施方案中,三种MSRE用于消化至少一个分区组中的核酸分子。在一些实施方案中,三种MSRE是BstUI、HpaII和Hin6I。在一些实施方案中,MSRE选自由以下组成的组:AatII、AccII、AciI、Aor13HI、Aor15HI、BspT104I、BssHII、BstUI、Cfr10I、ClaI、CpoI、Eco52I、HaeII、HapII、HhaI、Hin6I、HpaII、HpyCH4IV、MluI、MspI、NaeI、NotI、NruI、NsbI、PmaCI、Psp1406I、PvuI、SacII、SalI、SmaI和SnaBI。在一些实施方案中,可以使用任何商业上可获得的MSRE(可以使用由TakaraBio USA Inc.、New England Inc.和/或Thermo Fisher Scientific Inc.提供的MSRE)。At 206, the nucleic acid molecules in at least one partition group are digested with at least one methylation-sensitive restriction endonuclease (MSRE). In some embodiments, the nucleic acids in at least one partition group are digested with at least two MSREs. In some embodiments, two MSREs are used to digest the nucleic acid molecules in at least one partition group. In some embodiments, the two MSREs are BstUI and HpaII. In some embodiments, the two MSREs are HhaI and AccII. In some embodiments, three MSREs are used to digest the nucleic acid molecules in at least one partition group. In some embodiments, the three MSREs are BstUI, HpaII and Hin6I. In some embodiments, the MSRE is selected from the group consisting of AatII, AccII, AciI, Aor13HI, Aor15HI, BspT104I, BssHII, BstUI, Cfr10I, ClaI, CpoI, Eco52I, HaeII, HapII, HhaI, Hin6I, HpaII, HpyCH4IV, MluI, MspI, NaeI, NotI, NruI, NsbI, PmaCI, Psp1406I, PvuI, SacII, SalI, SmaI, and SnaBI. In some embodiments, any commercially available MSRE can be used (products from TakaraBio USA Inc., New England Biolabs Inc., or the like can be used). Inc. and/or MSRE provided by Thermo Fisher Scientific Inc.

在一些实施方案中,FspEI用于消化至少一个其他分区组(例如,低甲基化分区)中的核酸分子。在一些实施方案中,BstUI、HpaII和Hin6I用于消化至少一个分区组(例如,高甲基化分区)中的核酸分子,而FspEI用于消化至少一个其他分区组(例如,低甲基化分区)中的核酸分子。在包括中等甲基化分区的实施方案中,其中的核酸分子可以用至少一种优先裂解甲基化或未甲基化的DNA的甲基化敏感性限制性内切酶消化。在一些实施方案中,用与高甲基化分区相同的MSRE消化中等甲基化分区中的核酸分子。例如,可以汇集中等甲基化分区与高甲基化分区,并且然后可以对汇集的分区进行消化。在一些实施方案中,用与低甲基化分区相同的MSRE消化中等甲基化分区中的核酸分子。例如,可以汇集中等甲基化分区与低甲基化分区,并且然后可以对汇集的分区进行消化。In some embodiments, FspEI is used to digest the nucleic acid molecules in at least one other partition group (for example, hypomethylation partition). In some embodiments, BstUI, HpaII and Hin6I are used to digest the nucleic acid molecules in at least one other partition group (for example, hypermethylation partition), and FspEI is used to digest the nucleic acid molecules in at least one other partition group (for example, hypomethylation partition). In the embodiment comprising medium methylation partition, nucleic acid molecules therein can be digested with at least one methylation-sensitive restriction endonuclease that preferentially cleaves methylated or unmethylated DNA. In some embodiments, the nucleic acid molecules in the medium methylation partition are digested with the MSRE identical with the hypermethylation partition. For example, the medium methylation partition and the hypermethylation partition can be collected, and then the partition collected can be digested. In some embodiments, the nucleic acid molecules in the medium methylation partition are digested with the MSRE identical with the hypomethylation partition. For example, the medium methylation partition and the hypomethylation partition can be collected, and then the partition collected can be digested.

在一些实施方案中,在用MSRE限制性消化之前,将至少一个衔接子附接到核酸分子的至少一端(即,DNA分子的5’和/或3’末端)。在一些这样的实施方案中,将衔接子附接到核酸分子的两端。在其他实施方案中,在消化之后但在208富集之前,将至少一个衔接子附接到核酸分子的至少一端。在一些实施方案中,衔接子耐受甲基化敏感性限制性内切酶的消化。在一些实施方案中,衔接子包含一个或更多个甲基化核苷酸(例如,包含甲基化碱基的核苷酸)。在一些实施方案中,甲基化核苷酸可以是5-甲基胞嘧啶和/或5-羟甲基胞嘧啶。在一些实施方案中,衔接子包含一个或更多个耐受甲基化敏感性限制性内切酶的核苷酸类似物。在一些实施方案中,衔接子包含不被甲基化敏感性限制性内切酶识别的核苷酸序列。在一些实施方案中,标签可以作为衔接子的组件提供。在一些实施方案中,标签包括分子条形码(即,分子标识符)。在一些实施方案中,附接到一个分区组中的核酸分子的标签不同于附接到其他分区组中的核酸分子的标签。在一些实施方案中,对一个分区组和其他分区组差异性加标签。对分区组差异性加标签有助于保持追溯属于特定分区组的核酸分子。不同分区组中的核酸分子接收可以将一个分区组的成员与另一个分区组的成员区分开的不同的标签。与同一分区组的核酸分子连接的标签可以彼此相同或不同。但是如果彼此不同,则标签序列的一部分可以是共有的,以便将它们所附接的分子鉴定为特定的分区组。例如,如果将样品的分子分区为两个分区组——P1和P2,那么P1中的分子可以用A1、A2、A3等加标签,并且P2中的分子可以用B1、B2、B3等加标签。这样的加标签系统允许区分分区组和区分分区组中的分子。在一些实施方案中,标签包括分区标签(即,分区标识符)。在这样的实施方案中,分区组内的核酸分子接收相同的分区标签,并且不同于附接到其他分区组的核酸分子的分区标签。In some embodiments, before restriction digestion with MSRE, at least one adapter is attached to at least one end of the nucleic acid molecule (i.e., 5' and/or 3' ends of the DNA molecule). In some such embodiments, the adapter is attached to both ends of the nucleic acid molecule. In other embodiments, after digestion but before 208 enrichment, at least one adapter is attached to at least one end of the nucleic acid molecule. In some embodiments, the adapter tolerates digestion of methylation-sensitive restriction endonucleases. In some embodiments, the adapter comprises one or more methylated nucleotides (e.g., nucleotides comprising methylated bases). In some embodiments, the methylated nucleotides can be 5-methylcytosine and/or 5-hydroxymethylcytosine. In some embodiments, the adapter comprises one or more nucleotide analogs that tolerate methylation-sensitive restriction endonucleases. In some embodiments, the adapter comprises a nucleotide sequence that is not recognized by methylation-sensitive restriction endonucleases. In some embodiments, a tag can be provided as a component of an adapter. In some embodiments, the tag comprises a molecular barcode (i.e., a molecular identifier). In some embodiments, the label of the nucleic acid molecule attached to a partition group is different from the label of the nucleic acid molecule attached to other partition groups. In some embodiments, a partition group and other partition groups are differentially labeled. The differential labeling of the partition group helps to keep the nucleic acid molecules belonging to a specific partition group. The nucleic acid molecules in different partition groups receive different labels that can distinguish the members of a partition group from the members of another partition group. The labels connected to the nucleic acid molecules of the same partition group can be the same or different from each other. But if they are different from each other, a part of the label sequence can be shared, so that the molecules to which they are attached are identified as specific partition groups. For example, if the molecules of the sample are partitioned into two partition groups--P1 and P2, then the molecules in P1 can be labeled with A1, A2, A3, etc., and the molecules in P2 can be labeled with B1, B2, B3, etc. Such a labeling system allows distinguishing partition groups and distinguishing molecules in partition groups. In some embodiments, the label includes a partition label (i.e., a partition identifier). In such an embodiment, the nucleic acid molecules in the partition group receive the same partition label, and are different from the partition labels of the nucleic acid molecules attached to other partition groups.

在208,在MSRE消化后,可以针对感兴趣的基因组区域对一个或更多个分区组中的核酸分子进行富集。在一些实施方案中,感兴趣的基因组区域可以包括用于癌症检测的差异性甲基化的区域。在210,用下一代测序仪对富集分子的至少一个亚组进行测序。在212,然后使用生物信息学工具/算法分析测序仪产生的测序读段,以确定一个或更多个分区组中的分子数,分子数继而用于确定至少一个分区组中核酸分子的一个或更多个遗传基因座处的甲基化状态。在一些实施方案中,一个或更多个遗传基因座可以包括多于一个遗传基因座。在一些实施方案中,一个或更多个遗传基因座可以包括一个或更多个基因组区域。在一些实施方案中,基因组区域可以是基因的启动子区域。在一些实施方案中,在测序之前,可以通过PCR扩增来扩增核酸分子。在一些实施方案中,扩增中使用的引物可以包含至少一种样品索引。At 208, after the MSRE digestion, the nucleic acid molecules in one or more partition groups can be enriched for the genomic region of interest. In some embodiments, the genomic region of interest can include the region of differential methylation for cancer detection. At 210, at least one subset of the enriched molecules is sequenced with a next-generation sequencer. At 212, the sequencing reads produced by the bioinformatics tool/algorithm analysis sequencer are then used to determine the number of molecules in one or more partition groups, and the number of molecules is then used to determine the methylation state of one or more genetic loci of nucleic acid molecules in at least one partition group. In some embodiments, one or more genetic loci can include more than one genetic loci. In some embodiments, one or more genetic loci can include one or more genomic regions. In some embodiments, the genomic region can be the promoter region of a gene. In some embodiments, before sequencing, nucleic acid molecules can be amplified by pcr amplification. In some embodiments, the primer used in the amplification can include at least one sample index.

在一些实施方案中,该方法还可以包括,基于至少一个分区组中核酸分子的一个或更多个遗传基因座处的甲基化状态,检测受试者中存在或不存在癌症。在一些实施方案中,该方法还包括例如通过确定多核苷酸样品中来自癌细胞的DNA的水平,来确定多核苷酸样品中的癌症水平。In some embodiments, the method may also include, based on the methylation state of one or more genetic loci of nucleic acid molecules in at least one partition group, detecting the presence or absence of cancer in the subject. In some embodiments, the method also includes determining the level of cancer in the polynucleotide sample, for example, by determining the level of DNA from cancer cells in the polynucleotide sample.

在另一方面,本公开提供了用于确定核酸分子的甲基化状态的方法,该方法包括:(a)提供核酸分子的生物样品,其中核酸分子包括甲基化的核酸分子和未甲基化的核酸分子;(b)基于核酸分子的甲基化状态将生物样品中的核酸分子的至少一个亚组分区为多于一个分区组;(c)将一个或更多个衔接子附接到多于一个分区组中的核酸分子的至少一端;(d)用至少一种甲基化敏感性限制性内切酶消化多于一个分区组中的一个或更多个分区组的至少一个亚组;(d)针对感兴趣的基因组区域对多于一个分区组中的核酸分子的至少一个亚组进行富集;其中核酸分子的至少一个亚组包含一个或更多个分区组中的消化的核酸分子;和(e)确定分区组中的至少一个中的核酸分子的一个或更多个遗传基因座处的甲基化状态。在一些这样的实施方案中,感兴趣的基因组区域包括表观遗传靶区组。在一些这样的实施方案中,该方法包括从第一分区组的至少一部分富集或捕获第一表观遗传靶区组,并且从第二分区组的至少一部分富集或捕获第二表观遗传靶区组。In another aspect, the present disclosure provides a method for determining the methylation state of a nucleic acid molecule, the method comprising: (a) providing a biological sample of nucleic acid molecules, wherein the nucleic acid molecules include methylated nucleic acid molecules and unmethylated nucleic acid molecules; (b) partitioning at least one subset of the nucleic acid molecules in the biological sample into more than one partition group based on the methylation state of the nucleic acid molecules; (c) attaching one or more adapters to at least one end of the nucleic acid molecules in more than one partition group; (d) digesting at least one subset of one or more of the partition groups with at least one methylation-sensitive restriction endonuclease; (d) enriching at least one subset of the nucleic acid molecules in more than one partition group for a genomic region of interest; wherein at least one subset of the nucleic acid molecules comprises digested nucleic acid molecules in one or more partition groups; and (e) determining the methylation state at one or more genetic loci of the nucleic acid molecules in at least one of the partition groups. In some such embodiments, the genomic region of interest includes an epigenetic target region group. In some such embodiments, the method includes enriching or capturing a first set of epigenetic target regions from at least a portion of the first partitioning group, and enriching or capturing a second set of epigenetic target regions from at least a portion of the second partitioning group.

在一些实施方案中,该方法还包括检测生物样品中存在或不存在癌症。在一些实施方案中,该方法包括例如通过确定生物样品中来自癌细胞的DNA的水平,来确定生物样品中的癌症水平。在一些实施方案中,该方法还包括在消化之前,将一个或更多个衔接子附接到多于一个分区组中的核酸分子的至少一端(即5’和/或3’末端)。在一些实施方案中,确定甲基化状态包括对消化的核酸分子的至少一个亚组进行测序。In some embodiments, the method further comprises detecting the presence or absence of cancer in the biological sample. In some embodiments, the method comprises determining the level of cancer in the biological sample, for example, by determining the level of DNA from cancer cells in the biological sample. In some embodiments, the method further comprises attaching one or more adapters to at least one end (i.e., 5' and/or 3' ends) of the nucleic acid molecules in more than one partition group prior to digestion. In some embodiments, determining the methylation state comprises sequencing at least one subset of the digested nucleic acid molecules.

图3示出了根据本公开内容的实施方案的用于检测受试者中癌症的存在或不存在的方法300的示例性实施方案。在302,从受试者获得多核苷酸样品。在一些实施方案中,多核苷酸样品是从肿瘤组织活检中获得的DNA样品。在一些实施方案中,多核苷酸样品是从血液(例如,从血浆)中获得的无细胞DNA(cfDNA)样品。在304,将多核苷酸样品分区为至少两个分区组。在一些实施方案中,进行分区包括基于多核苷酸与优先结合包含甲基化核苷酸(例如,包含甲基化碱基的核苷酸)的多核苷酸的结合剂的不同结合亲和力将核酸分子进行分区。结合剂的实例包括但不限于甲基结合结构域(MBD)和甲基结合蛋白(MBP)。本文设想的MBP的实例在以上列出。Fig. 3 shows an exemplary embodiment of a method 300 for detecting the presence or absence of cancer in a subject according to an embodiment of the present disclosure. At 302, a polynucleotide sample is obtained from a subject. In some embodiments, the polynucleotide sample is a DNA sample obtained from a tumor tissue biopsy. In some embodiments, the polynucleotide sample is a cell-free DNA (cfDNA) sample obtained from blood (e.g., from plasma). At 304, the polynucleotide sample is partitioned into at least two partition groups. In some embodiments, partitioning includes partitioning nucleic acid molecules based on different binding affinities of a binding agent that preferentially binds to a polynucleotide comprising a methylated nucleotide (e.g., a nucleotide comprising a methylated base). Examples of binding agents include, but are not limited to, methyl binding domains (MBDs) and methyl binding proteins (MBPs). Examples of MBPs contemplated herein are listed above.

在一些实施方案中,可以将核酸分区为两个或更多个分区组(例如,至少3个、4个、5个、6个或7个分区组)。在一些实施方案中,分区组代表具有不同程度的甲基化(过度代表性或代表性不足的修饰)的核酸。例如,核酸分子可以被分区为三个组——一组为高甲基化的核酸分子(高分区组或高甲基化分区组),第二组为低甲基化的核酸分子(低分区组或低甲基化分区组),并且第三组为中等甲基化的核酸分子(中等分区组或中等甲基化分区组)。In some embodiments, nucleic acids can be partitioned into two or more partition groups (e.g., at least 3, 4, 5, 6, or 7 partition groups). In some embodiments, the partition groups represent nucleic acids with varying degrees of methylation (overrepresentation or underrepresentation of modifications). For example, nucleic acid molecules can be partitioned into three groups—one group is highly methylated nucleic acid molecules (high partition group or high methylation partition group), a second group is low methylated nucleic acid molecules (low partition group or low methylation partition group), and a third group is medium methylated nucleic acid molecules (medium partition group or medium methylation partition group).

在306,用衔接子附接一个或更多个分区组中的核酸分子,其中衔接子包含至少一个标签并被附接到核酸分子的至少一端(即,DNA分子的5’和/或3’末端)。在一些实施方案中,衔接子耐受甲基化敏感性限制性内切酶的消化。在一些实施方案中,衔接子包含一个或更多个甲基化核苷酸(例如,包含甲基化碱基的核苷酸)。在一些实施方案中,甲基化核苷酸可以是5-甲基胞嘧啶和/或5-羟甲基胞嘧啶。在一些实施方案中,衔接子包含一个或更多个耐受甲基化敏感性限制性内切酶的核苷酸类似物。在一些实施方案中,衔接子包含不被甲基化敏感性限制性内切酶识别的核苷酸序列。在一些实施方案中,衔接子不包含被方法中使用的甲基化敏感性限制性内切酶识别的核苷酸序列。在一些实施方案中,衔接子包含一个或更多个抑制甲基化敏感性限制性内切酶裂解的修饰(例如,连接修饰,诸如硫代磷酸酯)。在一些实施方案中,标签可以作为衔接子的组件提供。在一些实施方案中,标签包括分子条形码(即,分子标识符)。在一些实施方案中,附接到一个分区组中的核酸分子的标签不同于附接到其他分区组中的核酸分子的标签。在一些实施方案中,对一个分区组与其他分区组差异性加标签。对分区组差异性加标签有助于保持追溯属于特定分区组的核酸分子。不同分区组中的核酸分子接收可以将一个分区组的成员与另一个分区组的成员区分开的不同的标签。与同一分区组的核酸分子连接的标签可以彼此相同或不同。但是如果彼此不同,则标签序列的一部分可以是共有的,以便将它们所附接的分子鉴定为特定的分区组。例如,如果将样品的分子分区为两个分区组——P1和P2,那么P1中的分子可以用A1、A2、A3等加标签,并且P2中的分子可以用B1、B2、B3等加标签。这样的加标签系统允许区分分区组和区分分区组中的分子。在一些实施方案中,标签包括分区标签(即,分区标识符)。在这样的实施方案中,分区组内的核酸分子接收相同的分区标签,并且不同于附接到其他分区组的核酸分子的分区标签。在一些实施方案中,所用的标签序列不包含被方法中使用的甲基化敏感性限制性内切酶识别的核苷酸序列。At 306, the nucleic acid molecules in one or more partition groups are attached with adapters, wherein the adapters include at least one tag and are attached to at least one end of the nucleic acid molecule (i.e., 5' and/or 3' ends of the DNA molecule). In some embodiments, the adapters are tolerant to digestion of methylation-sensitive restriction endonucleases. In some embodiments, the adapters include one or more methylated nucleotides (e.g., nucleotides including methylated bases). In some embodiments, the methylated nucleotides may be 5-methylcytosine and/or 5-hydroxymethylcytosine. In some embodiments, the adapters include one or more nucleotide analogs tolerant to methylation-sensitive restriction endonucleases. In some embodiments, the adapters include nucleotide sequences that are not recognized by methylation-sensitive restriction endonucleases. In some embodiments, the adapters do not include nucleotide sequences recognized by the methylation-sensitive restriction endonucleases used in the method. In some embodiments, the adapters include one or more modifications (e.g., connection modifications, such as thiophosphates) that inhibit cleavage of methylation-sensitive restriction endonucleases. In some embodiments, the tags may be provided as components of the adapters. In some embodiments, the label includes a molecular barcode (i.e., a molecular identifier). In some embodiments, the label of the nucleic acid molecule attached to a partition group is different from the label of the nucleic acid molecule attached to other partition groups. In some embodiments, a partition group is labeled with other partition group differences. Labeling the partition group differences helps to keep tracing the nucleic acid molecules belonging to a specific partition group. The nucleic acid molecules in different partition groups receive different labels that can distinguish the members of a partition group from the members of another partition group. The labels connected to the nucleic acid molecules of the same partition group can be the same or different from each other. But if different from each other, a part of the label sequence can be shared, so that the molecules to which they are attached are identified as specific partition groups. For example, if the molecules of the sample are partitioned into two partition groups--P1 and P2, then the molecules in P1 can be labeled with A1, A2, A3, etc., and the molecules in P2 can be labeled with B1, B2, B3, etc. Such a labeling system allows distinguishing partition groups and distinguishing molecules in partition groups. In some embodiments, the label includes a partition label (i.e., a partition identifier). In such embodiments, nucleic acid molecules within a partitioning group receive the same partitioning tag, and are different from partitioning tags attached to nucleic acid molecules of other partitioning groups.In some embodiments, the tag sequence used does not comprise a nucleotide sequence recognized by a methylation-sensitive restriction endonuclease used in the method.

在308,用至少一种甲基化敏感性限制性内切酶(MSRE)消化至少一个分区组中的核酸分子。在一些实施方案中,用至少两种MSRE消化至少一个分区组中的核酸。在一些实施方案中,两种MSRE用于消化至少一个分区组中的核酸分子。在一些实施方案中,两种MSRE是BstUI和HpaII。在一些实施方案中,两种MSRE是HhaI和AccII。在一些实施方案中,三种MSRE用于消化至少一个分区组中的核酸分子。在一些实施方案中,三种MSRE是BstUI、HpaII和Hin6I。在一些实施方案中,MSRE选自由以下组成的组:AatII、AccII、AciI、Aor13HI、Aor15HI、BspT104I、BssHII、BstUI、Cfr10I、ClaI、CpoI、Eco52I、HaeII、HapII、HhaI、Hin6I、HpaII、HpyCH4IV、MluI、MspI、NaeI、NotI、NruI、NsbI、PmaCI、Psp1406I、PvuI、SacII、SalI、SmaI和SnaBI。在一些实施方案中,可以使用任何商业上可获得的MSRE(可以使用由TakaraBio USA Inc.、New England Inc.和/或Thermo Fisher Scientific Inc.提供的MSRE)。At 308, the nucleic acid molecules in at least one partition group are digested with at least one methylation-sensitive restriction endonuclease (MSRE). In some embodiments, the nucleic acid in at least one partition group is digested with at least two MSREs. In some embodiments, two MSREs are used to digest the nucleic acid molecules in at least one partition group. In some embodiments, the two MSREs are BstUI and HpaII. In some embodiments, the two MSREs are HhaI and AccII. In some embodiments, three MSREs are used to digest the nucleic acid molecules in at least one partition group. In some embodiments, the three MSREs are BstUI, HpaII and Hin6I. In some embodiments, the MSRE is selected from the group consisting of AatII, AccII, AciI, Aor13HI, Aor15HI, BspT104I, BssHII, BstUI, Cfr10I, ClaI, CpoI, Eco52I, HaeII, HapII, HhaI, Hin6I, HpaII, HpyCH4IV, MluI, MspI, NaeI, NotI, NruI, NsbI, PmaCI, Psp1406I, PvuI, SacII, SalI, SmaI, and SnaBI. In some embodiments, any commercially available MSRE can be used (products from TakaraBio USA Inc., New England Biolabs Inc., or the like can be used). Inc. and/or MSRE provided by Thermo Fisher Scientific Inc.

在310,在MSRE消化后,可以针对感兴趣的基因组区域对一个或更多个分区组中的核酸分子进行富集。在一些实施方案中,感兴趣的基因组区域可以包括用于癌症检测的差异性甲基化的区域。在312,用下一代测序仪对富集分子的至少一个亚组进行测序。在314,然后使用生物信息学工具/算法分析测序仪产生的测序读段,以确定一个或更多个分区组中的分子数,分子数继而用于确定至少一个分区组中核酸分子的一个或更多个遗传基因座处的甲基化状态。在一些实施方案中,一个或更多个遗传基因座可以包括多于一个遗传基因座。在一些实施方案中,一个或更多个遗传基因座可以包括一个或更多个基因组区域。在一些实施方案中,基因组区域可以是基因的启动子区域。在一些实施方案中,在测序之前,可以通过PCR扩增来扩增核酸分子。在一些实施方案中,在扩增中使用的引物可以包含至少一种样品索引。在一些实施方案中,被MSRE消化的核酸分子不被扩增。在一些这样的实施方案中,除了被MSRE消化的核酸分子之外,样品中基本上所有的核酸分子都被扩增。At 310, after MSRE digestion, the nucleic acid molecules in one or more partition groups can be enriched for the genomic region of interest. In some embodiments, the genomic region of interest can include the region of differential methylation for cancer detection. At 312, at least one subset of the enriched molecules is sequenced with a next-generation sequencer. At 314, the sequencing reads produced by the bioinformatics tool/algorithm analysis sequencer are then used to determine the number of molecules in one or more partition groups, and the number of molecules is then used to determine the methylation state of one or more genetic loci of nucleic acid molecules in at least one partition group. In some embodiments, one or more genetic loci can include more than one genetic loci. In some embodiments, one or more genetic loci can include one or more genomic regions. In some embodiments, the genomic region can be the promoter region of a gene. In some embodiments, before sequencing, nucleic acid molecules can be amplified by PCR amplification. In some embodiments, the primer used in amplification can include at least one sample index. In some embodiments, the nucleic acid molecules digested by MSRE are not amplified. In some such embodiments, except for the nucleic acid molecules digested by MSRE, substantially all nucleic acid molecules in the sample are amplified.

在一些实施方案中,该方法还可以包括,基于至少一个分区组中核酸分子的一个或更多个遗传基因座处的甲基化状态,检测受试者中存在或不存在癌症。在一些实施方案中,该方法还包括例如通过确定多核苷酸样品中来自癌细胞的DNA的水平,来确定多核苷酸样品中的癌症水平。In some embodiments, the method may also include, based on the methylation state of one or more genetic loci of nucleic acid molecules in at least one partition group, detecting the presence or absence of cancer in the subject. In some embodiments, the method also includes determining the level of cancer in the polynucleotide sample, for example, by determining the level of DNA from cancer cells in the polynucleotide sample.

图4图示了根据本公开内容的某些实施方案的从cfDNA样品开始检测癌症存在或不存在的示例性工作流程,其中cfDNA从血液样品中分离,并且cfDNA样品包含属于癌症高甲基化DMR区域和未甲基化对照区域的cfDNA分子;使用甲基结合结构域蛋白(MBD)将cfDNA分区为低甲基化、残留(即中等甲基化)和高甲基化的分区组;对每个分区组进行分子条形码化,以可区分地对来自低分区组、残留分区组和高分区组的DNA加标签;用两种MSRE(HhaI和AccI)消化高分区组,在RE识别位点裂解未甲基化的cfDNA分子;并且然后对分区组(包括MSRE消化的高分区组)进行汇集、捕获、扩增和测序。在一些实施方案中,被MSRE消化的核酸分子不被扩增。在一些这样的实施方案中,除了被MSRE消化的核酸分子之外,样品中基本上所有的核酸分子都被扩增。Fig. 4 illustrates an exemplary workflow for detecting the presence or absence of cancer from a cfDNA sample according to certain embodiments of the present disclosure, wherein cfDNA is separated from a blood sample, and the cfDNA sample comprises cfDNA molecules belonging to a cancer hypermethylated DMR region and an unmethylated control region; cfDNA is partitioned into low methylation, residual (i.e., medium methylation) and high methylation partition groups using a methyl binding domain protein (MBD); molecular barcoding is performed on each partition group, so that the DNA from the low partition group, the residual partition group and the high partition group can be distinguishably labeled; the high partition group is digested with two MSREs (HhaI and AccI), and the unmethylated cfDNA molecules are cracked at the RE recognition site; and then the partition group (including the high partition group digested by MSRE) is collected, captured, amplified and sequenced. In some embodiments, the nucleic acid molecules digested by MSRE are not amplified. In some such embodiments, in addition to the nucleic acid molecules digested by MSRE, substantially all nucleic acid molecules in the sample are amplified.

在一些实施方案中,选择MSRE以使靶向的甲基化生物标志物序列(即DMR)的数目最大化。在一些实施方案中,如果在单次消化中使用两种或更多种MSRE,酶缓冲液应是相容的(由供应商验证和/或根据经验测试)。此外,MSRE应具有与下游测定处理相容的使其活性失活的机制。例如,如果在连接之前进行MSRE消化,MSRE的热失活(>65℃)将不合适,因为它会使dsDNA变性,使其与衔接子连接反应不相容。In some embodiments, MSRE is selected to maximize the number of targeted methylation biomarker sequences (i.e., DMRs). In some embodiments, if two or more MSREs are used in a single digestion, the enzyme buffer should be compatible (verified by the supplier and/or tested empirically). In addition, MSRE should have a mechanism for inactivating its activity that is compatible with downstream assay processing. For example, if MSRE digestion is performed before connection, thermal inactivation (>65°C) of MSRE will not be suitable because it will denature dsDNA, making it incompatible with the adapter ligation reaction.

在一些实施方案中,可以使用如果识别序列处的特定的核苷酸碱基是甲基化的则不裂解DNA的甲基化敏感性限制性内切酶。可以仅在高分区中使用这样的MSRE,以便去除高分区中不正确分区的未甲基化分子;从而改进甲基化核酸分子的检测特异性。在一些实施方案中,可以使用如果识别序列处的特定核苷酸碱基是甲基化的则裂解DNA的甲基化敏感性限制性内切酶。可以在低分区使用这样的MSRE,以便去除在低分区中不正确分区的甲基化分子;从而改进未甲基化核酸分子的检测特异性。在一些实施方案中,用MSRE消化高(和残留)分区和低分区两者,使得(i)如果在识别位点有未甲基化的核苷酸则裂解DNA的MSRE用于高(和残留)分区,以及(ii)如果在识别位点有甲基化的核苷酸则裂解DNA的MSRE用于低分区。In some embodiments, a methylation-sensitive restriction endonuclease that does not cleave DNA if the specific nucleotide base at the recognition sequence is methylated can be used. Such MSRE can be used only in high partitions to remove unmethylated molecules that are not correctly partitioned in the high partition; thereby improving the detection specificity of methylated nucleic acid molecules. In some embodiments, a methylation-sensitive restriction endonuclease that cleaves DNA if the specific nucleotide base at the recognition sequence is methylated can be used. Such MSRE can be used in low partitions to remove methylated molecules that are not correctly partitioned in the low partition; thereby improving the detection specificity of unmethylated nucleic acid molecules. In some embodiments, both high (and residual) partitions and low partitions are digested with MSRE, so that (i) if there are unmethylated nucleotides at the recognition site, the MSRE that cleaves DNA is used for high (and residual) partitions, and (ii) if there are methylated nucleotides at the recognition site, the MSRE that cleaves DNA is used for low partitions.

在一些实施方案中,在衔接子连接之后,如果要用相同的MSRE消化多于一个分区(例如,高分区和残留分区),则可以对每个分区分别进行消化,或者可以在一个反应中组合和消化分区。在一些实施方案中,如果必要的酶性能(效率、特异性)只能通过使用单独的反应来实现,则对每个分区分别进行消化可以是有利的。在一些实施方案中,组合分区并且然后进行MSRE消化可以有利于降低所售商品的测定成本(COGS(SPRI珠、酶、PCR板、移液吸头等))和简化规模化的自动化测定(即每个样品单个消化反应)。In some embodiments, after the adapter is connected, if more than one partition (e.g., a high partition and a residual partition) is to be digested with the same MSRE, each partition can be digested separately, or the partitions can be combined and digested in one reaction. In some embodiments, if the necessary enzyme performance (efficiency, specificity) can only be achieved by using a separate reaction, it can be advantageous to digest each partition separately. In some embodiments, combining the partitions and then performing the MSRE digestion can be beneficial to reduce the assay cost of the goods sold (COGS (SPRI beads, enzymes, PCR plates, pipetting tips, etc.)) and simplify the automated assay of scale (i.e., a single digestion reaction per sample).

在一些实施方案中,如果在连接衔接子之前进行MSRE消化(其中MSRE在识别位点处裂解未甲基化的DNA),则可以保留分子的裂解片段,并且将匹配RE识别位点的分子的末端用于鉴定高分区中的未甲基化分子。在这样的实施方案中,当分析cfDNA样品时,如果存在基因组DNA污染,则基因组DNA可以被MSRE裂解(在衔接子连接之前),并且可以导致基因组DNA污染。这可以通过在MSRE消化前进行衔接子连接来避免。In some embodiments, if MSRE digestion is performed before connecting the adapter (wherein MSRE cleaves unmethylated DNA at the recognition site), the cleavage fragments of the molecule can be retained, and the ends of the molecules matching the RE recognition site are used to identify unmethylated molecules in the high partition. In such embodiments, when analyzing cfDNA samples, if there is genomic DNA contamination, the genomic DNA can be cleaved by MSRE (before the adapter is connected), and genomic DNA contamination can be caused. This can be avoided by connecting the adapter before MSRE digestion.

在一些实施方案中,可以对所有分区组或所有分区组的亚组进行测序。在一些实施方案中,只有对其进行MSRE消化的一个或更多个分区组可以被测序以分析癌症DMR中的核酸分子。In some embodiments, all or a subset of all partitions may be sequenced.In some embodiments, only one or more partitions to which MSRE digestion was performed may be sequenced to analyze nucleic acid molecules in cancer DMRs.

在一些实施方案中,将多核苷酸样品分区为两个分区组。在一些实施方案中,将多核苷酸样品分区为三个分区组。在一些实施方案中,对高分区和低分区中的核酸分子进行MSRE消化,其中如果识别位点具有未甲基化的核苷酸,则在高分区中使用的MSRE裂解DNA,并且如果识别位点具有甲基化的核苷酸,则在低分区中使用的MSRE裂解DNA。这使得能够同时灵敏地检测高DMR和低DMR。In some embodiments, polynucleotide sample partition is divided into two partition groups.In some embodiments, polynucleotide sample partition is divided into three partition groups.In some embodiments, nucleic acid molecules in high partition and low partition are carried out MSRE digestion, if wherein recognition site has unmethylated nucleotide, the MSRE cracking DNA used in high partition, and if recognition site has methylated nucleotide, the MSRE cracking DNA used in low partition.This makes it possible to detect high DMR and low DMR sensitively simultaneously.

在一些实施方案中,多核苷酸样品在1ng和500ng之间。在一些实施方案中,多核苷酸样品小于500ng。在一些实施方案中,多核苷酸样品选自由以下组成的组:DNA样品、RNA样品、无细胞DNA样品和无细胞RNA样品。在一些实施方案中,多核苷酸样品是从受试者血液中获得的cfDNA样品。在一些实施方案中,多核苷酸样品是从肿瘤组织活检中获得的DNA样品。In some embodiments, the polynucleotide sample is between 1 ng and 500 ng. In some embodiments, the polynucleotide sample is less than 500 ng. In some embodiments, the polynucleotide sample is selected from the group consisting of: a DNA sample, an RNA sample, a cell-free DNA sample, and a cell-free RNA sample. In some embodiments, the polynucleotide sample is a cfDNA sample obtained from the subject's blood. In some embodiments, the polynucleotide sample is a DNA sample obtained from a tumor tissue biopsy.

II.方法的一般特征II. General characteristics of the method

A.样品A. Sample

样品可以是从受试者分离的任何生物样品。样品可以包括身体组织、全血、血小板、血清、血浆、粪便、红细胞、白细胞(white blood cell)或白细胞(leucocyte)、内皮细胞、组织活检(例如,来自已知或疑似的实体瘤的活检)、脑脊液、滑液、淋巴液、腹水、间质液或细胞外液(例如,来自细胞间隙的流体)、齿龈液、龈沟液、骨髓、胸腔积液、脑脊液、唾液、粘液、痰、精液、汗液和尿液。因此,样品可以是体液,诸如血液及其级分以及尿液。这样的样品可以包括从肿瘤脱落的核酸。核酸可以包括DNA和RNA,并且可以呈双链形式和单链形式。在一些实施方案中,样品包含无细胞DNA。样品可以是最初从受试者分离的形式,或者可以已经经历进一步处理以去除或添加组分,诸如细胞,相对于另一种组分富集一种组分,或者将一种形式的核酸转化为另一种形式的核酸,诸如将RNA转化为DNA或将单链核酸转化为双链核酸。因此,例如,用于分析的体液可以是含有无细胞核酸例如无细胞DNA(cfDNA)的血浆或血清。Sample can be any biological sample separated from a subject.Sample can include body tissue, whole blood, platelets, serum, plasma, feces, red blood cells, white blood cells (white blood cell) or white blood cells (leucocyte), endothelial cells, tissue biopsy (e.g., biopsy from known or suspected solid tumors), cerebrospinal fluid, synovial fluid, lymphatic fluid, ascites, interstitial fluid or extracellular fluid (e.g., fluid from intercellular spaces), gingival fluid, gingival sulcus fluid, bone marrow, pleural effusion, cerebrospinal fluid, saliva, mucus, sputum, semen, sweat and urine.Therefore, sample can be body fluid, such as blood and its fractions and urine.Such sample can include nucleic acid falling off from tumor.Nucleic acid can include DNA and RNA, and can be in double-stranded form and single-stranded form.In some embodiments, sample includes cell-free DNA.Sample can be the form initially separated from subject, or can have been further processed to remove or add components, such as cells, enrich a component relative to another component, or convert a form of nucleic acid into another form of nucleic acid, such as converting RNA into DNA or converting single-stranded nucleic acid into double-stranded nucleic acid. Thus, for example, the bodily fluid for analysis can be plasma or serum containing cell-free nucleic acids, such as cell-free DNA (cfDNA).

可以从受试者分离或获得样品,并将其运送到样品分析地点。样品可以在合意的温度保存和运输,例如室温、4℃、-20℃和/或-80℃。样品可以在样品分析地点从受试者分离或获得。受试者可以是人类、哺乳动物、动物、伴侣动物、服务动物或宠物。受试者可以患有癌症。受试者可以没有癌症或可检测到的癌症症状。受试者可以已经用一种或更多种癌症疗法,例如任何一种或更多种化疗进行治疗。受试者可以处于缓解中。受试者可以被诊断或可以未被诊断为对癌症或任何癌症相关的遗传突变/紊乱易感。The sample can be separated or obtained from the subject and transported to the sample analysis site. The sample can be stored and transported at a desired temperature, such as room temperature, 4°C, -20°C and/or -80°C. The sample can be separated or obtained from the subject at the sample analysis site. The subject can be a human, a mammal, an animal, a companion animal, a service animal or a pet. The subject can suffer from cancer. The subject can have no cancer or detectable cancer symptoms. The subject can have been treated with one or more cancer therapies, such as any one or more chemotherapy. The subject can be in remission. The subject may be diagnosed or may not be diagnosed as susceptible to cancer or any cancer-related genetic mutation/disorder.

在一些实施方案中,取自受试者的体液的样品体积取决于期望的测序区域的读段深度。体积的实例为约0.4-40毫升(mL)、约5-20mL、约10-20mL。例如,体积可以为约0.5mL、约1mL、约5mL、约10mL、约20mL、约30mL、约40mL或更多毫升。取样血浆的体积通常在约5mL至约20mL之间。In some embodiments, the sample volume of the body fluid taken from the subject depends on the read depth of the desired sequencing region. Examples of volumes are about 0.4-40 milliliters (mL), about 5-20 mL, about 10-20 mL. For example, the volume can be about 0.5 mL, about 1 mL, about 5 mL, about 10 mL, about 20 mL, about 30 mL, about 40 mL or more milliliters. The volume of sampled plasma is generally between about 5 mL and about 20 mL.

样品可以包含不同量的核酸。通常,特定样品中核酸的量等同于多个基因组当量。例如,约30纳克(ng)DNA的样品可以包含约10,000(104)个单倍体人类基因组当量,而在cfDNA的情况下,可以包含约2000亿(2x 1011)个单独的多核苷酸分子。类似地,约100ng DNA的样品可以包含约30,000个单倍体人类基因组当量,并且在cfDNA的情况下,可以包含约6000亿个个体分子。Samples can contain different amounts of nucleic acid. Typically, the amount of nucleic acid in a particular sample is equivalent to multiple genome equivalents. For example, a sample of about 30 nanograms (ng) of DNA can contain about 10,000 (10 4 ) haploid human genome equivalents, and in the case of cfDNA, can contain about 200 billion (2x 10 11 ) individual polynucleotide molecules. Similarly, a sample of about 100 ng of DNA can contain about 30,000 haploid human genome equivalents, and in the case of cfDNA, can contain about 600 billion individual molecules.

在一些实施方案中,样品包含来自不同来源,例如,来自细胞来源和来自无细胞来源(例如,血液样品等)的核酸。通常,样品包括携带突变的核酸。例如,样品任选地包含携带种系突变和/或体细胞突变的DNA。通常,样品包含携带癌症相关突变(例如,癌症相关的体细胞突变)的DNA。In some embodiments, the sample comprises nucleic acids from different sources, for example, from cell sources and from cell-free sources (e.g., blood samples, etc.). Typically, the sample includes nucleic acids carrying mutations. For example, the sample optionally includes DNA carrying germline mutations and/or somatic mutations. Typically, the sample includes DNA carrying cancer-associated mutations (e.g., cancer-associated somatic mutations).

扩增前样品中无细胞核酸的示例性量的范围通常为从约1飞克(fg)至约1微克(μg),例如,约1皮克(pg)至约200纳克(ng)、约1ng至约100ng、约10ng至约1000ng。在一些实施方案中,样品包括最多约600ng、最多约500ng、最多约400ng、最多约300ng、最多约200ng、最多约100ng、最多约50ng或最多约20ng的无细胞核酸分子。任选地,该量为至少约1fg、至少约10fg、至少约100fg、至少约1pg、至少约10pg、至少约100pg、至少约1ng、至少约10ng、至少约100ng、至少约150ng或至少约200ng的无细胞核酸分子。在一些实施方案中,该量为最多约1fg、约10fg、约100fg、约1pg、约10pg、约100pg、约1ng、约10ng、约100ng、约150ng或约200ng的无细胞核酸分子。在一些实施方案中,方法包括从样品获得约1fg至约200ng之间的无细胞核酸分子。Exemplary amounts of cell-free nucleic acid in a sample prior to amplification generally range from about 1 femtogram (fg) to about 1 microgram (μg), for example, about 1 picogram (pg) to about 200 nanograms (ng), about 1 ng to about 100 ng, about 10 ng to about 1000 ng. In some embodiments, the sample includes up to about 600 ng, up to about 500 ng, up to about 400 ng, up to about 300 ng, up to about 200 ng, up to about 100 ng, up to about 50 ng, or up to about 20 ng of cell-free nucleic acid molecules. Optionally, the amount is at least about 1 fg, at least about 10 fg, at least about 100 fg, at least about 1 pg, at least about 10 pg, at least about 100 pg, at least about 1 ng, at least about 10 ng, at least about 100 ng, at least about 150 ng, or at least about 200 ng of cell-free nucleic acid molecules. In some embodiments, the amount is up to about 1 fg, about 10 fg, about 100 fg, about 1 pg, about 10 pg, about 100 pg, about 1 ng, about 10 ng, about 100 ng, about 150 ng, or about 200 ng of cell-free nucleic acid molecules. In some embodiments, the method comprises obtaining between about 1 fg and about 200 ng of cell-free nucleic acid molecules from a sample.

无细胞核酸通常具有长度约100个核苷酸和长度约500个核苷酸之间的大小分布,长度约110个核苷酸至长度约230个核苷酸之间的分子代表样品中约90%的分子,其中众数为长度约168个核苷酸(在来自人类受试者的样品中),并且第二次要峰的长度在约240个核苷酸至约440个核苷酸之间的范围内。在一些实施方案中,无细胞核酸的长度为从约160个核苷酸至约180个核苷酸,或长度为从约320个核苷酸至约360个核苷酸,或长度为从约440个核苷酸至约480个核苷酸。The cell-free nucleic acids typically have a size distribution between about 100 nucleotides in length and about 500 nucleotides in length, with molecules between about 110 nucleotides in length and about 230 nucleotides in length representing about 90% of the molecules in the sample, with a mode of about 168 nucleotides in length (in samples from human subjects), and a second minor peak ranging in length from about 240 nucleotides to about 440 nucleotides. In some embodiments, the cell-free nucleic acids are from about 160 nucleotides to about 180 nucleotides in length, or from about 320 nucleotides to about 360 nucleotides in length, or from about 440 nucleotides to about 480 nucleotides in length.

在一些实施方案中,通过分区步骤(partitioning step)从体液分离无细胞核酸,在该分区步骤中,在溶液中存在的无细胞核酸与体液中的完整细胞和其他不可溶性组分被分开。在一些实施方案中,分区包括诸如离心或过滤的技术。可选地,体液中的细胞可以被裂解,并且无细胞核酸和细胞核酸可以一起处理。通常,在添加缓冲液和洗涤步骤后,可以用例如醇来沉淀无细胞核酸。在一些实施方案中,使用另外的清洁(clean up)步骤诸如基于二氧化硅的柱来去除污染物或盐。例如,任选地在整个反应中添加非特异性批量(bulk)载体核酸以对示例性程序的诸如收率的多个方面进行优化。在这样的处理后,样品通常包含各种形式的核酸,包括双链DNA、单链DNA和/或单链RNA。任选地,单链DNA和/或单链RNA被转化成双链形式,使得它们被包括在随后的处理和分析步骤中。In some embodiments, cell-free nucleic acid is separated from body fluid by partitioning step, in which the cell-free nucleic acid present in the solution is separated from intact cells and other insoluble components in the body fluid. In some embodiments, partitioning includes techniques such as centrifugation or filtration. Alternatively, cells in body fluid can be lysed, and cell-free nucleic acid and cell nucleic acid can be processed together. Typically, after adding buffer and washing steps, cell-free nucleic acid can be precipitated with, for example, alcohol. In some embodiments, additional cleaning steps such as silica-based columns are used to remove pollutants or salts. For example, non-specific bulk carrier nucleic acid is optionally added throughout the reaction to optimize multiple aspects such as yield of exemplary procedures. After such processing, the sample generally contains various forms of nucleic acid, including double-stranded DNA, single-stranded DNA and/or single-stranded RNA. Optionally, single-stranded DNA and/or single-stranded RNA are converted into double-stranded forms so that they are included in subsequent processing and analysis steps.

样品中的双链DNA分子和已经转化为双链DNA分子的单链核酸分子可以在一端或两端与衔接子连接。通常,在所有四种标准核苷酸存在的情况下,通过用具有5’-3’聚合酶和3’-5’外切核酸酶(或校对功能)的聚合酶处理使双链分子平末端化。Klenow大片段和T4聚合酶是合适的聚合酶的实例。平末端的DNA分子可以与至少部分地双链的衔接子(例如,Y形或钟形衔接子)连接。可选地,可以将互补核苷酸添加到样品核酸和衔接子的平末端以促进连接。本文设想了平末端连接和粘性末端连接两者。在平末端连接中,核酸分子和衔接子标签两者都具有平末端。在粘性末端连接中,通常核酸分子带有“A”突出端并且衔接子带有“T”突出端。Double-stranded DNA molecules in the sample and single-stranded nucleic acid molecules converted into double-stranded DNA molecules can be connected to adapters at one or both ends. Usually, in the presence of all four standard nucleotides, double-stranded molecules are flat-ended by polymerase treatment with 5'-3' polymerase and 3'-5' exonuclease (or proofreading function). Klenow large fragment and T4 polymerase are examples of suitable polymerases. Flat-ended DNA molecules can be connected to at least partially double-stranded adapters (e.g., Y-shaped or bell-shaped adapters). Alternatively, complementary nucleotides can be added to the flat ends of sample nucleic acids and adapters to promote connection. Both flat-ended connections and sticky end connections are contemplated herein. In flat-ended connections, both nucleic acid molecules and adapter tags have flat ends. In sticky end connections, nucleic acid molecules usually have "A" overhangs and adapters have "T" overhangs.

B.分区、添加衔接子、加标签B. Partitioning, adding adapters, and labeling

在另一种实施方案中,可以使用以下示例性程序来进行分区方案。将核酸两端连接到包含引物结合位点和标签的Y形衔接子。扩增分子。然后,通过与优先结合5-甲基胞嘧啶的抗体接触将扩增的分子分区以产生两个分区。一个分区包含缺乏甲基化的原始分子和失去甲基化的扩增拷贝。另一个分区包含具有甲基化的原始DNA分子。使包含具有甲基化的原始DNA分子的分区经历不同地影响第一分区的DNA中的第一核碱基和DNA中的第二核碱基的程序,其中第一核碱基是修饰或未修饰的核碱基,第二核碱基是不同于第一核碱基的修饰或未修饰的核碱基,并且第一核碱基和第二核碱基具有相同的碱基配对特异性。然后对两个分区单独进行处理和测序,并进一步扩增甲基化分区。然后可以比较两个分区的序列数据。在该实例中,标签不是用来区分甲基化DNA和未甲基化DNA,而是用来区分这些分区中的不同分子,使得人们可以确定具有相同起点和终点的读段是否基于相同或不同的分子。In another embodiment, the following exemplary program can be used to perform a partitioning scheme. Both ends of the nucleic acid are connected to a Y-shaped adapter comprising a primer binding site and a tag. Amplify the molecule. Then, the amplified molecule is partitioned to produce two partitions by contacting with an antibody that preferentially binds to 5-methylcytosine. One partition contains the original molecule lacking methylation and the amplified copy that loses methylation. The other partition contains the original DNA molecule with methylation. The partition containing the original DNA molecule with methylation is subjected to a program that differently affects the first nucleobase in the DNA of the first partition and the second nucleobase in the DNA, wherein the first nucleobase is a modified or unmodified nucleobase, the second nucleobase is a modified or unmodified nucleobase different from the first nucleobase, and the first nucleobase and the second nucleobase have the same base pairing specificity. The two partitions are then processed and sequenced separately, and the methylation partition is further amplified. The sequence data of the two partitions can then be compared. In this example, the tag is not used to distinguish between methylated DNA and unmethylated DNA, but to distinguish between different molecules in these partitions, so that people can determine whether the read segments with the same starting point and end point are based on the same or different molecules.

标签可以通过化学合成、连接(例如,平末端连接或粘末端连接)或重叠延伸聚合酶链式反应(PCR)等方法掺入到衔接子中或以其他方式连接至衔接子。这样的衔接子可以最终连接至靶核酸分子。在其他实施方案中,通常应用一轮或更多轮扩增循环(例如,PCR扩增)来使用常规核酸扩增方法将样品索引引入核酸分子。扩增可以在一种或更多种反应混合物中进行(例如,阵列中的多于一个微孔)。分子条形码和/或样品索引可以同时引入或以任何顺序引入。在一些实施方案中,在进行序列捕获步骤之前和/或之后引入分子条形码和/或样品索引。在一些实施方案中,在探针捕获之前仅引入分子条形码,并且在进行序列捕获步骤之后引入样品索引。在一些实施方案中,在进行基于探针的捕获步骤之前,引入分子条形码和样品索引两者。在一些实施方案中,在进行序列捕获步骤之后引入样品索引。在一些实施方案中,分子条形码通过连接(例如,钝末端连接或粘末端连接)通过衔接子掺入到样品中的核酸分子(例如,cfDNA分子)中。在一些实施方案中,通过重叠延伸聚合酶链式反应(PCR)将样品索引掺入样品中的核酸分子(例如cfDNA分子)。通常,序列捕获方案包括引入与靶核酸序列互补的单链核酸分子,所述靶核酸序列例如基因组区域的编码序列,并且这样的区域的突变与癌症类型相关。The label can be incorporated into the adapter or otherwise connected to the adapter by methods such as chemical synthesis, connection (e.g., flat end connection or sticky end connection) or overlap extension polymerase chain reaction (PCR). Such an adapter can ultimately be connected to the target nucleic acid molecule. In other embodiments, one or more rounds of amplification cycles (e.g., PCR amplification) are generally applied to introduce the sample index into the nucleic acid molecule using conventional nucleic acid amplification methods. Amplification can be carried out in one or more reaction mixtures (e.g., more than one micropore in the array). Molecular barcodes and/or sample indexes can be introduced simultaneously or in any order. In some embodiments, molecular barcodes and/or sample indexes are introduced before and/or after the sequence capture step is performed. In some embodiments, only molecular barcodes are introduced before the probe capture, and sample indexes are introduced after the sequence capture step is performed. In some embodiments, both molecular barcodes and sample indexes are introduced before the probe-based capture step is performed. In some embodiments, sample indexes are introduced after the sequence capture step is performed. In some embodiments, the molecular barcode is incorporated into a nucleic acid molecule (e.g., cfDNA molecule) in a sample through an adapter by ligation (e.g., blunt end ligation or sticky end ligation). In some embodiments, the sample index is incorporated into a nucleic acid molecule (e.g., cfDNA molecule) in a sample by overlap extension polymerase chain reaction (PCR). Typically, sequence capture schemes include the introduction of a single-stranded nucleic acid molecule complementary to a target nucleic acid sequence, such as a coding sequence of a genomic region, and mutations in such a region are associated with cancer types.

在一些实施方案中,标签可以位于样品核酸分子的一个末端或两个末端。在一些实施方案中,标签是预定或随机或半随机序列的寡核苷酸。在一些实施方案中,标签的长度可以小于或等于约500个、200个、100个、50个、20个、10个、9个、8个、7个、6个、5个、4个、3个、2个或1个核苷酸。标签可以随机或非随机地连接至样品核酸。In some embodiments, the label can be located at one end or both ends of the sample nucleic acid molecule. In some embodiments, the label is an oligonucleotide of a predetermined or random or semi-random sequence. In some embodiments, the length of the label can be less than or equal to about 500, 200, 100, 50, 20, 10, 9, 8, 7, 6, 5, 4, 3, 2 or 1 nucleotides. The label can be randomly or non-randomly connected to the sample nucleic acid.

在一些实施方案中,每个样品被样品索引或样品索引的组合独特地加标签。在一些实施方案中,样品或子样品的每个核酸分子被分子条形码或分子条形码的组合独特地加标签。在其他实施方案中,可以使用多于一个条形码,使得分子条形码在所述多于一个条形码中相对于彼此不必是独特的(例如,非独特分子条形码)。在这些实施方案中,分子条形码通常与个体分子附接(例如,通过连接),使得分子条形码和可以与其附接的序列的组合产生可以被单独地追溯的独特序列。检测非独特分子条形码与内源序列信息(例如,对应于样品中原始核酸分子序列的开始(起始)和/或结束(终止)基因组座位(location)/位置(position),对应于样品中原始核酸分子序列的起始和终止基因组位置,映射到参考序列的序列读段的开始(起始)和/或结束(终止)基因组座位/位置,映射到参考序列的序列读段的起始和终止基因组位置,在一个或两个末端处的序列读段的子序列,序列读段的长度和/或样品中原始核酸分子的长度)的组合通常允许将独特身份分配给特定分子。在一些实施方案中,开始区域包括与参考序列对齐的测序读段的5’末端处的前1个、前2个、前5个、前10个、前15个、前20个、前25个、前30个或至少前30个碱基位置。在一些实施方案中,结束区域包括与参考序列对齐的测序读段的3’末端的最后1个、最后2个、最后5个、最后10个、最后15个、最后20个、最后25个、最后30个或至少最后30个碱基位置。个体序列读段的长度或碱基对数目也任选地用于将独特身份指定至特定分子。如本文描述的,来自已经被分配了独特身份的核酸单链的片段可以从而允许随后鉴定来自亲本链和/或互补链的片段。In some embodiments, each sample is uniquely labeled by a sample index or a combination of sample indexes. In some embodiments, each nucleic acid molecule of a sample or subsample is uniquely labeled by a molecular barcode or a combination of molecular barcodes. In other embodiments, more than one barcode may be used so that the molecular barcodes do not have to be unique relative to each other in the more than one barcode (e.g., non-unique molecular barcodes). In these embodiments, molecular barcodes are typically attached to individual molecules (e.g., by connection) so that the combination of molecular barcodes and sequences to which they can be attached produces a unique sequence that can be traced back individually. Detection of a non-unique molecular barcode in combination with endogenous sequence information (e.g., corresponding to the start (start) and/or end (stop) genomic loci/positions of the original nucleic acid molecule sequence in the sample, corresponding to the start and end genomic positions of the original nucleic acid molecule sequence in the sample, the start (start) and/or end (stop) genomic loci/positions of sequence reads mapped to a reference sequence, the start and end genomic positions of sequence reads mapped to a reference sequence, a subsequence of the sequence reads at one or both ends, the length of the sequence reads, and/or the length of the original nucleic acid molecule in the sample) generally allows a unique identity to be assigned to a specific molecule. In some embodiments, the start region includes the first 1, first 2, first 5, first 10, first 15, first 20, first 25, first 30, or at least the first 30 base positions at the 5' end of the sequencing reads aligned to the reference sequence. In some embodiments, the end region includes the last 1, last 2, last 5, last 10, last 15, last 20, last 25, last 30, or at least the last 30 base positions of the 3' end of the sequencing reads aligned to the reference sequence. The length or number of base pairs of individual sequence reads is also optionally used to assign a unique identity to a specific molecule. As described herein, fragments from a single strand of nucleic acid that has been assigned a unique identity can thereby allow subsequent identification of fragments from a parental strand and/or a complementary strand.

在一些实施方案中,分子条形码以标识符组(例如,独特或非独特分子条形码的组合)与样品中的分子的预期比值引入。一种示例形式使用连接到靶分子两端的约2个至约1,000,000个不同的分子条形码序列、或约5个至约150个不同的分子条形码序列、或约20个至约50个不同的分子条形码序列。可选地,可以使用约25个至约1,000,000个不同的分子条形码序列。例如,可以使用20-50×20-50个分子条形码序列(即,20-50个不同的分子条形码序列之一可以附接到靶分子的每一端)。这样数目的标识符通常足以使具有相同起点和终点的不同分子具有接收不同标识符组合的高概率(例如,至少94%、99.5%、99.99%或99.999%)。在一些实施方案中,约80%、约90%、约95%或约99%的分子具有相同的分子条形码组合。In some embodiments, the molecular barcode is introduced with an expected ratio of an identifier group (e.g., a combination of unique or non-unique molecular barcodes) to the molecules in the sample. An exemplary form uses about 2 to about 1,000,000 different molecular barcode sequences, or about 5 to about 150 different molecular barcode sequences, or about 20 to about 50 different molecular barcode sequences connected to both ends of the target molecule. Alternatively, about 25 to about 1,000,000 different molecular barcode sequences can be used. For example, 20-50×20-50 molecular barcode sequences can be used (i.e., one of the 20-50 different molecular barcode sequences can be attached to each end of the target molecule). Such a number of identifiers is generally sufficient to allow different molecules with the same starting and ending points to have a high probability of receiving different identifier combinations (e.g., at least 94%, 99.5%, 99.99% or 99.999%). In some embodiments, about 80%, about 90%, about 95%, or about 99% of the molecules have the same molecular barcode combination.

在一些实施方案中,使用在例如美国专利申请第20010053519号、第20030152490号和第20110160078号以及美国专利第6,582,908号、第7,537,898号、第9,598,731号和第9,902,992号中所述的方法和系统来进行反应中独特或非独特分子条形码的分配,在此将其中每一项通过引用以其整体并入。可选地,在一些实施方案中,可以仅使用内源序列信息(例如,起始和/或终止位置、序列一端或两端的子序列和/或长度)来鉴定样品的不同核酸分子。In some embodiments, the assignment of unique or non-unique molecular barcodes in a reaction is performed using methods and systems described in, for example, U.S. Patent Application Nos. 20010053519, 20030152490, and 20110160078 and U.S. Patent Nos. 6,582,908, 7,537,898, 9,598,731, and 9,902,992, each of which is incorporated herein by reference in its entirety. Alternatively, in some embodiments, only endogenous sequence information (e.g., start and/or end position, subsequences at one or both ends of the sequence, and/or length) can be used to identify different nucleic acid molecules of a sample.

在本文描述的某些实施方案中,可以在分析例如测序或加标签和测序之前,对不同形式的核酸的群体(例如,样品中的高甲基化DNA和低甲基化DNA)进行物理分区。例如,在一些实施方案中,分区包括基于核酸分子与优先结合包含甲基化核苷酸的核酸分子的结合剂的不同结合亲和力将核酸分子分为分区组。在一些实施方案中,通过例如用MSRE消化至少一个分区组的至少一个亚组来修改分区组。该方法可以用于确定,例如,高甲基化可变表观遗传靶区是否显示出肿瘤细胞的高甲基化特征,或低甲基化可变表观遗传靶区是否显示出肿瘤细胞的低甲基化特征。此外,通过将异质性核酸群体分区,可以增加罕见信号,例如,通过富集在群体的一个级分(或分区)中更普遍的罕见核酸分子。例如,通过将样品分区为高甲基化核酸分子和低甲基化核酸分子,可以更容易地检测出存在于高甲基化DNA中但在低甲基化DNA中较少(或不存在)的遗传变异。通过分析样品的多于一个级分,可以对基因组或核酸种类的单个基因座进行多维分析,并因此可以实现更大的灵敏度。In certain embodiments described herein, it is possible to analyze, for example, before sequencing or labeling and sequencing, to a population of different forms of nucleic acid (for example, hypermethylated DNA and hypomethylated DNA in a sample) physically partitioned. For example, in some embodiments, partitioning includes dividing nucleic acid molecules into partition groups based on different binding affinities of a binding agent that preferentially binds to nucleic acid molecules comprising methylated nucleotides. In some embodiments, the partition group is modified by, for example, digesting at least one subgroup of at least one partition group with MSRE. The method can be used to determine, for example, whether a hypermethylated variable epigenetic target region shows the hypermethylated characteristics of a tumor cell, or whether a hypomethylated variable epigenetic target region shows the hypomethylated characteristics of a tumor cell. In addition, by partitioning a heterogeneous nucleic acid population, rare signals can be increased, for example, by enriching in a rare nucleic acid molecule that is more common in a fraction (or partition) of a population. For example, by partitioning a sample into a hypermethylated nucleic acid molecule and a hypomethylated nucleic acid molecule, it is possible to more easily detect genetic variations that are present in a hypermethylated DNA but are less (or absent) in a hypomethylated DNA. By analyzing more than one fraction of a sample, multidimensional analysis of individual loci of a genome or nucleic acid species can be performed and thus greater sensitivity can be achieved.

在一些情况下,将异质性核酸样品分区为两个或更多个分区(例如,至少3个、4个、5个、6个或7个分区)。在一些实施方案中,对每个分区差异性加标签——即,每个分区可以具有不同的分子条形码组。然后,可以将加标签的分区汇集在一起,用于集合样品制备和/或测序。分区-加标签-汇集步骤可以发生多于一次,其中每一轮分区基于不同的特征(本文提供的实例)发生,并且使用区别于其他分区和分区手段的差异性标签来加标签。In some cases, a heterogeneous nucleic acid sample is partitioned into two or more partitions (e.g., at least 3, 4, 5, 6, or 7 partitions). In some embodiments, each partition is differentially labeled—that is, each partition can have a different set of molecular barcodes. The labeled partitions can then be pooled together for pooled sample preparation and/or sequencing. The partition-labeling-pooling step can occur more than once, with each round of partitioning occurring based on a different feature (examples provided herein) and labeled using a differential label that is distinct from other partitions and partitioning means.

可以用于分区的特征的实例包括序列长度、甲基化水平、核小体结合、序列错配、免疫沉淀和/或与DNA结合的蛋白。所得的分区可以包括以下核酸形式中的一种或更多种:单链DNA(ssDNA)、双链DNA(dsDNA)、较短DNA片段和较长DNA片段。在一些实施方案中,通常进行基于胞嘧啶修饰(例如,胞嘧啶甲基化)或甲基化的分区,并且任选地与至少一个另外的分区步骤组合,该步骤可以基于DNA的任何前述特征或形式。在一些实施方案中,将异质性核酸群体分区为具有一个或更多个表观遗传修饰和不具有所述一个或更多个表观遗传修饰的核酸。表观遗传修饰的实例包括甲基化的存在或不存在、甲基化水平、甲基化类型(例如,5-甲基胞嘧啶与其他类型的甲基化,诸如腺嘌呤甲基化和/或胞嘧啶羟甲基化)、以及与一种或更多种蛋白(诸如组蛋白)的缔合和缔合水平。可选地或另外地,可以将异质性核酸群体分区为与核小体缔合的核酸分子和不含核小体的核酸分子。可选地或另外地,可以将异质性核酸群体分区为单链DNA(ssDNA)和双链DNA(dsDNA)。可选地或另外地,异质性核酸群体可以基于核酸长度(例如,最大160bp的分子和具有大于160bp的长度的分子)来分区。The example of the feature that can be used for partition includes sequence length, methylation level, nucleosome binding, sequence mismatch, immunoprecipitation and/or protein combined with DNA. The partition of gained can include one or more of the following nucleic acid forms: single-stranded DNA (ssDNA), double-stranded DNA (dsDNA), shorter DNA fragments and longer DNA fragments. In some embodiments, the partition based on cytosine modification (for example, cytosine methylation) or methylation is usually carried out, and optionally combined with at least one other partitioning step, the step can be based on any aforementioned feature or form of DNA. In some embodiments, heterogeneous nucleic acid population is partitioned into nucleic acids with one or more epigenetic modifications and without the one or more epigenetic modifications. The example of epigenetic modification includes the presence or absence of methylation, methylation level, methylation type (for example, 5-methylcytosine and other types of methylation, such as adenine methylation and/or cytosine hydroxymethylation) and association and association level with one or more proteins (such as histones). Alternatively or additionally, the heterogeneous nucleic acid population can be partitioned into nucleic acid molecules associated with nucleosomes and nucleic acid molecules without nucleosomes. Alternatively or additionally, the heterogeneous nucleic acid population can be partitioned into single-stranded DNA (ssDNA) and double-stranded DNA (dsDNA). Alternatively or additionally, the heterogeneous nucleic acid population can be partitioned based on nucleic acid length (e.g., molecules of maximum 160bp and molecules with a length greater than 160bp).

在一些实施方案中,将核酸群体分区为两个或更多个不同的分区。每个分区代表不同的核酸形式,并且第一分区包含比第二分区具有更大比例的胞嘧啶修饰的DNA。将每个分区不同地加标签。使第一分区经历不同地影响第一分区的DNA中的第一核碱基和DNA中的第二核碱基的程序,其中第一核碱基是修饰或未修饰的核碱基,第二核碱基是不同于第一核碱基的修饰或未修饰的核碱基,并且第一核碱基和第二核碱基具有相同的碱基配对特异性。将加标签的核酸汇集在一起,然后测序。获得序列读段并进行分析,包括以计算机(insilico)区分第一分区的DNA中的第一核碱基和第二核碱基。标签用于分选来自不同分区的读段。可以在各个分区的水平以及整个核酸群体水平上进行分析以检测遗传变异。例如,分析可以包括以计算机分析来确定遗传变异,诸如每个分区的核酸中的CNV、SNV、插入/缺失、融合。在一些情况下,以计算机分析可以包括确定染色质结构。例如,序列读段的覆盖率可以用来确定核小体在染色质中的定位。较高的覆盖率可以与基因组区域中较高的核小体占据度相关联,而较低的覆盖率可以与较低的核小体占据度或核小体缺失区(nucleosomedepleted region,NDR)相关联。In some embodiments, the nucleic acid population is partitioned into two or more different partitions. Each partition represents a different nucleic acid form, and the first partition comprises a DNA modified with cytosine having a larger proportion than the second partition. Each partition is labeled differently. The first partition is subjected to a program that differently affects the first nuclear base in the DNA of the first partition and the second nuclear base in the DNA, wherein the first nuclear base is a modified or unmodified nuclear base, the second nuclear base is a modified or unmodified nuclear base different from the first nuclear base, and the first nuclear base and the second nuclear base have the same base pairing specificity. The labeled nucleic acids are brought together and then sequenced. Sequence reads are obtained and analyzed, including distinguishing the first nuclear base and the second nuclear base in the DNA of the first partition by computer (insilico). Labels are used to sort reads from different partitions. Analysis can be performed at the level of each partition and the entire nucleic acid population level to detect genetic variation. For example, analysis can include determining genetic variation with computer analysis, such as CNV, SNV, insertion/deletion, fusion in the nucleic acid of each partition. In some cases, computer analysis can include determining chromatin structure. For example, the coverage of sequence reads can be used to determine the positioning of nucleosomes in chromatin. Higher coverage can be associated with higher nucleosome occupancy in a genomic region, while lower coverage can be associated with lower nucleosome occupancy or nucleosome-depleted regions (NDRs).

样品可以包括不同修饰的核酸,包括对核苷酸的复制后修饰和与一个或更多个蛋白的结合(通常是非共价的)。The sample may include nucleic acids of various modifications, including post-replicative modifications to the nucleotides and association (usually non-covalently) with one or more proteins.

在实施方案中,核酸群体是从怀疑患有赘生物、肿瘤或癌症或先前被诊断为患有赘生物、肿瘤或癌症的受试者的血清、血浆或血液样品获得的核酸群体。核酸群体包括具有不同甲基化水平的核酸。甲基化可以由任何一个或更多个复制后或转录后修饰发生。复制后修饰包括对核苷酸胞嘧啶的修饰,特别是在核碱基的5-位置处,例如5-甲基胞嘧啶、5-羟甲基胞嘧啶、5-甲酰基胞嘧啶和5-羧基胞嘧啶。In embodiments, nucleic acid colony is the nucleic acid colony obtained from serum, plasma or blood sample of the experimenter suspected of suffering from vegetation, tumor or cancer or previously diagnosed as suffering from vegetation, tumor or cancer.Nucleic acid colony includes nucleic acid with different methylation levels.Methylation can be modified after any one or more replication or transcription.Replication and modification include modification of nucleotide cytosine, particularly at the 5-position of nucleobase, such as 5-methylcytosine, 5-hydroxymethylcytosine, 5-formylcytosine and 5-carboxylcytosine.

用于分区的剂,诸如结合剂,可以是具有所需特异性的抗体、天然结合配偶体或其变体(Bock等人,Nat Biotech 28:1106-1114(2010);Song等人,Nat Biotech 29:68-72(2011)),或例如通过噬菌体展示选择的对给定靶具有特异性的人工肽。Agents used for partitioning, such as binding agents, can be antibodies with the desired specificity, natural binding partners or variants thereof (Bock et al., Nat Biotech 28:1106-1114 (2010); Song et al., Nat Biotech 29:68-72 (2011)), or artificial peptides with specificity for a given target, for example selected by phage display.

本文设想的结合剂的实例包括如本文描述的甲基结合结构域(MBD)和甲基结合蛋白(MBP),包括蛋白诸如MeCP2和优先与5-甲基胞嘧啶结合的抗体。当用抗体对甲基化DNA进行免疫沉淀时,甲基化DNA可以以单链形式回收。在这样的实施方案中,可以合成第二链。然后,可以使高甲基化(和任选的中等甲基化)的分区与不裂解半甲基化DNA但裂解未甲基化DNA的MSRE(诸如HpaII、BstUI或Hin6I)接触。可选地或另外地,然后可以使低甲基化(和任选地中等甲基化)的分区与裂解半甲基化DNA但不裂解未甲基化DNA的MSRE接触。The examples of binding agents contemplated herein include methyl binding domains (MBD) and methyl binding proteins (MBP) as described herein, including proteins such as MeCP2 and antibodies preferentially combined with 5-methylcytosine. When methylated DNA is immunoprecipitated with antibodies, methylated DNA can be recovered in single-stranded form. In such embodiments, the second chain can be synthesized. Then, the partition of high methylation (and optional medium methylation) can be contacted with the MSRE (such as HpaII, BstUI or Hin6I) that does not crack hemimethylated DNA but cracks unmethylated DNA. Alternatively or additionally, the partition of low methylation (and optionally medium methylation) can then be contacted with the MSRE that cracks hemimethylated DNA but does not crack unmethylated DNA.

同样,对不同形式核酸的分区可以使用组蛋白结合蛋白进行,该组蛋白结合蛋白可以分离与组蛋白结合的核酸与游离或未结合的核酸。可以用于本文公开的方法的组蛋白结合蛋白的实例包括RBBP4、RbAp48和SANT结构域肽。Likewise, partitioning of different forms of nucleic acids can be performed using histone binding proteins that can separate nucleic acids bound to histones from free or unbound nucleic acids.Examples of histone binding proteins that can be used in the methods disclosed herein include RBBP4, RbAp48, and SANT domain peptides.

对于一些结合剂和一些核酸修饰,尽管与剂的结合可以取决于核酸是否带有修饰而以基本上全或无的方式发生,但是分离可以是一定程度的。在这样的情况下,与修饰未被充分代表的核酸相比,修饰被过度代表的核酸与剂以更大的程度与剂结合。可选地,具有修饰的核酸可以以全或无的方式结合。但是然后,各种水平的修饰可以从结合剂顺序洗脱。For some binding agents and some nucleic acid modifications, although the combination with the agent can depend on whether the nucleic acid is modified and occurs in a substantially all-or-nothing manner, separation can be to a certain degree. In such a case, compared with the underrepresented nucleic acid of the modification, the overrepresented nucleic acid and the agent are combined with the agent to a greater extent. Alternatively, the nucleic acid with the modification can be combined in an all-or-nothing manner. But then, the modification of various levels can be eluted from the binding agent sequence.

例如,在一些实施方案中,分区可以是二元的或者基于修饰的程度/水平。例如,可以使用甲基结合结构域蛋白(例如MethylMiner甲基化DNA富集试剂盒(ThermoFisherScientific))将所有甲基化片段与未甲基化的片段分区。随后,另外的分区可以包括通过调整含有甲基结合结构域和结合片段的溶液的盐浓度来洗脱具有不同甲基化水平的片段。随着盐浓度增加,具有更大甲基化水平的片段被洗脱。For example, in some embodiments, partitioning can be binary or based on the degree/level of modification. For example, methyl binding domain protein (such as MethylMiner methylated DNA enrichment kit (ThermoFisherScientific)) can be used to partition all methylated fragments with unmethylated fragments. Subsequently, other partitioning can include eluting fragments with different methylation levels by adjusting the salt concentration of the solution containing the methyl binding domain and the binding fragment. As the salt concentration increases, the fragment with a larger methylation level is eluted.

在一些情况下,最终分区代表具有不同程度的修饰(过度代表性或代表性不足的修饰)的核酸。过度代表性和代表性不足可以由核酸带有的修饰的数目相对于群体中每条链的修饰的中位数来定义。例如,如果样品中的核酸中5-甲基胞嘧啶残基的中位数为2,则包含多于两个5-甲基胞嘧啶残基的核酸的该修饰是过度代表性的,而具有1个或0个5-甲基胞嘧啶残基的核酸是代表性不足的。亲和分离的作用是富集结合相中修饰被过度代表的核酸分子和非结合相(即,溶液中)中修饰未被充分代表的核酸分子。结合相的核酸分子可以在后续处理之前洗脱。In some cases, the final partition represents nucleic acids with varying degrees of modification (overrepresentation or underrepresentation of modification). Overrepresentation and underrepresentation can be defined by the median of the modification of each chain in the population by the number of modifications carried by nucleic acid. For example, if the median of 5-methylcytosine residues in the nucleic acid in the sample is 2, then the modification of the nucleic acid comprising more than two 5-methylcytosine residues is overrepresentation, while the nucleic acid with 1 or 0 5-methylcytosine residues is underrepresentation. The effect of affinity separation is to enrich the nucleic acid molecules that are overrepresented in the binding phase and the nucleic acid molecules that are not fully represented in the non-binding phase (that is, in the solution). The nucleic acid molecules in the binding phase can be eluted before subsequent treatment.

当使用MethylMiner甲基化DNA富集试剂盒(ThermoFisher Scientific)时,可以使用顺序洗脱将包含不同甲基化水平的DNA分区。例如,可以通过使核酸群体与来自试剂盒的附接至磁珠的MBD接触,将低甲基化分区(无甲基化)与甲基化分区分离。珠用于从未甲基化核酸中分离出甲基化核酸。随后,顺序进行一个或更多个洗脱步骤,以洗脱具有不同甲基化水平的核酸。例如,第一组甲基化核酸可以在160mM或更高,例如至少150mM、至少200mM、300mM、400mM、500mM、600mM、700mM、800mM、900mM、1000mM或2000mM的盐浓度洗脱。在这样的甲基化核酸被洗脱后,再次使用磁分离将较高甲基化水平的核酸与具有较低甲基化水平的核酸分离。洗脱和磁性分离步骤本身可以重复进行以产生各种分区,诸如低甲基化分区(代表无甲基化)、甲基化分区(代表低甲基化水平)和高甲基化分区(代表高甲基化水平)。When using MethylMiner methylated DNA enrichment kit (ThermoFisher Scientific), sequential elution can be used to contain the DNA partitions of different methylation levels. For example, low methylation partitions (no methylation) can be separated from methylation partitions by making nucleic acid colonies contact with the MBD attached to magnetic beads from the test kit. Pearl is used to separate methylated nucleic acids from unmethylated nucleic acids. Subsequently, one or more elution steps are sequentially carried out to elute nucleic acids with different methylation levels. For example, the first group of methylated nucleic acids can be eluted at a salt concentration of 160mM or higher, such as at least 150mM, at least 200mM, 300mM, 400mM, 500mM, 600mM, 700mM, 800mM, 900mM, 1000mM or 2000mM. After such methylated nucleic acids are eluted, magnetic separation is used again to separate the nucleic acid of higher methylation level from the nucleic acid with lower methylation level. The elution and magnetic separation steps themselves can be repeated to generate various partitions, such as a hypomethylated partition (representing no methylation), a methylated partition (representing low methylation levels), and a hypermethylated partition (representing high methylation levels).

在一些方法中,与用于分区的剂结合的核酸经历洗涤步骤。洗涤步骤洗去与结合剂弱结合的核酸。这样的核酸可以富集在具有接近平均值或中位值(即,在保持与固相结合的核酸和样品与剂初始接触时不与固相结合的核酸之间的中间值)程度的修饰的核酸中。In some methods, the nucleic acid bound to the agent for partitioning is subjected to a washing step. The washing step washes away nucleic acids weakly bound to the binding agent. Such nucleic acids can be enriched in nucleic acids with a degree of modification close to the mean or median (i.e., the intermediate value between the nucleic acid that remains bound to the solid phase and the nucleic acid that does not bind to the solid phase when the sample and the agent are initially contacted).

分区导致至少两个,并且有时是三个或更多个具有不同程度的修饰的核酸分区。当分区仍然分开时,将至少一个分区且通常两个或三个(或更多个)分区的核酸与核酸标签连接,该核酸标签通常作为衔接子的组件提供,并且不同分区中的核酸接收将一个分区的成员与另一个分区的成员区分开的不同的标签。与同一分区的核酸分子连接的标签可以彼此相同或不同。但是如果彼此不同,则标签编码的一部分可以是共有的,以便将它们所附接的分子鉴定为属于特定分区。Partitioning results in at least two, and sometimes three or more nucleic acid partitions with varying degrees of modification. When the partitions are still separated, the nucleic acids of at least one partition and usually two or three (or more) partitions are connected to nucleic acid tags, which are usually provided as components of adapters, and the nucleic acids in different partitions receive different tags that distinguish members of one partition from members of another partition. The tags connected to the nucleic acid molecules of the same partition can be the same or different from each other. But if they are different from each other, a portion of the tag encoding can be shared, so that the molecules to which they are attached are identified as belonging to a specific partition.

关于基于表征诸如甲基化对核酸样品进行分区的进一步细节,参见WO2018/119452,其通过引用并入本文。For further details on partitioning nucleic acid samples based on characteristics such as methylation, see WO2018/119452, which is incorporated herein by reference.

在一些实施方案中,可以基于与特定蛋白或其片段结合的核酸分子和不与该特定蛋白或其片段结合的核酸分子,将核酸分子分区为不同的分区。In some embodiments, nucleic acid molecules can be partitioned into different partitions based on nucleic acid molecules that bind to a specific protein or fragment thereof and nucleic acid molecules that do not bind to the specific protein or fragment thereof.

核酸分子可以基于DNA-蛋白结合进行分区。蛋白-DNA复合物可以基于特定的蛋白性质进行分区。这样的性质的实例包括各种表位、修饰(例如,组蛋白甲基化或乙酰化)或酶促活性。可以与DNA结合并用作用于分区的结合剂的蛋白的实例可以包括但不限于蛋白A和蛋白G。任何合适的方法都可以用于基于蛋白结合区将核酸分子分区。用于基于蛋白结合区将核酸分子分区的方法的实例包括但不限于SDS-PAGE、染色质免疫沉淀(ChIP)、肝素色谱法和非对称场流分级(asymmetrical field flow fractionation,AF4)。Nucleic acid molecules can be partitioned based on DNA-protein binding. Protein-DNA complexes can be partitioned based on specific protein properties. Examples of such properties include various epitopes, modifications (e.g., histone methylation or acetylation) or enzymatic activity. Examples of proteins that can be combined with DNA and used as binding agents for partitioning can include but are not limited to protein A and protein G. Any suitable method can be used for partitioning nucleic acid molecules based on protein binding regions. Examples of methods for partitioning nucleic acid molecules based on protein binding regions include but are not limited to SDS-PAGE, chromatin immunoprecipitation (ChIP), heparin chromatography and asymmetrical field flow fractionation (asymmetrical field flow fractionation, AF4).

通常,洗脱随每个核酸分子的甲基化位点数目变化,在增加的盐浓度下,具有更多甲基化的分子洗脱。为了基于甲基化程度将DNA洗脱到不同的群体或分区中,人们可以使用一系列递增NaCl浓度的洗脱缓冲液。盐浓度可以在约100nM至约2500mM NaCl的范围。在实施方案中,该过程导致三(3)个分区。将分子与第一盐浓度的溶液接触,并且该溶液包含含有甲基结合结构域的分子,该分子可以与捕获部分诸如链霉亲和素附接。在第一盐浓度,一个分子群体将与MBD结合,并且一个群体将保持未结合。未结合的群体可以被分离为“低甲基化”群体。例如,代表低甲基化形式的DNA的第一分区是在低盐浓度(例如,100mM或160mM)保持未结合的分区。代表中等甲基化DNA的第二分区使用中等盐浓度(例如,在100mM和2000mM之间的浓度)洗脱。这也从样品中分离。代表高甲基化形式的DNA的第三分区使用高盐浓度(例如,至少约2000mM)洗脱。Typically, elution varies with the number of methylation sites per nucleic acid molecule, with more methylated molecules eluting at increased salt concentrations. In order to elute DNA into different populations or partitions based on the degree of methylation, one can use a series of elution buffers with increasing NaCl concentrations. Salt concentrations can range from about 100nM to about 2500mM NaCl. In an embodiment, the process results in three (3) partitions. The molecule is contacted with a solution of a first salt concentration, and the solution comprises a molecule containing a methyl binding domain, which can be attached to a capture portion such as streptavidin. At a first salt concentration, a population of molecules will bind to MBD, and a population will remain unbound. Unbound populations can be separated into "hypomethylated" populations. For example, the first partition representing a low-methylated form of DNA is a partition that remains unbound at a low salt concentration (e.g., 100mM or 160mM). The second partition representing a medium methylated DNA is eluted using a medium salt concentration (e.g., a concentration between 100mM and 2000mM). This is also separated from the sample. The third partition, representing the hypermethylated form of DNA, is eluted using a high salt concentration (eg, at least about 2000 mM).

分区程序可以导致DNA分子在所得分区或级分中的不完全分选。例如,低甲基化分区中的少部分分子可能是高度修饰的(例如,高甲基化),和/或高甲基化分区中的少部分分子可能是未修饰的或大体上未修饰的(例如,未甲基化或大体上未甲基化)。这样的分子被认为是非特异性分区的。本文描述的方法包括能够减少来自非特异性分区的DNA的技术噪声的步骤,例如通过使非特异性分区的DNA降解和/或通过转化某些碱基,使得能够在测序后鉴定非特异性分区的DNA。因此,本文描述的方法可以提供提高的灵敏度和/或简化的分析。Partitioning program can cause incomplete sorting of DNA molecules in obtained partitions or fractions.For example, a small part of molecules in the low methylation partition may be highly modified (for example, high methylation), and/or a small part of molecules in the high methylation partition may be unmodified or substantially unmodified (for example, unmethylated or substantially unmethylated).Such molecules are considered to be non-specific partitions.Methods described herein include the step of reducing the technical noise of the DNA from non-specific partitions, for example, by making the DNA of non-specific partitions degraded and/or by converting some bases, so that the DNA of non-specific partitions can be identified after sequencing.Therefore, methods described herein can provide the sensitivity of improvement and/or simplified analysis.

在一些情况下,用分子条形码对每个分区组(代表不同的核酸形式)差异性加标签,并将分区组汇集在一起然后进行测序。在其他情况下,不同的形式被单独测序。In some cases, each partition group (representing a different nucleic acid form) is differentially tagged with a molecular barcode, and the partition groups are pooled together and then sequenced. In other cases, different forms are sequenced separately.

在一些实施方案中,可以用样品索引和/或分子条形码(通常称为“标签”)对核酸分子(来自多核苷酸样品,例如,在进行分区后)加标签。标签可用于标记分区中的核酸,以便将标签(或多于一个标签)与特定分区相关联。可选地,标签可用于不采用分区步骤的本发明的实施方案中。标签或索引可以是含有信息的分子,诸如核酸,该信息指示与标签缔合的分子的特征。例如,分子可以带有样品标签或样品索引(它将一个样品中的分子与不同样品中的分子区分开)、分区标签(它将一个分区中的分子与不同分区中的分子区分开)或分子标签/分子条形码/条形码(它将不同分子彼此区分开(在独特和非独特加标签两者的情形中))。在某些实施方案中,标签可以包括一种条形码或更多种条形码的组合。在一些实施方案中,条形码具有例如10个和100个之间的核苷酸。按照特定目的期望,条形码的集合可以具有简并序列或可以有具有有一定汉明距离(hamming distance)的序列。因此,例如,分子条形码可以包括一种条形码或两种条形码(每种条形码附接到分子的不同末端)的组合。另外或可选地,对于不同的分区和/或样品,可以使用不同的分子条形码组、分子标签组或分子索引组,使得条形码通过它们的单独序列用作分子标签并且还用于基于它们是其成员的组来鉴定它们对应的分区和/或样品。In some embodiments, nucleic acid molecules (from a polynucleotide sample, e.g., after partitioning) can be tagged with sample indices and/or molecular barcodes (commonly referred to as "tags"). Tags can be used to mark nucleic acids in a partition so that a tag (or more than one tag) is associated with a specific partition. Alternatively, tags can be used in embodiments of the present invention that do not employ a partitioning step. A tag or index can be a molecule, such as a nucleic acid, containing information that indicates the characteristics of the molecule associated with the tag. For example, a molecule can carry a sample tag or sample index (which distinguishes molecules in one sample from molecules in different samples), a partition tag (which distinguishes molecules in one partition from molecules in different partitions), or a molecular tag/molecular barcode/barcode (which distinguishes different molecules from each other (in the case of both unique and non-unique tagging)). In certain embodiments, a tag can include a combination of one barcode or more barcodes. In some embodiments, a barcode has, for example, between 10 and 100 nucleotides. As desired for a particular purpose, a collection of barcodes can have a degenerate sequence or can have a sequence with a certain Hamming distance. Thus, for example, a molecular barcode may comprise one barcode or a combination of two barcodes, each barcode attached to a different end of the molecule. Additionally or alternatively, for different partitions and/or samples, different sets of molecular barcodes, molecular tags, or molecular indices may be used, such that the barcodes are used as molecular tags by their individual sequences and also used to identify the partitions and/or samples to which they correspond based on the group of which they are members.

可以将加标签策略分为加独特标签策略和加非独特标签策略。在加独特标签中,样品中的所有或基本上所有分子带有不同的标签,使得可以基于单独的标签信息将读段指定给原始分子。在这样的方法中使用的标签有时被称为“独特标签”。在加非独特标签中,同一样品中的不同分子可以带有相同的标签,使得除了标签信息之外的其他信息用于将序列读段指定给原始分子。这样的信息可以包括起始和终止坐标、分子映射到的坐标、单独的起始或终止坐标等。在这样的方法中使用的标签有时被称为“非独特标签”。因此,没有必要对样品中的每个分子独特地加标签。对样品中落入可识别类别的分子独特地加标签就足够了。因此,不同可识别家族中的分子可以带有相同的标签,而不会丢失关于加标签的分子的身份的信息。The labeling strategy can be divided into a unique labeling strategy and a non-unique labeling strategy. In unique labeling, all or substantially all molecules in the sample carry different labels, so that the reads can be assigned to the original molecules based on the separate label information. The labels used in such methods are sometimes referred to as "unique labels". In non-unique labeling, different molecules in the same sample can carry the same label, so that other information besides the label information is used to assign sequence reads to the original molecules. Such information may include start and end coordinates, coordinates to which the molecules are mapped, separate start or end coordinates, etc. The labels used in such methods are sometimes referred to as "non-unique labels". Therefore, it is not necessary to uniquely label each molecule in the sample. It is sufficient to uniquely label the molecules in the sample that fall into identifiable categories. Therefore, molecules in different identifiable families can carry the same label without losing information about the identity of the labeled molecules.

在加非独特标签的某些实施方案中,所使用的不同标签的数目可以足以使得特定群的所有分子带有不同标签的可能性非常高(例如,至少99%、至少99.9%、至少99.99%或至少99.999%)。应注意,当条形码用作标签时,以及当条形码被例如随机地附接至分子的两端时,条形码的组合一起可以构成标签。就这一数目而言,是落入判定的分子数目的函数。例如,类别可以是所有映射到参考基因组上的相同起始-终止位置的分子。类别可以是跨越特定遗传基因座,例如,特定碱基或特定区域(例如,多达100个碱基或基因或基因外显子)映射的所有分子。在某些实施方案中,用于独特地鉴定一个类别中分子数目(z)的不同标签的数目可以在2*z、3*z、4*z、5*z、6*z、7*z、8*z、9*z、10*z、11*z、12*z、13*z、14*z、15*z、16*z、17*z、18*z、19*z、20*z或100*z中的任一个(例如,下限)和100,000*z、10,000*z、1000*z或100*z(例如,上限)中的任一个之间。In certain embodiments of adding non-unique tags, the number of different tags used can be enough to make all molecules of a particular group with different tags very high (e.g., at least 99%, at least 99.9%, at least 99.99% or at least 99.999%). It should be noted that when barcodes are used as tags, and when barcodes are, for example, randomly attached to the two ends of a molecule, the combination of barcodes together can constitute tags. With regard to this number, it is a function of the number of molecules that fall into the determination. For example, a category can be all molecules mapped to the same start-stop position on a reference genome. A category can be all molecules mapped across a specific genetic locus, for example, a specific base or a specific region (e.g., up to 100 bases or genes or gene exons). In some embodiments, the number of different tags used to uniquely identify the number of molecules in a class (z) can be between any of 2*z, 3*z, 4*z, 5*z, 6*z, 7*z, 8*z, 9*z, 10*z, 11*z, 12*z, 13*z, 14*z, 15*z, 16*z, 17*z, 18*z, 19*z, 20*z, or 100*z (e.g., a lower limit) and any of 100,000*z, 10,000*z, 1000*z, or 100*z (e.g., an upper limit).

例如,在约5ng至30ng的无细胞DNA的样品中,人们预期约3000个分子映射到特定的核苷酸坐标,并且具有任何起始坐标的约3个和10个之间的分子共享相同的终止坐标。因此,约50个至约50,000个不同的标签(例如,约6个和220个之间的条形码组合)能够足以独特地对所有这样的分子加标签。为了独特地对跨一个核苷酸坐标映射的所有3000个分子加标签,将需要约100万至约2000万个不同的标签。For example, in a sample of about 5 ng to 30 ng of cell-free DNA, one would expect about 3000 molecules to map to a particular nucleotide coordinate, and between about 3 and 10 molecules with any starting coordinate to share the same ending coordinate. Thus, about 50 to about 50,000 different tags (e.g., between about 6 and 220 barcode combinations) would be sufficient to uniquely tag all such molecules. To uniquely tag all 3000 molecules mapped across one nucleotide coordinate, about 1 million to about 20 million different tags would be required.

通常,反应中独特或非独特的标签条形码的指定遵循美国专利申请20010053519、20030152490、20110160078以及美国专利第6,582,908号和美国专利第7,537,898号和美国专利第9,598,731号所描述的方法和系统。标签可以随机或非随机地连接至样品核酸。Typically, the assignment of unique or non-unique tag barcodes in a reaction follows the methods and systems described in US Patent Applications 20010053519, 20030152490, 20110160078 and US Patents 6,582,908, 7,537,898 and 9,598,731. Tags can be randomly or non-randomly attached to sample nucleic acids.

在一些实施方案中,对加载到微孔板后的加标签的核酸测序。微孔板可以具有96个、384个或1536个微孔。在一些情况下,它们以独特标签与微孔的预期比值引入。例如,可以加载独特标签使得每基因组样品加载多于约1个、2个、3个、4个、5个、6个、7个、8个、9个、10个、20个、50个、100个、500个、1000个、5000个、10000个、50,000个、100,000个、500,000个、1,000,000个、10,000,000个、50,000,000个或1,000,000,000个独特标签。在一些情况下,可以加载独特标签使得每基因组样品加载少于约2个、3个、4个、5个、6个、7个、8个、9个、10个、20个、50个、100个、500个、1000个、5000个、10000个、50,000个、100,000个、500,000个、1,000,000个、10,000,000个、50,000,000个或1,000,000,000个独特标签。在一些情况下,每样品基因组加载的独特标签的平均数目少于或大于每基因组样品约1个、2个、3个、4个、5个、6个、7个、8个、9个、10个、20个、50个、100个、500个、1000个、5000个、10000个、50,000个、100,000个、500,000个、1,000,000个、10,000,000个、50,000,000个或1,000,000,000个独特标签。In some embodiments, the tagged nucleic acids are sequenced after loading into a microwell plate. The microwell plate can have 96, 384, or 1536 microwells. In some cases, they are introduced with an expected ratio of unique tags to microwells. For example, unique tags can be loaded so that more than about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 20, 50, 100, 500, 1000, 5000, 10000, 50,000, 100,000, 500,000, 1,000,000, 10,000,000, 50,000,000, or 1,000,000,000 unique tags are loaded per genome sample. In some cases, unique tags can be loaded such that less than about 2, 3, 4, 5, 6, 7, 8, 9, 10, 20, 50, 100, 500, 1000, 5000, 10000, 50,000, 100,000, 500,000, 1,000,000, 10,000,000, 50,000,000, or 1,000,000,000 unique tags are loaded per genomic sample. In some cases, the average number of unique tags loaded per sample genome is less than or greater than about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 20, 50, 100, 500, 1000, 5000, 10000, 50,000, 100,000, 500,000, 1,000,000, 10,000,000, 50,000,000, or 1,000,000,000 unique tags per genomic sample.

一种优选的格式使用连接到靶核酸两端的20种-50种不同的标签(例如,条形码)。例如,35种不同的标签(例如,条形码)连接到靶分子的两端,产生35×35种排列,等于对于35种标签有1225种排列。这样的标签的数目是足以使得具有相同起点和终点的不同分子具有接收不同标签组合的高概率(例如,至少94%、99.5%、99.99%、99.999%)。其他条形码组合包括10和500之间的任何数字,例如,约15x15、约35x35、约75x75、约100x100、约250x250、约500x500。A preferred format uses 20-50 different tags (e.g., barcodes) attached to both ends of the target nucleic acid. For example, 35 different tags (e.g., barcodes) are attached to both ends of the target molecule, resulting in 35×35 arrangements, which is equivalent to 1225 arrangements for 35 tags. The number of such tags is sufficient to make different molecules with the same start and end points have a high probability of receiving different tag combinations (e.g., at least 94%, 99.5%, 99.99%, 99.999%). Other barcode combinations include any number between 10 and 500, for example, about 15x15, about 35x35, about 75x75, about 100x100, about 250x250, about 500x500.

在一些情况下,独特标签可以具有预定序列或者随机序列或半随机序列。在其他情况下,可以使用多于一个条形码使得条形码在所述多于一个条形码中相对于彼此不必是独特的。在该实例中,条形码可以与个体核酸分子连接,使得条形码和可以与其连接的序列的组合产生可以被单独地追溯的独特序列。如本文描述的,非独特条形码的检测与在序列读段的开始(起始)和结束(终止)部分的序列数据组合可以允许将独特的身份指定至特定分子。个体序列读段的长度或碱基对的数目也可以用于将独特身份指定至这样的分子。如本文描述的,来自已经指定了独特身份的核酸单链的片段可以从而允许随后鉴定来自该亲本链的片段。In some cases, the unique tag can have a predetermined sequence or a random sequence or a semi-random sequence. In other cases, more than one barcode can be used so that the barcodes do not have to be unique relative to each other in the more than one barcode. In this example, the barcode can be connected to an individual nucleic acid molecule so that the combination of the barcode and the sequence that can be connected thereto produces a unique sequence that can be traced back individually. As described herein, the detection of non-unique barcodes combined with the sequence data at the beginning (start) and end (termination) parts of the sequence reads can allow a unique identity to be assigned to a specific molecule. The length of the individual sequence reads or the number of base pairs can also be used to assign a unique identity to such a molecule. As described herein, the fragments from the nucleic acid single strands to which a unique identity has been assigned can thereby allow subsequent identification of the fragments from the parental chain.

在一些实施方案中,在对核酸进行分区之后将衔接子(例如,包含标签的衔接子)添加到核酸,在其他实施方案中,可以在对核酸进行分区之前将衔接子添加到核酸。在一些这样的方法中,使带有不同程度修饰(例如,每个核酸分子有0、1、2、3、4、5或更多甲基基团)的核酸群体与衔接子接触,然后根据修饰的程度对群体进行分区。衔接子附接到群体中核酸分子的一端或两端。在一些实施方案中,衔接子包含足够数目的不同标签,使得标签组合的数目导致具有相同起点和终点的两个核酸接收不同标签组合的概率高,例如95%、99%或99.9%。无论带有相同或不同标签的衔接子都可以包含相同或不同的引物结合位点,但优选地衔接子包含相同的引物结合位点。在一些实施方案中,在分区之后,由与衔接子内的引物结合位点结合的引物扩增核酸。在扩增之后,不同的分区然后可以并行但单独地经历进一步的处理步骤,这些步骤可以包括进一步扩增(例如,克隆扩增)和序列分析。然后可以将来自不同分区的序列数据进行比较。In some embodiments, adapters (e.g., adapters comprising tags) are added to nucleic acids after nucleic acids are partitioned, and in other embodiments, adapters can be added to nucleic acids before nucleic acids are partitioned. In some such methods, nucleic acid populations with different degrees of modification (e.g., each nucleic acid molecule has 0, 1, 2, 3, 4, 5 or more methyl groups) are contacted with adapters, and then the population is partitioned according to the degree of modification. Adapters are attached to one or both ends of nucleic acid molecules in the population. In some embodiments, adapters include a sufficient number of different tags so that the number of tag combinations results in a high probability of two nucleic acids with the same starting point and end point receiving different tag combinations, such as 95%, 99% or 99.9%. Regardless of whether adapters with the same or different tags can include the same or different primer binding sites, but preferably adapters include the same primer binding sites. In some embodiments, after partitioning, nucleic acids are amplified by primers that bind to primer binding sites in adapters. After amplification, different partitions can then undergo further processing steps in parallel but separately, which steps can include further amplification (e.g., clonal amplification) and sequence analysis. Sequence data from different partitions can then be compared.

在一些实施方案中,可以使用单一标签来标记特定分区。在一些实施方案中,可以使用多于一种不同的标签来标记特定分区组。在采用多于一种不同标签来标记特定分区的实施方案中,用于标记一个分区的标签组可以容易地与用于标记其他分区的标签组区分开。在一些实施方案中,标签可以是多功能的——即,它可以同时用作分子标识符(即,分子条形码)、分区标识符(即,分区标签)和样品标识符(即,样品索引)。例如,如果有四个DNA样品,并且每个DNA样品被分区为三个分区,那么十二个分区(即,四个DNA样品总共有十二个分区)中的每个分区中的DNA分子可以用单独的标签组来加标签,使得附接到DNA分子的标签序列揭示DNA分子的身份、DNA分子所属的分区和DNA分子所来源于的样品。在一些实施方案中,标签可以用作分子条形码和分区标签二者。例如,如果将DNA样品分区为三个分区,那么每个分区中的DNA分子用单独的标签组来加标签,使得附接到DNA分子的标签序列揭示DNA分子的身份和DNA分子所属的分区。在一些实施方案中,标签可以用作分子条形码和样品索引二者。例如,如果有四个DNA样品,那么每个样品中的DNA分子将用可以区分各样品的单独的标签组来加标签,使得附接到DNA分子的标签序列用作分子标识符和样品标识符。In some embodiments, a single label can be used to mark a specific partition. In some embodiments, more than one different label can be used to mark a specific partition group. In the embodiment of using more than one different label to mark a specific partition, the label group for marking a partition can be easily distinguished from the label group for marking other partitions. In some embodiments, the label can be multifunctional-that is, it can be used as a molecule identifier (that is, a molecular barcode), a partition identifier (that is, a partition label) and a sample identifier (that is, a sample index) at the same time. For example, if there are four DNA samples, and each DNA sample is partitioned into three partitions, then the DNA molecules in each partition in twelve partitions (that is, four DNA samples have twelve partitions in total) can be labeled with a separate label group, so that the label sequence attached to the DNA molecule reveals the identity of the DNA molecule, the partition to which the DNA molecule belongs, and the sample from which the DNA molecule is derived. In some embodiments, the label can be used as both a molecular barcode and a partition label. For example, if the DNA sample is partitioned into three partitions, then the DNA molecules in each partition are labeled with a separate label group, so that the label sequence attached to the DNA molecule reveals the identity of the DNA molecule and the partition to which the DNA molecule belongs. In some embodiments, the label can be used as both a molecular barcode and a sample index. For example, if there are four DNA samples, the DNA molecules in each sample will be labeled with a separate set of labels that can distinguish each sample, so that the label sequence attached to the DNA molecule is used as a molecular identifier and a sample identifier.

在一些实施方案中,标签可以具有另外的功能,例如标签可以用于索引样品来源或用作独特分子标识符(其可以用于通过区分测序错误与突变来改进测序数据的质量,例如,如在Kinde等人,Proc Nat’l Acad Sci USA 108:9530-9535(2011);Kou等人,PLoSONE,11:e0146638(2016)中的)或用作非独特分子标识符,例如,如美国专利第9,598,731号中描述的。类似地,在一些实施方案中,标签可以具有另外的功能,例如标签可以用于索引样品来源或用作非独特分子标识符(其可以用于通过区分测序错误与突变来改进测序数据的质量)。In some embodiments, the tag can have additional functions, such as the tag can be used to index the sample source or used as a unique molecular identifier (which can be used to improve the quality of sequencing data by distinguishing sequencing errors from mutations, for example, as in Kinde et al., Proc Nat'l Acad Sci USA 108:9530-9535 (2011); Kou et al., PLoS ONE, 11:e0146638 (2016)) or as a non-unique molecular identifier, for example, as described in U.S. Patent No. 9,598,731. Similarly, in some embodiments, the tag can have additional functions, such as the tag can be used to index the sample source or used as a non-unique molecular identifier (which can be used to improve the quality of sequencing data by distinguishing sequencing errors from mutations).

在一种实施方案中,对分区加标签包括用分区标签对每个分区中的分子加标签。在将分区重新组合(例如,为了减少所需的测序运行次数并避免不必要的成本)和对分子测序之后,分区标签标识来源分区。在另一种实施方案中,用不同组的分子标签(例如,包含一对条形码)对不同的分区加标签。以这种方式,每个分子条形码指示来源分区,以及可用于区分分区内的分子。例如,第一组的35种条形码可以用于对第一分区中的分子加标签,而第二组的35种条形码可以用于对第二分区中的分子加标签。In one embodiment, labeling the partitions includes labeling the molecules in each partition with a partition label. After the partitions are reassembled (e.g., to reduce the number of sequencing runs required and avoid unnecessary costs) and the molecules are sequenced, the partition label identifies the source partition. In another embodiment, different partitions are labeled with different sets of molecular labels (e.g., comprising a pair of barcodes). In this way, each molecular barcode indicates the source partition and can be used to distinguish molecules within the partition. For example, a first set of 35 barcodes can be used to label molecules in a first partition, while a second set of 35 barcodes can be used to label molecules in a second partition.

在一些实施方案中,在进行分区和用分区标签加标签之后,可以将分子汇集用于在单次运行中测序。在一些实施方案中,在例如添加分区标签和汇集之后的步骤中,将样品标签添加到分子。样品标签可以有助于将从多于一个样品产生的材料汇集用于在单次测序运行中测序。In some embodiments, after partitioning and labeling with partition labels, the molecules can be pooled for sequencing in a single run. In some embodiments, sample labels are added to the molecules in a step, such as after adding partition labels and pooling. Sample labels can help pool material generated from more than one sample for sequencing in a single sequencing run.

可选地,在一些实施方案中,分区标签可以与样品以及分区相关联。作为简单实例,第一标签可以指示第一样品的第一分区;第二标签可以指示第一样品的第二分区;第三标签可以指示第二样品的第一分区;并且第四标签可以指示第二样品的第二分区。Optionally, in some embodiments, partition labels can be associated with samples and partitions. As a simple example, a first label can indicate a first partition of a first sample; a second label can indicate a second partition of the first sample; a third label can indicate a first partition of a second sample; and a fourth label can indicate a second partition of the second sample.

虽然标签可以附接到已经基于一个或更多个表观遗传特征分区的分子,但在文库中最终加标签的分子可以不再具有该表观遗传特征。例如,虽然单链DNA分子可能被分区和加标签,但在文库中最终加标签的分子可能是双链的。类似地,虽然DNA可以经历基于不同的甲基化水平进行的分区,但在最终文库中,源自这些分子的加标签的分子可能是未甲基化的。因此,在文库中附接到分子的标签通常指示最终加标签的分子源自的“亲本分子”的特征,而不必是加标签的分子本身的特征。Although tags can be attached to molecules that have been partitioned based on one or more epigenetic features, the final tagged molecules in the library may no longer have that epigenetic feature. For example, although single-stranded DNA molecules may be partitioned and tagged, the final tagged molecules in the library may be double-stranded. Similarly, although DNA can undergo partitioning based on different methylation levels, the tagged molecules derived from these molecules in the final library may be unmethylated. Therefore, the tags attached to the molecules in the library generally indicate the characteristics of the "parent molecule" from which the final tagged molecules are derived, and not necessarily the characteristics of the tagged molecules themselves.

作为实例,条形码1、2、3、4等用于对第一分区中的分子加标签和标记;条形码A、B、C、D等用于对第二分区中的分子加标签和标记;并且条形码a、b、c、d等用于对第三分区中的分子加标签和标记。差异性加标签的分区可以被汇集然后测序。差异性加标签的分区可以单独测序或一起同时测序,例如,在Illumina测序仪的同一流动池中。As an example, barcodes 1, 2, 3, 4, etc. are used to tag and label the molecules in the first partition; barcodes A, B, C, D, etc. are used to tag and label the molecules in the second partition; and barcodes a, b, c, d, etc. are used to tag and label the molecules in the third partition. The differentially tagged partitions can be pooled and then sequenced. The differentially tagged partitions can be sequenced individually or together simultaneously, for example, in the same flow cell of an Illumina sequencer.

在测序之后,可以在各个分区的水平以及整个核酸群体水平上对读段进行分析以检测遗传变异。标签用于分选来自不同分区的读段。分析可以包括使用序列信息、基因组坐标长度、覆盖率和/或拷贝数以计算机分析来确定遗传和表观遗传变异(甲基化、染色质结构等中的一种或更多种)。在一些实施方案中,较高的覆盖率可以与基因组区域中较高的核小体占据度相关联,而较低的覆盖率可以与较低的核小体占据度或核小体缺失区(NDR)相关联。After sequencing, the reads can be analyzed at the level of each partition and the entire nucleic acid population level to detect genetic variation. Labels are used to sort the reads from different partitions. Analysis can include using sequence information, genome coordinate length, coverage and/or copy number to determine genetic and epigenetic variation (methylation, chromatin structure, etc., one or more). In some embodiments, higher coverage can be associated with higher nucleosome occupancy in a genomic region, while lower coverage can be associated with lower nucleosome occupancy or nucleosome loss region (NDR).

C.用限制性内切酶消化核酸分子C. Digestion of nucleic acid molecules with restriction endonucleases

在一些实施方案中,通过使分区或分区组(例如,通过如本文描述的,诸如基于胞嘧啶修饰的水平,诸如甲基化,例如5-甲基化,将样品进行分区制备的第一、第二或第三分区组)与甲基化敏感性限制性内切酶(MSRE)接触来消化分区或分区组。在基于胞嘧啶修饰进行分区的一些实施方案中,第一分区是具有较高修饰水平的分区;第二分区是具有较低修饰水平的分区;并且,当存在时,第三分区具有介于第一和第二分区之间的修饰水平。In some embodiments, the partition or partition group is digested by contacting the partition or partition group (e.g., a first, second, or third partition group prepared by partitioning the sample as described herein, such as based on the level of cytosine modification, such as methylation, e.g., 5-methylation) with a methylation-sensitive restriction endonuclease (MSRE). In some embodiments of partitioning based on cytosine modification, the first partition is the partition with a higher modification level; the second partition is the partition with a lower modification level; and, when present, the third partition has a modification level between the first and second partitions.

如以上讨论的,分区程序可以导致分区中DNA分子的不完全分选。可以选择MSRE来降解非特异性分区的DNA。例如,第二分区可以与选择性地消化甲基化的核酸分子的MSRE接触。这可以使第二分区中非特异性分区的DNA(例如,甲基化的DNA)降解,以产生经处理的第二分区。可选地或另外地,第一分区可以与选择性地消化未甲基化的核酸分子的MSRE接触,从而降解第一分区中非特异性分区的DNA,以产生经处理的第一分区。降解第一或第二分区中的一个或两个中的非特异性分区的DNA被作为对依赖于基于胞嘧啶修饰的DNA的精确分区的方法的性能的改进而提出,所述方法例如,检测样品中异常修饰的DNA的存在,确定DNA来源的组织,和/或确定受试者是否患有癌症。例如,这样的降解可以提供提高的灵敏度和/或简化下游分析。As discussed above, the partitioning procedure can result in the incomplete sorting of DNA molecules in the partition.MSRE can be selected to degrade the DNA of non-specific partitions.For example, the second partition can contact with the MSRE that selectively digests the methylated nucleic acid molecule.This can make the DNA (for example, methylated DNA) of the non-specific partition in the second partition degraded, to produce the processed second partition.Alternatively or additionally, the first partition can contact with the MSRE that selectively digests the unmethylated nucleic acid molecule, thereby degrade the DNA of the non-specific partition in the first partition, to produce the processed first partition.Degrading the DNA of the non-specific partition in one or both of the first or second partitions is proposed as an improvement in the performance of the method for the accurate partitioning of the DNA modified based on cytosine, for example, the presence of the DNA of abnormal modification in the detection sample, determining the tissue of the DNA source, and/or determining whether the experimenter suffers from cancer.For example, such degradation can provide the sensitivity of improvement and/or simplify downstream analysis.

在使分区与核酸酶诸如MSRE接触时,可以使用一种或更多种核酸酶。在一些实施方案中,使分区与多于一种核酸酶接触。可以使分区顺序地或同时地与核酸酶接触。当核酸酶在相似条件(例如缓冲液组成)下具有活性时,同时使用核酸酶可能是有利的,以避免不必要的样品操作。使第二分区与多于一种MSRE接触可以更完全地降解非特异性分区的高甲基化DNA。类似地,使第一分区与多于一种MSRE接触可以更完全地降解非特异性分区的低甲基化和/或未甲基化的DNA。When making partition and nuclease such as MSRE contact, one or more nucleases can be used.In some embodiments, partition is contacted with more than one nuclease.Partition can be contacted with nuclease sequentially or simultaneously.When nuclease has activity under similar conditions (such as buffer composition), it may be advantageous to use nuclease simultaneously to avoid unnecessary sample operation.Contacting the second partition with more than one MSRE can more completely degrade the high methylated DNA of non-specific partition.Similarly, contacting the first partition with more than one MSRE can more completely degrade the low methylated and/or unmethylated DNA of non-specific partition.

在一些实施方案中,选择性地消化甲基化的核酸分子的MSRE包括MspJI、LpnPI、FspEI或McrBC中的一种或更多种。在一些实施方案中,使用至少两种选择性地消化甲基化的核酸分子的MSRE。在一些实施方案中,使用至少三种选择性地消化甲基化的核酸分子的MSRE。In some embodiments, the MSRE that selectively digests the methylated nucleic acid molecule includes one or more of MspJI, LpnPI, FspEI or McrBC. In some embodiments, at least two MSREs that selectively digest the methylated nucleic acid molecule are used. In some embodiments, at least three MSREs that selectively digest the methylated nucleic acid molecule are used.

在一些实施方案中,选择性地消化未甲基化的核酸分子的MSRE包括以下中的一种或更多种:AatII、AccII、AciI、Aor13HI、Aor15HI、BspT104I、BssHII、BstUI、Cfr10I、ClaI、CpoI、Eco52I、HaeII、HapII、HhaI、Hin6I、HpaII、HpyCH4IV、MluI、MspI、NaeI、NotI、NruI、NsbI、PmaCI、Psp1406I、PvuI、SacII、SalI、SmaI和SnaBI。在一些实施方案中,使用至少两种选择性地消化未甲基化的核酸分子的MSRE。在一些实施方案中,使用至少三种选择性地消化未甲基化的核酸分子的MSRE。在一些实施方案中,MSRE包括BstUI和HpaII。在一些实施方案中,两种MSRE包括HhaI和AccII。在一些实施方案中,MSER包括BstUI、HpaII和Hin6I。In some embodiments, the MSRE that selectively digests unmethylated nucleic acid molecules includes one or more of the following: AatII, AccII, AciI, Aor13HI, Aor15HI, BspT104I, BssHII, BstUI, Cfr10I, ClaI, CpoI, Eco52I, HaeII, HapII, HhaI, Hin6I, HpaII, HpyCH4IV, MluI, MspI, NaeI, NotI, NruI, NsbI, PmaCI, Psp1406I, PvuI, SacII, SalI, SmaI and SnaBI. In some embodiments, at least two MSREs that selectively digest unmethylated nucleic acid molecules are used. In some embodiments, at least three MSREs that selectively digest unmethylated nucleic acid molecules are used. In some embodiments, MSRE includes BstUI and HpaII. In some embodiments, two kinds of MSREs include HhaI and AccII. In some embodiments, the MSER includes BstUI, HpaII, and Hin6I.

在一些实施方案中,在加标签或将衔接子附接到DNA两端的步骤之后,如以上描述地使分区与核酸酶接触。标签或衔接子可以使用以上描述的任何方法耐受核酸酶的裂解。在这种方法中,裂解可以阻止对非特异性分区的分子进行分析,因为裂解产物在两端缺乏标签或衔接子。In some embodiments, after the step of tagging or attaching adapters to both ends of the DNA, the partition is contacted with a nuclease as described above. The tag or adapter can be resistant to cleavage by the nuclease using any of the methods described above. In this method, cleavage can prevent the analysis of molecules of non-specific partitions because the cleavage products lack tags or adapters at both ends.

可选地,可以如以上描述地在用核酸酶消化之后进行加标签或附接衔接子的步骤。然后,可以基于具有对应于核酸酶识别位点的末端(与标签或衔接子附接的点)在序列读段中鉴定裂解的分子。以这种方式处理分子也可以允许从裂解的分子中获取信息,例如,观察体细胞突变。在分区与核酸酶接触后进行加标签或附接衔接子,以及分析低分子量DNA诸如cfDNA时,可能期望在接触步骤之前从样品中去除高分子量DNA(诸如污染的基因组DNA)。还可能期望使用能够在相对低的温度(例如65℃或更低,或60℃或更低)被热失活的核酸酶,以避免DNA变性,因为变性可以干扰随后的连接步骤。Alternatively, the step of labeling or attaching adapters can be performed after digestion with nucleases as described above. Then, the cleaved molecules can be identified in the sequence reads based on the ends corresponding to the nuclease recognition sites (the points to which the tags or adapters are attached). Treating molecules in this way can also allow information to be obtained from the cleaved molecules, for example, observing somatic mutations. When labeling or attaching adapters after the partitions are contacted with nucleases, and analyzing low molecular weight DNA such as cfDNA, it may be desirable to remove high molecular weight DNA (such as contaminated genomic DNA) from the sample before the contact step. It may also be desirable to use nucleases that can be heat-inactivated at relatively low temperatures (e.g., 65°C or lower, or 60°C or lower) to avoid DNA denaturation, because denaturation can interfere with subsequent connection steps.

在将样品分区为三个分区,包括包含中等甲基化分子的第三分区的情况下,在一些实施方案中,使第三分区与MSRE(例如,选择性地消化未甲基化的核酸分子的MSRE)接触。这样的步骤可以具有本文其他地方描述的关于接触步骤的任何特征,并且可以在如以上讨论的加标签或附接衔接子的步骤之前或之后进行。在一些实施方案中,在与MSRE接触之前组合第一和第三分区。这样的步骤可以具有本文其他地方描述的关于接触步骤的任何特征,并且可以在如以上讨论的加标签或附接衔接子的步骤之前或之后进行。在一些实施方案中,在组合之前对第一和第三分区差异性加标签。In the case where the sample is partitioned into three partitions, including a third partition comprising a medium methylated molecule, in some embodiments, the third partition is contacted with an MSRE (e.g., an MSRE that selectively digests unmethylated nucleic acid molecules). Such a step may have any of the features described elsewhere herein regarding the contact step, and may be performed before or after the step of labeling or attaching an adapter as discussed above. In some embodiments, the first and third partitions are combined before contacting with the MSRE. Such a step may have any of the features described elsewhere herein regarding the contact step, and may be performed before or after the step of labeling or attaching an adapter as discussed above. In some embodiments, the first and third partitions are differentially labeled before combination.

在一些实施方案中,在将样品分区为三个分区,包括包含中等甲基化分子的第三分区的情况下,在一些实施方案中,使第三分区与选择性地消化甲基化的核酸分子的MSRE接触。这样的步骤可以具有本文其他地方描述的关于接触步骤的任何特征,并且可以在如以上讨论的加标签或附接衔接子的步骤之前或之后进行。在一些实施方案中,在与MSRE接触之前,组合第二和第三分区。这样的步骤可以具有本文其他地方描述的关于接触步骤的任何特征,并且可以在如以上讨论的加标签或附接衔接子的步骤之前或之后进行。在一些实施方案中,在组合之前对第二和第三分区差异性加标签。In some embodiments, when the sample is partitioned into three partitions, including the third partition comprising medium methylated molecules, in some embodiments, the third partition is contacted with an MSRE that selectively digests methylated nucleic acid molecules. Such a step can have any of the features described elsewhere herein about the contact step, and can be performed before or after the step of labeling or attaching adapters as discussed above. In some embodiments, before contacting with the MSRE, the second and third partitions are combined. Such a step can have any of the features described elsewhere herein about the contact step, and can be performed before or after the step of labeling or attaching adapters as discussed above. In some embodiments, the second and third partitions are labeled for differences before combination.

在一些实施方案中,在与核酸酶接触后,例如使用SPRI珠纯化DNA。这样的纯化可以在核酸酶热失活之后发生。可选地,可以省略纯化;因此,例如,可以对含有热失活核酸酶的分区进行随后的步骤诸如扩增。在另一实施方案中,接触步骤可以在存在纯化试剂诸如SPRI珠的情况下进行,例如,以便使与管转移相关的损失最小化。在裂解和热失活之后,SPRI珠可以通过添加分子拥挤试剂(molecular crowding reagents)(例如PEG)和盐来重新用于清理。In some embodiments, after contacting with the nuclease, the DNA is purified, for example, using SPRI beads. Such purification can occur after heat inactivation of the nuclease. Alternatively, purification can be omitted; thus, for example, the partition containing the heat-inactivated nuclease can be subjected to subsequent steps such as amplification. In another embodiment, the contacting step can be performed in the presence of a purification reagent such as SPRI beads, for example, in order to minimize losses associated with tube transfer. After lysis and heat inactivation, the SPRI beads can be reused for cleanup by adding molecular crowding reagents (e.g., PEG) and salts.

D.扩增D. Amplification

样品核酸可以侧接衔接子,并且使用与待扩增的DNA分子侧翼的衔接子中的引物结合位点结合的核酸引物通过PCR和其它扩增方法来扩增。在一些实施方案中,扩增方法包括由热循环产生的延伸、变性和退火的循环,或者可以是等温的,例如,在转录介导的扩增中。可以任选地利用的扩增方法的其他实例包括连接酶链式反应、链置换扩增、基于核酸序列的扩增和基于自主维持序列的复制(self-sustained sequence-based replication)。The sample nucleic acid can be flanked by adapters and amplified by PCR and other amplification methods using nucleic acid primers that bind to primer binding sites in the adapters flanking the DNA molecule to be amplified. In some embodiments, the amplification method includes cycles of extension, denaturation, and annealing generated by thermal cycling, or can be isothermal, for example, in transcription-mediated amplification. Other examples of amplification methods that can optionally be utilized include ligase chain reaction, strand displacement amplification, nucleic acid sequence-based amplification, and self-sustained sequence-based replication.

通常,扩增反应生成多于一个非独特或独特地加标签的核酸扩增子,其分子条形码和样品索引的大小范围为约150个核苷酸(nt)至约700nt、250nt至约350nt或约320nt至约550nt。在一些实施方案中,扩增子具有约180nt的大小。在一些实施方案中,扩增子具有约200nt的大小。Typically, the amplification reaction generates more than one non-uniquely or uniquely tagged nucleic acid amplicon with a molecular barcode and sample index ranging in size from about 150 nucleotides (nt) to about 700 nt, 250 nt to about 350 nt, or about 320 nt to about 550 nt. In some embodiments, the amplicon has a size of about 180 nt. In some embodiments, the amplicon has a size of about 200 nt.

在一些实施方案中,本方法包括在连接到衔接子之前,用T-尾和C-尾衔接子进行dsDNA连接,这导致至少50%、60%、70%或80%的双链核酸的扩增。优选地,相对于单独用T尾衔接子进行的对照方法,本方法使扩增分子的量或数目增加至少10%、15%或20%。In some embodiments, the method comprises ligating dsDNA with T-tail and C-tail adapters prior to ligation to adapters, which results in amplification of at least 50%, 60%, 70% or 80% of double-stranded nucleic acids. Preferably, the method increases the amount or number of amplified molecules by at least 10%, 15% or 20% relative to a control method performed with T-tail adapters alone.

在一些实施方案中,被MSRE消化的核酸分子不被扩增。在一些这样的实施方案中,除了被MSRE消化的核酸分子之外,样品中基本上所有的核酸分子都被扩增。In some embodiments, nucleic acid molecules digested by MSRE are not amplified.In some such embodiments, substantially all nucleic acid molecules in the sample are amplified except nucleic acid molecules digested by MSRE.

E.富集/捕获E. Enrichment/Capture

在一些实施方案中,本文公开的方法包括捕获或富集核酸分子的一个或更多个靶区。可以使用本领域中已知的任何合适的方法来进行捕获。在一些实施方案中,捕获包括使待被捕获的DNA与靶特异性探针组,例如本文描述的探针接触。可以对本文公开的方法过程中制备的一个或更多个分区进行捕获。在一些实施方案中,从至少第一分区或第二分区,例如至少第一分区和第二分区中捕获DNA。可以对分区或分区组的任何一个、任何两个或所有亚组进行捕获。在一些实施方案中,对分区差异性加标签(例如,如本文描述的),并且然后在进行捕获之前进行汇集。In some embodiments, the methods disclosed herein include capturing or enriching one or more target areas of nucleic acid molecules. Any suitable method known in the art can be used to capture. In some embodiments, capturing includes contacting the DNA to be captured with a target-specific probe group, such as a probe described herein. One or more partitions prepared in the method process disclosed herein can be captured. In some embodiments, DNA is captured from at least the first partition or the second partition, such as at least the first partition and the second partition. Any one, any two or all subgroups of a partition or a partition group can be captured. In some embodiments, the partition differences are labeled (e.g., as described herein), and then collected before capturing.

捕获步骤可以使用适于特定核酸杂交的条件进行,该条件通常在某种程度上取决于探针的特征,诸如长度、碱基组成等。鉴于本领域有关核酸杂交的一般知识,本领域技术人员对适当的条件将是熟悉的。在一些实施方案中,形成靶特异性探针和DNA的复合物。The capture step can be performed using conditions suitable for specific nucleic acid hybridization, which conditions generally depend to some extent on the characteristics of the probe, such as length, base composition, etc. In view of the general knowledge of nucleic acid hybridization in the art, those skilled in the art will be familiar with appropriate conditions. In some embodiments, a complex of a target-specific probe and DNA is formed.

在一些实施方案中,本文描述的方法包括捕获从测试受试者获得的cfDNA的多于一个靶区组。靶区包括表观遗传靶区,所述表观遗传靶区可以显示出甲基化水平和/或片段化模式的差异,这取决于它们来源于肿瘤细胞还是来源于健康细胞。靶区还包括序列可变靶区,所述序列可变靶区可以显示出序列差异,这取决于它们来源于肿瘤细胞还是来源于健康细胞。捕获步骤产生cfDNA分子的捕获组。在一些实施方案中,在cfDNA分子的捕获组中,对应于序列可变靶区组的cfDNA分子以比对应于表观遗传靶区组的cfDNA分子更大的捕获产量被捕获。对于捕获步骤、捕获产量和相关方面的另外的讨论,参见WO2020/160414,为了所有目的将其通过引用并入本文。In some embodiments, the methods described herein include capturing more than one target group of cfDNA obtained from a test subject. The target area includes an epigenetic target area, which can show differences in methylation levels and/or fragmentation patterns, depending on whether they are derived from tumor cells or from healthy cells. The target area also includes a sequence variable target area, which can show sequence differences, depending on whether they are derived from tumor cells or from healthy cells. The capture step produces a capture group of cfDNA molecules. In some embodiments, in the capture group of cfDNA molecules, the cfDNA molecules corresponding to the sequence variable target area group are captured with a greater capture yield than the cfDNA molecules corresponding to the epigenetic target area group. For further discussion of capture steps, capture yields, and related aspects, see WO2020/160414, which is incorporated herein by reference for all purposes.

在一些实施方案中,本文描述的方法包括使从测试受试者获得的cfDNA与靶特异性探针组接触,其中靶特异性探针组被配置为以比对应于表观遗传靶区组的cfDNA更大的捕获产量捕获对应于序列可变靶区组的cfDNA。In some embodiments, the methods described herein include contacting cfDNA obtained from a test subject with a target-specific probe set, wherein the target-specific probe set is configured to capture cfDNA corresponding to a set of sequence variable target regions with a greater capture yield than cfDNA corresponding to a set of epigenetic target regions.

以比对应于表观遗传靶区组的cfDNA更大的捕获产量捕获对应于序列可变靶区组的cfDNA是有益的,因为以足够的置信度或准确度分析序列可变靶区可能需要的测序深度比分析表观遗传靶区可能需要的测序深度更大。确定片段化模式(例如,测试转录起始位点或CTCF结合位点的扰动)或片段丰度(例如,在高甲基化分区和低甲基化分区中)所需的数据量通常小于确定癌症相关序列突变的存在或不存在所需的数据量。以不同的产量捕获靶区组可以有助于在同一测序运行中(例如,使用汇集的混合物和/或在同一测序池中)将靶区测序到不同的测序深度。It is beneficial to capture cfDNA corresponding to a set of sequence variable target regions at a greater capture yield than cfDNA corresponding to a set of epigenetic target regions because a greater sequencing depth may be required to analyze sequence variable target regions with sufficient confidence or accuracy than an epigenetic target region. The amount of data required to determine fragmentation patterns (e.g., to test perturbations of transcription start sites or CTCF binding sites) or fragment abundance (e.g., in hypermethylated and hypomethylated partitions) is typically less than the amount of data required to determine the presence or absence of cancer-associated sequence mutations. Capturing target region sets at different yields can facilitate sequencing target regions to different sequencing depths in the same sequencing run (e.g., using pooled mixtures and/or in the same sequencing pool).

在各种实施方案中,该方法还包括将捕获的cfDNA测序到,例如,对表观遗传靶区组和序列可变靶区组不同程度的测序深度,与本文讨论的一致。In various embodiments, the method further comprises sequencing the captured cfDNA to, for example, varying degrees of sequencing depth for a set of epigenetic target regions and a set of sequence variable target regions, consistent with discussion herein.

在一些实施方案中,将靶特异性探针和DNA的复合物与未结合到靶特异性探针的DNA分离。例如,在靶特异性探针共价地或非共价地结合到固体支持物的情况下,可以使用洗涤或抽吸步骤来分离未结合的材料。可选地,在复合物具有不同于未结合材料的色谱性质的情况下(例如,在探针包含结合色谱树脂的配体的情况下),可以使用色谱法。In some embodiments, the complex of the target-specific probe and DNA is separated from DNA that is not bound to the target-specific probe. For example, where the target-specific probe is covalently or non-covalently bound to a solid support, a washing or aspiration step can be used to separate the unbound material. Alternatively, where the complex has chromatographic properties different from those of the unbound material (e.g., where the probe comprises a ligand that binds to a chromatographic resin), chromatography can be used.

如本文其他地方详细讨论的,靶特异性探针组可以包括多于一个组,诸如针对序列可变靶区组的探针和针对表观遗传靶区组的探针。在一些这样的实施方案中,在同一容器中同时使用针对序列可变靶区的探针和针对表观遗传靶区的探针进行捕获步骤,例如,针对序列可变靶区组和表观遗传靶区组的探针在同一组合物中。该方法提供了相对效率更高的工作流程。在一些实施方案中,针对序列可变靶区组的探针的浓度大于针对表观遗传靶区组的探针的浓度。As discussed in detail elsewhere herein, target-specific probe groups can include more than one group, such as probes for a sequence variable target group and probes for an epigenetic target group. In some such embodiments, a capture step is performed using probes for a sequence variable target and probes for an epigenetic target in the same container, for example, probes for a sequence variable target group and an epigenetic target group are in the same composition. This method provides a relatively more efficient workflow. In some embodiments, the concentration of probes for a sequence variable target group is greater than the concentration of probes for an epigenetic target group.

可选地,在第一容器中用序列可变靶区探针组并在第二容器中用表观遗传靶区探针组进行捕获步骤,或者在第一时间和第一容器用序列可变靶区探针组并在第一时间之前或之后的第二时间用表观遗传靶区探针组进行接触步骤。该方法允许制备单独的第一组合物和第二组合物,所述第一组合物和第二组合物包含对应于序列可变靶区组的捕获的DNA和对应于表观遗传靶区组的捕获的DNA。所述组合物可以按期望单独地处理(例如,基于甲基化进行分级,如本文其他地方描述的),并以适当比例重新组合以提供用于进一步处理和分析诸如测序的材料。Optionally, the capture step is performed in the first container with a sequence variable target probe set and in the second container with an epigenetic target probe set, or the contacting step is performed at a first time and in the first container with a sequence variable target probe set and at a second time before or after the first time with an epigenetic target probe set. The method allows the preparation of separate first and second compositions, which contain captured DNA corresponding to the sequence variable target set and captured DNA corresponding to the epigenetic target set. The compositions can be processed separately as desired (e.g., graded based on methylation, as described elsewhere herein), and recombined in appropriate proportions to provide materials for further processing and analysis such as sequencing.

在一些实施方案中,扩增DNA。在一些实施方案中,在捕获步骤之前进行扩增。在一些实施方案中,在捕获步骤之后进行扩增。In some embodiments, the DNA is amplified. In some embodiments, the amplification is performed before the capture step. In some embodiments, the amplification is performed after the capture step.

在一些实施方案中,DNA中包含衔接子。这可以与扩增程序同时进行,例如,通过在引物的5’部分中提供衔接子,例如,如以上描述的。可选地,衔接子可以通过其他方法诸如连接添加。In some embodiments, the DNA includes an adapter. This can be performed simultaneously with the amplification procedure, for example, by providing an adapter in the 5' portion of the primer, for example, as described above. Alternatively, the adapter can be added by other methods such as ligation.

在一些实施方案中,DNA(例如添加至DNA的衔接子)中包含标签,标签可以是条形码或包含条形码。标签可以有助于鉴定核酸的来源。例如,条形码可用于允许在汇集多于一个样品用于并行测序之后鉴定DNA来自的来源,例如,受试者。这可以与扩增程序同时进行,例如,通过在引物的5’部分中提供条形码,例如,如以上描述的。在一些实施方案中,衔接子和标签/条形码由同一引物或引物组提供。例如,条形码可以位于衔接子的3’和引物的靶杂交部分的5’。可选地,条形码可以通过其他方法添加,诸如连接,任选地与衔接子一起在同一连接底物中。In some embodiments, a label is included in the DNA (e.g., an adapter added to the DNA), which can be a barcode or include a barcode. The label can help identify the source of the nucleic acid. For example, a barcode can be used to allow the source of the DNA to be identified after more than one sample is collected for parallel sequencing, e.g., a subject. This can be performed simultaneously with the amplification procedure, e.g., by providing a barcode in the 5' portion of the primer, e.g., as described above. In some embodiments, the adapter and label/barcode are provided by the same primer or primer set. For example, the barcode can be located at the 3' of the adapter and the 5' of the target hybridization portion of the primer. Alternatively, the barcode can be added by other methods, such as ligation, optionally in the same ligation substrate together with the adapter.

关于扩增、标签和条形码的另外的细节在本文的其他部分中讨论,这些细节可以在可行的程度上与本文阐述的任何实施方案组合。Additional details regarding amplification, labeling, and barcoding are discussed elsewhere herein, which details may be combined, to the extent practicable, with any of the embodiments set forth herein.

在一些实施方案中,在对核酸测序之前富集序列。可以任选地对特定的靶区进行富集,或者可以非特异性地进行富集(“靶序列”)。在一些实施方案中,可以使用差异性平铺(tiling)和捕获方案,用核酸捕获探针(“诱饵”)(诸如针对一个或更多个诱饵组的组(oneor more bait set panels)选择的靶区探针组)富集/捕获感兴趣的靶向区域。差异性平铺和捕获方案通常使用不同相对浓度的诱饵集以在遍及与诱饵相关的基因组区域中差异性平铺(例如,以不同的“分辨率”),经受一组限制(例如,测序仪限制,诸如测序载量、每种诱饵的效用等),并以下游测序所需的水平捕获靶核酸。这些感兴趣的靶基因组区域任选地包括天然核苷酸序列或核酸构建体的合成核苷酸序列。在一些实施方案中,具有针对一个或更多个感兴趣区域的探针的生物素标记的珠可以用于捕获靶序列,并且任选地随后扩增这些区域,以富集感兴趣区域。在一些实施方案中,核酸捕获探针可以是单链RNA或双链DNA分子。In some embodiments, before nucleic acid sequencing, enrichment sequence.Can optionally enrich specific target area, or can non-specifically enrich (" target sequence ").In some embodiments, can use difference tiling (tiling) and capture scheme, with nucleic acid capture probe (" bait ") (such as the target probe group selected for one or more bait set panels) enrichment/capture of interested target region.Different tiling and capture scheme usually use bait set of different relative concentrations to spread in the genomic region related to bait (for example, with different " resolutions "), subject to a group of restrictions (for example, sequencer restrictions, such as sequencing loading, the effectiveness of every kind of bait, etc.), and capture target nucleic acid with the level required for downstream sequencing.These target genomic regions of interest optionally include natural nucleotide sequence or the synthetic nucleotide sequence of nucleic acid construct.In some embodiments, the biotin-labeled pearl with probe for one or more regions of interest can be used to capture the target sequence, and optionally amplify these regions subsequently, to enrich the region of interest. In some embodiments, the nucleic acid capture probe can be a single-stranded RNA or a double-stranded DNA molecule.

序列捕获通常包括使用与靶核酸序列杂交的寡核苷酸探针。在一些实施方案中,探针组策略包括将探针平铺在感兴趣的区域内。这样的探针的长度可以为,例如,约60个至约120个核苷酸。该组可以具有约2X、3X、4X、5X、6X、7X、8X、9X、10X、15X、20X、50X或多于50X的深度(例如,覆盖深度)。序列捕获的有效性通常部分地取决于靶分子中与探针序列互补(或几乎互补)的序列的长度。Sequence capture generally includes the use of oligonucleotide probes hybridized with the target nucleic acid sequence. In some embodiments, the probe set strategy includes tiling the probe in the region of interest. The length of such probe can be, for example, about 60 to about 120 nucleotides. The group can have a depth (for example, coverage depth) of about 2X, 3X, 4X, 5X, 6X, 7X, 8X, 9X, 10X, 15X, 20X, 50X or more than 50X. The effectiveness of sequence capture generally depends in part on the length of the sequence complementary (or almost complementary) to the probe sequence in the target molecule.

在一些实施方案中,从第一分区中捕获第一靶区组,包括至少表观遗传靶区。从第一分区捕获的表观遗传靶区可以包括高甲基化可变靶区。在一些实施方案中,高甲基化可变靶区是在来自健康受试者的cfDNA中未甲基化的或具有低甲基化(例如,相对于大量cfDNA(bulk cfDNA)低于平均的甲基化)的含CpG区域。在一些实施方案中,高甲基化可变靶区是在健康cfDNA中显示出比在至少一种其他组织类型中低的甲基化的区域。不希望受任何特定理论束缚,癌细胞可以比同一组织类型的健康细胞脱落更多的DNA到到血流中。因此,cfDNA来源组织的分布可以在癌变时变化。因此,第一分区中高甲基化可变靶区水平的增加可以是癌症存在(或复发,取决于受试者的病史)的指标。In some embodiments, the first target group is captured from the first partition, including at least an epigenetic target area. The epigenetic target area captured from the first partition may include a hypermethylated variable target area. In some embodiments, the hypermethylated variable target area is a CpG-containing region that is unmethylated or has low methylation (e.g., methylation below average relative to a large amount of cfDNA (bulk cfDNA)) in cfDNA from healthy subjects. In some embodiments, the hypermethylated variable target area is a region that shows low methylation in healthy cfDNA than in at least one other tissue type. Without wishing to be bound by any particular theory, cancer cells can shed more DNA into the bloodstream than healthy cells of the same tissue type. Therefore, the distribution of cfDNA-derived tissues can change when cancerous. Therefore, the increase in the level of hypermethylated variable target areas in the first partition can be an indicator of cancer presence (or recurrence, depending on the medical history of the subject).

在一些实施方案中,从第二分区中捕获第二靶区组,包括至少表观遗传靶区。表观遗传靶区可以包括低甲基化可变靶区。在一些实施方案中,低甲基化可变靶区是在来自健康受试者的cfDNA中甲基化的或具有高甲基化(例如,相对于大量cfDNA高于平均的甲基化)的含CpG区域。在一些实施方案中,低甲基化可变靶区是在健康cfDNA中显示出比在至少一种其他组织类型中高的甲基化的区域。不希望受任何特定理论束缚,癌细胞可以比同一组织类型的健康细胞脱落更多的DNA到到血流中。因此,cfDNA来源组织的分布可以在癌变时变化。因此,第二分区中低甲基化可变靶区水平的增加可以是癌症存在(或复发,取决于受试者的病史)的指标。In some embodiments, the second target group is captured from the second partition, including at least an epigenetic target area. The epigenetic target area may include a low methylation variable target area. In some embodiments, the low methylation variable target area is a CpG-containing area that is methylated in cfDNA from healthy subjects or has high methylation (e.g., methylation higher than average relative to a large amount of cfDNA). In some embodiments, the low methylation variable target area is a region that shows high methylation in healthy cfDNA than in at least one other tissue type. Without wishing to be bound by any particular theory, cancer cells can shed more DNA into the bloodstream than healthy cells of the same tissue type. Therefore, the distribution of cfDNA source tissues can change when cancerous. Therefore, the increase in the level of low methylation variable target areas in the second partition can be an indicator of cancer presence (or recurrence, depending on the medical history of the subject).

在一些实施方案中,富集的DNA分子(或捕获组)可以包括对应于序列可变靶区组和表观遗传靶区组的DNA。在一些实施方案中,当针对靶区的尺寸(足迹尺寸)的差异进行归一化时,捕获的序列可变靶区DNA的量大于捕获的表观遗传靶区DNA的量。在一些实施方案中,组合物、方法和系统在PCT专利申请第PCT/US2020/016120号中描述,在此将该申请通过引用以其整体并入。In some embodiments, the enriched DNA molecules (or capture groups) may include DNA corresponding to sequence variable target groups and epigenetic target groups. In some embodiments, when normalized for differences in the size (footprint size) of the target area, the amount of captured sequence variable target area DNA is greater than the amount of captured epigenetic target area DNA. In some embodiments, compositions, methods, and systems are described in PCT patent application No. PCT/US2020/016120, which is hereby incorporated by reference in its entirety.

可选地,可以提供分别包括对应于序列可变靶区组的DNA和对应于表观遗传靶区组的DNA的第一捕获组和第二捕获组。可以组合第一捕获组和第二捕获组以提供组合的捕获组。Alternatively, a first capture group and a second capture group may be provided that include DNA corresponding to a set of sequence variable target regions and DNA corresponding to a set of epigenetic target regions, respectively. The first capture group and the second capture group may be combined to provide a combined capture group.

在包括对应于序列可变靶区组和表观遗传靶区组的DNA的捕获组(包括如以上讨论的组合的捕获组)中,对应于序列可变靶区组的DNA可以以比对应于表观遗传靶区组的DNA更大的浓度(例如,1.1倍至1.2倍大的浓度、1.2倍至1.4倍大的浓度、1.4倍至1.6倍大的浓度、1.6倍至1.8倍大的浓度、1.8倍至2.0倍大的浓度、2.0倍至2.2倍大的浓度、2.2倍至2.4倍大的浓度、2.4倍至2.6倍大的浓度、2.6倍至2.8倍大的浓度、2.8倍至3.0倍大的浓度、3.0倍至3.5倍大的浓度、3.5倍至4.0、4.0倍至4.5倍大的浓度、4.5倍至5.0倍大的浓度、5.0倍至5.5倍大的浓度、5.5倍至6.0倍大的浓度、6.0倍至6.5倍大的浓度、6.5倍至7.0倍大、7.0倍至7.5倍大的浓度、7.5倍至8.0倍大的浓度、8.0倍至8.5倍大的浓度、8.5倍至9.0倍大的浓度、9.0倍至9.5倍大的浓度、9.5倍至10.0倍大的浓度、10倍至11倍大的浓度、11倍至12倍大的浓度、12倍至13倍大的浓度、13倍至14倍大的浓度、14倍至15倍大的浓度、15倍至16倍大的浓度、16倍至17倍大的浓度、17倍至18倍大的浓度、18倍至19倍大的浓度、19倍至20倍大的浓度、20倍至30倍大的浓度、30倍至40倍大的浓度、40倍至50倍大的浓度、50倍至60倍大的浓度、60倍至70倍大的浓度、70倍至80倍大的浓度、80倍至90倍大的浓度或90倍至100倍大的浓度)存在。浓度差异的程度按针对靶区足迹尺寸的归一化计算,如定义章节中讨论的。In capture sets that include DNA corresponding to a set of sequence variable target regions and a set of epigenetic target regions (including combined capture sets as discussed above), the DNA corresponding to the set of sequence variable target regions can be present at a greater concentration (e.g., 1.1-fold to 1.2-fold greater concentration, 1.2-fold to 1.4-fold greater concentration, 1.4-fold to 1.6-fold greater concentration, 1.6-fold to 1.8-fold greater concentration, 1.8-fold to 2.0-fold greater concentration, 2.0-fold greater concentration) than the DNA corresponding to the set of epigenetic target regions. .0 times to 2.2 times greater concentration, 2.2 times to 2.4 times greater concentration, 2.4 times to 2.6 times greater concentration, 2.6 times to 2.8 times greater concentration, 2.8 times to 3.0 times greater concentration, 3.0 times to 3.5 times greater concentration, 3.5 times to 4.0, 4.0 times to 4.5 times greater concentration, 4.5 times to 5.0 times greater concentration, 5.0 times to 5.5 times greater concentration, 5.5 times to 6.0 times greater concentration, 6.0 times to 6.5 times greater concentration concentration, 6.5 times to 7.0 times greater, 7.0 times to 7.5 times greater, 7.5 times to 8.0 times greater, 8.0 times to 8.5 times greater, 8.5 times to 9.0 times greater, 9.0 times to 9.5 times greater, 9.5 times to 10.0 times greater, 10 times to 11 times greater, 11 times to 12 times greater, 12 times to 13 times greater, 13 times to 14 times greater, 14 times to 15 times greater, The concentration difference is calculated normalized to the target footprint size, as discussed in the definitions section.

a.表观遗传靶区组a. Epigenetic target group

表观遗传靶区组可以包括一个或更多个类型的靶区,所述靶区可以区分来自赘生性(例如,肿瘤或癌症)细胞的DNA与来自健康细胞(例如,非赘生性循环细胞)的DNA。本文详细讨论了这样的区域的示例性类型。在一些实施方案中,根据本公开内容的方法包括确定对应于表观遗传靶区组的cfDNA分子是否包括或指示癌症相关的表观遗传修饰(例如,一个或更多个高甲基化可变靶区中的高甲基化;CTCF结合的一个或更多个扰动;和/或转录起始位点的一个或更多个扰动)和/或拷贝数变异(例如,聚焦扩增)。表观遗传靶区组还可以包括一个或更多个对照区,例如,如本文描述的。The epigenetic target group may include one or more types of target areas that can distinguish DNA from neoplastic (e.g., tumor or cancer) cells from DNA from healthy cells (e.g., non-neoplastic circulating cells). Exemplary types of such regions are discussed in detail herein. In some embodiments, the method according to the present disclosure includes determining whether the cfDNA molecule corresponding to the epigenetic target group includes or indicates cancer-related epigenetic modifications (e.g., hypermethylation in one or more hypermethylated variable target areas; one or more perturbations of CTCF binding; and/or one or more perturbations of transcription start sites) and/or copy number variations (e.g., focused amplification). The epigenetic target group may also include one or more control areas, for example, as described herein.

在一些实施方案中,表观遗传靶区组具有至少100kbp(例如至少200kbp、至少300kbp或至少400kbp)的足迹。在一些实施方案中,表观遗传靶区组具有100kbp-20Mbp(例如100-200kbp、200-300kbp、300-400kbp、400-500kbp、500-600kbp、600-700kbp、700-800kbp、800-900kbp、900-1,000kbp、1-1.5Mbp、1.5-2Mbp、2-3Mbp、3-4Mbp、4-5Mbp、5-6Mbp、6-7Mbp、7-8Mbp、8-9Mbp、9-10Mbp或10-20Mbp)范围内的足迹。在一些实施方案中,表观遗传靶区组具有至少20Mbp的足迹。In some embodiments, the set of epigenetic target regions has a footprint of at least 100 kbp (e.g., at least 200 kbp, at least 300 kbp, or at least 400 kbp). In some embodiments, the set of epigenetic target regions has a footprint in the range of 100 kbp-20 Mbp (e.g., 100-200 kbp, 200-300 kbp, 300-400 kbp, 400-500 kbp, 500-600 kbp, 600-700 kbp, 700-800 kbp, 800-900 kbp, 900-1,000 kbp, 1-1.5 Mbp, 1.5-2 Mbp, 2-3 Mbp, 3-4 Mbp, 4-5 Mbp, 5-6 Mbp, 6-7 Mbp, 7-8 Mbp, 8-9 Mbp, 9-10 Mbp, or 10-20 Mbp). In some embodiments, the set of epigenetic target regions has a footprint of at least 20 Mbp.

i.高甲基化可变靶区i. Hypermethylated variable target regions

在一些实施方案中,表观遗传靶区组包括一个或更多个高甲基化可变靶区。通常,高甲基化可变靶区是指这样的区域,在该区域中所观察到的甲基化水平的增加指示样品(例如cfDNA)含有由赘生性细胞(诸如肿瘤细胞或癌细胞)产生的DNA的可能性增加。例如,肿瘤抑制基因启动子的高甲基化已经被重复观察到。参见,例如,Kang等人,GenomeBiol.18:53(2017)及其中引用的参考文献。在另一实例中,如以上讨论的,高甲基化可变靶区可以包括这样的区域:相对于来自同一类型的健康组织的DNA,癌性组织中该区域的甲基化不一定不同,但相对于健康受试者中典型的cfDNA,该区域的甲基化确实不同(例如,具有更多甲基化)。In some embodiments, the epigenetic target group includes one or more hypermethylated variable target regions. Generally, a hypermethylated variable target region refers to a region in which the increase in the methylation level observed indicates that the sample (e.g., cfDNA) contains DNA produced by neoplastic cells (such as tumor cells or cancer cells) increases in likelihood. For example, hypermethylation of tumor suppressor gene promoters has been repeatedly observed. See, for example, Kang et al., Genome Biol. 18: 53 (2017) and references cited therein. In another example, as discussed above, a hypermethylated variable target region may include a region in which the methylation of the region in the cancerous tissue is not necessarily different relative to DNA from healthy tissue of the same type, but the methylation of the region is indeed different (e.g., with more methylation) relative to typical cfDNA in healthy subjects.

对结肠直肠癌中甲基化可变靶区的广泛讨论在Lam等人,Biochim BiophysActa.1866:106-20(2016)中提供。这些包括VIM、SEPT9、ITGA4、OSM4、GATA4和NDRG4。在表1中提供了基于结肠直肠癌(CRC)研究的示例性高甲基化可变靶区组,包括基因或其部分。这些基因中的许多种可能与结肠直肠癌之外的癌症有关;例如,TP53被广泛认为是至关重要的肿瘤抑制因子,并且该基因的基于高甲基化的失活可能是常见的致癌机制。An extensive discussion of methylated variable targets in colorectal cancer is provided in Lam et al., Biochim Biophys Acta. 1866: 106-20 (2016). These include VIM, SEPT9, ITGA4, OSM4, GATA4, and NDRG4. An exemplary set of hypermethylated variable targets based on colorectal cancer (CRC) studies, including genes or portions thereof, is provided in Table 1. Many of these genes may be associated with cancers other than colorectal cancer; for example, TP53 is widely considered to be a critical tumor suppressor, and hypermethylation-based inactivation of this gene may be a common oncogenic mechanism.

表1.基于CRC研究的示例性高甲基化靶区(基因或其部分)。Table 1. Exemplary hypermethylated target regions (genes or parts thereof) based on CRC studies.

在一些实施方案中,高甲基化可变靶区包括表1中列出的多于一个基因或其部分,例如,表1中列出的基因或其部分的至少10%、20%、30%、40%、50%、60%、70%、80%、90%或100%。例如,对于作为靶区被包括在内的每个基因座,可以有一种或更多种探针,所述探针具有在该基因的转录起始位点和终止密码子(对于选择性剪接的基因为最后的终止密码子)之间结合的杂交位点。在一些实施方案中,一个或更多个探针在表1中列出的基因或其部分的上游和/或下游300bp内(例如,在200bp或100bp内)结合。In some embodiments, the hypermethylated variable target region includes more than one gene listed in Table 1 or a portion thereof, for example, at least 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90% or 100% of the genes listed in Table 1 or a portion thereof. For example, for each locus included as a target region, there can be one or more probes having a hybridization site that binds between the transcription start site and the stop codon (the last stop codon for alternatively spliced genes) of the gene. In some embodiments, one or more probes bind within 300 bp upstream and/or downstream of the genes listed in Table 1 or a portion thereof (e.g., within 200 bp or 100 bp).

多种类型的肺癌中的甲基化可变靶区在以下中详细讨论:例如Ooki等人,Clin.Cancer Res.23:7141-52(2017);Belinksy,Annu.Rev.Physiol.77:453-74(2015);Hulbert等人,Clin.Cancer Res.23:1998-2005(2017);Shi等人,BMC Genomics 18:901(2017);Schneider等人,BMC Cancer.11:102(2011);Lissa等人,Transl Lung Cancer Res5(5):492-504(2016);Skvortsova等人,Br.J.Cancer.94(10):1492–1495(2006);Kim等人,Cancer Res.61:3419–3424(2001);Furonaka等人,Pathology International 55:303-309(2005);Gomes等人,Rev.Port.Pneumol.20:20-30(2014);Kim等人,Oncogene.20:1765-70(2001);Hopkins-Donaldson等人,Cell Death Differ.10:356-64(2003);Kikuchi等人,Clin.Cancer Res.11:2954-61(2005);Heller等人,Oncogene 25:959–968(2006);Licchesi等人,Carcinogenesis.29:895–904(2008);Guo等人,Clin.Cancer Res.10:7917-24(2004);Palmisano等人,Cancer Res.63:4620–4625(2003);和Toyooka等人,CancerRes.61:4556–4560,(2001)。在实例中,高甲基化可变靶区可以包括这样的区域,相对于来自同一类型的健康组织的DNA,癌性组织中该区域的甲基化不一定不同,但相对于健康受试者中典型的cfDNA,该区域的甲基化确实不同(例如,具有更多甲基化)。例如,当癌症的存在导致细胞死亡(诸如对应于癌症的组织类型的细胞凋亡)增加时,可以至少部分地使用这样的高甲基化可变靶区来检测这样的癌症。在一些实施方案中,高甲基化可变靶区包括一个或更多个基因组区域,其中在癌症受试者中这些区域中的cfDNA分子的甲基化状态相对于来自健康受试者的cfDNA没有差异,但是在这些区域中高甲基化cfDNA的存在/增加的量指示特定的组织类型(例如,癌症来源),并且以伴随增加的凋亡(例如,肿瘤脱落)进入循环中的cfDNA呈现。Methylation variable targets in various types of lung cancer are discussed in detail in, for example, Ooki et al., Clin. Cancer Res. 23:7141-52 (2017); Belinksy, Annu. Rev. Physiol. 77:453-74 (2015); Hulbert et al., Clin. Cancer Res. 23:1998-2005 (2017); Shi et al., BMC Genomics 18:901 (2017); Schneider et al., BMC Cancer. 11:102 (2011); Lissa et al., Transl Lung Cancer Res 5(5):492-504 (2016); Skvortsova et al., Br. J. Cancer. 94(10):1492–1495 (2006); Kim et al., Cancer Res. 61:3419–3424 (2001); Furonaka et al., Pathology International 55:303-309 (2005); Gomes et al., Rev. Port. Pneumol. 20:20-30 (2014); Kim et al., Oncogene. 20:1765-70 (2001); Hopkins-Donaldson et al., Cell Death Differ. 10:356-64 (2003); Kikuchi et al., Clin. Cancer Res. 11:2954-61 (2005); Heller et al., Oncogene 25:959–968 (2006); Licchesi et al., Carcinogenesis. 29:895–904 (2008); Guo ...11:295 Res.10:7917-24 (2004); Palmisano et al., Cancer Res.63:4620–4625 (2003); and Toyooka et al., Cancer Res.61:4556–4560, (2001). In an example, a hypermethylated variable target region may include a region where the methylation of the region in the cancerous tissue is not necessarily different relative to DNA from healthy tissue of the same type, but the methylation of the region is indeed different (e.g., has more methylation) relative to typical cfDNA in healthy subjects. For example, when the presence of cancer leads to increased cell death (such as apoptosis of the tissue type corresponding to the cancer), such a hypermethylated variable target region can be used, at least in part, to detect such cancer. In some embodiments, the hypermethylated variable target regions include one or more genomic regions, wherein the methylation status of cfDNA molecules in these regions is not different in cancer subjects relative to cfDNA from healthy subjects, but the presence/increased amount of hypermethylated cfDNA in these regions is indicative of a specific tissue type (e.g., cancer origin) and is present in cfDNA that enters the circulation with increased apoptosis (e.g., tumor shedding).

在表2中提供了基于肺癌研究的示例性高甲基化可变靶区组,包括基因或其部分。这些基因中的许多种可能与肺癌之外的癌症有关;例如,Casp8(胱天蛋白酶8)是程序性细胞死亡的关键酶,并且该基因的基于高甲基化的失活可能是常见的致癌机制,不限于肺癌。另外,一些基因在表1和表2两者中都出现,指示一般性。An exemplary set of hypermethylated variable target regions based on lung cancer research is provided in Table 2, including genes or portions thereof. Many of these genes may be associated with cancers other than lung cancer; for example, Casp8 (caspase 8) is a key enzyme for programmed cell death, and hypermethylation-based inactivation of this gene may be a common carcinogenic mechanism, not limited to lung cancer. In addition, some genes appear in both Table 1 and Table 2, indicating generality.

表2.基于肺癌研究的示例性高甲基化靶区(基因或其部分)。Table 2. Exemplary hypermethylated target regions (genes or portions thereof) based on lung cancer research.

涉及表2中鉴定的靶区的任何前述实施方案可以与涉及表1中鉴定的靶区的任何前述实施方案组合。在一些实施方案中,高甲基化可变靶区包括表1或表2中列出的多于一个基因或其部分,例如,表1或表2中列出的基因或其部分中的至少10%、20%、30%、40%、50%、60%、70%、80%、90%或100%。Any of the foregoing embodiments involving target regions identified in Table 2 can be combined with any of the foregoing embodiments involving target regions identified in Table 1. In some embodiments, the hypermethylated variable target regions include more than one gene listed in Table 1 or Table 2, or portions thereof, e.g., at least 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, or 100% of the genes listed in Table 1 or Table 2, or portions thereof.

另外的高甲基化靶区可以例如从癌症基因组图谱(the Cancer Genome Atlas)获得。Kang等人,Genome Biology 18:53(2017)描述了使用来自乳腺、结肠、肾、肝和肺的高甲基化靶区构建称为癌症定位器(Cancer Locator)的概率方法。在一些实施方案中,高甲基化靶区可以是对一个或更多个类型的癌症特异性的。因此,在一些实施方案中,高甲基化靶区包括一个、两个、三个、四个或五个高甲基化靶区亚组,所述高甲基化靶区亚组集体地显示出乳腺癌、结肠癌、肾癌、肝癌和肺癌中的一种、两种、三种、四种或五种中的高甲基化。Additional hypermethylated target areas can be obtained, for example, from the Cancer Genome Atlas. Kang et al., Genome Biology 18: 53 (2017) describes the use of hypermethylated target areas from breast, colon, kidney, liver and lung to construct a probabilistic method called Cancer Locator. In some embodiments, the hypermethylated target area can be specific to one or more types of cancer. Therefore, in some embodiments, the hypermethylated target area includes one, two, three, four or five hypermethylated target area subgroups, which collectively show hypermethylation in one, two, three, four or five types of breast cancer, colon cancer, kidney cancer, liver cancer and lung cancer.

在一些实施方案中,在从第一和第二分区捕获不同的表观遗传靶区的情况下,从第一分区捕获的表观遗传靶区包括高甲基化可变靶区。In some embodiments, where different epigenetic target regions are captured from the first and second partitions, the epigenetic target regions captured from the first partition include hypermethylated variable target regions.

ii.低甲基化可变靶区ii. Hypomethylated variable target regions

全面低甲基化是在多种癌症中普遍观察到的现象。参见,例如,Hon等人,GenomeRes.22:246-258(2012)(乳腺癌);Ehrlich,Epigenomics1:239-259(2009)(提到对结肠癌、卵巢癌、前列腺癌、白血病、肝细胞癌和宫颈癌中低甲基化的观察的综述文章)。例如,在健康细胞中正常被甲基化的区域(诸如重复元件(例如,LINE1元件、Alu元件、着丝粒串联重复序列、着丝粒周围串联重复序列和卫星DNA)和基因间区域)在肿瘤细胞中可能显示出减少的甲基化。因此,在一些实施方案中,表观遗传靶区组包括低甲基化可变靶区,其中所观察到的甲基化水平的降低指示样品(例如,cfDNA)含有由赘生性细胞(诸如肿瘤细胞或癌细胞)产生的DNA的可能性增加。在实例中,低甲基化可变靶区可以包括这样的区域:相对于来自同一类型的健康组织的DNA,癌性组织中该区域的甲基化状态不一定不同,但相对于健康受试者中典型的cfDNA,该区域的甲基化确实不同(例如,甲基化较少)。例如,当癌症的存在导致细胞死亡(诸如对应于癌症的组织类型的细胞凋亡)增加时,可以至少部分地使用这样的低甲基化可变靶区来检测这样的癌症。在一些实施方案中,低甲基化可变靶区包括一个或更多个基因组区域,其中在癌症受试者中这些区域中的cfDNA分子的甲基化状态相对于来自健康受试者的cfDNA没有差异,但是在这些区域中低甲基化cfDNA的存在/增加的量指示特定的组织类型(例如,癌症来源),并且以伴随增加的凋亡(例如,肿瘤脱落)进入循环中的cfDNA呈现。Comprehensive hypomethylation is a phenomenon commonly observed in a variety of cancers. See, for example, Hon et al., Genome Res. 22: 246-258 (2012) (breast cancer); Ehrlich, Epigenomics 1: 239-259 (2009) (review of observations of hypomethylation in colon cancer, ovarian cancer, prostate cancer, leukemia, hepatocellular carcinoma, and cervical cancer). For example, regions that are normally methylated in healthy cells (such as repeat elements (e.g., LINE1 elements, Alu elements, centromere tandem repeats, centromere peri-tandem repeats, and satellite DNA) and intergenic regions) may show reduced methylation in tumor cells. Therefore, in some embodiments, the epigenetic target region group includes a hypomethylated variable target region, wherein the observed reduction in methylation levels indicates that the sample (e.g., cfDNA) contains DNA produced by neoplastic cells (such as tumor cells or cancer cells) increases in likelihood. In an example, a hypomethylated variable target region may include a region where the methylation state of the region in the cancerous tissue is not necessarily different relative to DNA from healthy tissue of the same type, but the methylation of the region is indeed different (e.g., less methylated) relative to typical cfDNA in healthy subjects. For example, when the presence of cancer leads to increased cell death (such as apoptosis of cells corresponding to the tissue type of the cancer), such a hypomethylated variable target region may be used at least in part to detect such cancer. In some embodiments, the hypomethylated variable target region includes one or more genomic regions, wherein the methylation state of cfDNA molecules in these regions in cancer subjects is not different relative to cfDNA from healthy subjects, but the presence/increased amount of hypomethylated cfDNA in these regions indicates a specific tissue type (e.g., cancer origin) and is presented as cfDNA entering the circulation with increased apoptosis (e.g., tumor shedding).

在一些实施方案中,低甲基化可变靶区包括重复元件和/或基因间区域。在一些实施方案中,重复元件包括LINE1元件、Alu元件、着丝粒串联重复序列、着丝粒周围串联重复序列和/或卫星DNA中的一种、两种、三种、四种或五种。In some embodiments, the hypomethylated variable target region comprises a repetitive element and/or an intergenic region. In some embodiments, the repetitive element comprises one, two, three, four or five of a LINE1 element, an Alu element, a centromeric tandem repeat sequence, a pericentromeric tandem repeat sequence and/or a satellite DNA.

显示出癌症相关的低甲基化的示例性特定基因组区域包括人类1号染色体的核苷酸8403565-8953708和151104701-151106035,例如根据hg19或hg38人类基因组构建体。在一些实施方案中,低甲基化可变靶区与这些区域中的一个或两个重叠或者包括这些区域中的一个或两个。Exemplary specific genomic regions showing cancer-associated hypomethylation include nucleotides 8403565-8953708 and 151104701-151106035 of human chromosome 1, for example according to hg19 or hg38 human genome constructs. In some embodiments, the hypomethylated variable target region overlaps with or includes one or both of these regions.

在一些实施方案中,其中从第一和第二分区捕获不同的表观遗传靶区,从第二分区捕获的表观遗传靶区包括低甲基化可变靶区。In some embodiments, wherein different epigenetic target regions are captured from the first and second partitions, the epigenetic target regions captured from the second partition include hypomethylated variable target regions.

iii.CTCF结合区iii. CTCF binding region

CTCF是有助于染色质组织并经常与黏连蛋白共定位的DNA结合蛋白。CTCF结合位点的扰动已在许多不同的癌症中被报道。参见,例如,Katainen等人,Nature Genetics,doi:10.1038/ng.3335,2015年6月8日在线出版;Guo等人,Nat.Commun.9:1520(2018)。CTCF结合导致cfDNA中可识别的模式,其可以通过测序检测,例如通过片段长度分析。例如,关于基于测序的片段长度分析的细节在Snyder等人,Cell 164:57-68(2016);WO 2018/009723和US20170211143A1中提供,其中每一项通过引用并入本文。CTCF is a DNA binding protein that contributes to chromatin organization and often colocalizes with cohesin. Perturbations of CTCF binding sites have been reported in many different cancers. See, for example, Katainen et al., Nature Genetics, doi: 10.1038/ng.3335, published online on June 8, 2015; Guo et al., Nat. Commun. 9: 1520 (2018). CTCF binding results in recognizable patterns in cfDNA, which can be detected by sequencing, such as by fragment length analysis. For example, details about fragment length analysis based on sequencing are provided in Snyder et al., Cell 164: 57-68 (2016); WO 2018/009723 and US20170211143A1, each of which is incorporated herein by reference.

因此,CTCF结合的扰动导致cfDNA片段化模式的变异。因此,CTCF结合位点代表片段化可变靶区的类型。Thus, perturbations in CTCF binding lead to variations in cfDNA fragmentation patterns. CTCF binding sites therefore represent types of variable targets for fragmentation.

有许多已知的CTCF结合位点。参见,例如,CTCFBSDB(CTCF结合位点数据库),在互联网上在insulatordb.uthsc.edu/可获得;Cuddapah等人,Genome Res.19:24-32(2009);Martin等人,Nat.Struct.Mol.Biol.18:708-14(2011);Rhee等人,Cell.147:1408-19(2011),其中每一项通过引用并入。示例性CTCF结合位点位于8号染色体上的核苷酸56014955-56016161处和13号染色体上的核苷酸95359169-95360473处,例如根据hg19或hg38人类基因组构建体。There are many known CTCF binding sites. See, for example, CTCFBSDB (CTCF binding site database), available on the Internet at insulatordb.uthsc.edu/; Cuddapah et al., Genome Res.19:24-32 (2009); Martin et al., Nat.Struct.Mol.Biol.18:708-14 (2011); Rhee et al., Cell.147:1408-19 (2011), each of which is incorporated by reference. Exemplary CTCF binding sites are located at nucleotides 56014955-56016161 on chromosome 8 and nucleotides 95359169-95360473 on chromosome 13, for example according to hg19 or hg38 human genome constructs.

因此,在一些实施方案中,表观遗传靶区组包括CTCF结合区。在一些实施方案中,CTCF结合区包括至少10个、20个、50个、100个、200个或500个CTCF结合区、或者10-20个、20-50个、50-100个、100-200个、200-500个或500-1000个CTCF结合区,例如,诸如上文描述的CTCF结合区,或者CTCFBSDB或上文引用的文章Cuddapah等人、Martin等人或Rhee等人中的一项或更多项中的CTCF结合区。Thus, in some embodiments, the set of epigenetic target regions includes CTCF binding regions. In some embodiments, the CTCF binding regions include at least 10, 20, 50, 100, 200, or 500 CTCF binding regions, or 10-20, 20-50, 50-100, 100-200, 200-500, or 500-1000 CTCF binding regions, for example, such as the CTCF binding regions described above, or the CTCFBSDB or the articles cited above, Cuddapah et al., Martin et al., or Rhee et al. One or more of the CTCF binding regions.

在一些实施方案中,至少一些CTCF位点可以是甲基化或未甲基化的,其中甲基化状态与细胞是否是癌细胞相关。在一些实施方案中,表观遗传靶区组包括CTCF结合位点的上游和/或下游区域至少100bp、至少200bp、至少300bp、至少400bp、至少500bp、至少750bp、至少1000bp。In some embodiments, at least some CTCF sites can be methylated or unmethylated, wherein the methylation state is associated with whether the cell is a cancer cell. In some embodiments, the epigenetic target region group includes at least 100bp, at least 200bp, at least 300bp, at least 400bp, at least 500bp, at least 750bp, at least 1000bp upstream and/or downstream of the CTCF binding site.

iv.转录起始位点iv. Transcription start site

赘生性细胞中转录起始位点也可以显示出扰动。例如,在造血谱系的健康细胞中各转录起始位点处的核小体组织(其在健康个体中对cfDNA有实质性贡献)可以不同于赘生性细胞中这些转录起始位点处的核小体组织。这导致了不同的cfDNA模式,可以通过测序来检测,例如,如在Snyder等人,Cell 164:57-68(2016);WO 2018/009723和US20170211143A1中普遍讨论的。在另一实例中,癌性组织中转录起始位点相对于来自同一类型的健康组织的DNA在表观遗传上不一定不同,但相对于健康受试者中典型的cfDNA在表观遗传上(例如,对于核小体组织)确实不同。例如,当癌症的存在导致细胞死亡(诸如对应于癌症的组织类型的细胞凋亡)增加时,可以至少部分地使用这样的转录起始位点来检测这样的癌症。Transcription start sites in neoplastic cells can also show disturbances.For example, the nucleosome organization at each transcription start site in healthy cells of hematopoietic lineages (which has a substantial contribution to cfDNA in healthy individuals) can be different from the nucleosome organization at these transcription start sites in neoplastic cells.This leads to different cfDNA patterns, which can be detected by sequencing, for example, as in Snyder et al., Cell 164:57-68 (2016); generally discussed in WO 2018/009723 and US20170211143A1.In another example, the transcription start site in cancerous tissue is not necessarily different in epigenetics relative to DNA from healthy tissues of the same type, but is indeed different in epigenetics (for example, for nucleosome organization) relative to typical cfDNA in healthy subjects.For example, when the presence of cancer causes cell death (such as apoptosis corresponding to the tissue type of cancer) to increase, such a transcription start site can be used at least in part to detect such cancer.

因此,转录起始位点的扰动也导致cfDNA片段化模式的变异。因此,转录起始位点也代表片段化可变靶区的类型。Therefore, perturbations in the transcription start site also lead to variations in the fragmentation pattern of cfDNA. Therefore, the transcription start site also represents a type of variable target region for fragmentation.

人类转录起始位点从DBTSS(人类转录起始位点数据库)可获得,在互联网上在dbtss.hgc.jp可获得,并在in Yamashita等人,Nucleic Acids Res.34(数据库期号):D86–D89(2006)中描述,其通过引用并入本文。Human transcription start sites are available from DBTSS (Database of Human Transcription Start Sites), available on the internet at dbtss.hgc.jp, and described in Yamashita et al., Nucleic Acids Res. 34(Database Issue):D86-D89 (2006), which is incorporated herein by reference.

因此,在一些实施方案中,表观遗传靶区组包括转录起始位点。在一些实施方案中,转录起始位点包括至少10个、20个、50个、100个、200个或500个转录起始位点,或10-20个、20-50个、50-100个、100-200个、200-500个或500-1000个转录起始位点,例如,诸如DBTSS中列出的转录起始位点。在一些实施方案中,至少一些转录起始位点可以是甲基化或未甲基化的,其中甲基化状态与细胞是否是癌细胞相关。在一些实施方案中,表观遗传靶区组包括转录起始位点的上游和/或下游区域至少100bp、至少200bp、至少300bp、至少400bp、至少500bp、至少750bp、至少1000bp。Thus, in some embodiments, the epigenetic target region group includes a transcription start site. In some embodiments, the transcription start site includes at least 10, 20, 50, 100, 200, or 500 transcription start sites, or 10-20, 20-50, 50-100, 100-200, 200-500, or 500-1000 transcription start sites, for example, such as the transcription start sites listed in DBTSS. In some embodiments, at least some of the transcription start sites may be methylated or unmethylated, wherein the methylation state is associated with whether the cell is a cancer cell. In some embodiments, the epigenetic target region group includes at least 100bp, at least 200bp, at least 300bp, at least 400bp, at least 500bp, at least 750bp, at least 1000bp upstream and/or downstream of the transcription start site.

v.拷贝数变异;聚焦扩增v. Copy number variation; focused amplification

尽管拷贝数变异诸如聚焦扩增是体细胞突变,但它们可以通过基于读段频率的测序以类似于检测某些表观遗传改变诸如甲基化改变的方法的方式来检测。因此,在癌症中可以显示出拷贝数变异诸如聚焦扩增的区域可以被包括在表观遗传靶区组中,并且可以包括AR、BRAF、CCND1、CCND2、CCNE1、CDK4、CDK6、EGFR、ERBB2、FGFR1、FGFR2、KIT、KRAS、MET、MYC、PDGFRA、PIK3CA和RAF1中的一种或更多种。例如,在一些实施方案中,表观遗传靶区组包括前述靶中的至少2种、3种、4种、5种、6种、7种、8种、9种、10种、11种、12种、13种、14种、15种、16种、17种或18种。Although copy number variations such as focused amplification are somatic mutations, they can be detected by sequencing based on read frequency in a manner similar to methods for detecting certain epigenetic changes such as methylation changes. Therefore, regions that can show copy number variations such as focused amplification in cancer can be included in the epigenetic target region group and can include one or more of AR, BRAF, CCND1, CCND2, CCNE1, CDK4, CDK6, EGFR, ERBB2, FGFR1, FGFR2, KIT, KRAS, MET, MYC, PDGFRA, PIK3CA, and RAF1. For example, in some embodiments, the epigenetic target region group includes at least 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, or 18 of the aforementioned targets.

iv.甲基化对照区iv. Methylation control region

纳入对照区来帮助数据验证可能是有用的。在一些实施方案中,表观遗传靶区组包括对照区,所述对照区预期在基本上所有样品中都是甲基化或未甲基化的,不管DNA源自癌细胞还是正常细胞。在一些实施方案中,表观遗传靶区组包括预期在基本上所有样品中都是低甲基化的对照低甲基化区。在一些实施方案中,表观遗传靶区组包括预期在基本上所有样品中都是高甲基化的对照高甲基化区。Inclusion of control regions to aid in data validation may be useful. In some embodiments, the epigenetic target region set includes control regions that are expected to be methylated or unmethylated in substantially all samples, regardless of whether the DNA is derived from cancer cells or normal cells. In some embodiments, the epigenetic target region set includes control hypomethylated regions that are expected to be hypomethylated in substantially all samples. In some embodiments, the epigenetic target region set includes control hypermethylated regions that are expected to be hypermethylated in substantially all samples.

b.序列可变靶区组b. Sequence variable target group

在一些实施方案中,序列可变靶区组包括已知在癌症中经历体细胞突变(本文称为癌症相关突变)的多于一个区域。因此,方法可以包括确定对应于序列可变靶区组的cfDNA分子是否包含癌症相关突变。In some embodiments, the sequence variable target region group includes more than one region known to undergo somatic mutations in cancer (referred to herein as cancer-associated mutations). Thus, the method can include determining whether a cfDNA molecule corresponding to the sequence variable target region group comprises a cancer-associated mutation.

在一些实施方案中,序列可变靶区组靶向所选择的多于一个不同的基因或基因组区域(“组(panel)”),使得确定比例的患有癌症的受试者在组中的一个或多于一个不同的基因或基因组区域中表现出遗传变异或肿瘤标志物。可以将组选择为将用于测序的区域限定为固定数目的碱基对。可以将组选择为对期望的量的DNA进行测序,例如,通过调节探针的亲和力和/或量,如本文其他地方描述的。还可以将组选择为实现期望的序列读段深度。可以将组选择为对一定量的测序的碱基对实现期望的序列读段深度或序列读段覆盖率。可以将组选择为对检测样品中一种或更多种遗传变异实现理论灵敏度、理论特异性和/或理论准确度。In some embodiments, the set of sequence variable target regions targets more than one different gene or genomic region selected ("panel") so that a determined proportion of subjects with cancer exhibit genetic variation or tumor markers in one or more different genes or genomic regions in the panel. The panel can be selected to limit the region for sequencing to a fixed number of base pairs. The panel can be selected to sequence a desired amount of DNA, for example, by adjusting the affinity and/or amount of the probe, as described elsewhere herein. The panel can also be selected to achieve a desired sequence read depth. The panel can be selected to achieve a desired sequence read depth or sequence read coverage for a certain amount of sequenced base pairs. The panel can be selected to achieve theoretical sensitivity, theoretical specificity, and/or theoretical accuracy for detecting one or more genetic variations in a sample.

用于检测该组区域的探针可以包括用于检测感兴趣的基因组区域(热点区域)的探针以及核小体感知探针(例如,KRAS密码子12和13),并且可以设计成基于分析cfDNA覆盖率和受核小体结合模式影响的片段尺寸变异和GC序列组成来优化捕获。本文使用的区域还可以包括基于核小体位置和GC模式优化的非热点区域。The probes for detecting the set of regions can include probes for detecting genomic regions of interest (hotspot regions) and nucleosome-aware probes (e.g., KRAS codons 12 and 13), and can be designed to optimize capture based on analyzing cfDNA coverage and fragment size variation and GC sequence composition affected by nucleosome binding patterns. The regions used herein can also include non-hotspot regions optimized based on nucleosome positions and GC patterns.

感兴趣的基因组位置列表的实例可见于表3和表4。在一些实施方案中,本公开内容的方法中使用的序列可变靶区组包含表3中的至少5个、至少10个、至少15个、至少20个、至少25个、至少30个、至少35个、至少40个、至少45个、至少50个、至少55个、至少60个、至少65个或70个基因的至少一部分。在一些实施方案中,本公开内容的方法中使用的序列可变靶区组包含表3中的至少5个、至少10个、至少15个、至少20个、至少25个、至少30个、至少35个、至少40个、至少45个、至少50个、至少55个、至少60个、至少65个或70个SNV。在一些实施方案中,本公开内容的方法中使用的序列可变靶区组包含表3中的至少1个、至少2个、至少3个、至少4个、至少5个或6个融合。在一些实施方案中,本公开内容的方法中使用的序列可变靶区组包含表3中的至少1个、至少2个或3个插入/缺失中的至少一部分。在一些实施方案中,本公开内容的方法中使用的序列可变靶区组包含表4中的至少5个、至少10个、至少15个、至少20个、至少25个、至少30个、至少35个、至少40个、至少45个、至少50个、至少55个、至少60个、至少65个、至少70个或73个基因的至少一部分。在一些实施方案中,本公开内容的方法中使用的序列可变靶区组包含表4中的至少5个、至少10个、至少15个、至少20个、至少25个、至少30个、至少35个、至少40个、至少45个、至少50个、至少55个、至少60个、至少65个、至少70个或73个SNV。在一些实施方案中,本公开内容的方法中使用的序列可变靶区组包含表4中的至少1个、至少2个、至少3个、至少4个、至少5个或6个融合。在一些实施方案中,本公开内容的方法中使用的序列可变靶区组包含表4中的至少1个、至少2个、至少3个、至少4个、至少5个、至少6个、至少7个、至少8个、至少9个、至少10个、至少11个、至少12个、至少13个、至少14个、至少15个、至少16个、至少17个或18个插入/缺失中的至少一部分。这些感兴趣的基因组位置中的每一个可以被鉴定为特定组的骨架区域或热点区域。感兴趣的热点基因组位置列表的实例可见于表5。表5中的坐标是基于hg19人类基因组组装,但本领域技术人员将熟悉其他组装,并且可以鉴定对应于他们选择的组装中指示的外显子、内含子、密码子等的坐标组。在一些实施方案中,本公开内容的方法中使用的序列可变靶区组包含表5中的至少1个、至少2个、至少3个、至少4个、至少5个、至少6个、至少7个、至少8个、至少9个、至少10个、至少11个、至少12个、至少13个、至少14个、至少15个、至少16个、至少17个、至少18个、至少19个或至少20个基因的至少一部分。每个热点基因组区域都列出了几个特征,包括相关基因、其所在的染色体、基因组的代表基因的基因座的起始和终止位置、以碱基对计的基因座长度、基因覆盖的外显子和感兴趣的特定基因组区域可以寻求捕获的关键特征(例如,突变类型)。Examples of lists of genomic locations of interest can be found in Tables 3 and 4. In some embodiments, the set of sequence variable target regions used in the methods of the present disclosure comprises at least a portion of at least 5, at least 10, at least 15, at least 20, at least 25, at least 30, at least 35, at least 40, at least 45, at least 50, at least 55, at least 60, at least 65, or 70 genes in Table 3. In some embodiments, the set of sequence variable target regions used in the methods of the present disclosure comprises at least 5, at least 10, at least 15, at least 20, at least 25, at least 30, at least 35, at least 40, at least 45, at least 50, at least 55, at least 60, at least 65, or 70 SNVs in Table 3. In some embodiments, the set of sequence variable target regions used in the methods of the present disclosure comprises at least 1, at least 2, at least 3, at least 4, at least 5, or 6 fusions in Table 3. In some embodiments, the set of sequence variable target regions used in the methods of the present disclosure comprises at least a portion of at least 1, at least 2, or 3 indels in Table 3. In some embodiments, the set of sequence variable target regions used in the methods of the present disclosure comprises at least a portion of at least 5, at least 10, at least 15, at least 20, at least 25, at least 30, at least 35, at least 40, at least 45, at least 50, at least 55, at least 60, at least 65, at least 70, or 73 genes in Table 4. In some embodiments, the set of sequence variable target regions used in the methods of the present disclosure comprises at least 5, at least 10, at least 15, at least 20, at least 25, at least 30, at least 35, at least 40, at least 45, at least 50, at least 55, at least 60, at least 65, at least 70, or 73 SNVs in Table 4. In some embodiments, the sequence variable target region group used in the methods of the present disclosure comprises at least 1, at least 2, at least 3, at least 4, at least 5, or 6 fusions in Table 4. In some embodiments, the sequence variable target region group used in the methods of the present disclosure comprises at least a portion of at least 1, at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 11, at least 12, at least 13, at least 14, at least 15, at least 16, at least 17, or 18 insertions/deletions in Table 4. Each of these genomic positions of interest can be identified as a backbone region or hotspot region of a particular group. An example of a list of hotspot genomic positions of interest can be found in Table 5. The coordinates in Table 5 are based on the hg19 human genome assembly, but those skilled in the art will be familiar with other assemblies and can identify coordinate groups corresponding to exons, introns, codons, etc. indicated in their selected assembly. In some embodiments, the set of sequence variable target regions used in the methods of the present disclosure comprises at least a portion of at least 1, at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 11, at least 12, at least 13, at least 14, at least 15, at least 16, at least 17, at least 18, at least 19, or at least 20 genes in Table 5. Several features are listed for each hotspot genomic region, including the associated gene, the chromosome on which it is located, the start and end positions of the locus of the representative gene in the genome, the length of the locus in base pairs, the exons covered by the gene, and key features (e.g., mutation types) that the specific genomic region of interest can seek to capture.

表3Table 3

表4Table 4

表5Table 5

另外或可选地,从文献中可获得合适的靶区组。例如,Gale等人,PLoS One 13:e0194630(2018),其通过引用并入本文,描述了一组35个癌症相关基因靶,可用作序列可变靶区组的一部分或全部。这35个靶是AKT1、ALK、BRAF、CCND1、CDK2A、CTNNB1、EGFR、ERBB2、ESR1、FGFR1、FGFR2、FGFR3、FOXL2、GATA3、GNA11、GNAQ、GNAS、HRAS、IDH1、IDH2、KIT、KRAS、MED12、MET、MYC、NFE2L2、NRAS、PDGFRA、PIK3CA、PPP2R1A、PTEN、RET、STK11、TP53和U2AF1。Additionally or alternatively, suitable target groups can be obtained from the literature. For example, Gale et al., PLoS One 13: e0194630 (2018), which is incorporated herein by reference, describes a group of 35 cancer-related gene targets that can be used as part or all of a sequence variable target group. These 35 targets are AKT1, ALK, BRAF, CCND1, CDK2A, CTNNB1, EGFR, ERBB2, ESR1, FGFR1, FGFR2, FGFR3, FOXL2, GATA3, GNA11, GNAQ, GNAS, HRAS, IDH1, IDH2, KIT, KRAS, MED12, MET, MYC, NFE2L2, NRAS, PDGFRA, PIK3CA, PPP2R1A, PTEN, RET, STK11, TP53, and U2AF1.

在一些实施方案中,序列可变靶区组包含来自至少10个、20个、30个或35个癌症相关基因,诸如以上列出的癌症相关基因的靶区。In some embodiments, the set of sequence variable target regions comprises target regions from at least 10, 20, 30, or 35 cancer-associated genes, such as those listed above.

在一些实施方案中,序列可变靶区组具有至少50kbp(例如至少100kbp、至少200kbp、至少300kbp或至少400kbp)的足迹。在一些实施方案中,序列可变靶区组具有100-2000kbp(例如100-200kbp、200-300kbp、300-400kbp、400-500kbp、500-600kbp、600-700kbp、700-800kbp、800-900kbp、900-1,000kbp、1-1.5Mbp或1.5-2Mbp)范围内的足迹。在一些实施方案中,序列可变靶区组具有至少2Mbp的足迹。In some embodiments, the set of sequence variable target regions has a footprint of at least 50 kbp (e.g., at least 100 kbp, at least 200 kbp, at least 300 kbp, or at least 400 kbp). In some embodiments, the set of sequence variable target regions has a footprint in the range of 100-2000 kbp (e.g., 100-200 kbp, 200-300 kbp, 300-400 kbp, 400-500 kbp, 500-600 kbp, 600-700 kbp, 700-800 kbp, 800-900 kbp, 900-1,000 kbp, 1-1.5 Mbp, or 1.5-2 Mbp). In some embodiments, the set of sequence variable target regions has a footprint of at least 2 Mbp.

c.靶特异性探针的集合c. Collection of target-specific probes

在一些实施方案中,在本文描述的方法中使用靶特异性探针的集合。在一些实施方案中,靶特异性探针的集合包括对序列可变靶区组特异性的靶结合探针和对表观遗传靶区组特异性的靶结合探针。在一些实施方案中,对序列可变靶区组特异性的靶结合探针的捕获产量比对表观遗传靶区组特异性的靶结合探针的捕获产量高(例如,高至少2倍)。在一些实施方案中,靶特异性探针的集合被配置为具有比其对表观遗传靶区特异性的捕获产量高(例如,高至少2倍)的对序列可变靶区特异性的捕获产量。In some embodiments, a collection of target-specific probes is used in the methods described herein. In some embodiments, the collection of target-specific probes includes target binding probes specific to a sequence variable target region group and target binding probes specific to an epigenetic target region group. In some embodiments, the capture yield of the target binding probes specific to the sequence variable target region group is higher than the capture yield of the target binding probes specific to the epigenetic target region group (e.g., at least 2 times higher). In some embodiments, the collection of target-specific probes is configured to have a capture yield specific to the sequence variable target region that is higher than its capture yield specific to the epigenetic target region (e.g., at least 2 times higher).

在一些实施方案中,对序列可变靶区组特异性的靶结合探针的捕获产量比对表观遗传靶区组特异性的靶结合探针的捕获产量高至少1.25倍、1.5倍、1.75倍、2倍、2.25倍、2.5倍、2.75倍、3倍、3.5倍、4倍、4.5倍、5倍、6倍、7倍、8倍、9倍、10倍、11倍、12倍、13倍、14倍或15倍。在一些实施方案中,对序列可变靶区组特异性的靶结合探针的捕获产量比对表观遗传靶区组特异性的靶结合探针的捕获产量高1.25倍至1.5倍、1.5倍至1.75倍、1.75倍至2倍、2倍至2.25倍、2.25倍至2.5倍、2.5倍至2.75倍、2.75倍至3倍、3倍至3.5倍、3.5倍至4倍、4倍至4.5倍、4.5倍至5倍、5倍至5.5倍、5.5倍至6倍、6倍至7倍、7倍至8倍、8倍至9倍、9倍至10倍、10倍至11倍、11倍至12倍、13倍至14倍或14倍至15倍。In some embodiments, the capture yield of a target binding probe specific for a set of sequence variable target regions is at least 1.25-fold, 1.5-fold, 1.75-fold, 2-fold, 2.25-fold, 2.5-fold, 2.75-fold, 3-fold, 3.5-fold, 4-fold, 4.5-fold, 5-fold, 6-fold, 7-fold, 8-fold, 9-fold, 10-fold, 11-fold, 12-fold, 13-fold, 14-fold, or 15-fold greater than the capture yield of a target binding probe specific for a set of epigenetic target regions. In some embodiments, the capture yield of a target binding probe specific for a set of sequence variable target regions is 1.25-fold to 1.5-fold, 1.5-fold to 1.75-fold, 1.75-fold to 2-fold, 2-fold to 2.25-fold, 2.25-fold to 2.5-fold, 2.5-fold to 2.75-fold, 2.75-fold to 3-fold, 3-fold to 3.5-fold, 3.5-fold to 4-fold, 4-fold to 4.5-fold, 4.5-fold to 5-fold, 5-fold to 5.5-fold, 5.5-fold to 6-fold, 6-fold to 7-fold, 7-fold to 8-fold, 8-fold to 9-fold, 9-fold to 10-fold, 10-fold to 11-fold, 11-fold to 12-fold, 13-fold to 14-fold, or 14-fold to 15-fold higher than the capture yield of a target binding probe specific for a set of epigenetic target regions.

在一些实施方案中,靶特异性探针的集合被配置为具有比其对表观遗传靶区组的捕获产量高至少1.25倍、1.5倍、1.75倍、2倍、2.25倍、2.5倍、2.75倍、3倍、3.5倍、4倍、4.5倍、5倍、6倍、7倍、8倍、9倍、10倍、11倍、12倍、13倍、14倍或15倍的对序列可变靶区组特异性的捕获产量。在一些实施方案中,靶特异性探针的集合被配置为具有比其对表观遗传靶区组特异性的捕获产量高1.25倍至1.5倍、1.5倍至1.75倍、1.75倍至2倍、2倍至2.25倍、2.25倍至2.5倍、2.5倍至2.75倍、2.75倍至3倍、3倍至3.5倍、3.5倍至4倍、4倍至4.5倍、4.5倍至5倍、5倍至5.5倍、5.5倍至6倍、6倍至7倍、7倍至8倍、8倍至9倍、9倍至10倍、10倍至11倍、11倍至12倍、13倍至14倍或14倍至15倍的对序列可变靶区组特异性的捕获产量。In some embodiments, the collection of target-specific probes is configured to have a capture yield specific for a set of sequence variable target regions that is at least 1.25-fold, 1.5-fold, 1.75-fold, 2-fold, 2.25-fold, 2.5-fold, 2.75-fold, 3-fold, 3.5-fold, 4-fold, 4.5-fold, 5-fold, 6-fold, 7-fold, 8-fold, 9-fold, 10-fold, 11-fold, 12-fold, 13-fold, 14-fold, or 15-fold higher than its capture yield for a set of epigenetic target regions. In some embodiments, a collection of target-specific probes is configured to have a capture yield specific for a set of sequence variable target regions that is 1.25-fold to 1.5-fold, 1.5-fold to 1.75-fold, 1.75-fold to 2-fold, 2-fold to 2.25-fold, 2.25-fold to 2.5-fold, 2.5-fold to 2.75-fold, 2.75-fold to 3-fold, 3-fold to 3.5-fold, 3.5-fold to 4-fold, 4-fold to 4.5-fold, 4.5-fold to 5-fold, 5-fold to 5.5-fold, 5.5-fold to 6-fold, 6-fold to 7-fold, 7-fold to 8-fold, 8-fold to 9-fold, 9-fold to 10-fold, 10-fold to 11-fold, 11-fold to 12-fold, 13-fold to 14-fold, or 14-fold to 15-fold higher than its capture yield specific for a set of epigenetic target regions.

探针的集合可以被配置为以各种方式(包括浓度、不同的长度和/或化学(例如,影响亲和力的化学)及其组合)提供对序列可变靶区组的更高的捕获产量。亲和力可以通过调节探针长度和/或包括核苷酸修饰来调节,如以下讨论的。The collection of probes can be configured to provide higher capture yields for sets of sequence-variable targets in a variety of ways, including concentration, different lengths, and/or chemistries (e.g., chemistries that affect affinity), and combinations thereof. Affinity can be adjusted by adjusting probe length and/or including nucleotide modifications, as discussed below.

在一些实施方案中,对序列可变靶区组特异性的靶特异性探针以比对表观遗传靶区组特异性的靶特异性探针更高的浓度存在。在一些实施方案中,对序列可变靶区组特异性的靶结合探针的浓度比对表观遗传靶区组特异性的靶结合探针的浓度高至少1.25倍、1.5倍、1.75倍、2倍、2.25倍、2.5倍、2.75倍、3倍、3.5倍、4倍、4.5倍、5倍、6倍、7倍、8倍、9倍、10倍、11倍、12倍、13倍、14倍或15倍。在一些实施方案中,对序列可变靶区组特异性的靶结合探针的浓度比对表观遗传靶区组特异性的靶结合探针的浓度高1.25倍至1.5倍、1.5倍至1.75倍、1.75倍至2倍、2倍至2.25倍、2.25倍至2.5倍、2.5倍至2.75倍、2.75倍至3倍、3倍至3.5倍、3.5倍至4倍、4倍至4.5倍、4.5倍至5倍、5倍至5.5倍、5.5倍至6倍、6倍至7倍、7倍至8倍、8倍至9倍、9倍至10倍、10倍至11倍、11倍至12倍、13倍至14倍或14倍至15倍。在这样的实施方案中,浓度可以指每组中单独探针的平均质量/体积浓度。In some embodiments, target-specific probes specific for the set of sequence variable target regions are present at a higher concentration than target-specific probes specific for the set of epigenetic target regions. In some embodiments, the concentration of target binding probes specific for the set of sequence variable target regions is at least 1.25-fold, 1.5-fold, 1.75-fold, 2-fold, 2.25-fold, 2.5-fold, 2.75-fold, 3-fold, 3.5-fold, 4-fold, 4.5-fold, 5-fold, 6-fold, 7-fold, 8-fold, 9-fold, 10-fold, 11-fold, 12-fold, 13-fold, 14-fold, or 15-fold higher than the concentration of target binding probes specific for the set of epigenetic target regions. In some embodiments, the concentration of target binding probes specific for the set of sequence variable target regions is 1.25 to 1.5 times, 1.5 to 1.75 times, 1.75 to 2 times, 2 to 2.25 times, 2.25 to 2.5 times, 2.5 to 2.75 times, 2.75 to 3 times, 3 to 3.5 times, 3.5 to 4 times, 4 to 4.5 times, 4.5 to 5 times, 5 to 5.5 times, 5.5 to 6 times, 6 to 7 times, 7 to 8 times, 8 to 9 times, 9 to 10 times, 10 to 11 times, 11 to 12 times, 13 to 14 times, or 14 to 15 times higher than the concentration of target binding probes specific for the set of epigenetic target regions. In such embodiments, the concentration can refer to the average mass/volume concentration of the individual probes in each set.

在一些实施方案中,与对表观遗传靶区组特异性的靶特异性探针相比,对序列可变靶区组特异性的靶特异性探针对其靶具有更高的亲和力。亲和力可以以本领域技术人员已知的任何方式调节,包括通过使用不同的探针化学。例如,某些核苷酸修饰,诸如胞嘧啶5-甲基化(在某些序列的情况下)、在2’糖位置提供杂原子的修饰、和LNA核苷酸,可以增加双链核酸的稳定性,这意味着具有这样的修饰的寡核苷酸对其互补序列具有相对更高的亲和力。参见,例如,Severin等人,Nucleic Acids Res.39:8740–8751(2011);Freier等人,Nucleic Acids Res.25:4429–4443(1997);美国专利第9,738,894号。此外,较长的序列长度通常将提供增加的亲和力。其他核苷酸修饰,诸如核碱基次黄嘌呤对鸟嘌呤的取代,通过减少寡核苷酸与其互补序列之间的氢键的量来减少亲和力。在一些实施方案中,对序列可变靶区组特异性的靶特异性探针具有增加其对其靶的亲和力的修饰。在一些实施方案中,可选地或另外地,对表观遗传靶区组特异性的靶特异性探针具有降低其对其靶的亲和力的修饰。在一些实施方案中,对序列可变靶区组特异性的靶特异性探针比对表观遗传靶区组特异性的靶特异性探针具有更长的平均长度和/或更高的平均解链温度。这些实施方案可以相互组合和/或具有如以上讨论的浓度差异,以实现捕获产量的期望的倍数差异,诸如以上描述的任何倍数差异或其范围。In some embodiments, a target-specific probe specific for a sequence variable target region group has a higher affinity for its target than a target-specific probe specific for an epigenetic target region group. Affinity can be adjusted in any manner known to those skilled in the art, including by using different probe chemistries. For example, certain nucleotide modifications, such as cytosine 5-methylation (in the case of certain sequences), modifications providing heteroatoms at the 2' sugar position, and LNA nucleotides, can increase the stability of double-stranded nucleic acids, which means that oligonucleotides with such modifications have a relatively higher affinity for their complementary sequences. See, for example, Severin et al., Nucleic Acids Res. 39: 8740–8751 (2011); Freier et al., Nucleic Acids Res. 25: 4429–4443 (1997); U.S. Patent No. 9,738,894. In addition, longer sequence lengths will generally provide increased affinity. Other nucleotide modifications, such as substitution of the nucleobase hypoxanthine for guanine, reduce affinity by reducing the amount of hydrogen bonding between the oligonucleotide and its complementary sequence. In some embodiments, a target-specific probe specific for a sequence variable target region group has a modification that increases its affinity for its target. In some embodiments, alternatively or additionally, a target-specific probe specific for an epigenetic target region group has a modification that reduces its affinity for its target. In some embodiments, a target-specific probe specific for a sequence variable target region group has a longer average length and/or a higher average melting temperature than a target-specific probe specific for an epigenetic target region group. These embodiments can be combined with each other and/or have concentration differences as discussed above to achieve a desired multiple difference in capture yield, such as any multiple difference described above or a range thereof.

在一些实施方案中,靶特异性探针包含捕获部分。捕获部分可以是本文描述的任何捕获部分,例如生物素。在一些实施方案中,将靶特异性探针连接到固体支持物,例如,共价地或非共价地,诸如通过捕获部分的结合对的相互作用。在一些实施方案中,固体支持物是珠,诸如磁珠。In some embodiments, the target-specific probe comprises a capture moiety. The capture moiety can be any capture moiety described herein, such as biotin. In some embodiments, the target-specific probe is connected to a solid support, for example, covalently or non-covalently, such as by the interaction of a binding pair of the capture moiety. In some embodiments, the solid support is a bead, such as a magnetic bead.

在一些实施方案中,对序列可变靶区组特异性的靶特异性探针和/或对表观遗传靶区组特异性的靶特异性探针是如以上讨论的诱饵组,例如,包含捕获部分和被选择为平铺跨越一组区域(诸如基因)的序列的探针。In some embodiments, target-specific probes specific for a set of sequence-variable target regions and/or target-specific probes specific for a set of epigenetic target regions are bait sets as discussed above, e.g., probes comprising capture moieties and sequences selected to tile across a set of regions, such as genes.

在一些实施方案中,靶特异性探针以单一组合物提供。单一组合物可以是溶液(液体或冷冻的)。可选地,单一组合物可以是冻干物。In some embodiments, the target-specific probes are provided in a single composition. The single composition can be a solution (liquid or frozen). Alternatively, the single composition can be a lyophilisate.

可选地,靶特异性探针可以作为多于一种组合物提供,例如,包括第一组合物和第二组合物,所述第一组合物包含对表观遗传靶区组特异性的探针,所述第二组合物包含对序列可变靶区组特异性的探针。这些探针可以以适当的比例混合,以提供在浓度和/或捕获产量上具有任何前述倍数差异的组合的探针组合物。可选地,它们可以在单独的捕获程序中使用(例如,用于样品的等分试样或依次用于同一样品),以提供分别包含捕获的表观遗传靶区和捕获的序列可变靶区的第一组合物和第二组合物。Alternatively, target-specific probes can be provided as more than one composition, for example, including a first composition comprising probes specific to a set of epigenetic target regions and a second composition comprising probes specific to a set of sequence variable target regions. These probes can be mixed in appropriate proportions to provide probe compositions having combinations of any of the aforementioned fold differences in concentration and/or capture yield. Alternatively, they can be used in separate capture procedures (e.g., for aliquots of a sample or sequentially for the same sample) to provide a first composition and a second composition comprising a captured epigenetic target region and a captured sequence variable target region, respectively.

ii.对表观遗传靶区特异性的探针ii. Probes specific for epigenetic target regions

针对表观遗传靶区组的探针可以包括对一个或更多个类型的靶区特异性的探针,所述靶区能够区分来自赘生性(例如,肿瘤或癌症)细胞与来自健康细胞(例如,非赘生性循环细胞)的DNA。在本文(例如,在以上关于捕获组的章节中)详细讨论了这样的区域的示例性类型。针对表观遗传靶区组的探针还可以包括针对一个或更多个对照区的探针,例如,如本文描述的。The probes for the epigenetic target region group can include probes specific to one or more types of target regions that can distinguish DNA from neoplastic (e.g., tumor or cancer) cells from healthy cells (e.g., non-neoplastic circulating cells). Exemplary types of such regions are discussed in detail herein (e.g., in the above section on capture groups). The probes for the epigenetic target region group can also include probes for one or more control regions, for example, as described herein.

在一些实施方案中,表观遗传靶区探针组的探针具有至少100kbp(例如至少200kbp、至少300kbp、或至少400kbp)的足迹。在一些实施方案中,表观遗传靶区组具有100kbp-20Mbp(例如100-200kbp、200-300kbp、300-400kbp、400-500kbp、500-600kbp、600-700kbp、700-800kbp、800-900kbp、900-1,000kbp、1-1.5Mbp、1.5-2Mbp、2-3Mbp、3-4Mbp、4-5Mbp、5-6Mbp、6-7Mbp、7-8Mbp、8-9Mbp、9-10Mbp或10-20Mbp)范围内的足迹。在一些实施方案中,表观遗传靶区组具有至少20Mbp的足迹。In some embodiments, the probes of the epigenetic target region probe set have a footprint of at least 100 kbp (e.g., at least 200 kbp, at least 300 kbp, or at least 400 kbp). In some embodiments, the epigenetic target region set has a footprint in the range of 100 kbp-20 Mbp (e.g., 100-200 kbp, 200-300 kbp, 300-400 kbp, 400-500 kbp, 500-600 kbp, 600-700 kbp, 700-800 kbp, 800-900 kbp, 900-1,000 kbp, 1-1.5 Mbp, 1.5-2 Mbp, 2-3 Mbp, 3-4 Mbp, 4-5 Mbp, 5-6 Mbp, 6-7 Mbp, 7-8 Mbp, 8-9 Mbp, 9-10 Mbp, or 10-20 Mbp). In some embodiments, the set of epigenetic target regions has a footprint of at least 20 Mbp.

a.高甲基化可变靶区a. Hypermethylated variable target regions

在一些实施方案中,针对表观遗传靶区组的探针包括对一个或更多个高甲基化可变靶区特异性的探针。高甲基化可变靶区在本文中也可称为高甲基化DMR(差异性甲基化的区域)。高甲基化可变靶区可以是上文列出的任何高甲基化可变靶区。例如,在一些实施方案中,对高甲基化可变靶区特异性的探针包括对表1中列出的多于一个基因座(例如,表1中列出的基因座中的至少10%、20%、30%、40%、50%、60%、70%、80%、90%或100%)特异性的探针。在一些实施方案中,对高甲基化可变靶区特异性的探针包括对表2中列出的多于一个基因座(例如,表2中列出的基因座中的至少10%、20%、30%、40%、50%、60%、70%、80%、90%或100%)特异性的探针。在一些实施方案中,对高甲基化可变靶区特异性的探针包括对表1或表2中列出的多于一个基因座(例如,表1或表2中列出的基因座中的至少10%、20%、30%、40%、50%、60%、70%、80%、90%或100%)特异性的探针。在一些实施方案中,对于作为靶区被包括在内的每个基因座,可以有一个或更多个探针,所述探针具有在该基因的转录起始位点和终止密码子(对于选择性剪接的基因为最后的终止密码子)之间结合的杂交位点。在一些实施方案中,一个或更多个探针在所列位置的300bp内(例如,在200bp或100bp内)结合。在一些实施方案中,探针具有与以上列出的位置重叠的杂交位点。在一些实施方案中,对高甲基化靶区特异性的探针包括对高甲基化靶区的一个、两个、三个、四个或五个亚组特异性的探针,其集体地显示出在乳腺癌、结肠癌、肾癌、肝癌和肺癌中的一种、两种、三种、四种或五种中的高甲基化。In some embodiments, probes for the epigenetic target region group include probes specific to one or more hypermethylated variable target regions. Hypermethylated variable target regions may also be referred to herein as hypermethylated DMRs (differentially methylated regions). Hypermethylated variable target regions may be any hypermethylated variable target regions listed above. For example, in some embodiments, probes specific to hypermethylated variable target regions include probes specific to more than one locus listed in Table 1 (e.g., at least 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90% or 100% of the loci listed in Table 1). In some embodiments, probes specific to hypermethylated variable target regions include probes specific to more than one locus listed in Table 2 (e.g., at least 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90% or 100% of the loci listed in Table 2). In some embodiments, probes specific for hypermethylated variable target regions include probes specific for more than one locus listed in Table 1 or Table 2 (e.g., at least 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, or 100% of the loci listed in Table 1 or Table 2). In some embodiments, for each locus included as a target region, there can be one or more probes having a hybridization site that binds between the transcription start site and the stop codon (the last stop codon for alternatively spliced genes) of the gene. In some embodiments, one or more probes bind within 300 bp (e.g., within 200 bp or 100 bp) of the listed positions. In some embodiments, the probes have hybridization sites that overlap with the positions listed above. In some embodiments, probes specific to hypermethylated target regions include probes specific to one, two, three, four, or five subsets of hypermethylated target regions, which collectively show hypermethylation in one, two, three, four, or five of breast cancer, colon cancer, kidney cancer, liver cancer, and lung cancer.

b.低甲基化可变靶区b. Hypomethylated variable target regions

在一些实施方案中,针对表观遗传靶区组的探针包括对一个或更多个低甲基化可变靶区特异性的探针。低甲基化可变靶区在本文中也可称为低甲基化DMR(差异性甲基化的区域)。低甲基化可变靶区可以是上文列出的任何低甲基化可变靶区。例如,对一个或更多个低甲基化可变靶区特异性的探针可以包括针对以下区域的探针:诸如重复元件(例如,LINE1元件、Alu元件、着丝粒串联重复序列、着丝粒周围串联重复序列和卫星DNA)和基因间区域,这些区域在健康细胞中通常被甲基化,在肿瘤细胞中可能显示出减少的甲基化。In some embodiments, probes for the epigenetic target region group include probes specific to one or more hypomethylated variable target regions. Hypomethylated variable target regions may also be referred to herein as hypomethylated DMRs (regions of differential methylation). Hypomethylated variable target regions may be any of the hypomethylated variable target regions listed above. For example, probes specific to one or more hypomethylated variable target regions may include probes for the following regions: such as repetitive elements (e.g., LINE1 elements, Alu elements, centromere tandem repeats, centromere peri-tandem repeats, and satellite DNA) and intergenic regions, which are typically methylated in healthy cells and may show reduced methylation in tumor cells.

在一些实施方案中,对低甲基化可变靶区特异性的探针包括对重复元件和/或基因间区域特异性的探针。在一些实施方案中,对重复元件特异性的探针包括对LINE1元件、Alu元件、着丝粒串联重复序列、着丝粒周围串联重复序列和/或卫星DNA中的一种、两种、三种、四种或五种特异性的探针。In some embodiments, probes specific for hypomethylated variable target regions include probes specific for repetitive elements and/or intergenic regions. In some embodiments, probes specific for repetitive elements include probes specific for one, two, three, four, or five of LINE1 elements, Alu elements, centromeric tandem repeats, pericentromeric tandem repeats, and/or satellite DNA.

对显示出癌症相关低甲基化的基因组区域特异性的示例性探针包括对人类1号染色体的核苷酸8403565-8953708和/或151104701-151106035特异性的探针。在一些实施方案中,对低甲基化可变靶区特异性的探针包括对与人类1号染色体核苷酸8403565-8953708和/或151104701-151106035重叠或包含其的区域特异性的探针。Exemplary probes specific for genomic regions showing cancer-associated hypomethylation include probes specific for nucleotides 8403565-8953708 and/or 151104701-151106035 of human chromosome 1. In some embodiments, probes specific for hypomethylated variable target regions include probes specific for regions overlapping or containing nucleotides 8403565-8953708 and/or 151104701-151106035 of human chromosome 1.

c.CTCF结合区c.CTCF binding region

在一些实施方案中,针对表观遗传靶区组的探针包括对CTCF结合区特异性的探针。在一些实施方案中,对CTCF结合区特异性的探针包括对至少10个、20个、50个、100个、200个或500个CTCF结合区、或者10-20个、20-50个、50-100个、100-200个、200-500个或500-1000个CTCF结合区特异性的探针,例如,诸如上文描述的CTCF结合区,或者CTCFBSDB或上文引用的文章Cuddapah等人、Martin等人或Rhee等人中的一项或更多项中的CTCF结合区。在一些实施方案中,针对表观遗传靶区组的探针包含CTCF结合位点的上游和下游区域至少100bp、至少200bp、至少300bp、至少400bp、至少500bp、至少750bp或至少1000bp。In some embodiments, probes for the epigenetic target group include probes specific to CTCF binding regions. In some embodiments, probes specific to CTCF binding regions include probes specific to at least 10, 20, 50, 100, 200, or 500 CTCF binding regions, or 10-20, 20-50, 50-100, 100-200, 200-500, or 500-1000 CTCF binding regions, for example, such as the CTCF binding regions described above, or CTCFBSDB or the articles cited above Cuddapah et al., Martin et al., or Rhee et al. One or more of the CTCF binding regions. In some embodiments, probes for the epigenetic target group include at least 100 bp, at least 200 bp, at least 300 bp, at least 400 bp, at least 500 bp, at least 750 bp, or at least 1000 bp upstream and downstream of the CTCF binding site.

d.转录起始位点d. Transcription start site

在一些实施方案中,针对表观遗传靶区组的探针包括对转录起始位点特异性的探针。在一些实施方案中,对转录起始位点特异性的探针包括对至少10个、20个、50个、100个、200个或500个转录起始位点、或者10-20个、20-50个、50-100个、100-200个、200-500个或500-1000个转录起始位点,例如,诸如DBTSS中列出的转录起始位点特异性的探针。在一些实施方案中,针对表观遗传靶区组的探针包括针对转录起始位点上游和下游至少100bp、至少200bp、至少300bp、至少400bp、至少500bp、至少750bp或至少1000bp的序列的探针。In some embodiments, probes for the epigenetic target region group include probes specific for transcription start sites. In some embodiments, probes specific for transcription start sites include probes for at least 10, 20, 50, 100, 200, or 500 transcription start sites, or 10-20, 20-50, 50-100, 100-200, 200-500, or 500-1000 transcription start sites, for example, probes specific for transcription start sites such as those listed in DBTSS. In some embodiments, probes for the epigenetic target region group include probes for sequences at least 100 bp, at least 200 bp, at least 300 bp, at least 400 bp, at least 500 bp, at least 750 bp, or at least 1000 bp upstream and downstream of the transcription start site.

e.聚焦扩增e. Focused Amplification

如以上提到的,尽管聚焦扩增是体细胞突变,但它们可以通过基于读段频率的测序以类似于检测某些表观遗传改变诸如甲基化改变的方法的方式来检测。因此,癌症中可能显示出聚焦扩增的区域可以包括在表观遗传靶区组中,如以上讨论的。在一些实施方案中,对表观遗传靶区组特异性的探针包括对聚焦扩增特异性的探针。在一些实施方案中,对聚焦扩增特异性的探针包括对AR、BRAF、CCND1、CCND2、CCNE1、CDK4、CDK6、EGFR、ERBB2、FGFR1、FGFR2、KIT、KRAS、MET、MYC、PDGFRA、PIK3CA和RAF1中的一种或更多种特异性的探针。例如,在一些实施方案中,对聚焦扩增特异性的探针包括对前述靶中至少2种、3种、4种、5种、6种、7种、8种、9种、10种、11种、12种、13种、14种、15种、16种、17种或18种中的一种或更多种特异性的探针。As mentioned above, although focal amplifications are somatic mutations, they can be detected by sequencing based on read frequency in a manner similar to methods for detecting certain epigenetic changes such as methylation changes. Therefore, regions in cancer that may show focal amplification can be included in the epigenetic target group, as discussed above. In some embodiments, probes specific to the epigenetic target group include probes specific for focal amplification. In some embodiments, probes specific for focal amplification include probes specific for one or more of AR, BRAF, CCND1, CCND2, CCNE1, CDK4, CDK6, EGFR, ERBB2, FGFR1, FGFR2, KIT, KRAS, MET, MYC, PDGFRA, PIK3CA, and RAF1. For example, in some embodiments, probes specific for focused amplification include probes specific for one or more of at least 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, or 18 of the aforementioned targets.

f.对照区f. Control area

纳入对照区来帮助数据验证是有用的。在一些实施方案中,对表观遗传靶区组特异性的探针包括对对照甲基化区特异性的探针,所述对照甲基化区预期在基本上所有样品中都是甲基化的。在一些实施方案中,对表观遗传靶区组特异性的探针包括对对照低甲基化区特异性的探针,所述对照低甲基化区预期在基本上所有样品中都是低甲基化的。It is useful to include control regions to aid in data validation. In some embodiments, probes specific to a panel of epigenetic target regions include probes specific to control methylated regions that are expected to be methylated in substantially all samples. In some embodiments, probes specific to a panel of epigenetic target regions include probes specific to control hypomethylated regions that are expected to be hypomethylated in substantially all samples.

ii.对序列可变靶区特异性的探针ii. Probes specific for sequence variable target regions

针对序列可变靶区组的探针可以包括对癌症中已知经历体细胞突变的多于一个区域特异性的探针。探针可以是对本文描述的任何序列可变靶区组特异性的。在本文(例如,在上文关于捕获组的章节中)详细讨论了示例性序列可变靶区组。Probes for sequence variable target groups can include probes specific to more than one region known to undergo somatic mutations in cancer. Probes can be specific to any sequence variable target group described herein. Exemplary sequence variable target groups are discussed in detail herein (e.g., in the section above on capture groups).

在一些实施方案中,序列可变靶区探针组具有至少0.5kb(例如,至少1kb、至少2kb、至少5kb、至少10kb、至少20kb、至少30kb或至少40kb)的足迹。在一些实施方案中,表观遗传靶区探针组具有0.5-100kb(例如,0.5-2kb、2-10kb、10-20kb、20-30kb、30-40kb、40-50kb、50-60kb、60-70kb、70-80kb、80-90kb和90-100kb)范围内的足迹。在一些实施方案中,序列可变靶区探针组具有至少50kbp(例如,至少100kbp、至少200kbp、至少300kbp或至少400kbp)的足迹。在一些实施方案中,序列可变靶区探针组具有100-2000kbp(例如100-200kbp、200-300kbp、300-400kbp、400-500kbp、500-600kbp、600-700kbp、700-800kbp、800-900kbp、900-1,000kbp、1-1.5Mbp或1.5-2Mbp)范围内的足迹。在一些实施方案中,序列可变靶区探针组具有至少2Mbp的足迹。In some embodiments, the sequence variable target probe group has a footprint of at least 0.5 kb (e.g., at least 1 kb, at least 2 kb, at least 5 kb, at least 10 kb, at least 20 kb, at least 30 kb, or at least 40 kb). In some embodiments, the epigenetic target probe group has a footprint in the range of 0.5-100 kb (e.g., 0.5-2 kb, 2-10 kb, 10-20 kb, 20-30 kb, 30-40 kb, 40-50 kb, 50-60 kb, 60-70 kb, 70-80 kb, 80-90 kb, and 90-100 kb). In some embodiments, the sequence variable target probe group has a footprint of at least 50 kbp (e.g., at least 100 kbp, at least 200 kbp, at least 300 kbp, or at least 400 kbp). In some embodiments, the sequence variable target probe set has a footprint in the range of 100-2000 kbp (e.g., 100-200 kbp, 200-300 kbp, 300-400 kbp, 400-500 kbp, 500-600 kbp, 600-700 kbp, 700-800 kbp, 800-900 kbp, 900-1,000 kbp, 1-1.5 Mbp, or 1.5-2 Mbp). In some embodiments, the sequence variable target probe set has a footprint of at least 2 Mbp.

在一些实施方案中,对序列可变靶区组特异性的探针包括对表3中至少5个、至少10个、至少15个、至少20个、至少25个、至少30个、至少35个、至少40个、至少45个、至少50个、至少55个、至少60个、至少65个或70个基因的至少一部分特异性的探针。在一些实施方案中,对序列可变靶区组特异性的探针包括对表3中至少5个、至少10个、至少15个、至少20个、至少25个、至少30个、至少35个、至少40个、至少45个、至少50个、至少55个、至少60个、至少65个或70个SNV特异性的探针。在一些实施方案中,对序列可变靶区组特异性的探针包括对表3中至少1个、至少2个、至少3个、至少4个、至少5个或6个融合特异性的探针。在一些实施方案中,对序列可变靶区组特异性的探针包括对表3中至少1个、至少2个或3个插入/缺失的至少一部分特异性的探针。在一些实施方案中,对序列可变靶区组特异性的探针包括对表4中至少5个、至少10个、至少15个、至少20个、至少25个、至少30个、至少35个、至少40个、至少45个、至少50个、至少55个、至少60个、至少65个、至少70个或73个基因的至少一部分特异性的探针。在一些实施方案中,对序列可变靶区组特异性的探针包括对表4中至少5个、至少10个、至少15个、至少20个、至少25个、至少30个、至少35个、至少40个、至少45个、至少50个、至少55个、至少60个、至少65个、至少70个或73个SNV特异性的探针。在一些实施方案中,对序列可变靶区组特异性的探针包括对表4中至少1个、至少2个、至少3个、至少4个、至少5个或6个融合特异性的探针。在一些实施方案中,对序列可变靶区组特异性的探针包括对表4中至少1个、至少2个、至少3个、至少4个、至少5个、至少6个、至少7个、至少8个、至少9个、至少10个、至少11个、至少12个、至少13个、至少14个、至少15个、至少16个、至少17个或18个插入/缺失的至少一部分特异性的探针。在一些实施方案中,对序列可变靶区组特异性的探针包括对表5中至少1个、至少2个、至少3个、至少4个、至少5个、至少6个、至少7个、至少8个、至少9个、至少10个、至少11个、至少12个、至少13个、至少14个、至少15个、至少16个、至少17个、至少18个、至少19个或至少20个基因的至少一部分特异性的探针。In some embodiments, probes specific for the set of sequence variable target regions include probes specific for at least a portion of at least 5, at least 10, at least 15, at least 20, at least 25, at least 30, at least 35, at least 40, at least 45, at least 50, at least 55, at least 60, at least 65, or 70 genes in Table 3. In some embodiments, probes specific for the set of sequence variable target regions include probes specific for at least 5, at least 10, at least 15, at least 20, at least 25, at least 30, at least 35, at least 40, at least 45, at least 50, at least 55, at least 60, at least 65, or 70 SNVs in Table 3. In some embodiments, probes specific for the set of sequence variable target regions include probes specific for at least 1, at least 2, at least 3, at least 4, at least 5, or 6 fusions in Table 3. In some embodiments, probes specific for the set of sequence variable target regions include probes specific for at least a portion of at least 1, at least 2, or 3 indels in Table 3. In some embodiments, probes specific for the set of sequence variable target regions include probes specific for at least a portion of at least 5, at least 10, at least 15, at least 20, at least 25, at least 30, at least 35, at least 40, at least 45, at least 50, at least 55, at least 60, at least 65, at least 70, or 73 genes in Table 4. In some embodiments, probes specific for the set of sequence variable target regions include probes specific for at least 5, at least 10, at least 15, at least 20, at least 25, at least 30, at least 35, at least 40, at least 45, at least 50, at least 55, at least 60, at least 65, at least 70, or 73 SNVs in Table 4. In some embodiments, probes specific for the set of sequence variable target regions include probes specific for at least 1, at least 2, at least 3, at least 4, at least 5, or 6 fusions in Table 4. In some embodiments, probes specific for the set of sequence variable target regions include probes specific for at least a portion of at least 1, at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 11, at least 12, at least 13, at least 14, at least 15, at least 16, at least 17, or 18 indels in Table 4. In some embodiments, probes specific for a set of sequence variable target regions include probes specific for at least a portion of at least 1, at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 11, at least 12, at least 13, at least 14, at least 15, at least 16, at least 17, at least 18, at least 19, or at least 20 of the genes in Table 5.

在一些实施方案中,对序列可变靶区组特异性的探针包括对来自至少10个、20个、30个或35个癌症相关基因的靶区特异性的探针,所述癌症相关基因诸如AKT1、ALK、BRAF、CCND1、CDK2A、CTNNB1、EGFR、ERBB2、ESR1、FGFR1、FGFR2、FGFR3、FOXL2、GATA3、GNA11、GNAQ、GNAS、HRAS、IDH1、IDH2、KIT、KRAS、MED12、MET、MYC、NFE2L2、NRAS、PDGFRA、PIK3CA、PPP2R1A、PTEN、RET、STK11、TP53和U2AF1。In some embodiments, probes specific for a set of sequence variable target regions include probes specific for targets from at least 10, 20, 30, or 35 cancer-related genes, such as AKT1, ALK, BRAF, CCND1, CDK2A, CTNNB1, EGFR, ERBB2, ESR1, FGFR1, FGFR2, FGFR3, FOXL2, GATA3, GNA11, GNAQ, GNAS, HRAS, IDH1, IDH2, KIT, KRAS, MED12, MET, MYC, NFE2L2, NRAS, PDGFRA, PIK3CA, PPP2R1A, PTEN, RET, STK11, TP53, and U2AF1.

F.测序F. Sequencing

通常在预先扩增或未预先扩增的情况下对样品核酸(任选地侧翼为衔接子)进行测序。任选地利用的测序方法或商业上可得的形式包括,例如,Sanger测序、高通量测序、焦磷酸测序、合成测序、单分子测序、基于纳米孔的测序、半导体测序、连接测序、杂交测序、RNA-Seq(Illumina)、数字基因表达(Helicos)、下一代测序(NGS)、单分子合成测序(SMSS)(Helicos)、大规模并行测序、克隆单分子阵列(Solexa)、鸟枪法测序、Ion Torrent、Oxford纳米孔、Roche Genia、Maxim-Gilbert测序、引物步移、使用PacBio、SOLiD、Ion Torrent或纳米孔平台测序。测序反应可以在各种样品处理单元中进行,样品处理单元可以包括基本上同时处理多于一个样品组的多道、多通道、多孔或其他装置。样品处理单元还可以包括多个样品室,以便能够同时处理多个运行。Sample nucleic acid (optionally flanked by adapters) is usually sequenced in advance or without pre-amplification. The sequencing method or commercially available form optionally utilized includes, for example, Sanger sequencing, high-throughput sequencing, pyrophosphate sequencing, synthetic sequencing, single-molecule sequencing, sequencing based on nanopores, semiconductor sequencing, connection sequencing, hybridization sequencing, RNA-Seq (Illumina), digital gene expression (Helicos), next generation sequencing (NGS), single-molecule synthetic sequencing (SMSS) (Helicos), large-scale parallel sequencing, cloned single-molecule array (Solexa), shotgun sequencing, Ion Torrent, Oxford nanopore, Roche Genia, Maxim-Gilbert sequencing, primer walking, sequencing using PacBio, SOLiD, Ion Torrent or nanopore platform. Sequencing reaction can be carried out in various sample processing units, and the sample processing unit can include multiple channels, multichannels, multiporous or other devices that substantially process more than one sample group at the same time. The sample processing unit can also include multiple sample chambers so that multiple operations can be processed simultaneously.

在一些实施方案中,对包含捕获的靶区组的文库进行测序步骤,靶区组可以包含本文描述的任何靶区组。在一些实施方案中,对包含未经历捕获/富集的分区(例如,全基因组样品)的文库进行测序步骤。例如,可以从第一分区和第二样品捕获靶区,并且然后对其进行测序;或者可以从第一分区捕获靶区,并在诸如接触和标记步骤的处理之后与第二分区组合;或者可以从第二分区捕获靶区,并在诸如接触和标记步骤的处理之后与第一分区组合;或者可以处理和组合第一和第二分区两者而不进行捕获/富集。In some embodiments, a library comprising a captured target region set is subjected to a sequencing step, and the target region set may comprise any target region set described herein. In some embodiments, a library comprising a partition (e.g., a whole genome sample) that has not undergone capture/enrichment is subjected to a sequencing step. For example, a target region may be captured from a first partition and a second sample and then sequenced; or a target region may be captured from a first partition and combined with a second partition after treatment such as a contact and labeling step; or a target region may be captured from a second partition and combined with a first partition after treatment such as a contact and labeling step; or both the first and second partitions may be processed and combined without capture/enrichment.

可以对一种或更多种包含癌症或其他疾病的标志物的核酸片段类型或区域进行测序反应。也可以对样品中存在的任何核酸片段进行测序反应。可以对基因组的至少约5%、10%、15%、20%、25%、30%、40%、50%、60%、70%、80%、90%、95%、99%、99.9%或100%进行测序反应。在其他情况下,可以对基因组的少于约5%、10%、15%、20%、25%、30%、40%、50%、60%、70%、80%、90%、95%、99%、99.9%或100%进行测序反应。序列覆盖可以在基因组的至少5%、10%、20%、70%、100%,至少200个或500个不同基因,或多达5000个、2500个、1000个、500个或100个不同基因上进行。Sequencing reaction can be carried out to one or more nucleic acid fragment types or regions comprising the marker of cancer or other diseases. Sequencing reaction can also be carried out to any nucleic acid fragment present in the sample. Sequencing reaction can be carried out to at least about 5%, 10%, 15%, 20%, 25%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, 95%, 99%, 99.9% or 100% of a genome. In other cases, sequencing reaction can be carried out to less than about 5%, 10%, 15%, 20%, 25%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, 95%, 99%, 99.9% or 100% of a genome. Sequence coverage can be performed on at least 5%, 10%, 20%, 70%, 100% of the genome, at least 200 or 500 different genes, or up to 5000, 2500, 1000, 500 or 100 different genes.

可以使用多重测序技术进行同时测序反应。在一些实施方案中,用至少约1000个、2000个、3000个、4000个、5000个、6000个、7000个、8000个、9000个、10000个、50000个或100,000个测序反应对无细胞多核苷酸进行测序。在其他实施方案中,用少于约1000个、2000个、3000个、4000个、5000个、6000个、7000个、8000个、9000个、10000个、50000个或100,000个测序反应对无细胞多核苷酸进行测序。测序反应通常顺序性地进行或同时进行。随后的数据分析通常对全部或部分的测序反应进行。在一些实施方案中,对至少约1000个、2000个、3000个、4000个、5000个、6000个、7000个、8000个、9000个、10000个、50000个或100,000个测序反应进行数据分析。在其他实施方案中,对少于约1000个、2000个、3000个、4000个、5000个、6000个、7000个、8000个、9000个、10000个、50000个或100,000个测序反应进行数据分析。读段深度的一种实例是每个基因座(例如,碱基位置)约1000个至约50000个读段。读段深度的另一实例具有每个基因座(例如,碱基位置)至少50000个读段。Multiple sequencing techniques can be used to perform simultaneous sequencing reactions. In some embodiments, the cell-free polynucleotides are sequenced with at least about 1000, 2000, 3000, 4000, 5000, 6000, 7000, 8000, 9000, 10000, 50000 or 100,000 sequencing reactions. In other embodiments, the cell-free polynucleotides are sequenced with less than about 1000, 2000, 3000, 4000, 5000, 6000, 7000, 8000, 9000, 10000, 50000 or 100,000 sequencing reactions. Sequencing reactions are typically performed sequentially or simultaneously. Subsequent data analysis is typically performed on all or part of the sequencing reactions. In some embodiments, data analysis is performed on at least about 1000, 2000, 3000, 4000, 5000, 6000, 7000, 8000, 9000, 10000, 50000 or 100,000 sequencing reactions. In other embodiments, data analysis is performed on less than about 1000, 2000, 3000, 4000, 5000, 6000, 7000, 8000, 9000, 10000, 50000 or 100,000 sequencing reactions. An example of read depth is about 1000 to about 50000 reads per locus (e.g., base position). Another example of read depth has at least 50000 reads per locus (e.g., base position).

1.差异性测序深度1. Differential sequencing depth

在一些实施方案中,将对应于序列可变靶区组的核酸测序到比对应于表观遗传靶区组的核酸更大的测序深度。例如,对应于序列变异靶区组的核酸的测序深度可以是对应于表观遗传靶区组的核酸的测序深度的至少1.25倍、1.5倍、1.75倍、2倍、2.25倍、2.5倍、2.75倍、3倍、3.5倍、4倍、4.5倍、5倍、6倍、7倍、8倍、9倍、10倍、11倍、12倍、13倍、14倍、或15倍大、或者1.25倍至1.5倍、1.5倍至1.75倍、1.75倍至2倍、2倍至2.25倍、2.25倍至2.5倍、2.5倍至2.75倍、2.75倍至3倍、3倍至3.5倍、3.5倍至4倍、4倍至4.5倍、4.5倍至5倍、5倍至5.5倍、5.5倍至6倍、6倍至7倍、7倍至8倍、8倍至9倍、9倍至10倍、10倍至11倍、11倍至12倍、13倍至14倍、14倍至15倍或15倍至100倍大。在一些实施方案中,所述测序深度是至少2倍大。在一些实施方案中,所述测序深度是至少5倍大。在一些实施方案中,所述测序深度是至少10倍大。在一些实施方案中,所述测序深度是4倍至10倍大。在一些实施方案中,所述测序深度是4倍至100倍大。这些实施方案中的每一种都涉及将对应于序列可变靶区组的核酸测序到比对应于表观遗传靶区组的核酸更大的测序深度的程度。In some embodiments, nucleic acids corresponding to the set of sequence variable target regions are sequenced to a greater sequencing depth than nucleic acids corresponding to the set of epigenetic target regions. For example, the sequencing depth of nucleic acids corresponding to the set of sequence variable target regions can be at least 1.25 times, 1.5 times, 1.75 times, 2 times, 2.25 times, 2.5 times, 2.75 times, 3 times, 3.5 times, 4 times, 4.5 times, 5 times, 6 times, 7 times, 8 times, 9 times, 10 times, 11 times, 12 times, 13 times, 14 times, or 15 times greater, or 1.25 times to 1.5 times, 1.5 times to 1.7 ... In some embodiments, the sequencing depth is at least 2 times greater. In some embodiments, the sequencing depth is at least 5 times greater. In some embodiments, the sequencing depth is at least 10 times greater. In some embodiments, the sequencing depth is 4 times to 10 times greater. In some embodiments, the sequencing depth is 4 times to 10 times greater. In some embodiments, the sequencing depth is 4 times to 10 times greater. Each of these embodiments involves sequencing nucleic acids corresponding to a set of sequence variable target regions to a greater degree of sequencing depth than nucleic acids corresponding to a set of epigenetic target regions.

在一些实施方案中,对应于序列可变靶区组的捕获的DNA和对应于表观遗传靶区组的捕获的DNA被同时测序,例如在同一测序池(诸如Illumina测序仪的流动池)中和/或在同一组合物中;所述同一组合物可以是由重新组合单独捕获的组产生的汇集的组合物,或者通过在同一容器中捕获对应于序列可变靶区组的cfDNA和对应于表观遗传靶区组的捕获的DNA而获得的组合物。In some embodiments, captured DNA corresponding to a set of sequence variable target regions and captured DNA corresponding to a set of epigenetic target regions are sequenced simultaneously, for example, in the same sequencing pool (such as a flow cell of an Illumina sequencer) and/or in the same composition; the same composition can be a pooled composition generated by recombining individually captured groups, or a composition obtained by capturing cfDNA corresponding to a set of sequence variable target regions and captured DNA corresponding to a set of epigenetic target regions in the same container.

G.某些方法的另外的特征G. Additional Features of Certain Methods

a.使样品或分区经历不同地影响DNA中的第一核碱基和DNA中的第二核碱基的程序a. Subjecting a sample or partition to a procedure that differentially affects a first nucleobase in the DNA and a second nucleobase in the DNA

本文公开的方法可以包括使样品或第一分区经历不同地影响第一分区的DNA中的第一核碱基和DNA中的第二核碱基的程序的步骤,其中第一核碱基是修饰或未修饰的核碱基,第二核碱基是不同于第一核碱基的修饰或未修饰的核碱基,并且第一核碱基和第二核碱基具有相同的碱基配对特异性(例如,当根据本文其他地方描述的任何实施方案使第二分区与MSRE接触时)。在一些实施方案中,如果第一核碱基是修饰或未修饰的腺嘌呤,则第二核碱基是修饰或未修饰的腺嘌呤;如果第一核碱基是修饰或未修饰的胞嘧啶,则第二核碱基是修饰或未修饰的胞嘧啶;如果第一核碱基是修饰或未修饰的鸟嘌呤,则第二核碱基是修饰或未修饰的鸟嘌呤;如果第一核碱基是修饰或未修饰的胸腺嘧啶,则第二核碱基是修饰或未修饰的胸腺嘧啶(其中为了本步骤的目的,修饰和未修饰的尿嘧啶包括在修饰的胸腺嘧啶中)。这样的程序可以被用于鉴定分区中具有或缺乏某些修饰(诸如甲基化)的核苷酸。The methods disclosed herein may include subjecting a sample or a first partition to a procedure that differentially affects a first nucleobase in the DNA and a second nucleobase in the DNA of the first partition, wherein the first nucleobase is a modified or unmodified nucleobase, the second nucleobase is a modified or unmodified nucleobase different from the first nucleobase, and the first nucleobase and the second nucleobase have the same base pairing specificity (e.g., when the second partition is contacted with an MSRE according to any embodiment described elsewhere herein). In some embodiments, if the first nucleobase is a modified or unmodified adenine, the second nucleobase is a modified or unmodified adenine; if the first nucleobase is a modified or unmodified cytosine, the second nucleobase is a modified or unmodified cytosine; if the first nucleobase is a modified or unmodified guanine, the second nucleobase is a modified or unmodified guanine; if the first nucleobase is a modified or unmodified thymine, the second nucleobase is a modified or unmodified thymine (wherein for the purpose of this step, modified and unmodified uracil are included in modified thymine). Such a procedure can be used to identify nucleotides with or lacking certain modifications (such as methylation) in a partition.

在一些实施方案中,第一核碱基是修饰或未修饰的胞嘧啶,然后第二核碱基是修饰或未修饰的胞嘧啶。例如,第一核碱基可以包括未修饰的胞嘧啶(C),并且第二核碱基可以包括5-甲基胞嘧啶(mC)和5-羟甲基胞嘧啶(hmC)中的一种或更多种。可选地,第二核碱基可以包括C,并且第一核碱基可以包括mC和hmC中的一种或更多种。其它组合也是可能的,如例如在以上概述和以下讨论中指示的,诸如其中第一核碱基和第二核碱基中的一种包括mC并且另一种包括hmC。In some embodiments, the first core base is a modified or unmodified cytosine, and then the second core base is a modified or unmodified cytosine. For example, the first core base may include unmodified cytosine (C), and the second core base may include one or more of 5-methylcytosine (mC) and 5-hydroxymethylcytosine (hmC). Alternatively, the second core base may include C, and the first core base may include one or more of mC and hmC. Other combinations are also possible, as indicated in the above overview and the following discussion, such as wherein one of the first core base and the second core base includes mC and another includes hmC.

在一些实施方案中,不同地影响第一分区的DNA中的第一核碱基和DNA中的第二核碱基的程序包括亚硫酸氢盐转化。亚硫酸氢盐处理将未修饰的胞嘧啶和某些修饰的胞嘧啶核苷酸(例如5-甲酰基胞嘧啶(fC)或5-羧基胞嘧啶(caC))转化为尿嘧啶,而其他修饰的胞嘧啶(例如,5-甲基胞嘧啶和5-羟甲基胞嘧啶)不被转化。因此,在使用亚硫酸氢盐转化的情况下,第一核碱基包括未修饰的胞嘧啶、5-甲酰基胞嘧啶、5-羧基胞嘧啶或其他受亚硫酸氢盐影响的胞嘧啶形式中的一种或更多种,并且第二核碱基可以包括mC和hmC中的一种或更多种,诸如mC和任选地hmC。对亚硫酸氢盐处理的DNA的测序将读取为胞嘧啶的位置鉴定为mC位置或hmC位置。同时,读取为T的位置被鉴定为T或亚硫酸氢盐易感形式的C,诸如未修饰的胞嘧啶、5-甲酰基胞嘧啶或5-羧基胞嘧啶。因此,如本文描述的对第一分区进行亚硫酸氢盐转化有助于使用从第一分区获得的序列读段鉴定含有mC或hmC的位置。对于亚硫酸氢盐转化的示例性描述,参见,例如,Moss等人,Nat Commun.2018;9:5068。In some embodiments, the procedure of the first core base in the DNA of the first partition and the second core base in the DNA includes bisulfite conversion. Unmodified cytosine and certain modified cytosine nucleotides (e.g., 5-formylcytosine (fC) or 5-carboxylcytosine (caC)) are converted into uracil by bisulfite treatment, while other modified cytosines (e.g., 5-methylcytosine and 5-hydroxymethylcytosine) are not converted. Therefore, in the case of using bisulfite conversion, the first core base includes one or more of unmodified cytosine, 5-formylcytosine, 5-carboxylcytosine or other cytosine forms affected by bisulfite, and the second core base can include one or more of mC and hmC, such as mC and optionally hmC. The sequencing of the DNA treated with bisulfite identifies the position read as cytosine as an mC position or an hmC position. At the same time, positions that read as T are identified as T or bisulfite susceptible forms of C, such as unmodified cytosine, 5-formylcytosine, or 5-carboxylcytosine. Thus, bisulfite conversion of the first partition as described herein facilitates identification of positions containing mC or hmC using sequence reads obtained from the first partition. For an exemplary description of bisulfite conversion, see, e.g., Moss et al., Nat Commun. 2018; 9: 5068.

在一些实施方案中,不同地影响第一分区的DNA中的第一核碱基和DNA中的第二核碱基的程序包括氧化亚硫酸氢盐(Ox-BS)转化。该程序首先将hmC转化为对亚硫酸氢盐易感的fC,随后进行亚硫酸氢盐转化。因此,当使用氧化亚硫酸氢盐转化时,第一核碱基包括未修饰的胞嘧啶、fC、caC、hmC或其他受亚硫酸氢盐影响的胞嘧啶形式中的一种或更多种,并且第二核碱基包括mC。对Ox-BS转化的DNA的测序将读取为胞嘧啶的位置鉴定为mC位置。同时,读取为T的位置被鉴定为T、hmC或亚硫酸氢盐易感形式的C,诸如未修饰的胞嘧啶、fC或hmC。因此,如本文描述的对第一分区进行Ox-BS转化有助于使用从第一分区获得的序列读段鉴定含有mC的位置。对于氧化亚硫酸氢盐转化的示例性描述,参见,例如,Booth等人,Science 2012;336:934-937。In some embodiments, the program that differently affects the first nucleobase in the DNA of the first partition and the second nucleobase in the DNA includes oxidative bisulfite (Ox-BS) conversion. The program first converts hmC into fC susceptible to bisulfite, and then performs bisulfite conversion. Therefore, when using oxidative bisulfite conversion, the first nucleobase includes one or more of unmodified cytosine, fC, caC, hmC or other cytosine forms affected by bisulfite, and the second nucleobase includes mC. The sequencing of the DNA converted by Ox-BS identifies the position read as cytosine as the mC position. At the same time, the position read as T is identified as T, hmC or bisulfite susceptible form of C, such as unmodified cytosine, fC or hmC. Therefore, as described herein, the first partition is converted by Ox-BS to help identify the position containing mC using the sequence reads obtained from the first partition. For an exemplary description of oxidative bisulfite conversion, see, eg, Booth et al., Science 2012; 336:934-937.

在一些实施方案中,不同地影响第一分区的DNA中的第一核碱基和DNA中的第二核碱基的程序包括TET辅助的亚硫酸氢盐(TAB)转化。在TAB转化中,hmC被保护不被转化,并且mC在亚硫酸氢盐处理之前被氧化,因此最初由mC占据的位置被转化为U,而最初由hmC占据的位置保持为胞嘧啶的受保护形式。例如,如Yu等人,Cell 2012;149:1368-80中描述的,β-葡糖基转移酶可以用于保护hmC(形成5-葡糖基羟甲基胞嘧啶(ghmC)),然后TET蛋白诸如mTet1可以用于将mC转化为caC,并且然后亚硫酸氢盐处理可以用于将C和caC转化为U,而ghmC保持不受影响。因此,当使用TAB转化时,第一核碱基包括未修饰的胞嘧啶、fC、caC、mC或其他受亚硫酸氢盐影响的胞嘧啶形式中的一种或更多种,并且第二核碱基包括hmC。对TAB转化的DNA的测序将读取为胞嘧啶的位置鉴定为hmC位置。同时,读取为T的位置被识别为T、mC或亚硫酸氢盐易感形式的C,诸如未修饰的胞嘧啶、fC或caC。因此,如本文描述的对第一分区进行TAB转化有助于使用从第一分区获得的序列读段鉴定含有hmC的位置。In some embodiments, the procedure that differently affects the first nucleobase in the DNA of the first partition and the second nucleobase in the DNA includes TET-assisted bisulfite (TAB) conversion. In TAB conversion, hmC is protected from conversion, and mC is oxidized before bisulfite treatment, so the position initially occupied by mC is converted to U, while the position initially occupied by hmC remains as a protected form of cytosine. For example, as described in Yu et al., Cell 2012; 149: 1368-80, β-glucosyltransferase can be used to protect hmC (forming 5-glucosylhydroxymethylcytosine (ghmC)), then TET proteins such as mTet1 can be used to convert mC to caC, and then bisulfite treatment can be used to convert C and caC to U, while ghmC remains unaffected. Thus, when TAB conversion is used, the first nucleobase includes one or more of unmodified cytosine, fC, caC, mC, or other bisulfite-affected cytosine forms, and the second nucleobase includes hmC. Sequencing of the TAB-converted DNA identifies positions that read as cytosine as hmC positions. At the same time, positions that read as T are identified as T, mC, or a bisulfite-susceptible form of C, such as unmodified cytosine, fC, or caC. Thus, TAB conversion of the first partition as described herein facilitates identification of positions containing hmC using sequence reads obtained from the first partition.

在一些实施方案中,不同地影响第一分区的DNA中的第一核碱基和DNA中的第二核碱基的程序包括Tet辅助的取代硼烷还原剂转化,任选地其中取代硼烷还原剂是2-甲基吡啶硼烷、吡啶硼烷、叔丁基胺硼烷或氨硼烷。在Tet辅助的取代硼烷还原剂转化中,使用TET蛋白将mC和hmC转化为caC,而不影响未修饰的C。然后,通过用2-甲基吡啶硼烷(pic-硼烷)或其他取代硼烷还原剂(诸如吡啶硼烷、叔丁基胺硼烷或氨硼烷)处理,将caC和fC(如果存在的话)转化为二氢尿嘧啶(DHU),也不影响未修饰的C。参见,例如,Liu等人,NatureBiotechnology 2019;37:424–429(例如,在补充图1和补充说明7)。在测序中DHU被读取为T。因此,当使用这种类型的转化时,第一核碱基包括mC、fC、caC或hmC中的一种或更多种,并且第二核碱基包括未修饰的胞嘧啶。对转化的DNA的测序将读取为胞嘧啶的位置鉴定为未修饰的C位置。同时,读取为T的位置被鉴定为T、mC、fC、caC或hmC。因此,如本文描述的对第一分区进行TAP转化有助于使用从第一分区获得的序列读段鉴定含有未修饰的C的位置。该程序包括Tet-辅助的吡啶硼烷测序(TAPS),在以下中进一步详细描述:Liu等人2019,同上。In some embodiments, the procedure that differently affects the first nucleobase in the DNA of the first partition and the second nucleobase in the DNA includes a Tet-assisted substituted borane reducing agent conversion, optionally wherein the substituted borane reducing agent is 2-methylpyridine borane, pyridine borane, tert-butylamine borane or ammonia borane. In the Tet-assisted substituted borane reducing agent conversion, mC and hmC are converted to caC using TET protein without affecting unmodified C. Then, caC and fC (if present) are converted to dihydrouracil (DHU) by treatment with 2-methylpyridine borane (pic-borane) or other substituted borane reducing agents (such as pyridine borane, tert-butylamine borane or ammonia borane), without affecting unmodified C. See, e.g., Liu et al., Nature Biotechnology 2019; 37: 424–429 (e.g., in Supplementary Figure 1 and Supplementary Note 7). DHU is read as T in sequencing. Therefore, when this type of conversion is used, the first nucleobase includes one or more of mC, fC, caC or hmC, and the second nucleobase includes unmodified cytosine. The sequencing of the converted DNA identifies the position read as cytosine as an unmodified C position. At the same time, the position read as T is identified as T, mC, fC, caC or hmC. Therefore, TAP conversion of the first partition as described herein helps to identify the position containing unmodified C using the sequence reads obtained from the first partition. The program includes Tet-assisted pyridine borane sequencing (TAPS), which is described in further detail below: Liu et al. 2019, supra.

可选地,对hmC的保护(例如,使用βGT)可以与Tet辅助的取代硼烷还原剂转化组合。hmC可以如以上提到的通过使用βGT的葡糖基化形成ghmC得到保护。用TET蛋白诸如mTet1处理,然后将mC转化为caC但不转化C或ghmC。然后通过用pic-硼烷或其他取代硼烷还原剂(诸如吡啶硼烷、叔丁基胺硼烷或氨硼烷)处理将caC转化为DHU,也不影响未修饰的C或ghmC。因此,当使用Tet辅助的取代硼烷还原剂转化时,第一核碱基包括mC,并且第二核碱基包括未修饰胞嘧啶或hmC中的一种或更多种,诸如未修饰的胞嘧啶和任选地hmC、fC和/或caC。对转化的DNA的测序将读取为胞嘧啶的位置鉴定为hmC位置或未修饰的C位置。同时,读取为T的位置被鉴定为T、fC、caC或mC。因此,如本文描述的对第一分区进行TAPSβ转化有助于使用从第一分区获得的序列读段在一方面区分含有未修饰的C或hmC的位置与含有mC的位置。对于这种类型的转化的示例性描述,参见,例如,Liu等人,Nature Biotechnology2019;37:424–429。Alternatively, protection of hmC (e.g., using βGT) can be combined with Tet-assisted substituted borane reducing agent conversion. hmC can be protected by glucosylation using βGT to form ghmC as mentioned above. Treated with a TET protein such as mTet1, mC is then converted to caC but not C or ghmC. CaC is then converted to DHU by treatment with pic-borane or other substituted borane reducing agents (such as pyridine borane, tert-butylamine borane or ammonia borane), without affecting unmodified C or ghmC. Therefore, when converted using a Tet-assisted substituted borane reducing agent, the first nucleobase includes mC, and the second nucleobase includes one or more of unmodified cytosine or hmC, such as unmodified cytosine and optionally hmC, fC and/or caC. Sequencing of the converted DNA identifies the position read as cytosine as a hmC position or an unmodified C position. At the same time, the position read as T is identified as T, fC, caC or mC. Thus, performing TAPSβ conversion on the first partition as described herein facilitates using sequence reads obtained from the first partition to distinguish positions containing unmodified C or hmC from positions containing mC on the one hand. For an exemplary description of this type of conversion, see, e.g., Liu et al., Nature Biotechnology 2019; 37: 424–429.

在一些实施方案中,不同地影响第一分区的DNA中的第一核碱基和DNA中的第二核碱基的程序包括化学辅助的取代硼烷还原剂转化,任选地其中取代硼烷还原剂是2-甲基吡啶硼烷、吡啶硼烷、叔丁基胺硼烷或氨硼烷。在化学辅助的取代硼烷还原剂转化中,氧化剂诸如高钌酸钾(KRuO4)(也适用于ox-BS转化)用于特异性地将hmC氧化为fC。用pic-硼烷或其他取代硼烷还原剂(诸如吡啶硼烷、叔丁基胺硼烷或氨硼烷)处理将fC和caC转化为DHU,但不影响mC或未修饰的C。因此,当使用这种类型的转化时,第一核碱基包括hmC、fC和caC中的一种或更多种,并且第二核碱基包括未修饰的胞嘧啶或mC中的一种或更多种,诸如未修饰的胞嘧啶和任选地mC。对转化的DNA的测序将读取为胞嘧啶的位置鉴定为mC位置或未修饰的C位置。同时,读取为T的位置被鉴定为T、fC、caC或hmC。因此,如本文描述的对第一分区进行这种类型的转化有助于使用从第一分区获得的序列读段在一方面区分含有未修饰的C或mC的位置与含有hmC的位置。对于这种类型的转化的示例性描述,参见,例如,Liu等人,Nature Biotechnology 2019;37:424–429。In some embodiments, the procedure that differentially affects the first nucleobase in the DNA of the first partition and the second nucleobase in the DNA comprises a chemically assisted substituted borane reducing agent conversion, optionally wherein the substituted borane reducing agent is 2-methylpyridine borane, pyridine borane, tert-butylamine borane or ammonia borane. In a chemically assisted substituted borane reducing agent conversion, an oxidant such as potassium perruthenate (KRuO4) (also suitable for ox-BS conversion) is used to specifically oxidize hmC to fC. Treatment with pic-borane or other substituted borane reducing agents (such as pyridine borane, tert-butylamine borane or ammonia borane) converts fC and caC to DHU, but does not affect mC or unmodified C. Thus, when this type of conversion is used, the first nucleobase comprises one or more of hmC, fC and caC, and the second nucleobase comprises one or more of unmodified cytosine or mC, such as unmodified cytosine and optionally mC. Sequencing of the converted DNA identifies positions that read as cytosine as mC positions or unmodified C positions. At the same time, positions that read as T are identified as T, fC, caC, or hmC. Thus, performing this type of conversion on the first partition as described herein facilitates using the sequence reads obtained from the first partition to distinguish positions containing unmodified C or mC from positions containing hmC on the one hand. For an exemplary description of this type of conversion, see, e.g., Liu et al., Nature Biotechnology 2019; 37: 424–429.

在一些实施方案中,不同地影响第一分区的DNA中的第一核碱基和DNA中的第二核碱基的程序包括APOBEC偶联的表观遗传(ACE)转化。在ACE转化中,AID/APOBEC家族DNA脱氨基酶诸如APOBEC3A(A3A)用于使未修饰的胞嘧啶和mC脱氨基,而不使hmC、fC或caC脱氨基。因此,当使用ACE转化时,第一核碱基包括未修饰的C和/或mC(例如,未修饰的C和任选地mC),并且第二核碱基包括hmC。对ACE转化的DNA的测序将读取为胞嘧啶的位置鉴定为hmC位置、fC位置或caC位置。同时,读取为T的位置被鉴定为T、未修饰的C或mC。因此,如本文描述的对第一分区进行ACE转化有助于使用从第一子样品获得的序列读段来区分含有hmC的位置与含有mC或未修饰的C的位置。对于ACE转化的示例性描述,参见,例如,Schutsky等人,Nature Biotechnology 2018;36:1083–1090。In some embodiments, the program of the first core base in the DNA of the first partition and the second core base in the DNA differently includes the epigenetic (ACE) conversion of APOBEC coupling. In ACE conversion, AID/APOBEC family DNA deaminase such as APOBEC3A (A3A) is used to deaminize unmodified cytosine and mC, without deaminating hmC, fC or caC. Therefore, when using ACE conversion, the first core base includes unmodified C and/or mC (for example, unmodified C and optionally mC), and the second core base includes hmC. The sequencing of the DNA converted by ACE will be identified as the position of cytosine as hmC position, fC position or caC position. At the same time, the position of T is identified as T, unmodified C or mC. Therefore, as described herein, ACE conversion of the first partition helps to distinguish the position containing hmC from the position containing mC or unmodified C using the sequence reads obtained from the first subsample. For an exemplary description of ACE conversion, see, e.g., Schutsky et al., Nature Biotechnology 2018; 36: 1083–1090.

在一些实施方案中,不同地影响第一分区的DNA中的第一核碱基和DNA中的第二核碱基的程序包括对第一核碱基的酶促转化,例如,如EM-SEQ中的。参见,例如,Vaisvila R,等人(2019)EM-seq:Detection of DNA methylation at single base resolution frompicograms of DNA.bioRxiv;DOI:10.1101/2019.12.20.884692,在www.biorxiv.org/content/10.1101/2019.12.20.884692v1可获得。例如,TET2和T4-βGT可用于将5mC和5hmC转化为不能被脱氨基酶(例如,APOBEC3A)脱氨基的底物,并且然后脱氨基酶(例如,APOBEC3A)可用于使未修饰的胞嘧啶脱氨基,将其转化为尿嘧啶。In some embodiments, the procedure that differently affects the first nucleobase in the DNA of the first partition and the second nucleobase in the DNA comprises an enzymatic conversion of the first nucleobase, e.g., as in EM-SEQ. See, e.g., Vaisvila R, et al. (2019) EM-seq: Detection of DNA methylation at single base resolution from picograms of DNA. bioRxiv; DOI: 10.1101/2019.12.20.884692, available at www.biorxiv.org/content/10.1101/2019.12.20.884692v1. For example, TET2 and T4-βGT can be used to convert 5mC and 5hmC into substrates that cannot be deaminated by a deaminase (e.g., APOBEC3A), and then a deaminase (e.g., APOBEC3A) can be used to deaminate unmodified cytosine, converting it to uracil.

在一些实施方案中,不同地影响第一分区的DNA中的第一核碱基和DNA中的第二核碱基的方法包括将最初包含第一核碱基的DNA与最初不包含第一核碱基的DNA分离。在一些这样的实施方案中,第一核碱基是hmC。可以使用包括将最初包含第一核碱基的位置生物素化的标记程序将最初包含第一核碱基的DNA与其他DNA分离。在一些实施方案中,首先将第一核碱基用含叠氮基的部分(诸如含有葡糖基-叠氮基的部分)衍生化。然后,含叠氮基的部分可以用作附接生物素的试剂,例如通过Huisgen环加成化学。然后,可以使用生物素结合剂,诸如亲和素、中性亲和素(neutravidin)(具有约6.3的等电点的去糖基化亲和素)或链霉亲和素,将最初包含第一核碱基现在被生物素化的DNA与最初不包含第一核碱基的DNA分离。用于将最初包含第一核碱基的DNA与最初不包含第一核碱基的DNA分离的方法的实例是hmC-seal,hmC-seal标记hmC以形成β-6-叠氮基-葡糖基-5-羟甲基胞嘧啶,并且然后通过Huisgen环加成附接生物素部分,随后使用生物素结合剂来分离生物素化DNA与其他DNA。对于hmC-seal的示例性描述,参见,例如,Han等人,Mol.Cell 2016;63:711-719。这种方法可用于鉴定包含一个或更多个hmC核碱基的片段。In some embodiments, the method of the first nucleobase in the DNA of the first partition and the second nucleobase in the DNA differently affects includes separating the DNA initially comprising the first nucleobase from the DNA initially not comprising the first nucleobase. In some such embodiments, the first nucleobase is hmC. The DNA initially comprising the first nucleobase can be separated from other DNAs using a labeling procedure including biotinylation of the position initially comprising the first nucleobase. In some embodiments, the first nucleobase is first derivatized with a part containing an azido group (such as a part containing a glucosyl-azido group). Then, the part containing an azido group can be used as a reagent for attaching biotin, for example, by Huisgen cycloaddition chemistry. Then, a biotin binding agent can be used, such as avidin, neutravidin (neutravidin) (deglycosylated avidin with an isoelectric point of about 6.3) or streptavidin, the DNA initially comprising the first nucleobase now being biotinylated is separated from the DNA initially not comprising the first nucleobase. An example of a method for separating DNA that originally contained a first nucleobase from DNA that originally did not contain the first nucleobase is hmC-seal, which labels hmC to form β-6-azido-glucosyl-5-hydroxymethylcytosine, and then attaches a biotin moiety by Huisgen cycloaddition, followed by the use of a biotin binder to separate the biotinylated DNA from the other DNA. For an exemplary description of hmC-seal, see, e.g., Han et al., Mol. Cell 2016; 63: 711-719. This method can be used to identify fragments containing one or more hmC nucleobases.

在一些实施方案中,在这样的分离之后,该方法还包括对最初包含第一核碱基的DNA、最初不包含第一核碱基的DNA和第二分区的DNA中的每一个差异性加标签。该方法还可以包括在差异性加标签之后汇集最初包含第一核碱基的DNA、最初不包含第一核碱基的DNA和第二分区的DNA。然后,可以在同一测序池中对最初包含第一核碱基的DNA、最初不包含第一核碱基的DNA和第二分区的DNA进行测序,同时保持使用差异性标签来分辨特定读段是来自最初包含第一核碱基的DNA、最初不包含第一核碱基的DNA还是第二分区的DNA的分子的能力。In some embodiments, after such separation, the method further comprises differentially tagging each of the DNA that initially comprises the first nucleobase, the DNA that initially does not comprise the first nucleobase, and the DNA of the second partition. The method may also comprise pooling the DNA that initially comprises the first nucleobase, the DNA that initially does not comprise the first nucleobase, and the DNA of the second partition after differential tagging. The DNA that initially comprises the first nucleobase, the DNA that initially does not comprise the first nucleobase, and the DNA of the second partition may then be sequenced in the same sequencing pool while maintaining the ability to use the differential tags to distinguish whether a particular read is from a molecule of the DNA that initially comprises the first nucleobase, the DNA that initially does not comprise the first nucleobase, or the DNA of the second partition.

在一些实施方案中,第一核碱基是修饰或未修饰的腺嘌呤,并且第二核碱基是修饰或未修饰的腺嘌呤。在一些实施方案中,修饰的腺嘌呤是N6-甲基腺嘌呤(mA)。在一些实施方案中,修饰的腺嘌呤是N6-甲基腺嘌呤(mA)、N6-羟甲基腺嘌呤(hmA)或N6-甲酰基腺嘌呤(fA)中的一种或更多种。In some embodiments, the first core base is a modified or unmodified adenine, and the second core base is a modified or unmodified adenine. In some embodiments, the modified adenine is N6-methyladenine (mA). In some embodiments, the modified adenine is one or more of N6-methyladenine (mA), N6-hydroxymethyladenine (hmA) or N6-formyladenine (fA).

包括甲基化DNA免疫沉淀(MeDIP)的技术可以用于分离含有修饰的碱基(诸如mA)的DNA与其他DNA。参见,例如,Kumar等人,Frontiers Genet.2018;9:640;Greer等人,Cell2015;161:868-878。对mA特异性的抗体在Sun等人,Bioessays 2015;37:1155-62中描述。针对各种修饰的核碱基(诸如胸腺嘧啶/尿嘧啶的形式,包括卤化形式,诸如5-溴尿嘧啶)的抗体是商业上可获得的。各种修饰的碱基也可以基于它们的碱基配对特异性的改变来检测。例如,次黄嘌呤是腺嘌呤的修饰形式,其可以由脱氨基产生并且在测序中被读取为G。参见,例如,US专利第8,486,630号;Brown,Genomes,第2版,John Wiley&Sons,Inc.,New York,N.Y.,2002,第14章,“Mutation,Repair,and Recombination.”。Techniques including methylated DNA immunoprecipitation (MeDIP) can be used to separate DNA containing modified bases (such as mA) from other DNA. See, for example, Kumar et al., Frontiers Genet.2018; 9: 640; Greer et al., Cell 2015; 161: 868-878. Antibodies specific to mA are described in Sun et al., Bioessays 2015; 37: 1155-62. Antibodies against various modified nucleobases (such as forms of thymine/uracil, including halogenated forms, such as 5-bromouracil) are commercially available. Various modified bases can also be detected based on changes in their base pairing specificity. For example, hypoxanthine is a modified form of adenine, which can be produced by deamination and is read as G in sequencing. See, e.g., U.S. Patent No. 8,486,630; Brown, Genomes, 2nd ed., John Wiley & Sons, Inc., New York, N.Y., 2002, Chapter 14, “Mutation, Repair, and Recombination.”

b.受试者b. Subjects

在一些实施方案中,核酸分子诸如DNA(例如,cfDNA)从患有癌症的受试者获得。在一些实施方案中,DNA(例如,cfDNA)从怀疑患有癌症的受试者获得。在一些实施方案中,DNA(例如,cfDNA)从患有肿瘤的受试者获得。在一些实施方案中,DNA(例如,cfDNA)从怀疑患有肿瘤的受试者获得。在一些实施方案中,DNA(例如,cfDNA)从患有赘生物的受试者获得。在一些实施方案中,DNA(例如,cfDNA)从怀疑患有赘生物的受试者获得。在一些实施方案中,DNA(例如,cfDNA)从处于从肿瘤、癌症或赘生物缓解(例如,在化学疗法、手术切除、放射或其组合之后)的受试者获得。在任一前述实施方案中,癌症、肿瘤或赘生物或者疑似的癌症、肿瘤或赘生物可以是肺、结肠、直肠、肾、乳腺、前列腺或肝的。在一些实施方案中,癌症、肿瘤或赘生物或者疑似的癌症、肿瘤或赘生物是肺的。在一些实施方案中,癌症、肿瘤或赘生物或者疑似的癌症、肿瘤或赘生物是结肠的或直肠的。在一些实施方案中,癌症、肿瘤或赘生物或者疑似的癌症、肿瘤或赘生物是乳腺的。在一些实施方案中,癌症、肿瘤或赘生物或者疑似的癌症、肿瘤或赘生物是前列腺的。在任一前述实施方案中,受试者可以是人类受试者。In some embodiments, nucleic acid molecules such as DNA (e.g., cfDNA) are obtained from subjects with cancer. In some embodiments, DNA (e.g., cfDNA) is obtained from subjects suspected of having cancer. In some embodiments, DNA (e.g., cfDNA) is obtained from subjects with tumors. In some embodiments, DNA (e.g., cfDNA) is obtained from subjects suspected of having tumors. In some embodiments, DNA (e.g., cfDNA) is obtained from subjects with vegetation. In some embodiments, DNA (e.g., cfDNA) is obtained from subjects suspected of having vegetation. In some embodiments, DNA (e.g., cfDNA) is obtained from subjects in remission (e.g., after chemotherapy, surgical resection, radiation, or a combination thereof) from tumors, cancers, or vegetation. In any of the foregoing embodiments, cancer, tumors, or vegetations, or suspected cancer, tumors, or vegetations, can be lungs, colons, rectums, kidneys, breasts, prostates, or livers. In some embodiments, cancer, tumors, or vegetations, or suspected cancer, tumors, or vegetations are lungs. In some embodiments, cancer, tumors, or vegetations, or suspected cancer, tumors, or vegetations are colonic or rectal. In some embodiments, the cancer, tumor or neoplasm or suspected cancer, tumor or neoplasm is of the breast. In some embodiments, the cancer, tumor or neoplasm or suspected cancer, tumor or neoplasm is of the prostate. In any of the foregoing embodiments, the subject can be a human subject.

c.定量c. Quantification

在一些实施方案中,对从第一分区、经处理的第一分区或经处理的第二分区中的一个或更多个捕获的表观遗传靶区进行定量。例如,可以在经处理的第二分区中对低甲基化可变靶区进行定量,和/或可以在第一分区或经处理的第一分区中对高甲基化可变靶区进行定量。可以通过任何适当技术进行定量,例如定量扩增,诸如定量PCR。在一些实施方案中,基于测序数据(例如,测序读段的数目或测序的独特分子的数目)进行定量。In some embodiments, the epigenetic target regions captured from one or more of the first partition, the processed first partition, or the processed second partition are quantified. For example, the low methylation variable target regions can be quantified in the processed second partition, and/or the high methylation variable target regions can be quantified in the first partition or the processed first partition. Quantification can be performed by any appropriate technique, such as quantitative amplification, such as quantitative PCR. In some embodiments, quantification is performed based on sequencing data (e.g., the number of sequencing reads or the number of unique molecules sequenced).

如以上描述的,表观遗传靶区的定量可用于确定受试者中癌症的存在、不存在或可能性。例如,可以至少部分地基于第一分区或经处理的第一分区中的高甲基化可变靶区的量和/或经处理的第二分区中的低甲基化可变靶区的量是否超过预定阈值确定癌症存在或不存在。在一些实施方案中,这样的量可以与从样品收集的其他数据一起使用,其他数据例如突变和/或本文其他地方描述的其他表观遗传特征的存在,诸如转录起始位点和/或CTCF结合位点的扰动。As described above, quantification of epigenetic target regions can be used to determine the presence, absence, or likelihood of cancer in a subject. For example, the presence or absence of cancer can be determined based at least in part on whether the amount of hypermethylated variable target regions in the first partition or the treated first partition and/or the amount of hypomethylated variable target regions in the treated second partition exceeds a predetermined threshold. In some embodiments, such amounts can be used together with other data collected from the sample, such as the presence of mutations and/or other epigenetic features described elsewhere herein, such as perturbations of transcription start sites and/or CTCF binding sites.

d.汇集来自第一和第二分区或其部分的DNAd. Pooling DNA from the first and second partitions or portions thereof

在一些实施方案中,方法包括制备包含第二分区(例如,低甲基化分区)的至少一部分DNA和第一分区(例如,高甲基化分区)的至少一部分DNA的池。可以从池中捕获靶区,例如,包括表观遗传靶区和/或序列可变靶区。本文其他地方描述的从分区的至少一部分捕获靶区组的步骤包括对包含来自第一和第二分区的DNA的池进行捕获步骤。在从池中捕获靶区之前,可以进行扩增池中的DNA的步骤。捕获步骤可以具有本文其他地方描述的任何特征。In some embodiments, the method includes preparing a pool comprising at least a portion of the DNA of the second partition (e.g., a hypomethylated partition) and at least a portion of the DNA of the first partition (e.g., a hypermethylated partition). Target regions can be captured from the pool, for example, including epigenetic target regions and/or sequence variable target regions. The steps of capturing the target region group from at least a portion of the partition described elsewhere herein include performing a capture step on a pool comprising DNA from the first and second partitions. Prior to capturing the target region from the pool, a step of amplifying the DNA in the pool can be performed. The capture step can have any of the features described elsewhere herein.

表观遗传靶区可能显示出甲基化水平和/或片段化模式的差异,这取决于它们是来自肿瘤还是来自健康细胞,或者它们来源于什么类型的组织,如本文其他地方所讨论的。序列可变靶区可能显示出序列的差异,这取决于它们来源于肿瘤还是来源于健康细胞。Epigenetic target regions may show differences in methylation levels and/or fragmentation patterns depending on whether they are derived from tumor or healthy cells, or what type of tissue they are derived from, as discussed elsewhere in this article. Sequence variable target regions may show differences in sequence depending on whether they are derived from tumor or healthy cells.

在一些应用中,分析来自低甲基化分区的表观遗传靶区可能比分析来自高甲基化和低甲基化分区的序列可变靶区和来自高甲基化分区的表观遗传靶区信息少。因此,在捕获序列可变靶区和表观遗传靶区的方法中,后者可以以小于来自高甲基化和低甲基化分区的一个或更多个序列可变靶区和来自高甲基化分区的表观遗传靶区的程度捕获。例如,序列可变靶区可以从未与高甲基化分区汇集的低甲基化分区的部分捕获,并且可以用来自高甲基化分区的一些(例如,大多数、基本上所有或所有)DNA和来自低甲基化分区的零或一些(例如,少部分)DNA来制备池。这样的方法可以减少或消除来自低甲基化分区的表观遗传靶区的测序,从而减少足以用于进一步分析的测序数据量。In some applications, analyzing the epigenetic target area from the hypomethylation partition may be less than analyzing the sequence variable target area from the hypermethylation and hypomethylation partitions and the epigenetic target area information from the hypermethylation partition. Therefore, in the method for capturing the sequence variable target area and the epigenetic target area, the latter can be captured with a degree less than one or more sequence variable target areas from the hypermethylation and hypomethylation partitions and the epigenetic target area from the hypermethylation partition. For example, the sequence variable target area can be captured from the part of the hypomethylation partition that is never collected with the hypermethylation partition, and a pool can be prepared with some (for example, most of, substantially all or all) DNA from the hypermethylation partition and zero or some (for example, a small part) DNA from the hypomethylation partition. Such a method can reduce or eliminate the sequencing of the epigenetic target area from the hypomethylation partition, thereby reducing the sequencing data amount that is enough for further analysis.

在一些实施方案中,在池中包含低甲基化分区的DNA的少部分有助于例如在相对基础上对一个或更多个表观遗传特征(例如,甲基化或本文其他地方详细讨论的其他表观遗传特征)进行定量。In some embodiments, comprising a small portion of DNA from a hypomethylated partition in a pool facilitates quantification of one or more epigenetic features (eg, methylation or other epigenetic features discussed in detail elsewhere herein), for example, on a relative basis.

在一些实施方案中,池包含低甲基化分区的DNA的少部分,例如低甲基化分区的少于约50%的DNA,诸如低甲基化分区的少于或等于约45%、40%、35%、30%、25%、20%、15%、10%或5%的DNA。在一些实施方案中,池包含低甲基化分区的约5%-25%的DNA。在一些实施方案中,池包含低甲基化分区的约10%-20%的DNA。在一些实施方案中,池包含低甲基化分区的约10%的DNA。在一些实施方案中,池包含低甲基化分区的约15%的DNA。在一些实施方案中,池包含低甲基化分区的约20%的DNA。In some embodiments, the pool comprises a small portion of the DNA of the hypomethylated partition, for example, less than about 50% of the DNA of the hypomethylated partition, such as less than or equal to about 45%, 40%, 35%, 30%, 25%, 20%, 15%, 10% or 5% of the DNA of the hypomethylated partition. In some embodiments, the pool comprises about 5%-25% of the DNA of the hypomethylated partition. In some embodiments, the pool comprises about 10%-20% of the DNA of the hypomethylated partition. In some embodiments, the pool comprises about 10% of the DNA of the hypomethylated partition. In some embodiments, the pool comprises about 15% of the DNA of the hypomethylated partition. In some embodiments, the pool comprises about 20% of the DNA of the hypomethylated partition.

在一些实施方案中,池包含高甲基化分区的一部分,其可以是高甲基化分区的至少约50%的DNA。例如,池可以包含高甲基化分区的至少约55%、60%、65%、70%、75%、80%、85%、90%或95%的DNA。在一些实施方案中,池包含高甲基化分区的50%-55%、55%-60%、60%-65%、65%-70%、70%-75%、75%-80%、80%-85%、85%-90%、90%-95%或95%-100%的DNA。在一些实施方案中,第二池包含高甲基化分区的全部或基本上全部。In some embodiments, the pool comprises a portion of a hypermethylated partition, which may be at least about 50% of the DNA of the hypermethylated partition. For example, the pool may comprise at least about 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90% or 95% of the DNA of the hypermethylated partition. In some embodiments, the pool comprises 50%-55%, 55%-60%, 60%-65%, 65%-70%, 70%-75%, 75%-80%, 80%-85%, 85%-90%, 90%-95% or 95%-100% of the DNA of the hypermethylated partition. In some embodiments, the second pool comprises all or substantially all of the hypermethylated partition.

在一些实施方案中,方法包括制备包含低甲基化分区的DNA的至少一部分的第一池。在一些实施方案中,方法包括制备包含高甲基化分区的DNA的至少一部分的第二池。在一些实施方案中,第一池还包含高甲基化分区的DNA的一部分。在一些实施方案中,第二池还包含低甲基化分区的DNA的一部分。在一些实施方案中,第一池包含低甲基化分区的DNA的大部分,并且任选地和高甲基化分区的DNA的少部分。在一些实施方案中,第二池包含高甲基化分区的DNA的大部分和低甲基化分区的DNA的少部分。在包括中等甲基化分区的一些实施方案中,第二池包含中等甲基化分区的DNA的至少一部分,例如,中等甲基化分区的DNA的大部分。在一些实施方案中,第一池包含低甲基化分区的DNA的大部分,并且第二池包含高甲基化分区的DNA的大部分和中等甲基化分区的DNA的大部分。In some embodiments, the method includes preparing a first pool comprising at least a portion of the DNA of the hypomethylated partitions. In some embodiments, the method includes preparing a second pool comprising at least a portion of the DNA of the hypermethylated partitions. In some embodiments, the first pool also comprises a portion of the DNA of the hypermethylated partitions. In some embodiments, the second pool also comprises a portion of the DNA of the hypomethylated partitions. In some embodiments, the first pool comprises a majority of the DNA of the hypomethylated partitions, and optionally a minority of the DNA of the hypermethylated partitions. In some embodiments, the second pool comprises a majority of the DNA of the hypermethylated partitions and a minority of the DNA of the hypomethylated partitions. In some embodiments comprising a medium methylated partition, the second pool comprises at least a portion of the DNA of the medium methylated partitions, for example, a majority of the DNA of the medium methylated partitions. In some embodiments, the first pool comprises a majority of the DNA of the hypomethylated partitions, and the second pool comprises a majority of the DNA of the hypermethylated partitions and a majority of the DNA of the medium methylated partitions.

在一些实施方案中,方法包括从第一池捕获至少第一靶区组,例如,其中第一池如以上任何实施方案中阐述的。在一些实施方案中,第一组包括序列可变靶区。在一些实施方案中,第一组包括低甲基化可变靶区和/或片段化可变靶区。在一些实施方案中,第一组包括序列可变靶区和片段化可变靶区。在一些实施方案中,第一组包括序列可变靶区、低甲基化可变靶区和片段化可变靶区。在这种捕获步骤之前,可以进行扩增第一池中DNA的步骤。在一些实施方案中,从第一池中捕获第一靶区组包括使第一池的DNA与第一靶特异性探针组接触。在一些实施方案中,第一靶特异性探针组包括对序列可变靶区特异性的靶结合探针。在一些实施方案中,第一靶特异性探针组包括对序列可变靶区、低甲基化可变靶区和/或片段化可变靶区特异性的靶结合探针。In some embodiments, the method includes capturing at least a first target region group from a first pool, for example, wherein the first pool is as described in any of the above embodiments. In some embodiments, the first group includes sequence variable target regions. In some embodiments, the first group includes hypomethylated variable target regions and/or fragmented variable target regions. In some embodiments, the first group includes sequence variable target regions and fragmented variable target regions. In some embodiments, the first group includes sequence variable target regions, hypomethylated variable target regions, and fragmented variable target regions. Prior to this capture step, a step of amplifying the DNA in the first pool may be performed. In some embodiments, capturing the first target region group from the first pool includes contacting the DNA of the first pool with a first target-specific probe group. In some embodiments, the first target-specific probe group includes a target binding probe specific for a sequence variable target region. In some embodiments, the first target-specific probe group includes a target binding probe specific for a sequence variable target region, a hypomethylated variable target region, and/or a fragmented variable target region.

在一些实施方案中,方法包括从第二池捕获第二靶区组或多于一个靶区组,例如,其中第二池如以上任何实施方案中阐述的。在一些实施方案中,第二多于一个靶区组包括表观遗传靶区,诸如高甲基化可变靶区和/或片段化可变靶区。在一些实施方案中,第二多于一个靶区组包括序列可变靶区和表观遗传靶区,诸如高甲基化可变靶区和/或片段化可变靶区。在这种捕获步骤之前,可以进行扩增第二池中DNA的步骤。在一些实施方案中,从第二池中捕获第二多于一个靶区组包括使第二池的DNA与第二靶特异性探针组接触,其中第二靶特异性探针组包括对序列可变靶区特异性的靶结合探针和对表观遗传靶区特异性的靶结合探针。在一些实施方案中,第一靶区组和第二靶区组不相同。例如,第一靶区组可以包括不存在于第二靶区组中的一个或更多个靶区。可选地或另外,第二靶区组可以包括第一靶区组中不存在的一个或更多个靶区。在一些实施方案中,从第二池中但不从第一池中捕获至少一个高甲基化可变靶区。在一些实施方案中,从第二池中但不从第一池中捕获捕获多于一个高甲基化可变靶区。在一些实施方案中,第一靶区组包括序列可变靶区和/或第二靶区组包括表观遗传靶区。在一些实施方案中,第一靶区组包括序列可变靶区和片段化可变靶区;并且第二靶区组包括表观遗传靶区,诸如高甲基化可变靶区和片段化可变靶区。在一些实施方案中,第一靶区组包括序列可变靶区、片段化可变靶区,并且包括低甲基化可变靶区;并且第二靶区组包括表观遗传靶区,诸如高甲基化可变靶区和片段化可变靶区。In some embodiments, the method includes capturing a second target area group or more than one target area group from a second pool, for example, wherein the second pool is as described in any of the above embodiments. In some embodiments, the second more than one target area group includes epigenetic target areas, such as hypermethylated variable target areas and/or fragmented variable target areas. In some embodiments, the second more than one target area group includes sequence variable target areas and epigenetic target areas, such as hypermethylated variable target areas and/or fragmented variable target areas. Prior to this capture step, a step of amplifying the DNA in the second pool can be performed. In some embodiments, capturing the second more than one target area group from the second pool includes contacting the DNA of the second pool with a second target-specific probe group, wherein the second target-specific probe group includes a target binding probe specific to the sequence variable target area and a target binding probe specific to the epigenetic target area. In some embodiments, the first target area group and the second target area group are not the same. For example, the first target area group may include one or more target areas that are not present in the second target area group. Alternatively or in addition, the second target area group may include one or more target areas that are not present in the first target area group. In some embodiments, at least one hypermethylated variable target region is captured from the second pool but not from the first pool. In some embodiments, more than one hypermethylated variable target region is captured from the second pool but not from the first pool. In some embodiments, the first target region group includes sequence variable target regions and/or the second target region group includes epigenetic target regions. In some embodiments, the first target region group includes sequence variable target regions and fragmented variable target regions; and the second target region group includes epigenetic target regions, such as hypermethylated variable target regions and fragmented variable target regions. In some embodiments, the first target region group includes sequence variable target regions, fragmented variable target regions, and includes hypomethylated variable target regions; and the second target region group includes epigenetic target regions, such as hypermethylated variable target regions and fragmented variable target regions.

在一些实施方案中,第一池包含低甲基化分区的DNA的大部分和高甲基化分区的DNA的一部分(例如,约一半),并且第二池包含高甲基化分区的DNA的一部分(例如,约一半)。在一些这样的实施方案中,第一靶区组包括序列可变靶区和/或第二靶区组包括表观遗传靶区。序列可变靶区和/或表观遗传靶区可以如本文其他地方描述的任何实施方案中阐述的。In some embodiments, the first pool comprises a majority of the DNA of the hypomethylated partitions and a portion (e.g., about half) of the DNA of the hypermethylated partitions, and the second pool comprises a portion (e.g., about half) of the DNA of the hypermethylated partitions. In some such embodiments, the first target region set comprises sequence variable target regions and/or the second target region set comprises epigenetic target regions. The sequence variable target regions and/or epigenetic target regions may be as described in any of the embodiments described elsewhere herein.

f.捕获部分、诱饵组f. Capture part, bait set

如以上描述的,可以使样品中的核酸经历捕获步骤,其中捕获具有靶序列的分子用于后续分析。靶捕获可以包括使用包含寡核苷酸诱饵的诱饵组,诸如用捕获部分(诸如生物素或以下提及的其他实例)标记的靶特异性探针。探针可以具有被选择为平铺跨越一组区域(诸如基因)的序列。在一些实施方案中,诱饵组可以对靶区组(诸如序列可变靶区组和表观遗传靶区的靶区组)分别具有较高和较低的捕获产量,如本文其他地方讨论的。在允许靶分子与诱饵杂交的条件下,将这些诱饵组与样品组合。然后,使用捕获部分来分离捕获的分子。例如,基于珠的链霉亲和素的生物素捕获部分。例如,在2017年12月26日颁布的美国专利9,850,523中进一步描述了这样的方法,该专利通过引用并入本文。As described above, the nucleic acid in the sample can be subjected to a capture step, wherein molecules with target sequences are captured for subsequent analysis. Target capture can include the use of a bait set comprising an oligonucleotide bait, such as a target-specific probe labeled with a capture moiety (such as biotin or other examples mentioned below). The probe can have a sequence selected to tile across a group of regions (such as a gene). In some embodiments, the bait set can have a higher and lower capture yield for a target area group (such as a sequence variable target area group and a target area group of an epigenetic target area), respectively, as discussed elsewhere herein. Under conditions that allow target molecules to hybridize with baits, these bait sets are combined with samples. Then, capture moieties are used to separate captured molecules. For example, a biotin capture moiety based on the streptavidin of a bead. For example, such a method is further described in U.S. Patent No. 9,850,523, issued on December 26, 2017, which is incorporated herein by reference.

捕获部分包含但不限于生物素、亲和素、链霉亲和素、包含特定核苷酸序列的核酸、被抗体识别的半抗原和磁性可吸引颗粒。提取部分可以是结合对的成员,诸如生物素/链霉亲和素或半抗原/抗体。在一些实施方案中,附接到分析物的捕获部分被它的结合对捕获,该结合对附接到可分离部分,诸如磁性可吸引颗粒或可以通过离心沉淀的大颗粒。捕获部分可以是允许将带有捕获部分的核酸与缺乏捕获部分的核酸亲和分离的任何类型的分子。示例性捕获部分是生物素或寡核苷酸,所述生物素允许通过与连接或可连接到固相的链霉亲和素结合而进行亲和分离,所述寡核苷酸允许通过与连接或可连接到固相的互补寡核苷酸结合而进行亲和分离。Capture moieties include but are not limited to biotin, avidin, streptavidin, nucleic acids comprising specific nucleotide sequences, haptens recognized by antibodies, and magnetically attractable particles. Extraction moieties may be members of binding pairs, such as biotin/streptavidin or haptens/antibodies. In some embodiments, the capture moiety attached to the analyte is captured by its binding pair, which is attached to a separable portion, such as magnetically attractable particles or large particles that can be precipitated by centrifugation. Capture moieties may be molecules of any type that allow affinity separation of nucleic acids with capture moieties from nucleic acids lacking capture moieties. Exemplary capture moieties are biotin or oligonucleotides, the biotin allowing affinity separation by binding to streptavidin that is connected or can be connected to a solid phase, the oligonucleotides allowing affinity separation by binding to complementary oligonucleotides that are connected or can be connected to a solid phase.

H.分析H. Analysis

在一些实施方案中,本文描述的方法包括鉴定由肿瘤(或赘生性细胞或癌细胞)产生的DNA的存在。In some embodiments, the methods described herein include identifying the presence of DNA produced by a tumor (or neoplastic cell or cancer cell).

在一些实施方案中,本文的方法包括分析核酸分子,其中至少一些核酸包含一个或更多个修饰的胞嘧啶残基,诸如5-甲基胞嘧啶和先前描述的任何其他修饰。在一些这样的方法中,在分区之后,使核酸分区与包含一个或更多个在5C位置处修饰的胞嘧啶残基(诸如5-甲基胞嘧啶)的衔接子接触。在一些实施方案中,这样的衔接子中的所有胞嘧啶残基也是修饰的,或者衔接子的引物结合区中所有这样的胞嘧啶是修饰的。将衔接子附接到群体中核酸分子的两端。在一些实施方案中,衔接子包含足够数目的不同标签,使得标签组合的数目导致具有相同起点和终点的两个核酸接收不同标签组合的概率高,例如95%、99%或99.9%。这样的衔接子中的引物结合位点可以是相同或不同的,但优选地是相同的。衔接子附接后,由与衔接子的引物结合位点结合的引物扩增核酸。将扩增的核酸分为第一等分试样和第二等分试样。在进行或不进行进一步处理的情况下,对第一等分试样进行序列数据测定。由此确定第一等分试样中分子的序列数据而不管核酸分子的初始甲基化状态。第二等分试样中的核酸分子经历不同地影响DNA中的第一核碱基和DNA中的第二核碱基的程序,其中第一核碱基包括在位置5处修饰的胞嘧啶,并且第二核碱基包括未修饰的胞嘧啶。该程序可以是亚硫酸氢盐处理或将未修饰的胞嘧啶转化为尿嘧啶的另一程序。然后将经历该程序的核酸用针对连接到核酸的衔接子的原始引物结合位点的引物扩增。现在只有最初连接到衔接子的核酸分子(不同于其扩增产物)被扩增,因为这些核酸在衔接子的引物结合位点保留了胞嘧啶,而扩增产物已失去了这些胞嘧啶残基的甲基化,这些失去甲基化的胞嘧啶残基在亚硫酸氢盐处理中已经历转化成为尿嘧啶。因此,只有群体中的原始核酸分子(至少其中一些是甲基化的)经历扩增。扩增后,这些核酸经历序列分析。对从第一等分试样和第二等分试样确定的序列的比较尤其可以指示核酸群体中哪些胞嘧啶经历了甲基化。In some embodiments, the method herein includes analyzing nucleic acid molecules, wherein at least some nucleic acids include one or more modified cytosine residues, such as 5-methylcytosine and any other modification described previously. In some such methods, after partitioning, the nucleic acid partition is contacted with an adapter comprising one or more cytosine residues modified at the 5C position (such as 5-methylcytosine). In some embodiments, all cytosine residues in such adapters are also modified, or all such cytosines in the primer binding region of the adapter are modified. The adapter is attached to the two ends of the nucleic acid molecule in the population. In some embodiments, the adapter includes a sufficient number of different tags so that the number of tag combinations causes the probability of two nucleic acids with the same starting point and end point receiving different tag combinations to be high, such as 95%, 99% or 99.9%. The primer binding sites in such adapters can be the same or different, but preferably the same. After the adapter is attached, the nucleic acid is amplified by a primer bound to the primer binding site of the adapter. The amplified nucleic acid is divided into a first aliquot and a second aliquot. In the case of carrying out or not carrying out further processing, the first aliquot is subjected to sequence data determination. Thus the sequence data of the molecule in the first aliquot is determined regardless of the initial methylation state of the nucleic acid molecule. The nucleic acid molecule experience in the second aliquot affects the program of the first core base in DNA and the second core base in DNA differently, wherein the first core base includes the cytosine modified at position 5, and the second core base includes unmodified cytosine. The program can be another program of bisulfite treatment or unmodified cytosine converted into uracil. Then the nucleic acid undergoing the program is amplified with the primer for the original primer binding site of the adapter connected to the nucleic acid. Now only the nucleic acid molecules (different from its amplification product) initially connected to the adapter are amplified, because these nucleic acids retain cytosine in the primer binding site of the adapter, and the amplification product has lost the methylation of these cytosine residues, and these cytosine residues that lose methylation have been transformed into uracil in the bisulfite treatment. Therefore, only the original nucleic acid molecules in the population (at least some of which are methylated) are subjected to amplification. After amplification, these nucleic acids are subjected to sequence analysis. Comparison of the sequences determined from the first aliquot and the second aliquot can in particular indicate which cytosines in the nucleic acid population have undergone methylation.

这样的分析可以使用以下示例性程序来进行。在分区后,将甲基化DNA的两端连接到含有引物结合位点和标签的Y形衔接子。衔接子中的胞嘧啶在位置5处被修饰(例如,5-甲基化)。衔接子的修饰用于在随后的转化步骤(例如,亚硫酸氢盐处理、TAP转化或不影响修饰胞嘧啶但影响未修饰胞嘧啶的任何其他转化)中保护引物结合位点。衔接子附接后,扩增DNA分子。将扩增产物分为两个等分试样用于有转化和无转化的测序。未经历转化的等分试样可以在进行或不进行进一步处理的情况下经历序列分析。另一等分试样经历不同地影响DNA中的第一核碱基和DNA中的第二核碱基的程序,其中第一核碱基包括在位置5处修饰的胞嘧啶,并且第二核碱基包括未修饰的胞嘧啶。该程序可以是亚硫酸氢盐处理或将未修饰的胞嘧啶转化为尿嘧啶的另一程序。当与对原始引物结合位点特异性的引物接触时,只有受胞嘧啶修饰保护的引物结合位点可以支持扩增。因此,只有原始分子而非来自第一扩增的拷贝经历进一步扩增。进一步扩增的分子然后经历序列分析。然后可以比较来自两个等分试样的序列。如以上讨论的分离方案中的,衔接子中的核酸标签不用于区分甲基化DNA和未甲基化DNA,而是用于区分同一分区内的核酸分子。Such analysis can be carried out using the following exemplary program.After partitioning, the two ends of the methylated DNA are connected to a Y-shaped adapter containing a primer binding site and a label.The cytosine in the adapter is modified (e.g., 5-methylated) at position 5. The modification of the adapter is used to protect the primer binding site in subsequent conversion steps (e.g., bisulfite treatment, TAP conversion, or any other conversion that does not affect modified cytosine but affects unmodified cytosine).After the adapter is attached, the DNA molecule is amplified.The amplified product is divided into two aliquots for sequencing with and without conversion.The aliquots that have not undergone conversion can be subjected to sequence analysis in the case of carrying out or not carrying out further processing.Another aliquot experience differently affects the first core base in DNA and the second core base in DNA The program, wherein the first core base includes the cytosine modified at position 5, and the second core base includes unmodified cytosine.The program can be another program of bisulfite treatment or conversion of unmodified cytosine into uracil. When contacted with primers specific to the original primer binding site, only primer binding sites protected by cytosine modification can support amplification. Therefore, only the original molecule, rather than the copy from the first amplification, undergoes further amplification. The further amplified molecules are then subjected to sequence analysis. The sequences from the two aliquots can then be compared. As in the separation scheme discussed above, the nucleic acid tags in the adapter are not used to distinguish between methylated DNA and unmethylated DNA, but are used to distinguish nucleic acid molecules within the same partition.

测序可以生成多于一个序列读段或读段。序列读段或读段可以包括长度小于约150个碱基或长度小于约90个碱基的核苷酸序列的数据。在一些实施方案中,读段的长度在约80个碱基和约90个碱基之间,例如,约85个碱基。在一些实施方案中,本公开内容的方法被应用于非常短的读段,例如,长度小于约50个碱基或约30个碱基。序列读段数据可以包括序列数据以及元信息。序列读段数据可以以任何合适的文件格式存储,包括例如,VCF文件、FASTA文件、或FASTQ文件。Sequencing can generate more than one sequence read or reads. Sequence reads or reads can include data of nucleotide sequences having a length of less than about 150 bases or a length of less than about 90 bases. In some embodiments, the length of the read is between about 80 bases and about 90 bases, for example, about 85 bases. In some embodiments, the method of the present disclosure is applied to very short reads, for example, having a length of less than about 50 bases or about 30 bases. Sequence read data can include sequence data and meta information. Sequence read data can be stored in any suitable file format, including, for example, VCF files, FASTA files, or FASTQ files.

FASTA可以指用于检索序列数据库的计算机程序,并且名称FASTA也可以指标准文件格式。FASTA由例如Pearson&Lipman,1988,Improved tools for biological sequencecomparison,PNAS 85:2444-2448描述,在此将其通过引用以其整体并入。FASTA格式的序列以单行描述开始,随后为序列数据行。描述行通过第一列中的大于(“>”)符号与序列数据区分开。“>”符号后面的词是序列的标识符,并且该行的其余部分是描述(都是任选的)。在“>”和标识符的第一个字母之间不可有空格。建议文本的所有行少于80个字符。如果出现以“>”开头的另一行,则序列结束;这指示另一个序列的开始。FASTA may refer to a computer program for searching sequence databases, and the name FASTA may also refer to a standard file format. FASTA is described by, for example, Pearson & Lipman, 1988, Improved tools for biological sequence comparison, PNAS 85: 2444-2448, which is incorporated herein by reference in its entirety. A sequence in FASTA format begins with a single-line description, followed by a sequence data row. The description row is distinguished from the sequence data by a greater than (">") symbol in the first column. The word after the ">" symbol is an identifier for the sequence, and the rest of the row is a description (all optional). There may be no space between the ">" and the first letter of the identifier. It is recommended that all lines of text be less than 80 characters. If another line beginning with ">" appears, the sequence ends; this indicates the beginning of another sequence.

FASTQ格式是基于文本的格式,用于存储生物序列(通常是核苷酸序列)及其对应的质量评分。它与FASTA格式相似,但是在序列数据之后具有质量评分。为简洁起见,序列字母和质量评分都使用单个ASCII字符编码。FASTQ格式是用于存储高通量测序仪器诸如Illumina Genome Analyzer的输出结果的约定俗成的标准,例如Cock等人(“The SangerFASTQ file format for sequences with quality scores,and the Solexa/IlluminaFASTQ variants,”Nucleic Acids Res 38(6):1767-1771,2009)所描述的,在此将其通过引用以其整体并入。The FASTQ format is a text-based format for storing biological sequences (usually nucleotide sequences) and their corresponding quality scores. It is similar to the FASTA format, but has a quality score after the sequence data. For simplicity, both the sequence letters and the quality scores are encoded using a single ASCII character. The FASTQ format is a conventional standard for storing the output results of high-throughput sequencing instruments such as Illumina Genome Analyzer, such as Cock et al. ("The SangerFASTQ file format for sequences with quality scores, and the Solexa/IlluminaFASTQ variants," Nucleic Acids Res 38 (6): 1767-1771, 2009) described, which is incorporated herein by reference in its entirety.

对于FASTA和FASTQ文件,元信息包括描述行但不包括序列数据行。在一些实施方案中,对于FASTQ文件,元信息包括质量评分。对于FASTA和FASTQ文件,序列数据在描述行之后开始,并且通常使用一些任选地带有“-”的IUPAC模糊代码的子集呈现。在一种实施方案中,序列数据可以使用A、T、C、G和N字符,任选地根据需要包括“-”或者包括U(例如,以表示空位或尿嘧啶)。For FASTA and FASTQ files, the meta information includes the description line but does not include the sequence data line. In some embodiments, for FASTQ files, the meta information includes a quality score. For FASTA and FASTQ files, the sequence data begins after the description line and is typically presented using a subset of IUPAC fuzzy codes that are optionally preceded by "-". In one embodiment, the sequence data may use A, T, C, G, and N characters, optionally including "-" or including U (e.g., to represent a vacancy or uracil) as desired.

在一些实施方案中,至少一个主序列读段文件和输出文件被存储为纯文本文件(例如,使用诸如ASCII、ISO/IEC 646、EBCDIC、UTF-8或UTF-16的编码)。本公开内容提供的计算机系统可以包括能够打开纯文本文件的文本编辑器程序。文本编辑器程序可以指能够在计算机屏幕上呈现文本文件(诸如纯文本文件)的内容、允许人员编辑文本(例如使用显示器、键盘和鼠标)的计算机程序。文本编辑器的实例包括但不限于Microsoft Word、emacs、pico、vi、BBEdit和TextWrangler。文本编辑器程序可以能够以人类可读格式在计算机屏幕上显示纯文本文件,显示元信息和序列读段(例如,不是二进制编码而是使用字母数字字符,因为它们可以用于打印或人类书写)。In some embodiments, at least one of the master sequence read file and the output file is stored as a plain text file (e.g., using an encoding such as ASCII, ISO/IEC 646, EBCDIC, UTF-8, or UTF-16). The computer system provided by the present disclosure may include a text editor program capable of opening a plain text file. A text editor program may refer to a computer program capable of presenting the contents of a text file (such as a plain text file) on a computer screen, allowing a person to edit the text (e.g., using a display, keyboard, and mouse). Examples of text editors include, but are not limited to, Microsoft Word, emacs, pico, vi, BBEdit, and TextWrangler. A text editor program may be capable of displaying a plain text file on a computer screen in a human-readable format, displaying meta-information and sequence reads (e.g., not binary encoding but using alphanumeric characters as they may be used for printing or human writing).

虽然已经参照FASTA或FASTQ文件讨论了方法,但是本公开内容的方法和系统可以用于压缩任何合适的序列文件格式,包括例如Variant Call Format(VCF)格式的文件。典型的VCF文件可以包括标题部分和数据部分。标题包含任何数目的元信息行,每行都以字符‘##’开始,以及以单个‘#’字符开始的TAB分隔字段定义行。字段定义行命名了八个必填列,而主体部分包含填充了这些字段定义行定义的列的数据行。VCF格式由例如Danecek等人(“The variant call format and VCFtools,”Bioinformatics27(15):2156-2158,2011)描述,在此将其通过引用以其整体并入。标题部分可以被视为要写入压缩文件的元信息,并且数据部分可以被视为行,其中每一行只有在为独特的情况下才可以被存储在主文件中。Although methods have been discussed with reference to FASTA or FASTQ files, the methods and systems of the present disclosure can be used to compress any suitable sequence file format, including, for example, files in Variant Call Format (VCF) format. A typical VCF file can include a header portion and a data portion. The header contains any number of meta information rows, each starting with the characters '##', and TAB-delimited field definition rows starting with a single '#' character. The field definition row names eight required columns, and the body portion contains data rows that fill in the columns defined by these field definition rows. The VCF format is described by, for example, Danecek et al. ("The variant call format and VCFtools," Bioinformatics 27(15):2156-2158, 2011), which is incorporated herein by reference in its entirety. The header portion can be viewed as meta information to be written to the compressed file, and the data portion can be viewed as rows, each of which can be stored in the master file only if it is unique.

一些实施方案提供了序列读段的组装。例如,在通过比对的组装中,将序列读段彼此比对或与参考序列比对。通过比对每个读段,继而与参考基因组比对,所有读段被按照关于彼此的关系定位以创建组装体。另外,将序列读段与参考序列比对或映射到参考序列也可以用于鉴定序列读段中的变异序列。鉴定变异序列可以与本文描述的方法和系统组合使用,以进一步帮助疾病或状况的诊断或预后或用于指导治疗决定。Some embodiments provide assembly of sequence reads. For example, in assembly by alignment, sequence reads are aligned to each other or to a reference sequence. By aligning each read, and then to a reference genome, all reads are positioned in relation to each other to create an assembly. In addition, aligning or mapping sequence reads to a reference sequence can also be used to identify variant sequences in sequence reads. Identifying variant sequences can be used in combination with the methods and systems described herein to further aid in the diagnosis or prognosis of a disease or condition or to guide treatment decisions.

在一些实施方案中,任何或全部步骤是自动化的。可选地,本公开内容的方法可以全部或部分地在一个或更多个专用程序中实现,例如每一个任选地以编译语言诸如C++写入,然后以二进制编译和分发。本公开内容的方法可以全部或部分地作为现有序列分析平台内的模块或通过调用现有序列分析平台内的功能而实现。在一些实施方案中,本公开内容的方法包括响应于单个启动队列(例如,源自人类活动、另一个计算机程序或机器的触发事件中的一个事件或事件组合)而全部被自动调用的多个步骤。因此,本公开内容提供了其中任何步骤或步骤的任何组合可以响应于队列而自动发生的方法。“自动地”通常意指不介入人类输入、影响或交互(例如,仅响应于原来的或预先排队的人类活动)。In some embodiments, any or all steps are automated.Alternatively, the method of the present disclosure can be realized in whole or in part in one or more special programs, for example, each is optionally written in a compiled language such as C++, then compiled and distributed in binary.The method of the present disclosure can be realized in whole or in part as a module in an existing sequence analysis platform or by calling the function in an existing sequence analysis platform.In some embodiments, the method of the present disclosure includes a plurality of steps that are all automatically called in response to a single startup queue (for example, an event or combination of events in a triggering event derived from human activity, another computer program or machine).Therefore, the present disclosure provides a method in which any step or any combination of steps can occur automatically in response to a queue. "Automatically" generally means not intervening in human input, influence or interaction (for example, only in response to original or pre-queued human activity).

本公开内容的方法还可以包括多种形式的输出,所述多种形式的输出包括对受试者的核酸样品的准确和灵敏的解释。检索的输出可以以计算机文件的格式提供。在一些实施方案中,输出是FASTA文件、FASTQ文件或VCF文件。输出可以被处理以产生含有序列数据诸如与参考基因组的序列比对的核酸序列的文本文件或XML文件。在其他实施方案中,处理产生包含坐标或描述受试者核酸中相对于参考基因组的一个或更多个突变的字串的输出。比对字串可以包括Simple UnGapped Alignment Report(SUGAR)、Verbose UsefulLabeled Gapped Alignment Report(VALGAR)和Compact Idiosyncratic GappedAlignment Report(CIGAR)(例如,Ning等人,Genome Research 11(10):1725-9,2001描述的,在此将其通过引用以其整体并入)。这些字串可以例如在来自EuropeanBioinformatics Institute(Hinxton,UK)的Exonerate序列比对软件中实现。The method of the present disclosure may also include various forms of output, including accurate and sensitive interpretation of the nucleic acid sample of the subject. The output of the retrieval may be provided in the format of a computer file. In some embodiments, the output is a FASTA file, a FASTQ file or a VCF file. The output may be processed to generate a text file or XML file containing sequence data such as a nucleic acid sequence aligned with a reference genome. In other embodiments, the processing generates an output containing a coordinate or a string describing one or more mutations in the subject's nucleic acid relative to a reference genome. The alignment string may include Simple UnGapped Alignment Report (SUGAR), Verbose Useful Labeled Gapped Alignment Report (VALGAR) and Compact Idiosyncratic Gapped Alignment Report (CIGAR) (e.g., Ning et al., Genome Research 11 (10): 1725-9, 2001 described, which are incorporated herein by reference in their entirety). These words can be implemented, for example, in the Exonerate sequence alignment software from the European Bioinformatics Institute (Hinxton, UK).

在一些实施方案中,产生包含CIGAR字串的序列比对——诸如,例如序列比对图(SAM)或二元比对图(BAM)文件(SAM格式在例如Li等人,“The Sequence Alignment/Mapformat and SAMtools,”Bioinformatics,25(16):2078-9,2009中描述,在此将其通过引用以其整体并入)。在一些实施方案中,CIGAR显示或包括每行一个空位的比对。CIGAR是一种报告为CIGAR字串的压缩的成对比对格式。CIGAR字串可以用于呈现长的(例如,基因组)成对比对。CIGAR字串可以在SAM格式中使用以表示读段与参考基因组序列的比对。In some embodiments, a sequence alignment containing CIGAR strings is generated, such as, for example, a sequence alignment map (SAM) or a binary alignment map (BAM) file (the SAM format is described in, for example, Li et al., "The Sequence Alignment/Map format and SAMtools," Bioinformatics, 25(16): 2078-9, 2009, which is incorporated herein by reference in its entirety). In some embodiments, CIGAR displays or includes alignments with one gap per line. CIGAR is a compressed, pairwise alignment format reported as CIGAR strings. CIGAR strings can be used to present long (e.g., genomic) pairwise alignments. CIGAR strings can be used in the SAM format to represent alignments of reads to a reference genomic sequence.

CIGAR字串可以遵循建立的基序。每个字符前面是数字,给出事件的碱基计数。使用的字符可以包括M、I、D、N和S(M=匹配;I=插入;D=缺失;N=空位;S=取代)。CIGAR串定义匹配和/或不匹配和缺失(或空位)的序列。例如,CIGAR字串2MD3M2D2M可以指示,比对包含2个匹配、1个缺失(为了节省一些空间省略数字1)、3个匹配、2个缺失和2个匹配。CIGAR strings can follow established motifs. Each character is preceded by a number giving the base count of the event. The characters used can include M, I, D, N, and S (M = match; I = insertion; D = deletion; N = gap; S = substitution). CIGAR strings define sequences of matches and/or mismatches and deletions (or gaps). For example, the CIGAR string 2MD3M2D2M can indicate that the alignment contains 2 matches, 1 deletion (the number 1 is omitted to save some space), 3 matches, 2 deletions, and 2 matches.

在一些实施方案中,通过在一端或两端具有单链突出端的双链核酸上酶促形成平末端来制备用于测序的核酸群体。在这些实施方案中,通常用具有5’-3’DNA聚合酶活性和3’-5’核酸外切酶活性的酶在核苷酸(例如,A、C、G和T或U)的存在下处理该群体。可以任选地使用的酶或其催化片段的实例包括Klenow大片段和T4聚合酶。在5’突出端处,酶通常延伸相对链上凹陷的3’端,直到它与5’端齐平以产生平末端。在3’突出端处,酶通常从3’端消化,达到相对链的5’端并且有时超过相对链的5’端。如果该消化行进超过了相对链的5’端,则缺口可以通过具有与对5’突出端使用的相同的聚合酶活性的酶填补。双链核酸上平末端的形成有利于例如衔接子的附接和随后的扩增。In some embodiments, a nucleic acid population for sequencing is prepared by enzymatically forming a flat end on a double-stranded nucleic acid with a single-stranded overhang at one or both ends. In these embodiments, the population is usually treated with an enzyme with 5'-3' DNA polymerase activity and 3'-5' exonuclease activity in the presence of nucleotides (e.g., A, C, G and T or U). The example of an enzyme or its catalytic fragment that can be optionally used includes Klenow large fragment and T4 polymerase. At 5' overhang, the enzyme usually extends the 3' end of the depression on the relative chain until it is flush with the 5' end to produce a flat end. At 3' overhang, the enzyme usually digests from the 3' end, reaches the 5' end of the relative chain and sometimes exceeds the 5' end of the relative chain. If the digestion advances beyond the 5' end of the relative chain, the gap can be filled by an enzyme with the same polymerase activity as used for the 5' overhang. The formation of a flat end on a double-stranded nucleic acid is conducive to the attachment of, for example, an adaptor and subsequent amplification.

在一些实施方案中,核酸群体经历另外的处理,诸如将单链核酸转化为双链核酸和/或将RNA转化为DNA(例如,互补DNA或cDNA)。这些形式的核酸还任选地与衔接子连接并扩增。In some embodiments, the nucleic acid population undergoes additional processing, such as converting single-stranded nucleic acids into double-stranded nucleic acids and/or converting RNA into DNA (e.g., complementary DNA or cDNA). These forms of nucleic acids are also optionally ligated to adapters and amplified.

在具有或没有预先扩增的情况下,经受上文描述的形成平末端的处理的核酸以及任选地样品中的其它核酸,可以被测序以产生测序的核酸。测序的核酸可以指核酸的序列(例如,序列信息)或其序列已被确定的核酸。可以进行测序,以便从样品中个体核酸分子的扩增产物的共有序列直接或间接地提供样品中个体核酸分子的序列数据。The nucleic acid subjected to the above-described flat-end treatment and optionally other nucleic acids in the sample, with or without prior amplification, can be sequenced to produce sequenced nucleic acid. Sequenced nucleic acid can refer to the sequence (e.g., sequence information) of a nucleic acid or a nucleic acid whose sequence has been determined. Sequencing can be performed so as to provide sequence data of individual nucleic acid molecules in the sample directly or indirectly from the consensus sequence of the amplification products of the individual nucleic acid molecules in the sample.

在一些实施方案中,样品中具有单链突出端的双链核酸在平末端形成后,在两端处被与包含条形码的衔接子连接,并且测序确定了核酸序列以及通过衔接子引入的直线连接的(in-line)条形码。平末端DNA分子任选地与至少部分双链的衔接子(例如,Y形或钟形衔接子)的平末端连接。可选地,样品核酸和衔接子的平末端可以用互补核苷酸加尾以促进连接(例如,粘末端连接)。In some embodiments, after the double-stranded nucleic acid with single-stranded overhangs in the sample is formed with a barcode adapter at both ends, and sequencing determines the nucleic acid sequence and the in-line barcode introduced by the adapter. The flat-ended DNA molecule is optionally connected to the flat end of an adapter (e.g., a Y-shaped or bell-shaped adapter) that is at least partially double-stranded. Optionally, the flat ends of the sample nucleic acid and the adapter can be tailed with complementary nucleotides to facilitate connection (e.g., sticky end connection).

通常使核酸样品与足够数目的衔接子接触,使得相同核酸的任何两个拷贝从连接在两端的衔接子接收相同衔接子条形码组合的概率较低(例如,小于约1%或0.1%)。以这种方式使用衔接子可以允许对在参考核酸上具有相同的起点和终点并且被连接至相同条形码组合的核酸序列家族的鉴定。这样的家族可以代表扩增前的样品中的核酸的扩增产物序列。可以对家族成员的序列进行汇编,以获得原始样品中的核酸分子的共有核苷酸或完整的共有序列,所述核酸分子通过平末端形成和衔接子附接被修饰。换言之,占据样品中核酸的特定位置的核苷酸可以被确定为占据家族成员序列中对应位置的核苷酸的共有核苷酸。家族可以包括双链核酸的一条或两条链的序列。如果家族的成员包括来自双链核酸的两条链的序列,则为了对序列汇编以获得共有核苷酸或序列的目的,一条链的序列可以被转化为它们的互补序列。一些家族仅包括单个成员序列。在这种情况下,该序列可以作为扩增前样品中核酸的序列被获取。可选地,仅具有单个成员序列的家族可以从随后的分析中消除。The nucleic acid sample is typically contacted with a sufficient number of adapters so that the probability that any two copies of the same nucleic acid receive the same adapter barcode combination from the adapters connected at both ends is low (e.g., less than about 1% or 0.1%). Using adapters in this manner can allow identification of a family of nucleic acid sequences that have the same start and end points on a reference nucleic acid and are connected to the same barcode combination. Such a family can represent the sequence of the amplified product of the nucleic acid in the sample before amplification. The sequences of the family members can be compiled to obtain the common nucleotides or complete common sequences of the nucleic acid molecules in the original sample, which are modified by blunt end formation and adapter attachment. In other words, the nucleotides occupying a specific position of the nucleic acid in the sample can be determined as the common nucleotides of the nucleotides occupying the corresponding position in the family member sequence. The family can include the sequence of one or two chains of a double-stranded nucleic acid. If the members of the family include sequences from two chains of a double-stranded nucleic acid, the sequence of one chain can be converted into their complementary sequences for the purpose of assembling the sequence to obtain the common nucleotides or sequences. Some families include only a single member sequence. In this case, the sequence can be obtained as the sequence of the nucleic acid in the sample before amplification. Alternatively, families with only a single member sequence can be eliminated from subsequent analysis.

通过将测序的核酸与参考序列进行比较,可以确定测序的核酸中的核苷酸变异(例如,SNV或插入/缺失)。参考序列通常是已知序列,例如,来自受试者的已知全基因组或部分基因组序列(例如,人类受试者的全基因组序列)。参考序列可以是外部参考序列,例如hG19或hG38。如上文描述的,经测序的核酸可以代表直接确定的样品中的核酸的序列或这样的核酸的扩增产物的共有序列。可以在参考序列上的一个或更多个指定位置处进行比较。当相应的序列被最大程度地对齐时,可以鉴定经测序的核酸的亚组,该亚组包括与参考序列的指定位置对应的位置。在这样的亚组中,可以确定哪些(如果有的话)测序的核酸在指定位置处包含核苷酸变异,以及任选地哪些(如果有的话)包含参考核苷酸(例如,与参考序列中的相同)。如果亚组中包括核苷酸变体的测序核酸的数目超过选定的阈值,则可以在指定位置调用变体核苷酸。阈值可以是数字,诸如包含核苷酸变异的亚组中的至少1个、2个、3个、4个、5个、6个、7个、8个、9个或10个测序的核酸,或者阈值可以是包含核苷酸变异的亚组中的测序的核酸的比,诸如至少约0.5、1、2、3、4、5、10、15或20以及其他可能性。可以对参考序列中任何指定的感兴趣位置进行重复比较。有时可以对占据参考序列上至少约20个、100个、200个或300个连续位置例如,约20-500个或约50-300个连续位置处的指定位置进行比较。By comparing the sequenced nucleic acid with the reference sequence, the nucleotide variation (e.g., SNV or insertion/deletion) in the sequenced nucleic acid can be determined. The reference sequence is generally a known sequence, for example, a known full genome or partial genome sequence from a subject (e.g., a full genome sequence of a human subject). The reference sequence can be an external reference sequence, such as hG19 or hG38. As described above, the sequenced nucleic acid can represent the sequence of the nucleic acid in the sample directly determined or the consensus sequence of the amplification product of such nucleic acid. Comparison can be performed at one or more designated positions on the reference sequence. When the corresponding sequence is aligned to the greatest extent, a subgroup of sequenced nucleic acid can be identified, which includes a position corresponding to the designated position of the reference sequence. In such a subgroup, it can be determined which (if any) sequenced nucleic acids contain nucleotide variations at designated positions, and optionally which (if any) contain reference nucleotides (e.g., the same as in the reference sequence). If the number of sequenced nucleic acids including nucleotide variants in the subgroup exceeds a selected threshold, variant nucleotides can be called at designated positions. The threshold value can be a number, such as at least 1, 2, 3, 4, 5, 6, 7, 8, 9 or 10 sequenced nucleic acids in the subset comprising nucleotide variations, or the threshold value can be a ratio of sequenced nucleic acids in the subset comprising nucleotide variations, such as at least about 0.5, 1, 2, 3, 4, 5, 10, 15 or 20 and other possibilities. Any specified position of interest in the reference sequence can be compared repeatedly. Sometimes, a specified position occupying at least about 20, 100, 200 or 300 consecutive positions on the reference sequence, for example, about 20-500 or about 50-300 consecutive positions can be compared.

关于核酸测序的另外的细节,包括本文描述的格式和应用,还提供在以下文献中:例如,Levy等人,Annual Review of Genomics and Human Genetics,17:95-115(2016);Liu等人,J.of Biomedicine and Biotechnology,Volume 2012,Article ID 251364:1-11(2012);Voelkerding等人,Clinical Chem.,55:641-658(2009);MacLean等人,NatureRev.Microbiol.,7:287-296(2009),Astier等人,J Am Chem Soc.,128(5):1705-10(2006);美国专利第6,210,891号、美国专利第6,258,568号、美国专利第6,833,246号、美国专利第7,115,400号、美国专利第6,969,488号、美国专利第5,912,148号、美国专利第6,130,073号、美国专利第7,169,560号、美国专利第7,282,337号、美国专利第7,482,120号、美国专利第7,501,245号、美国专利第6,818,395号、美国专利第6,911,345号、美国专利第7,501,245号、美国专利第7,329,492号、美国专利第7,170,050号、美国专利第7,302,146号、美国专利第7,313,308号和美国专利第7,476,503号,在此将其中每一项通过引用以其整体并入。Additional details about nucleic acid sequencing, including the formats and applications described herein, are also provided in the following references: for example, Levy et al., Annual Review of Genomics and Human Genetics, 17:95-115 (2016); Liu et al., J. of Biomedicine and Biotechnology, Volume 2012, Article ID 251364:1-11 (2012); Voelkerding et al., Clinical Chem., 55:641-658 (2009); MacLean et al., Nature Rev. Microbiol., 7:287-296 (2009), Astier et al., J Am Chem. Soc., 128(5):1705-10(2006); U.S. Patent No. 6,210,891, U.S. Patent No. 6,258,568, U.S. Patent No. 6,833,246, U.S. Patent No. 7,115,400, U.S. Patent No. 6,969,488, U.S. Patent No. 5,912,148, U.S. Patent No. 6,130,073, U.S. Patent No. 7,169,560, U.S. Patent No. 7,282,337, U.S. Patent No. No. 7,482,120, U.S. Pat. No. 7,501,245, U.S. Pat. No. 6,818,395, U.S. Pat. No. 6,911,345, U.S. Pat. No. 7,501,245, U.S. Pat. No. 7,329,492, U.S. Pat. No. 7,170,050, U.S. Pat. No. 7,302,146, U.S. Pat. No. 7,313,308, and U.S. Pat. No. 7,476,503, each of which is hereby incorporated by reference in its entirety.

I.示例性工作流程I. Exemplary Workflow

本文提供了示例性工作流程。在一些实施方案中,分区和文库制备工作流程的一些或所有特征可以彼此组合使用,也可以与本文描述的方法的其他特征组合使用。Exemplary workflows are provided herein. In some embodiments, some or all features of the partitioning and library preparation workflows can be used in combination with each other or with other features of the methods described herein.

a.分区a. Partition

在一些实施方案中,将样品核酸分子,诸如DNA(例如,在5ng和200ng之间)与甲基结合结构域(MBD)缓冲液和与MBD蛋白缀合的磁珠混合,并孵育过夜。在该孵育期间甲基化DNA(高甲基化DNA)结合磁珠上的MBD蛋白。用含递增盐浓度的缓冲液将未甲基化(低甲基化DNA)或甲基化较少(中等甲基化)的DNA从珠上洗下。例如,可以从这样的洗涤中获得含有未甲基化DNA、低甲基化DNA和/或中等甲基化DNA的一个、两个或更多个级分。最后,使用高盐缓冲液将重度甲基化的DNA(高甲基化DNA)从MBD蛋白洗脱。在一些实施方案中,这些洗涤产生具有递增的甲基化水平的DNA的三个分区(低甲基化分区、中等甲基化级分和高甲基化分区)。In some embodiments, sample nucleic acid molecules, such as DNA (e.g., between 5ng and 200ng), are mixed with a methyl binding domain (MBD) buffer and magnetic beads conjugated to the MBD protein, and incubated overnight. During this incubation, methylated DNA (highly methylated DNA) binds to the MBD protein on the magnetic beads. Unmethylated (lowly methylated DNA) or less methylated (medium methylated) DNA is washed off the beads with a buffer containing increasing salt concentrations. For example, one, two or more fractions containing unmethylated DNA, lowly methylated DNA and/or medium methylated DNA can be obtained from such washings. Finally, the heavily methylated DNA (highly methylated DNA) is eluted from the MBD protein using a high salt buffer. In some embodiments, these washes produce three partitions (lowly methylated partitions, medium methylated fractions, and high methylated partitions) of DNA with increasing methylation levels.

在一些实施方案中,将DNA的三个分区脱盐并浓缩以准备用于文库制备的酶促步骤。In some embodiments, three partitions of DNA are desalted and concentrated in preparation for the enzymatic steps of library preparation.

b.文库制备b. Library Preparation

在一些实施方案中(例如,在将分区中的DNA浓缩之后),使分区的DNA可连接,例如,通过延伸DNA分子的末端突出端,并将腺苷残基添加到片段的3’末端,并使每个DNA片段的5’末端磷酸化。添加DNA连接酶和衔接子以在每个分区的DNA分子的每个末端上连接衔接子。这些衔接子含有与用于其他分区的衔接子中的分区标签可区分的分区标签(例如,非随机、非独特条形码)。在使分区的DNA可连接并进行连接之前或之后,用MSRE(例如优先裂解未甲基化DNA的MSRE,诸如HpaII、BstUI和Hin6I中的一种或更多种或每一种)消化至少一个分区(例如,高甲基化分区,或者如果适用的话,高甲基化分区和中等甲基化分区)。任选地,低甲基化分区可以用优先裂解甲基化DNA的MSRE诸如FspEI消化。任选地,高甲基化分区可以经历不同地影响DNA中的第一核碱基和DNA中的第二核碱基的程序,诸如本文描述的程序中的任一种。在不同地影响DNA中的第一核碱基和DNA中的第二核碱基的程序将高甲基化分区进一步分区的情况下,应在该程序之后进行衔接子的连接,以便可以对高甲基化分区的子分区差异性加标签。然后,将三个(或更多)分区汇集在一起并进行扩增(例如,通过PCR,诸如使用对衔接子特异性的引物)。In some embodiments (e.g., after the DNA in the partition is concentrated), the DNA of the partition is made ligatable, for example, by extending the terminal overhangs of the DNA molecules, and adding adenosine residues to the 3' ends of the fragments, and phosphorylating the 5' ends of each DNA fragment. DNA ligase and adapters are added to connect adapters at each end of the DNA molecules of each partition. These adapters contain partition tags (e.g., non-random, non-unique barcodes) that are distinguishable from the partition tags in the adapters for other partitions. Before or after making the DNA of the partition ligatable and connecting, digest at least one partition (e.g., a high methylation partition, or, if applicable, a high methylation partition and a medium methylation partition) with an MSRE (e.g., an MSRE that preferentially cleaves unmethylated DNA, such as one or more or each of HpaII, BstUI, and Hin6I). Optionally, the low methylation partition can be digested with an MSRE such as FspEI that preferentially cleaves methylated DNA. Optionally, the high methylation partition can undergo a program that differently affects the first nucleobase in the DNA and the second nucleobase in the DNA, such as any of the programs described herein. In the case where a procedure that differentially affects a first nucleobase in DNA and a second nucleobase in DNA further partitions a hypermethylated partition, the procedure should be followed by ligation of adapters so that sub-partitions of the hypermethylated partition can be differentially tagged. The three (or more) partitions are then pooled together and amplified (e.g., by PCR, such as using primers specific for the adapters).

PCR之后,扩增的DNA可以进行清洗并在富集之前浓缩。使扩增的DNA与本文描述的靶向感兴趣的特定区域的探针集合(其可以是,例如生物素化的RNA探针)接触。孵育混合物,例如过夜,例如在盐缓冲液中。捕获探针(例如,使用链霉亲和素磁珠)并将其与未捕获的扩增DNA分离,诸如通过一系列盐洗涤,从而富集样品。富集之后,通过PCR扩增富集的样品。在一些实施方案中,PCR引物含有样品标签,从而将样品标签掺入到DNA分子中。在一些实施方案中,将来自不同样品的DNA汇集在一起,并且然后进行多重测序,例如使用Illumina NovaSeq测序仪。After PCR, the amplified DNA can be cleaned and concentrated before enrichment. The amplified DNA is contacted with a probe set (which can be, for example, a biotinylated RNA probe) targeting a specific region of interest as described herein. The mixture is incubated, for example, overnight, for example, in a salt buffer. The probe is captured (for example, using streptavidin magnetic beads) and separated from the uncaptured amplified DNA, such as by a series of salt washes, thereby enriching the sample. After enrichment, the enriched sample is amplified by PCR. In some embodiments, the PCR primer contains a sample tag, thereby incorporating the sample tag into the DNA molecule. In some embodiments, DNA from different samples is pooled together, and then multiple sequencing is performed, for example, using an Illumina NovaSeq sequencer.

J.包含捕获的核酸分子的组合物J. Compositions Comprising Captured Nucleic Acid Molecules

本文提供了包含第一和第二DNA群体的组合,其中第二群体包含在至少一种MSRE的识别位点处具有末端、或附接的标签或衔接子的DNA片段,MSRE可以是本文描述的MSRE中的任一种或任何组合。在一些实施方案中,对第一和第二群体差异性加标签。第一群体可以包含或源自比第二群体具有更大比例的胞嘧啶修饰的DNA。第一群体可以包含具有改变的碱基配对特异性的最初存在于DNA中的第一核碱基的形式和没有改变的碱基配对特异性的第二核碱基,其中在碱基配对特异性改变之前最初存在于DNA中的第一核碱基的形式是修饰或未修饰的核碱基,第二核碱基是不同于第一核碱基的修饰或未修饰的核碱基,并且在碱基配对特异性改变之前最初存在于DNA中的第一核碱基的形式和第二核碱基具有相同的碱基配对特异性。在一些实施方案中,胞嘧啶修饰是胞嘧啶甲基化。在一些实施方案中,第一核碱基是修饰或未修饰的胞嘧啶,并且第二核碱基是修饰或未修饰的胞嘧啶。第一核碱基和第二核碱基可以是本文或者关于使第一分区经历不同地影响第一分区的DNA中的第一核碱基和DNA中的第二核碱基的程序所讨论的任何核碱基。在一些实施方案中,第一群体包含在至少一种MSRE的识别位点处具有末端、或附接的标签或衔接子的DNA片段,MSRE可以是本文描述的MSRE中的任一种或任何组合。Provided herein is a combination comprising a first and a second DNA population, wherein the second population is included in a DNA fragment having an end or attached label or adapter at the recognition site of at least one MSRE, and the MSRE can be any one or any combination of the MSRE described herein. In some embodiments, the first and second population differences are labeled. The first population can include or be derived from a DNA modified with a larger proportion of cytosine than the second population. The first population can include a form of the first nuclear base originally present in DNA with a base pairing specificity that has been changed and a second nuclear base of the base pairing specificity that has not been changed, wherein the form of the first nuclear base originally present in DNA before the base pairing specificity changes is a modified or unmodified nuclear base, the second nuclear base is a modified or unmodified nuclear base different from the first nuclear base, and the form of the first nuclear base originally present in DNA before the base pairing specificity changes and the second nuclear base have the same base pairing specificity. In some embodiments, cytosine modification is cytosine methylation. In some embodiments, the first nuclear base is a modified or unmodified cytosine, and the second nuclear base is a modified or unmodified cytosine. The first core base and the second core base can be any core base discussed herein or with respect to subjecting the first partition to a procedure that differently affects the first core base in the DNA of the first partition and the second core base in the DNA. In some embodiments, the first population comprises DNA fragments having ends, or attached tags or adapters at the recognition site of at least one MSRE, which can be any one or any combination of the MSREs described herein.

在一些实施方案中,第一群体包含选自第一组一个或更多个序列标签的序列标签,并且第二群体包含选自第二组一个或更多个序列标签的序列标签,并且第二组序列标签不同于第一组序列标签。序列标签可以包括条形码。In some embodiments, the first population comprises sequence tags selected from a first set of one or more sequence tags, and the second population comprises sequence tags selected from a second set of one or more sequence tags, and the second set of sequence tags is different from the first set of sequence tags.The sequence tags may include barcodes.

在一些实施方案中,第一群体包含受保护的hmC,诸如葡糖基化的hmC。In some embodiments, the first population comprises protected hmC, such as glucosylated hmC.

在一些实施方案中,第一群体经历本文讨论的任何转化程序,诸如亚硫酸氢盐转化、Ox-BS转化、TAB转化、ACE转化、TAP转化、TAPSβ转化或CAP转化。在一些实施方案中,第一群体经历对hmC的保护,随后经历mC和/或C的脱氨基。In some embodiments, the first population undergoes any conversion procedure discussed herein, such as bisulfite conversion, Ox-BS conversion, TAB conversion, ACE conversion, TAP conversion, TAPSβ conversion, or CAP conversion. In some embodiments, the first population undergoes protection of hmC and then undergoes deamination of mC and/or C.

在该组合的一些实施方案中,第一群体包含或源自比第二群体具有更大比例的胞嘧啶修饰的DNA,并且第一群体包括第一亚群和第二亚群,并且第一核碱基是修饰或未修饰的核碱基,第二核碱基是不同于第一核碱基的修饰或未修饰的核碱基,并且第一核碱基和第二核碱基具有相同的碱基配对特异性。在一些实施方案中,第二群体不包含第一核碱基。在一些实施方案中,第一核碱基是修饰或未修饰的胞嘧啶,并且第二核碱基是修饰或未修饰的胞嘧啶,任选地其中修饰的胞嘧啶是mC或hmC。在一些实施方案中,第一核碱基是修饰或未修饰的腺嘌呤,并且第二核碱基是修饰或未修饰的腺嘌呤,任选地其中修饰的腺嘌呤是mA。In some embodiments of the combination, the first population comprises or is derived from a DNA modified with a greater proportion of cytosine than the second population, and the first population includes a first subgroup and a second subgroup, and the first nucleobase is a modified or unmodified nucleobase, the second nucleobase is a modified or unmodified nucleobase different from the first nucleobase, and the first nucleobase and the second nucleobase have the same base pairing specificity. In some embodiments, the second population does not comprise the first nucleobase. In some embodiments, the first nucleobase is a modified or unmodified cytosine, and the second nucleobase is a modified or unmodified cytosine, optionally wherein the modified cytosine is mC or hmC. In some embodiments, the first nucleobase is a modified or unmodified adenine, and the second nucleobase is a modified or unmodified adenine, optionally wherein the modified adenine is mA.

在一些实施方案中,第一核碱基(例如,修饰的胞嘧啶)被生物素化。在一些实施方案中,第一核碱基(例如,修饰的胞嘧啶)是对β-6-叠氮基-葡糖基-5-羟甲基胞嘧啶进行Huisgen环加成的产物,该产物包含亲和标记(例如,生物素)。In some embodiments, the first nucleobase (e.g., modified cytosine) is biotinylated. In some embodiments, the first nucleobase (e.g., modified cytosine) is the product of Huisgen cycloaddition of β-6-azido-glucosyl-5-hydroxymethylcytosine, which comprises an affinity tag (e.g., biotin).

在本文描述的任何组合中,捕获的DNA可以包括cfDNA。In any of the combinations described herein, the captured DNA may include cfDNA.

捕获的DNA可以具有本文描述的关于捕获组的任何特征,包括例如,对应于序列可变靶区组的DNA的浓度比对应于表观遗传靶区组的DNA的浓度更大(如以上讨论的针对足迹尺寸进行归一化)。在一些实施方案中,捕获组的DNA包含序列标签,所述序列标签可以如本文描述的添加到DNA。通常,序列标签的包含导致DNA分子不同于它们天然存在的、未加标签的形式。该组合还可以包含本文描述的探针组或测序引物,其中每一个都可不同于天然存在的核酸分子。例如,本文描述的探针组可以包含捕获部分,并且测序引物可以包含非天然存在的标记。The captured DNA can have any of the features described herein about the capture group, including, for example, a greater concentration of DNA corresponding to a sequence variable target group than a concentration of DNA corresponding to an epigenetic target group (normalized for footprint size as discussed above). In some embodiments, the DNA of the capture group comprises a sequence tag, which can be added to the DNA as described herein. Typically, the inclusion of sequence tags causes DNA molecules to be different from their naturally occurring, untagged forms. The combination can also include probe groups or sequencing primers described herein, each of which can be different from naturally occurring nucleic acid molecules. For example, the probe groups described herein can include a capture portion, and a sequencing primer can include a non-naturally occurring label.

III.计算机系统III. Computer System

本公开内容的方法可以使用或借助计算机系统来实现。例如,这样的方法可以包括:(a)提供核酸分子的生物样品,其中核酸分子包括甲基化的核酸分子和未甲基化的核酸分子;(b)基于核酸分子的甲基化状态将生物样品中的核酸分子的至少一个亚组分区为多于一个分区组;(c)用至少一种甲基化敏感性限制性内切酶消化多于一个分区组中的一个或更多个分区组的至少一个亚组;(d)针对感兴趣的基因组区域对多于一个分区组中的核酸分子的至少一个亚组进行富集,其中核酸分子的至少一个亚组包含一个或更多个分区组中的消化的核酸分子;和(e)确定分区组中的至少一个中的核酸分子的一个或更多个遗传基因座处的甲基化状态,其继而用于确定受试者中癌症的存在或不存在。The methods of the present disclosure can be implemented using or with the aid of a computer system. For example, such methods may include: (a) providing a biological sample of nucleic acid molecules, wherein the nucleic acid molecules include methylated nucleic acid molecules and unmethylated nucleic acid molecules; (b) partitioning at least one subset of nucleic acid molecules in the biological sample into more than one partition group based on the methylation state of the nucleic acid molecules; (c) digesting at least one subset of one or more of the partition groups with at least one methylation-sensitive restriction endonuclease; (d) enriching at least one subset of nucleic acid molecules in more than one partition group for a genomic region of interest, wherein at least one subset of nucleic acid molecules comprises digested nucleic acid molecules in one or more partition groups; and (e) determining the methylation state of one or more genetic loci of nucleic acid molecules in at least one of the partition groups, which is then used to determine the presence or absence of cancer in the subject.

图5示出了被编程或以其他方式配置成实现本公开内容的方法的计算机系统501。计算机系统501可以控制样品制备、测序和/或分析的各方面。在一些实例中,计算机系统501被配置为进行样品制备和样品分析,包括核酸测序。Fig. 5 shows a computer system 501 programmed or otherwise configured to implement the method of the present disclosure. Computer system 501 can control various aspects of sample preparation, sequencing and/or analysis. In some instances, computer system 501 is configured to perform sample preparation and sample analysis, including nucleic acid sequencing.

在一些实施方案中,该方法还包括从测序中获得由核酸测序仪生成的多于一个序列读段;将多于一个序列读段映射到一个或更多个参考序列以生成映射的序列读段;以及处理映射的序列读段以确定受试者患有癌症的可能性。In some embodiments, the method further includes obtaining more than one sequence read generated by a nucleic acid sequencer from sequencing; mapping the more than one sequence read to one or more reference sequences to generate mapped sequence reads; and processing the mapped sequence reads to determine the likelihood that the subject has cancer.

计算机系统501包括中央处理单元(CPU,本文中也称为“处理器”和“计算机处理器”)505,其可以是单核或多核处理器或用于并行处理的多于一个处理器。计算机系统501还包括存储器或存储器位置510(例如,随机存取存储器、只读存储器、闪速存储器)、电子存储单元515(例如,硬盘)、用于与一个或更多个其他系统进行通信的通信接口520(例如,网络适配器)和外围设备525,诸如高速缓冲存储器(cache)、其他存储器、数据存储和/或电子显示适配器。存储器510、储存单元515、接口520和外围设备525与CPU 505通过通信网络或总线(实线路)诸如主板(motherboard)通信。存储单元515可以是用于存储数据的数据存储单元(或数据储存库)。计算机系统501可以借助于通信接口520可操作地耦合至计算机网络530。计算机网络530可以是因特网(Internet)、内联网和/或外联网、或与因特网通信的内联网和/或外联网。在一些情况下,计算机网络530为电信和/或数据网络。计算机网络530可以包括一个或更多个计算机服务器,这可以启动分布式计算,诸如云计算。在一些情况下,借助于计算机系统501,计算机网络530可以实现对等网络(peer-to-peer network),其可以启动耦合至计算机系统501的设备作为客户端或服务器运行。The computer system 501 includes a central processing unit (CPU, also referred to herein as a "processor" and "computer processor") 505, which may be a single-core or multi-core processor or more than one processor for parallel processing. The computer system 501 also includes a memory or memory location 510 (e.g., random access memory, read-only memory, flash memory), an electronic storage unit 515 (e.g., a hard disk), a communication interface 520 (e.g., a network adapter) for communicating with one or more other systems, and peripherals 525, such as cache, other memory, data storage, and/or an electronic display adapter. The memory 510, storage unit 515, interface 520, and peripherals 525 communicate with the CPU 505 via a communication network or bus (real line) such as a motherboard. The storage unit 515 may be a data storage unit (or data repository) for storing data. The computer system 501 may be operably coupled to a computer network 530 by means of the communication interface 520. Computer network 530 can be the Internet, an intranet and/or an extranet, or an intranet and/or an extranet that communicates with the Internet. In some cases, computer network 530 is a telecommunications and/or data network. Computer network 530 can include one or more computer servers, which can enable distributed computing, such as cloud computing. In some cases, with the aid of computer system 501, computer network 530 can implement a peer-to-peer network that can enable devices coupled to computer system 501 to operate as clients or servers.

CPU 505可以执行一系列的机器可读指令,该机器可读指令可以以程序或软件来体现。指令可以被存储于存储器位置,诸如存储器510中。由CPU 405进行的操作的实例可以包括读取、解码、执行和写回。The CPU 505 may execute a series of machine-readable instructions, which may be embodied in a program or software. The instructions may be stored in a memory location, such as the memory 510. Examples of operations performed by the CPU 405 may include reading, decoding, executing, and writing back.

存储单元515可以存储文件,诸如驱动程序、库和保存的程序。存储单元515可以存储用户生成的程序和记录的会话以及与程序相关的输出。存储单元515可以存储用户数据,例如,用户偏好和用户程序。在一些情况下,计算机系统501可以包括一个或更多个另外的数据存储单元,该另外的数据存储单元在计算机系统501的外部,诸如位于通过内联网或因特网与计算机系统501通信的远程服务器上。可以使用例如通信网络或物理数据传输器(例如,使用硬盘驱动器、拇指驱动器或其他数据存储机制)将数据从一个位置传输到另一个位置。Storage unit 515 can store files, such as drivers, libraries, and saved programs. Storage unit 515 can store user-generated programs and recorded sessions and outputs related to programs. Storage unit 515 can store user data, such as user preferences and user programs. In some cases, computer system 501 may include one or more additional data storage units, which are external to computer system 501, such as located on a remote server that communicates with computer system 501 via an intranet or the Internet. Data can be transferred from one location to another using, for example, a communication network or a physical data transfer (e.g., using a hard drive, a thumb drive, or other data storage mechanisms).

计算机系统501可以与一个或更多个远程计算机系统通过网络530进行通信。对于实施方案,计算机系统501可以与用户(例如,操作者)的远程计算机系统进行通信。远程计算机系统的实例包括个人计算机(例如,便携式PC)、板式(slate)或平板PC(例如,iPad、Galaxy Tab)、电话、智能电话(例如,iPhone、Android支持的设备、)或个人数字助手。用户可以通过网络530访问计算机系统501。Computer system 501 can communicate with one or more remote computer systems via network 530. For embodiments, computer system 501 can communicate with a remote computer system of a user (e.g., an operator). Examples of remote computer systems include personal computers (e.g., portable PCs), slate or tablet PCs (e.g., iPad, Galaxy Tab), phones, smartphones (e.g. iPhone, Android supported devices, ) or a personal digital assistant. A user can access the computer system 501 through a network 530.

如本文描述的方法可以通过机器(例如,计算机处理器)可执行代码的方式实现,该机器可进行代码被存储在计算机系统501的电子存储位置,诸如,例如存储器510或电子存储单元515上。机器可执行代码或机器可读代码可以以软件的形式提供。在使用期间,代码可以由处理器505执行。在一些情况下,代码可以从存储单元515检索并存储在存储器510上,以便于处理器505即时访问。在一些情况下,可以排除电子存储单元515,而将机器可执行指令存储于存储器510上。The methods as described herein may be implemented by means of machine (e.g., computer processor) executable code stored in an electronic storage location of the computer system 501, such as, for example, the memory 510 or the electronic storage unit 515. The machine executable code or machine readable code may be provided in the form of software. During use, the code may be executed by the processor 505. In some cases, the code may be retrieved from the storage unit 515 and stored on the memory 510 for immediate access by the processor 505. In some cases, the electronic storage unit 515 may be excluded and the machine executable instructions may be stored on the memory 510.

在一方面,本公开内容提供了包括计算机可执行指令的非暂时性计算机可读介质,计算机可执行指令在由至少一个电子处理器执行时,执行包含以下的方法的至少一部分:(a)提供核酸分子的生物样品,其中核酸分子包括甲基化的核酸分子和未甲基化的核酸分子;(b)基于核酸分子的甲基化状态将生物样品中的核酸分子的至少一个亚组分区为多于一个分区组;(c)用至少一种甲基化敏感性限制性内切酶消化多于一个分区组中的一个或更多个分区组的至少一个亚组;(d)针对感兴趣的基因组区域对多于一个分区组中的核酸分子的至少一个亚组进行富集,其中核酸分子的至少一个亚组包含一个或更多个分区组中的消化的核酸分子;和(e)确定分区组中的至少一个中的核酸分子的一个或更多个遗传基因座处的甲基化状态,其继而用于检测受试者中癌症的存在或不存在。In one aspect, the present disclosure provides a non-transitory computer-readable medium comprising computer-executable instructions that, when executed by at least one electronic processor, perform at least a portion of a method comprising: (a) providing a biological sample of nucleic acid molecules, wherein the nucleic acid molecules include methylated nucleic acid molecules and unmethylated nucleic acid molecules; (b) partitioning at least a subset of the nucleic acid molecules in the biological sample into more than one partitioning group based on the methylation status of the nucleic acid molecules; (c) digesting at least one subset of one or more of the more than one partitioning groups with at least one methylation-sensitive restriction endonuclease; (d) enriching at least one subset of the nucleic acid molecules in the more than one partitioning group for a genomic region of interest, wherein at least one subset of the nucleic acid molecules comprises digested nucleic acid molecules in one or more partitioning groups; and (e) determining the methylation status at one or more genetic loci of the nucleic acid molecules in at least one of the partitioning groups, which is then used to detect the presence or absence of cancer in a subject.

代码可以被预编译并配置成用于与具有适于执行该代码的处理器的机器一起使用或可以在运行时间期间被编译。代码可以以编程语言的形式提供,该编程语言可以被选择使得代码能够以预编译的或按原来编译(as-compiled)的方式被执行。The code may be precompiled and configured for use with a machine having a processor suitable for executing the code or may be compiled during runtime. The code may be provided in a programming language that may be selected so that the code can be executed in a precompiled or as-compiled manner.

本文提供的系统和方法的方面,诸如计算机系统501,可以以编程来体现。技术的多个方面可以被认为是“产品”或“制品”,所述产品”或“制品”通常呈在某种类型的机器可读介质上携带或在所述机器可读介质中体现的机器(或处理器)可执行代码和/或相关的数据的形式。机器可执行代码可以被存储于电子存储单元诸如存储器(例如,只读存储器、随机存取存储器、闪速存储器)或硬盘上。“存储”型介质可以包括计算机、处理器等任何或所有有形存储器,或其相关的模块,诸如各种半导体存储器、磁带驱动器、磁盘驱动器等,其可以在任何时间为软件编程提供非暂时性存储。Aspects of the systems and methods provided herein, such as computer system 501, can be embodied in programming. Various aspects of the technology can be considered to be "products" or "articles of manufacture," which are generally in the form of machine (or processor) executable code and/or related data carried on or embodied in some type of machine-readable medium. The machine executable code can be stored in an electronic storage unit such as a memory (e.g., read-only memory, random access memory, flash memory) or a hard disk. "Storage" type media can include any or all tangible memories of a computer, processor, etc., or their related modules, such as various semiconductor memories, tape drives, disk drives, etc., which can provide non-temporary storage for software programming at any time.

软件的所有或部分有时可以通过互联网或各种其他电信网络通信。例如,这样的通信可以使软件能够从一个计算机或处理器加载到另一个计算机或处理器,例如,从管理服务器或主机加载到应用服务器的计算机平台。因此,可以携带软件元件的另一类型的介质包括诸如那些在本地设备之间跨物理界面、通过有线和光纤陆线网络以及在各种空中链路(air-link)上使用的光波、电波和电磁波。携带此类波的物理元件,诸如有线或无线链路、光链路等,也可被认为是携带软件的介质。如本文使用的,除非被限制为非暂时性的、有形的“存储”介质,否则术语诸如计算机或机器“可读介质”是指参与将指令提供至处理器以便执行的任何介质。All or part of the software can sometimes communicate via the Internet or various other telecommunication networks. For example, such communication can enable software to be loaded from one computer or processor to another, for example, from a management server or mainframe to a computer platform of an application server. Therefore, another type of medium that can carry software elements includes optical waves, electric waves, and electromagnetic waves such as those used across physical interfaces between local devices, through wired and optical fiber landline networks, and on various air links (air-link). Physical elements carrying such waves, such as wired or wireless links, optical links, etc., can also be considered as media carrying software. As used herein, unless limited to non-temporary, tangible "storage" media, terms such as computer or machine "readable media" refer to any medium that participates in providing instructions to a processor for execution.

因此,机器可读介质,诸如计算机可执行代码,可以采取许多形式,包括但不限于有形存储介质、载波介质或物理传输介质。非易失性存储介质包括,例如光盘或磁盘,诸如任何一个或多于一个计算机等中的任何存储装置,诸如如附图中显示出的可以用于实现数据库等的存储装置。易失性存储介质包括动态存储器,诸如这样的计算机平台的主存储器。有形的传输介质包括同轴电缆;铜线和光纤,包括构成计算机系统内的总线的导线。载波传输介质可以采取电信号或电磁信号或者声波或光波的形式,诸如在射频(RF)和红外(IR)数据通信期间生成的那些。因此,计算机可读介质的常见形式包括例如:软盘(floppy disk)、软磁盘(flexible disk)、硬盘、磁带、任何其他磁介质、CD-ROM、DVD或DVD-ROM、任何其他光学介质、穿孔卡片、纸带、具有孔模式的任何其他物理存储介质、RAM、ROM、PROM和EPROM、FLASH-EPROM、任何其他存储器芯片或盒、传输数据或指令的载波、传输此类载波的缆线或链路,或者计算机可以从其读取编程代码和/或数据的任何其他介质。计算机可读介质的这些形式中的许多形式可以参与向处理器传送一个或更多个指令的一个或更多个序列以用于执行。Thus, machine-readable media, such as computer executable code, may take many forms, including but not limited to tangible storage media, carrier media, or physical transmission media. Non-volatile storage media include, for example, optical or magnetic disks, such as any storage device in any one or more computers, etc., such as the storage device shown in the accompanying drawings that can be used to implement a database, etc. Volatile storage media include dynamic memory, such as the main memory of such a computer platform. Tangible transmission media include coaxial cables; copper wire and optical fiber, including the wires that make up the bus within a computer system. Carrier transmission media may take the form of electrical or electromagnetic signals, or acoustic or light waves, such as those generated during radio frequency (RF) and infrared (IR) data communications. Thus, common forms of computer-readable media include, for example: a floppy disk, a flexible disk, a hard disk, magnetic tape, any other magnetic medium, a CD-ROM, a DVD or DVD-ROM, any other optical medium, punch cards, paper tape, any other physical storage medium with a pattern of holes, a RAM, a ROM, a PROM and an EPROM, a FLASH-EPROM, any other memory chip or cartridge, a carrier wave that transports data or instructions, a cable or link that transports such a carrier wave, or any other medium from which a computer can read programming code and/or data. Many of these forms of computer-readable media may participate in transmitting one or more sequences of one or more instructions to a processor for execution.

计算机系统501可以包括电子显示器535或与之通信,该电子显示器535包括用户界面(UI)540,以便提供例如样品分析的一个或更多个结果。UI的实例包括但不限于图形用户界面(GUI)和基于网络的用户界面。Computer system 501 may include or communicate with an electronic display 535 that includes a user interface (UI) 540 to provide, for example, one or more results of a sample analysis. Examples of UIs include, but are not limited to, graphical user interfaces (GUIs) and web-based user interfaces.

涉及计算机系统和网络、数据库和计算机程序产品的另外的细节也在以下中提供:例如,Peterson,Computer Networks:A Systems Approach,Morgan Kaufmann,第5版(2011);Kurose,Computer Networking:A Top-Down Approach,Pearson,第7版(2016);Elmasri,Fundamentals of Database Systems,Addison Wesley,第6版(2010);Coronel,Database Systems:Design,Implementation,&Management,Cengage Learning,第11版(2014);Tucker,Programming Languages,McGraw-Hill Science/Engineering/Math,第2版(2006);和Rhoton,Cloud Computing Architected:Solution Design Handbook,Recursive Press(2011),在此将其中每一项通过引用以其整体并入本文。Additional details relating to computer systems and networks, databases, and computer program products are also provided in, for example, Peterson, Computer Networks: A Systems Approach, Morgan Kaufmann, 5th Edition (2011); Kurose, Computer Networking: A Top-Down Approach, Pearson, 7th Edition (2016); Elmasri, Fundamentals of Database Systems, Addison Wesley, 6th Edition (2010); Coronel, Database Systems: Design, Implementation, & Management, Cengage Learning, 11th Edition (2014); Tucker, Programming Languages, McGraw-Hill Science/Engineering/Math, 2nd Edition (2006); and Rhoton, Cloud Computing Architected: Solution Design Handbook, Recursive Press (2011), each of which is hereby incorporated by reference in its entirety.

IV.应用IV. Application

A.癌症和其他疾病A. Cancer and other diseases

本方法可以用于诊断受试者中状况特别是癌症的存在或不存在,以表征状况(例如,对癌症进行分期或确定癌症的异质性),监测状况对治疗的响应,实现对状况发展或状况后续进程的风险的预后。本公开内容也可以用于确定特定治疗选择的效力。如果治疗是成功的,则成功的治疗选择可能随着更多的癌症可能死亡并且使DNA脱落而增加受试者的血液中检测到的拷贝数变异或罕见突变的量。在其他实例中,这可能不会发生。在另一实例中,也许某些治疗选择可能与癌症随时间推移的遗传谱相关。这种相关性可以用于选择疗法。在一些实施方案中,分析高甲基化可变表观遗传靶区以确定它们是否显示出肿瘤细胞或通常对cfDNA没有显著贡献的细胞的高甲基化特征,和/或分析低甲基化可变表观遗传靶区以确定它们是否显示出肿瘤细胞或通常对cfDNA没有显著贡献的细胞的低甲基化特征。The present method can be used to diagnose the presence or absence of a condition, particularly cancer, in a subject, to characterize the condition (e.g., to stage cancer or determine the heterogeneity of cancer), monitor the response of the condition to treatment, and achieve a prognosis of the risk of the development of the condition or the subsequent course of the condition. The present disclosure can also be used to determine the efficacy of a particular treatment option. If the treatment is successful, a successful treatment option may increase the amount of copy number variation or rare mutation detected in the subject's blood as more cancers may die and cause DNA to fall off. In other instances, this may not happen. In another instance, perhaps certain treatment options may be associated with the genetic spectrum of cancer over time. This correlation can be used to select therapy. In some embodiments, hypermethylated variable epigenetic target regions are analyzed to determine whether they show the high methylation characteristics of tumor cells or cells that do not usually contribute significantly to cfDNA, and/or low methylated variable epigenetic target regions are analyzed to determine whether they show the low methylation characteristics of tumor cells or cells that do not usually contribute significantly to cfDNA.

另外,如果观察到癌症在治疗之后处于缓解中,本方法可以用于监测残留疾病或疾病的复发。Additionally, if a cancer is observed to be in remission following treatment, the present method can be used to monitor for residual disease or recurrence of disease.

在一些实施方案中,本文公开的方法和系统可以基于将核酸变异分类为体细胞来源或种系来源而用于鉴定定制或靶向的疗法以治疗患者的特定疾病或状况。通常,所考虑的疾病是一种类型的癌症。这样的癌症的非限制性实例包括胆道癌、膀胱癌、移行细胞癌、尿路上皮癌、脑癌、神经胶质瘤、星形细胞瘤、乳腺癌、化生癌、宫颈癌、宫颈鳞状细胞癌、直肠癌、结肠直肠癌、结肠癌、遗传性非息肉性结肠直肠癌、结肠腺癌、胃肠间质瘤(GIST)、子宫内膜癌、子宫内膜间质肉瘤、食管癌、食管鳞状细胞癌、食管腺癌、眼黑素瘤、葡萄膜黑素瘤、胆囊癌、胆囊腺癌、肾细胞癌、透明细胞肾细胞癌(clear cell renal cellcarcinoma)、移行细胞癌、尿路上皮癌、肾母细胞瘤、白血病、急性淋巴细胞白血病(ALL)、急性髓性白血病(AML)、慢性淋巴细胞白血病(CLL)、慢性髓性白血病(CML)、慢性粒单核细胞白血病(CMML)、肝癌(liver cancer)、肝癌(liver carcinoma)、肝细胞瘤、肝细胞癌、胆管癌、肝母细胞瘤、肺癌、非小细胞肺癌(NSCLC)、间皮瘤、B细胞淋巴瘤、非霍奇金淋巴瘤、弥漫性大B细胞淋巴瘤、套细胞淋巴瘤、T细胞淋巴瘤、非霍奇金淋巴瘤、前体T淋巴母细胞淋巴瘤/白血病、外周T细胞淋巴瘤、多发骨髓瘤、鼻咽癌(NPC)、神经母细胞瘤、口咽癌、口腔鳞状细胞癌、骨肉瘤、卵巢癌、胰腺癌、胰腺导管腺癌、假乳头状肿瘤、泡细胞癌。前列腺癌、前列腺腺癌、皮肤癌、黑素瘤、恶性黑素瘤、皮肤黑素瘤、小肠癌、胃癌(stomach cancer)、胃癌(gastric carcinoma)、胃肠间质瘤(GIST)、子宫癌或子宫肉瘤。癌症的类型和/或分期可以根据遗传变异检测,包括突变、罕见突变、插入/缺失、拷贝数变异、颠换、易位、倒位、缺失、非整倍性、部分非整倍性、多倍性、染色体不稳定性、染色体结构改变、基因融合、染色体融合、基因截短、基因扩增、基因复制、染色体损伤、DNA损伤、核酸化学修饰的异常改变、表观遗传模式的异常改变和核酸5-甲基胞嘧啶的异常改变。In some embodiments, the methods and systems disclosed herein can be used to identify tailored or targeted therapies to treat a patient's specific disease or condition based on classifying nucleic acid variations as somatic or germline in origin. Typically, the disease under consideration is a type of cancer. Non-limiting examples of such cancers include biliary tract cancer, bladder cancer, transitional cell carcinoma, urothelial carcinoma, brain cancer, glioma, astrocytoma, breast cancer, metaplastic carcinoma, cervical cancer, cervical squamous cell carcinoma, rectal cancer, colorectal cancer, colon cancer, hereditary nonpolyposis colorectal cancer, colon adenocarcinoma, gastrointestinal stromal tumor (GIST), endometrial cancer, endometrial stromal sarcoma, esophageal cancer, esophageal squamous cell carcinoma, esophageal adenocarcinoma, ocular melanoma, uveal melanoma, gallbladder cancer, gallbladder adenocarcinoma, renal cell carcinoma, clear cell renal cell carcinoma, transitional cell carcinoma, urothelial carcinoma, Wilms tumor, leukemia, acute lymphocytic leukemia (ALL), acute myeloid leukemia (AML), chronic lymphocytic leukemia (CLL), chronic myeloid leukemia (CML), chronic myelomonocytic leukemia (CMML), liver cancer, liver cancer, carcinoma), hepatoma, hepatocellular carcinoma, cholangiocarcinoma, hepatoblastoma, lung cancer, non-small cell lung cancer (NSCLC), mesothelioma, B-cell lymphoma, non-Hodgkin's lymphoma, diffuse large B-cell lymphoma, mantle cell lymphoma, T-cell lymphoma, non-Hodgkin's lymphoma, precursor T-lymphoblastic lymphoma/leukemia, peripheral T-cell lymphoma, multiple myeloma, nasopharyngeal carcinoma (NPC), neuroblastoma, oropharyngeal cancer, oral squamous cell carcinoma, osteosarcoma, ovarian cancer, pancreatic cancer, pancreatic ductal adenocarcinoma, pseudopapillary tumor, follicular cell carcinoma. Prostate cancer, prostate adenocarcinoma, skin cancer, melanoma, malignant melanoma, cutaneous melanoma, small intestine cancer, stomach cancer, gastric carcinoma, gastrointestinal stromal tumor (GIST), uterine cancer or uterine sarcoma. The type and/or stage of cancer can be detected based on genetic variations, including mutations, rare mutations, insertions/deletions, copy number variations, transversions, translocations, inversions, deletions, aneuploidy, partial aneuploidy, polyploidy, chromosomal instability, chromosomal structural changes, gene fusions, chromosome fusions, gene truncations, gene amplifications, gene duplications, chromosomal damage, DNA damage, abnormal changes in nucleic acid chemical modifications, abnormal changes in epigenetic patterns, and abnormal changes in nucleic acid 5-methylcytosine.

遗传数据还可以用于表征特定形式的癌症。癌症在组成和分期两方面通常是异质性的。遗传谱数据可以允许表征癌症的具体亚型,该表征在该具体亚型的诊断或治疗中可能是重要的。该信息还可以为受试者或从业者提供关于具体类型癌症的预后的线索,并且允许受试者或从业者根据疾病的进展调整治疗选择。一些癌症可以进展而变得更具侵袭性和遗传不稳定性。其他癌症可以保持良性的、非活动的、或休眠的。本公开内容的系统和方法可以用于确定疾病进展。Genetic data can also be used to characterize specific forms of cancer. Cancer is often heterogeneous in both composition and staging. Genetic profile data can allow for characterization of a specific subtype of cancer, which may be important in the diagnosis or treatment of that specific subtype. This information can also provide clues to the prognosis of a specific type of cancer to a subject or practitioner, and allow the subject or practitioner to adjust treatment options based on the progression of the disease. Some cancers can progress and become more aggressive and genetically unstable. Other cancers can remain benign, inactive, or dormant. The systems and methods of the present disclosure can be used to determine disease progression.

此外,本公开内容的方法可以用于表征受试者的异常状况的异质性。这样的方法可以包括,例如生成源自受试者的细胞外多核苷酸的遗传谱,其中所述遗传谱包括由拷贝数变异和罕见突变分析得到的多于一个数据。在一些实施方案中,异常状况是癌症。在一些实施方案中,异常状况可以是导致异质性基因组群体的状况。在癌症的实例中,已知一些肿瘤包含处于癌症的不同分期的肿瘤细胞。在其他实例中,异质性可以包括疾病的多个病灶。再次,在癌症的实例中,可以存在多个肿瘤病灶,或许其中一个或更多个病灶为已从原发部位扩散的转移的结果。In addition, the method of the present disclosure can be used to characterize the heterogeneity of the abnormal condition of the subject. Such a method may include, for example, generating a genetic spectrum of extracellular polynucleotides derived from the subject, wherein the genetic spectrum includes more than one data obtained by copy number variation and rare mutation analysis. In some embodiments, the abnormal condition is cancer. In some embodiments, the abnormal condition can be a condition that leads to a heterogeneous genome population. In the example of cancer, it is known that some tumors contain tumor cells in different stages of cancer. In other examples, heterogeneity can include multiple lesions of the disease. Again, in the example of cancer, multiple tumor lesions may be present, perhaps one or more of which are the result of a metastasis that has spread from the primary site.

本方法可以用于产生或剖析为来源于异质性疾病中不同细胞遗传信息的总和的指纹图谱或数据集。该数据集可以包括单独的或组合的拷贝数变异、表观遗传变异和突变分析。The method can be used to generate or analyze a fingerprint or data set that is the sum of genetic information from different cells in a heterogeneous disease. The data set can include copy number variation, epigenetic variation, and mutation analysis alone or in combination.

本方法可以用于诊断、预后、监测或观察癌症或其他疾病。在一些实施方案中,本文的方法不涉及胎儿的诊断、预后或监测胎儿,并因此不涉及非侵入性产前测试。在其他实施方案中,这些方法可以用于妊娠受试者中以诊断、预后、监测或观察未出生受试者中的癌症或其他疾病,所述未出生受试者的DNA和其他多核苷酸可以与母体分子共循环。The present method can be used for diagnosis, prognosis, monitoring or observation of cancer or other diseases. In some embodiments, the methods herein do not involve diagnosis, prognosis or monitoring of a fetus, and therefore do not involve non-invasive prenatal testing. In other embodiments, these methods can be used in pregnant subjects to diagnose, prognose, monitor or observe cancer or other diseases in unborn subjects, whose DNA and other polynucleotides can co-circulate with maternal molecules.

任选地使用本文公开的方法和系统评估的其它基于遗传的疾病、紊乱或状况的非限制性实例包括软骨发育不全、α-1抗胰蛋白酶缺乏症、抗磷脂综合征、孤独症、常染色体显性多囊肾病、夏科-马里-图思病(CMT)、猫叫综合征、克罗恩病、囊性纤维化、Dercum病、唐氏综合征、Duane综合征、杜兴氏肌营养不良症、因子V Leiden易栓症、家族性高胆固醇血症、家族性地中海热、脆性X综合征、戈谢病、血色素沉着病、血友病、全前脑畸形、亨廷顿病、克兰费尔特综合征、马方综合征、强直性肌营养不良、神经纤维瘤病、努南综合征、成骨不全、帕金森病、苯丙酮尿症、Poland异常、卟啉症、早老症、视网膜色素变性、重症联合免疫缺陷病(scid)、镰状细胞病、脊髓性肌萎缩症、泰-萨克斯病、地中海贫血、三甲基胺尿症、特纳综合征、颚心脸综合征(velocardiofacial syndrome)、WAGR综合征、威尔逊病等。Non-limiting examples of other genetically based diseases, disorders, or conditions that are optionally assessed using the methods and systems disclosed herein include achondroplasia, alpha-1 antitrypsin deficiency, antiphospholipid syndrome, autism, autosomal dominant polycystic kidney disease, Charcot-Marie-Tooth disease (CMT), Cry-a-Cat syndrome, Crohn's disease, cystic fibrosis, Dercum's disease, Down syndrome, Duane syndrome, Duchenne muscular dystrophy, Factor V Leiden's thrombophilia, familial hypercholesterolemia, familial Mediterranean fever, fragile X syndrome, Gaucher disease, hemochromatosis, hemophilia, holoprosencephaly, Huntington's disease, Klinefelter syndrome, Marfan syndrome, myotonic dystrophy, neurofibromatosis, Noonan syndrome, osteogenesis imperfecta, Parkinson's disease, phenylketonuria, Poland anomaly, porphyria, progeria, retinitis pigmentosa, severe combined immunodeficiency (scid), sickle cell disease, spinal muscular atrophy, Tay-Sachs disease, thalassemia, trimethylaminuria, Turner syndrome, velocardiofacial syndrome, WAGR syndrome, Wilson's disease, etc.

在一些实施方案中,本文描述的方法包括在先前被诊断为患有癌症的受试者的先前癌症治疗之后的预选时间点,使用如本文描述获得的序列信息组检测来源于或源自肿瘤细胞的DNA的存在或不存在。该方法还可以包括确定癌症复发评分,所述癌症复发评分指示来源于或源自测试受试者的肿瘤细胞的DNA的存在或不存在。In some embodiments, the methods described herein include detecting the presence or absence of DNA derived from or originating from tumor cells using a sequence information set obtained as described herein at a preselected time point after a previous cancer treatment in a subject previously diagnosed with cancer. The method may also include determining a cancer recurrence score indicating the presence or absence of DNA derived from or originating from tumor cells of the test subject.

在确定了癌症复发评分的情况下,可以进一步使用它来确定癌症复发状态。例如,当癌症复发评分高于预定阈值时,癌症复发状态可能处于癌症复发风险。例如,当癌症复发评分低于预定阈值时,癌症复发状态可能处于低或较低的癌症复发风险。在特定实施方案中,等于预定阈值的癌症复发评分可以得到处于癌症复发风险或者处于低或较低的癌症复发风险的癌症复发状态。In the case where a cancer recurrence score is determined, it can be further used to determine a cancer recurrence state. For example, when the cancer recurrence score is above a predetermined threshold, the cancer recurrence state may be at risk of cancer recurrence. For example, when the cancer recurrence score is below a predetermined threshold, the cancer recurrence state may be at a low or lower risk of cancer recurrence. In a specific embodiment, a cancer recurrence score equal to a predetermined threshold can result in a cancer recurrence state that is at risk of cancer recurrence or at a low or lower risk of cancer recurrence.

在一些实施方案中,将癌症复发评分与预定的癌症复发阈值进行比较,并且当癌症复发评分高于癌症复发阈值时,将测试受试者分类为后续癌症治疗的候选者,或者当癌症复发评分低于癌症复发阈值时,将测试受试者分类为治疗的非候选者。在特定实施方案中,等于癌症复发阈值的癌症复发评分可以导致分类为后续癌症治疗的候选者或治疗的非候选者。In some embodiments, the cancer recurrence score is compared to a predetermined cancer recurrence threshold, and when the cancer recurrence score is above the cancer recurrence threshold, the test subject is classified as a candidate for subsequent cancer treatment, or when the cancer recurrence score is below the cancer recurrence threshold, the test subject is classified as a non-candidate for treatment. In specific embodiments, a cancer recurrence score equal to the cancer recurrence threshold can result in classification as a candidate for subsequent cancer treatment or a non-candidate for treatment.

以上讨论的方法还可以包括任何相容特征或在本文其他地方(包括关于确定测试受试者的癌症复发风险和/或将测试受试者分类为后续癌症治疗的候选者的方法的章节中)阐述的特征。The methods discussed above may also include any compatible features or features set forth elsewhere herein, including in the sections regarding methods of determining a test subject's risk of cancer recurrence and/or classifying a test subject as a candidate for subsequent cancer treatment.

B.确定测试受试者的癌症复发风险和/或将测试受试者分类为后续癌症治疗的候选者的方法B. Methods for Determining a Test Subject's Risk of Cancer Recurrence and/or Classifying a Test Subject as a Candidate for Subsequent Cancer Treatment

在一些实施方案中,本文提供的方法是确定测试受试者的癌症复发风险的方法。在一些实施方案中,本文提供的方法是将测试受试者分类为后续癌症治疗的候选者的方法。In some embodiments, the methods provided herein are methods of determining a test subject's risk of cancer recurrence. In some embodiments, the methods provided herein are methods of classifying a test subject as a candidate for subsequent cancer treatment.

这样的方法中的任一种可以包括在对测试受试者进行一个或更多个先前的癌症治疗之后的一个或更多个预选时间点从被诊断为患有癌症的测试受试者收集DNA(例如,来源于或源自肿瘤细胞)。受试者可以是本文描述的任何受试者。DNA可以是cfDNA。DNA可以从组织样品中获得。Any of such methods may include collecting DNA (e.g., derived from or derived from tumor cells) from a test subject diagnosed with cancer at one or more preselected time points after one or more previous cancer treatments have been performed on the test subject. The subject may be any subject described herein. The DNA may be cfDNA. The DNA may be obtained from a tissue sample.

这样的方法中的任一种可以包括从来自受试者的DNA中捕获多于一个靶区组,其中多于一个靶区组包括序列可变靶区组和表观遗传靶区组,由此产生DNA分子的捕获组。捕获步骤可以根据本文其他地方描述的任何实施方案来进行。Any of such methods may include capturing more than one target region set from DNA from a subject, wherein the more than one target region set includes a sequence variable target region set and an epigenetic target region set, thereby generating a captured set of DNA molecules. The capture step may be performed according to any embodiment described elsewhere herein.

在这样的方法中的任一种中,先前的癌症治疗可包括手术、施用治疗组合物和/或化学疗法。In any of such methods, prior cancer treatment may include surgery, administration of a therapeutic composition, and/or chemotherapy.

这样的方法中的任一种可以包括对捕获的DNA分子进行测序,由此产生序列信息组。可以将序列可变靶区组的捕获的DNA分子测序到比表观遗传靶区组的捕获的DNA分子更大的测序深度。Any of such methods may include sequencing the captured DNA molecules, thereby generating a sequence information set.The captured DNA molecules of the sequence variable target region set may be sequenced to a greater sequencing depth than the captured DNA molecules of the epigenetic target region set.

这样的方法中的任一种可以包括在预选的时间点使用序列信息组检测来源于或源自肿瘤细胞的DNA的存在或不存在。对来源于或源自肿瘤细胞的DNA的存在或不存在的检测可以根据本文其他地方描述的它的任何实施方案来进行。Any of such methods may include using the sequence information set to detect the presence or absence of DNA derived from or derived from tumor cells at a preselected time point. The detection of the presence or absence of DNA derived from or derived from tumor cells may be carried out according to any of its embodiments described elsewhere herein.

确定测试受试者的癌症复发风险的方法可以包括确定癌症复发评分,所述癌症复发评分指示来源于或源自测试受试者的肿瘤细胞的DNA的存在或不存在或者量。癌症复发评分可以进一步用于确定癌症复发状态。例如,当癌症复发评分高于预定阈值时,癌症复发状态可能处于癌症复发风险。例如,当癌症复发评分低于预定阈值时,癌症复发状态可能处于低或较低的癌症复发风险。在特定实施方案中,等于预定阈值的癌症复发评分可以得到处于癌症复发风险或者处于低或较低的癌症复发风险的癌症复发状态。The method for determining the risk of cancer recurrence of a test subject may include determining a cancer recurrence score, the cancer recurrence score indicating the presence or absence or amount of DNA derived from or derived from a tumor cell of the test subject. The cancer recurrence score may be further used to determine a cancer recurrence state. For example, when the cancer recurrence score is higher than a predetermined threshold, the cancer recurrence state may be at a risk of cancer recurrence. For example, when the cancer recurrence score is lower than a predetermined threshold, the cancer recurrence state may be at a low or lower risk of cancer recurrence. In a specific embodiment, a cancer recurrence score equal to a predetermined threshold can be obtained to be at a risk of cancer recurrence or at a low or lower risk of cancer recurrence.

将测试受试者分类为后续癌症治疗的候选者的方法可以包括将测试受试者的癌症复发评分与预定癌症复发阈值进行比较,从而当癌症复发评分高于癌症复发阈值时将测试受试者分类为后续癌症治疗的候选者,或者当癌症复发评分低于癌症复发阈值时将测试受试者分类为治疗的非候选者。在特定实施方案中,等于癌症复发阈值的癌症复发评分可以得到作为后续癌症治疗的候选者或治疗的非候选者的分类。在一些实施方案中,后续癌症治疗包括化学疗法或施用治疗组合物。The method of classifying a test subject as a candidate for a subsequent cancer treatment may include comparing the test subject's cancer recurrence score to a predetermined cancer recurrence threshold, thereby classifying the test subject as a candidate for a subsequent cancer treatment when the cancer recurrence score is above the cancer recurrence threshold, or classifying the test subject as a non-candidate for treatment when the cancer recurrence score is below the cancer recurrence threshold. In certain embodiments, a cancer recurrence score equal to the cancer recurrence threshold can result in a classification as a candidate for a subsequent cancer treatment or a non-candidate for treatment. In some embodiments, the subsequent cancer treatment includes chemotherapy or administration of a therapeutic composition.

这样的方法中的任一种可以包括基于癌症复发评分确定测试受试者的无疾病存活(DFS)期;例如,DFS期可以是1年、2年、3年、4年、5年或10年。Any of such methods may include determining a disease-free survival (DFS) period for the test subject based on the cancer recurrence score; for example, the DFS period may be 1 year, 2 years, 3 years, 4 years, 5 years, or 10 years.

在一些实施方案中,序列信息组包括序列可变靶区序列,并且确定癌症复发评分可以包括确定至少第一子评分,所述第一子评分指示序列可变靶区序列中存在的SNV、插入/缺失、CNV和/或融合的量。In some embodiments, the sequence information set includes sequence variable target region sequences, and determining the cancer recurrence score can include determining at least a first sub-score that indicates the amount of SNVs, indels, CNVs, and/or fusions present in the sequence variable target region sequences.

在一些实施方案中,序列可变靶区中选自1个、2个、3个、4个或5个的突变数目足以使第一子评分导致癌症复发评分被分类为癌症复发阳性。在一些实施方案中,突变数目选自1个、2个或3个。In some embodiments, the number of mutations selected from 1, 2, 3, 4, or 5 in the sequence variable target region is sufficient for the first subscore to result in the cancer recurrence score being classified as cancer recurrence positive. In some embodiments, the number of mutations is selected from 1, 2, or 3.

在一些实施方案中,序列信息组包括表观遗传靶区序列,并且确定癌症复发评分包括确定指示表观遗传靶区序列中表观遗传特征(例如,高甲基化可变靶区的甲基化和/或片段化可变靶区的扰动的片段化,其中“扰动”意指不同于来自健康受试者的相应样品中发现的DNA)的变化的第二子评分。在一些这样的实施方案中,确定癌症复发评分包括确定指示分子(从表观遗传靶区序列中获得)的量的第二子评分,所述分子代表不同于在来自健康受试者的相应样品中发现的DNA(例如,在来自健康受试者的血液样品中发现的cfDNA,或在来自健康受试者的组织样品中发现的DNA,其中组织样品是与从测试受试者获得的组织相同类型的组织)的表观遗传状态。这些异常分子(即,具有不同于在来自健康受试者的相应样品中发现的DNA的表观遗传状态的分子)可以和与癌症相关的表观遗传变化(例如,高甲基化可变靶区的甲基化和/或片段化可变靶区的扰动的片段化)一致。In some embodiments, the sequence information set includes an epigenetic target region sequence, and determining the cancer recurrence score includes determining a second sub-score indicating a change in an epigenetic feature in the epigenetic target region sequence (e.g., methylation of a hypermethylated variable target region and/or perturbed fragmentation of a fragmented variable target region, where "perturbed" means different from DNA found in a corresponding sample from a healthy subject). In some such embodiments, determining the cancer recurrence score includes determining a second sub-score indicating the amount of a molecule (obtained from the epigenetic target region sequence) that represents an epigenetic state different from DNA found in a corresponding sample from a healthy subject (e.g., cfDNA found in a blood sample from a healthy subject, or DNA found in a tissue sample from a healthy subject, where the tissue sample is the same type of tissue as the tissue obtained from the test subject). These abnormal molecules (i.e., molecules having an epigenetic state different from DNA found in a corresponding sample from a healthy subject) can be consistent with epigenetic changes associated with cancer (e.g., methylation of a hypermethylated variable target region and/or perturbed fragmentation of a fragmented variable target region).

在一些实施方案中,大于或等于0.001%-10%范围内的值的指示高甲基化可变靶区组中的高甲基化和/或片段化可变靶区组中的异常片段化的对应于高甲基化可变靶区组和/或片段化可变靶区组的分子比例足以使第二子评分被分类为癌症复发阳性。范围可为0.001%-1%、0.005%-1%、0.01%-5%、0.01%-2%或0.01%-1%。In some embodiments, the proportion of molecules corresponding to the hypermethylated variable target region group and/or the fragmented variable target region group that is greater than or equal to a value in the range of 0.001%-10% indicating hypermethylation in the hypermethylated variable target region group and/or aberrant fragmentation in the fragmented variable target region group is sufficient for the second subscore to be classified as positive for cancer recurrence. The range may be 0.001%-1%, 0.005%-1%, 0.01%-5%, 0.01%-2%, or 0.01%-1%.

在一些实施方案中,这样的方法中的任一种可以包括根据指示一个或更多个指示来自肿瘤细胞的来源的特征的序列信息组中的分子分数确定肿瘤DNA分数。这可以用于对应于表观遗传靶区中的一些或所有的分子,例如,包括高甲基化可变靶区和片段化可变靶区(高甲基化可变靶区的高甲基化和/或片段化可变靶区的异常片段化可以被认为指示来自肿瘤细胞的来源)中的一个或两者。这可以用于对应于序列可变靶区的分子,例如,包含与癌症一致的改变(诸如SNV、插入/缺失、CNV和/或融合)的分子。肿瘤DNA分数可以基于对应于表观遗传靶区的分子和对应于序列可变靶区的分子的组合来确定。In some embodiments, any of such methods may include determining a tumor DNA score based on the fraction of molecules in a sequence information group indicating one or more features indicating a source from a tumor cell. This can be used for molecules corresponding to some or all of the epigenetic target regions, for example, including one or both of a hypermethylated variable target region and a fragmented variable target region (hypermethylation of a hypermethylated variable target region and/or abnormal fragmentation of a fragmented variable target region can be considered to indicate a source from a tumor cell). This can be used for molecules corresponding to sequence variable target regions, for example, molecules containing changes consistent with cancer (such as SNVs, insertions/deletions, CNVs, and/or fusions). The tumor DNA score can be determined based on a combination of molecules corresponding to epigenetic target regions and molecules corresponding to sequence variable target regions.

癌症复发评分的确定可以至少部分地基于肿瘤DNA分数,其中大于10-11至1或10-10至1的范围内的阈值的肿瘤DNA分数足以使癌症复发评分被分类为癌症复发阳性。在一些实施方案中,大于或等于10-10至10-9、10-9至10-8、10-8至10-7、10-7至10-6、10-6至10-5、10-5至10-4、10-4至10-3、10-3至10-2或10-2至10-1的范围内的阈值的肿瘤DNA分数足以使癌症复发评分被分类为癌症复发阳性。在一些实施方案中,大于至少10-7的阈值的肿瘤DNA分数足以使癌症复发评分被分类为癌症复发阳性。可以基于累积概率来确定肿瘤DNA分数大于阈值,诸如对应于任何前述实施方案的阈值。例如,如果肿瘤分数大于任何前述范围中的阈值的累积概率超过至少0.5、0.75、0.9、0.95、0.98、0.99、0.995或0.999的概率阈值,则认为样品为阳性。在一些实施方案中,概率阈值为至少0.95,诸如0.99。The determination of the cancer recurrence score can be based at least in part on the tumor DNA score, wherein a tumor DNA score greater than a threshold value in the range of 10-11 to 1 or 10-10 to 1 is sufficient for the cancer recurrence score to be classified as positive for cancer recurrence. In some embodiments, a tumor DNA score greater than or equal to a threshold value in the range of 10-10 to 10-9 , 10-9 to 10-8 , 10-8 to 10-7 , 10-7 to 10-6 , 10-6 to 10-5 , 10-5 to 10-4 , 10-4 to 10-3 , 10-3 to 10-2 , or 10-2 to 10-1 is sufficient for the cancer recurrence score to be classified as positive for cancer recurrence. In some embodiments, a tumor DNA score greater than a threshold value of at least 10-7 is sufficient for the cancer recurrence score to be classified as positive for cancer recurrence. The tumor DNA score being greater than a threshold value, such as a threshold value corresponding to any of the foregoing embodiments, can be determined based on a cumulative probability. For example, if the cumulative probability that the tumor score is greater than a threshold in any of the aforementioned ranges exceeds a probability threshold of at least 0.5, 0.75, 0.9, 0.95, 0.98, 0.99, 0.995, or 0.999, the sample is considered positive. In some embodiments, the probability threshold is at least 0.95, such as 0.99.

在一些实施方案中,序列信息组包括序列可变靶区序列和表观遗传靶区序列,并且确定癌症复发评分包括确定指示序列可变靶区序列中存在的SNV、插入/缺失、CNV和/或融合的量的第一子评分和指示表观遗传靶区序列中异常分子的量的第二子评分,以及组合第一子评分和第二子评分以提供癌症复发评分。在组合第一子评分和第二子评分的情况下,它们可以通过以下方式来组合:将阈值独立地应用于每个子评分(例如,在序列可变靶区中大于预定的突变数目(例如>1),并且在表观遗传靶区中大于预定的异常分子(即,具有不同于在来自健康受试者的相应样品中发现的DNA的表观遗传状态的分子;例如肿瘤)分数),或者训练机器学习分类器以基于多于一个阳性训练样品和阴性训练样品来确定状态。In some embodiments, the sequence information group includes sequence variable target region sequences and epigenetic target region sequences, and determining the cancer recurrence score includes determining a first sub-score indicating the amount of SNVs, insertions/deletions, CNVs, and/or fusions present in the sequence variable target region sequences and a second sub-score indicating the amount of abnormal molecules in the epigenetic target region sequences, and combining the first sub-score and the second sub-score to provide a cancer recurrence score. In the case of combining the first sub-score and the second sub-score, they can be combined by applying a threshold to each sub-score independently (e.g., greater than a predetermined number of mutations (e.g., >1) in the sequence variable target region, and greater than a predetermined number of abnormal molecules (i.e., molecules with an epigenetic state different from that of DNA found in a corresponding sample from a healthy subject; e.g., a tumor) score), or training a machine learning classifier to determine a state based on more than one positive training sample and a negative training sample.

在一些实施方案中,在-4至2或-3至1的范围内的组合评分的值足以使癌症复发评分被分类为癌症复发阳性。In some embodiments, a value for the combined score in the range of -4 to 2 or -3 to 1 is sufficient for the cancer recurrence score to be classified as positive for cancer recurrence.

在其中癌症复发评分被分类为癌症复发阳性的任何实施方案中,受试者的癌症复发状态可能处于癌症复发的风险和/或可以将受试者分类为后续癌症治疗的候选者。In any embodiment in which the cancer recurrence score is classified as cancer recurrence positive, the cancer recurrence status of the subject may be at risk for cancer recurrence and/or the subject may be classified as a candidate for subsequent cancer treatment.

在一些实施方案中,癌症是本文其他地方描述的癌症类型中的任一种,例如,结肠直肠癌。In some embodiments, the cancer is any of the cancer types described elsewhere herein, e.g., colorectal cancer.

C.治疗和相关管理C. Treatment and related management

在某些实施方案中,本文公开的方法涉及鉴于核酸变异为体细胞来源或种系来源的状态,鉴定定制疗法并向患者施用定制疗法。在一些实施方案中,基本上任何癌症疗法(例如,手术疗法、放射疗法、化学疗法和/或类似疗法)都可以被包括为这些方法的一部分。通常,定制疗法包括至少一种免疫疗法(或免疫治疗剂)。免疫疗法通常是指增强针对特定癌症类型的免疫应答的方法。在某些实施方案中,免疫疗法是指增强针对肿瘤或癌症的T细胞应答的方法。In certain embodiments, methods disclosed herein relate to identifying customized therapy and administering customized therapy to patients in view of the state that nucleic acid variation is somatic cell source or germline source. In some embodiments, substantially any cancer therapy (e.g., surgical therapy, radiotherapy, chemotherapy and/or similar therapy) can be included as part of these methods. Typically, customized therapy includes at least one immunotherapy (or immunotherapeutic agent). Immunotherapy generally refers to a method for enhancing the immune response for a specific cancer type. In certain embodiments, immunotherapy refers to a method for enhancing the T cell response for a tumor or cancer.

在某些实施方案中,来自受试者的样品的核酸变异为体细胞来源或种系来源的状态可以与来自参考群体的比较器结果(comparator results)的数据库进行比较,以鉴定用于该受试者的定制或靶向疗法。通常,参考群体包括与测试受试者具有相同癌症或疾病类型的患者和/或正在接受或已经接受与测试受试者相同治疗的患者。当核酸变异和比较器结果满足某些分类标准(例如,基本或近似匹配)时,可以鉴定定制或靶向治疗(或多于一种治疗)。In certain embodiments, the state of the nucleic acid variation of the sample from the subject as a somatic cell source or a germline source can be compared with a database of comparator results (comparator results) from a reference population to identify a customized or targeted therapy for the subject. Typically, the reference population includes patients with the same cancer or disease type as the test subject and/or patients who are receiving or have received the same treatment as the test subject. When the nucleic acid variation and the comparator result meet certain classification criteria (e.g., a substantial or approximate match), customized or targeted therapy (or more than one treatment) can be identified.

在某些实施方案中,本文描述的定制疗法通常为胃肠外(例如,静脉内或皮下)施用。含有免疫治疗剂的药物组合物通常静脉内施用。某些治疗剂是口服施用的。然而,定制疗法(例如,免疫治疗剂等)也可以通过以下方法施用,诸如例如,含服、舌下、直肠、阴道、尿道内、表面(topical)、眼内、鼻内和/或耳内,所述施用可以包括片剂、胶囊、颗粒、水性悬浮液、凝胶、喷雾剂、栓剂、油膏(salve)、软膏(ointment)等。In certain embodiments, the customized therapy described herein is generally administered parenterally (e.g., intravenously or subcutaneously). Pharmaceutical compositions containing immunotherapeutics are generally administered intravenously. Certain therapeutic agents are administered orally. However, customized therapy (e.g., immunotherapeutics, etc.) can also be administered by the following methods, such as, for example, buccal, sublingual, rectal, vaginal, intraurethral, topical, intraocular, intranasal and/or intraauricular, and the administration may include tablets, capsules, granules, aqueous suspensions, gels, sprays, suppositories, salve, ointment, etc.

D.试剂盒D. Test kit

还提供了包含如本文描述的组合物的试剂盒。试剂盒可用于进行如本文描述的方法。试剂盒包含至少一种MSRE。在一些实施方案中,试剂盒还包含用于将样品分区为如本文描述的多于一个分区的第一试剂,诸如本文其他地方描述的任何分区试剂。在一些实施方案中,试剂盒包含第二试剂(例如,用于将核碱基诸如胞嘧啶或甲基化胞嘧啶转化为不同核碱基的本文其他地方描述的任何试剂),所述第二试剂用于使第一分区经历不同地影响第一分区的DNA中的第一核碱基和DNA中的第二核碱基的程序,其中第一核碱基是修饰或未修饰的核碱基,第二核碱基是不同于第一核碱基的修饰或未修饰的核碱基,并且第一核碱基和第二核碱基具有相同的碱基配对特异性。试剂盒可以包含第一试剂和第二试剂以及下文和/或本文其他地方讨论的另外的要素。Also provided is a test kit comprising a composition as described herein.Test kit can be used for carrying out a method as described herein.Test kit comprises at least one MSRE.In some embodiments, test kit also comprises the first reagent for partitioning a sample into more than one partition as described herein, such as any partition reagent described elsewhere herein.In some embodiments, test kit comprises a second reagent (for example, for converting a core base such as cytosine or methylated cytosine into any reagent described elsewhere herein of different core bases), and the second reagent is used to make the first partition experience differently affect the first core base in the DNA of the first partition and the program of the second core base in the DNA, wherein the first core base is a modified or unmodified core base, the second core base is a modified or unmodified core base different from the first core base, and the first core base and the second core base have identical base pairing specificity.Test kit can comprise the first reagent and the second reagent and the other elements discussed below and/or elsewhere herein.

试剂盒还可以包含多于一个寡核苷酸探针,所述寡核苷酸探针选择性地与至少5个、6个、7个、8个、9个、10个、20个、30个、40个或所有选自由以下组成的组的基因杂交:ALK、APC、BRAF、CDKN2A、EGFR、ERBB2、FBXW7、KRAS、MYC、NOTCH1、NRAS、PIK3CA、PTEN、RBI、TP53、MET、AR、ABL1、AKT1、ATM、CDH1、CSFIR、CTNNB1、ERBB4、EZH2、FGFRl、FGFR2、FGFR3、FLT3、GNA11、GNAQ、GNAS、HNF1A、HRAS、IDH1、IDH2、JAK2、JAK3、KDR、KIT、MLH1、MPL、NPM1、PDGFRA、PROC、PTPN11、RET,SMAD4、SMARCB1、SMO、SRC、STK11、VHL、TERT、CCND1、CDK4、CDKN2B、RAF1、BRCA1、CCND2、CDK6、NF1、TP53、ARID 1A、BRCA2、CCNE1、ESR1、RIT1、GATA3、MAP2K1、RHEB、ROS1、ARAF、MAP2K2、NFE2L2、RHOA和NTRKl。寡核苷酸探针可以选择性杂交的基因的数目可以不同。例如,基因的数目可以包括1、2、3、4、5、6、7、8、9、10、11、12、13、14、15、16、17、18、19、20、21、22、23、24、25、26、27、28、29、30、31、32、33、34、35、36、37、38、39、40、41、42、43、44、45、46、47、48、49、50、51、52、53或54。试剂盒可以包含容器,所述容器包含用于进行本文描述的任何方法的多于一种寡核苷酸探针和说明书。The kit may also contain more than one oligonucleotide probe that selectively hybridizes to at least 5, 6, 7, 8, 9, 10, 20, 30, 40 or all genes selected from the group consisting of: ALK, APC, BRAF, CDKN2A, EGFR, ERBB2, FBXW7, KRAS, MYC, NOTCH1, NRAS, PIK3CA, PTEN, RBI, TP53, MET, AR, ABL1, AKT1, ATM, CDH1, CSFIR, CTNNB1, ERBB4, EZ , FGFR1, FGFR2, FGFR3, FLT3, GNA11, GNAQ, GNAS, HNF1A, HRAS, IDH1, IDH2, JAK2, JAK3, KDR, KIT, MLH1, MPL, NPM1, PDGFRA, PROC, PTPN11, RET, SMAD4, SMARCB1, SMO, SRC, STK11, VHL, TERT, CCND1, CDK4, CDKN2B, RAF1, BRCA1, CCND2, CDK6, NF1, TP53, ARID1A, BRCA2, CCNE1, ESR1, RIT1, GATA3, MAP2K1, RHEB, ROS1, ARAF, MAP2K2, NFE2L2, RHOA, and NTRK1. The number of genes to which the oligonucleotide probe can selectively hybridize can vary. For example, the number of genes can include 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, or 54. The kit can include a container comprising more than one oligonucleotide probe and instructions for performing any of the methods described herein.

寡核苷酸探针可以选择性地与基因(例如,至少5个基因)的外显子区杂交。在一些情况下,寡核苷酸探针可以选择性地与基因(例如,至少5个基因)的至少30个外显子杂交。在一些情况下,多于一种探针可以选择性地与至少30个外显子中的每一个杂交。与每个外显子杂交的探针可以具有与至少1种其他探针重叠的序列。在一些实施方案中,寡探针可以选择性地与本文公开的基因的非编码区(例如,基因的内含子区)杂交。寡探针还可以选择性地与包含本文公开的基因的外显子区和内含子区两者的基因的区域杂交。Oligonucleotide probes can selectively hybridize with the exon region of a gene (e.g., at least 5 genes). In some cases, oligonucleotide probes can selectively hybridize with at least 30 exons of a gene (e.g., at least 5 genes). In some cases, more than one probe can selectively hybridize with each of at least 30 exons. The probe hybridized with each exon can have a sequence overlapping with at least 1 other probe. In some embodiments, oligoprobes can selectively hybridize with the non-coding region (e.g., the intron region of a gene) of a gene disclosed herein. Oligoprobes can also selectively hybridize with the region of a gene comprising both the exon region and the intron region of a gene disclosed herein.

寡核苷酸探针可以靶向任何数目的外显子。例如,可以靶向至少1个、2个、3个、4个、5个、6个、7个、8个、9个、10个、11个、12个、13个、14个、15个、16个、17个、18个、19个、20个、21个、22个、23个、24个、25个、30个、35个、40个、45个、50个、55个、60个、65个、70个、75个、80个、85个、90个、95个、100个、105个、110个、115个、120个、125个、130个、135个、140个、145个、150个、155个、160个、165个、170个、175个、180个、185个、190个、195个、200个、205个、210个、215个、220个、225个、230个、235个、240个、245个、250个、255个、260个、265个、270个、275个、280个、285个、290个、个、295个、300个、400个、500个、600个、700个、800个、900个、1,000个或更多个外显子。The oligonucleotide probes can target any number of exons. For example, at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, 105, 110, 115, 120, 125, 130, 135, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 250, 251, 252, 253, 254, 256, 257, 258, 259, 260, 261, 262, 263, 264, 265 70, 80 ...

试剂盒可以包含至少4种、5种、6种、7种或8种具有不同分子条形码和相同样品条形码的不同文库衔接子。文库衔接子可以不是测序衔接子。例如,文库衔接子不包含流动池序列或允许形成用于测序的发夹环的序列。分子条形码和样品条形码的不同变化形式和组合在全文中描述,并适用于试剂盒。此外,在一些情况下,衔接子不是测序衔接子。另外,与试剂盒一起提供的衔接子还可以包括测序衔接子。测序衔接子可以包含与一种或更多种测序引物杂交的序列。测序衔接子还可以包含与固体支持物杂交的序列,例如流动池序列。例如,测序衔接子可以是流动池衔接子。测序衔接子可以附接到多核苷酸片段的一端或两端。在一些情况下,试剂盒可以包含至少8种具有不同分子条形码和相同样品条形码的不同文库衔接子。文库衔接子可以不是测序衔接子。试剂盒还可以包含测序衔接子,所述测序衔接子具有选择性地与文库衔接子杂交的第一序列和选择性地与流动池序列杂交的第二序列。在另一实例中,测序衔接子可以是发夹形的。例如,发夹形衔接子可以包含互补的双链部分和环部分,其中双链部分可以附接(例如,连接)到双链多核苷酸。发夹形测序衔接子可以附接到多核苷酸片段的两端以产生环状分子,其可被多次测序。测序衔接子可以从端到端包含上至10个、11个、12个、13个、14个、15个、16个、17个、18个、19个、20个、21个、22个、23个、24个、25个、26个、27个、28个、29个、30个、31个、32个、33个、34个、35个、36个、37个、38个、39个、40个、41个、42个、43个、44个、45个、46个、47个、48个、49个、50个、51个、52个、53个、54个、55个、56个、57个、58个、59个、60个、61个、62个、63个、64个、65个、66个、67个、68个、69个、70个、71个、72个、73个、74个、75个、76个、77个、78个、79个、80个、81个、82个、83个、84个、85个、86个、87个、88个、89个、90个、91个、92个、93个、94个、95个、96个、97个、98个、99个、100个或更多个碱基。测序衔接子可以从端到端包含20-30个、20-40个、30-50个、30-60个、40-60个、40-70个、50-60个、50-70个碱基。在特定实例中,测序衔接子可以从端到端包含20-30个碱基。在另一实例中,测序衔接子可以从端到端包含50-60个碱基。测序衔接子可以包含一种或更多种条形码。例如,测序衔接子可以包含样品条形码。样品条形码可以包含预定序列。样品条形码可用于鉴定多核苷酸的来源。样品条形码可以是至少1个、2个、3个、4个、5个、6个、7个、8个、9个、10个、11个、12个、13个、14个、15个、16个、17个、18个、19个、20个、21个、22个、23个、24个、25个或更多个(或如全文中描述的任何长度)核酸碱基,例如至少8个碱基。条形码可以是连续或非连续的序列,如上文描述的。The kit may include at least 4, 5, 6, 7 or 8 different library adapters with different molecular barcodes and the same sample barcode. The library adapter may not be a sequencing adapter. For example, the library adapter does not include a flow cell sequence or a sequence that allows the formation of a hairpin loop for sequencing. Different variations and combinations of molecular barcodes and sample barcodes are described in the full text and are applicable to the kit. In addition, in some cases, the adapter is not a sequencing adapter. In addition, the adapter provided with the kit may also include a sequencing adapter. The sequencing adapter may include a sequence that hybridizes with one or more sequencing primers. The sequencing adapter may also include a sequence that hybridizes with a solid support, such as a flow cell sequence. For example, the sequencing adapter may be a flow cell adapter. The sequencing adapter may be attached to one or both ends of a polynucleotide fragment. In some cases, the kit may include at least 8 different library adapters with different molecular barcodes and the same sample barcode. The library adapter may not be a sequencing adapter. The kit can also include a sequencing adapter, the sequencing adapter having a first sequence selectively hybridized with a library adapter and a second sequence selectively hybridized with a flow cell sequence. In another example, the sequencing adapter can be hairpin-shaped. For example, a hairpin-shaped adapter can include a complementary double-stranded portion and a loop portion, wherein the double-stranded portion can be attached (e.g., connected) to a double-stranded polynucleotide. A hairpin-shaped sequencing adapter can be attached to the two ends of a polynucleotide fragment to produce a circular molecule, which can be sequenced multiple times. 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, The sequencing adapter may comprise 20-30, 20-40, 30-50, 30-60, 40-60, 40-70, 50-60, 50-70 bases from end to end. In a particular example, the sequencing adapter may comprise 20-30 bases from end to end. In another example, the sequencing adapter may comprise 50-60 bases from end to end. The sequencing adapter may comprise one or more barcodes. For example, the sequencing adapter may comprise a sample barcode. The sample barcode may comprise a predetermined sequence. The sample barcode may be used to identify the source of the polynucleotide. The sample barcode can be at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25 or more (or any length as described throughout) nucleic acid bases, such as at least 8 bases. The barcode can be a continuous or non-continuous sequence, as described above.

文库衔接子可以是平末端和Y形的,并且长度可以小于或等于40个核酸碱基。文库衔接子的其他变化形式可以在全文中找到,并适用于该试剂盒。The library adapter can be blunt-ended and Y-shaped, and can be less than or equal to 40 nucleic acid bases in length. Other variations of library adapters can be found throughout the text and are suitable for use in this kit.

实施例Example

提供以下实施例来说明所公开的方法的某些方面。实施例并不限制本公开内容。The following examples are provided to illustrate certain aspects of the disclosed methods. The examples do not limit the present disclosure.

实施例1:通过消化非特异性分区的DN减少技术噪声Example 1: Reduction of technical noise by digestion of nonspecifically partitioned DNA

组合来自两个健康正常样品的cfDNA池,其中18.6ng用作本文描述的MBD分区测定的输入。将来自具有0.5% MAF(体细胞等位基因分数)的结肠直肠癌样品(CRC)的cfDNA添加到样品的亚组,得到具有0.16%MAF的稀释的CRC样品。将三组正常样品和稀释的CRC样品用于测定。然后使用MBD蛋白将三组样品分区为三个分区(高甲基化(高)分区、中等(残留)分区和低甲基化(低)分区)。在清理之后,将每个分区中的cfDNA分子与包含分子条形码的分区特异性衔接子连接。高分区和残留分区的分子条形码选择为使得它们不具有MSRE识别位点,因此它们在下游处理中不被消化(无论cfDNA甲基化状态如何)。连接后,进行连接清理。在连接清理之后,使高分区和残留分区经历MSRE消化反应。用BstUI和HpaII处理第一样品组(正常和稀释的CRC样品),用BstUI、HpaII和Hin6I酶处理另一样品组。在MBD分区测定中对第三样品组进行模拟消化(无MSRE)作为对照。在MSRE消化之后,使酶热失活(65℃、20分钟)并使用SPRI珠清理。在消化清理之后,将高分区、残留分区和(未经消化的)低分区(衔接子连接的cfDNA)合并,并通过包括以下的NGS测定工作流程进行处理:PCR扩增;富集感兴趣的基因组区域的分子;汇集样品从而允许多重测序,并使用NovaSeq对汇集的样品进行测序。在可选的方法中,可以另外地使低分区与一种或更多种具有甲基化识别位点的MSRE接触,以裂解低分区中非特异性分区的DNA。Combine the cfDNA pools from two healthy normal samples, of which 18.6ng is used as the input of the MBD partition assay described herein. The cfDNA from the colorectal cancer sample (CRC) with 0.5% MAF (somatic allele fraction) is added to the subgroup of the sample to obtain the diluted CRC sample with 0.16% MAF. Three groups of normal samples and diluted CRC samples are used for determination. Then the three groups of samples are partitioned into three partitions (high methylation (high) partition, medium (residual) partition and low methylation (low) partition) using MBD protein. After cleaning, the cfDNA molecules in each partition are connected to the partition-specific adapters comprising molecular barcodes. The molecular barcodes of high partitions and residual partitions are selected so that they do not have MSRE recognition sites, so they are not digested in downstream processing (regardless of cfDNA methylation status). After connection, connection cleaning is performed. After connection cleaning, high partitions and residual partitions are subjected to MSRE digestion reaction. The first sample group (normal and diluted CRC samples) was treated with BstUI and HpaII, and another sample group was treated with BstUI, HpaII and Hin6I enzymes. The third sample group was simulated digested (without MSRE) as a control in the MBD partitioning assay. After MSRE digestion, the enzyme was heat inactivated (65°C, 20 minutes) and cleaned up using SPRI beads. After digestion and cleaning, the high partition, residual partition and (undigested) low partition (cfDNA connected to the adapter) were merged and processed by an NGS assay workflow including the following: PCR amplification; Enrichment of molecules in the genomic region of interest; Pooling samples to allow multiple sequencing, and sequencing the pooled samples using NovaSeq. In an optional method, the low partition can be additionally contacted with one or more MSREs with methylation recognition sites to cleave DNA from nonspecific partitions in the low partition.

图6清楚地示出了当应用MSRE消化时,相对于来自正常样品中未甲基化分子的技术噪声,DMR的癌症甲基化信号的增加。在图6中示出的阴性对照区(其中DNA分子几乎在所有时间都是未甲基化的,与疾病状态无关)中,“a”清楚地指示,MSRE消化明显去除了错误分区为高分区的未甲基化分子——在模拟消化中,90个分子被分区为高分区,而在BstUI、HpaII和Hin6I消化中,分子计数减少到10。在图6中示出的分类DMR中,在用MSRE消化之后,cfDNA分子在正常样品中(b;350→100)以比稀释的CRC样品(c;1500→1100)高得多的比例被去除。Figure 6 clearly shows the increase in cancer methylation signals of DMRs relative to technical noise from unmethylated molecules in normal samples when MSRE digestion is applied. In the negative control area shown in Figure 6 (where DNA molecules are unmethylated almost all the time, regardless of disease state), "a" clearly indicates that MSRE digestion significantly removes unmethylated molecules that were incorrectly partitioned into high partitions - in the mock digestion, 90 molecules were partitioned into high partitions, while in BstUI, HpaII, and Hin6I digestions, the molecule count was reduced to 10. In the classified DMRs shown in Figure 6, after digestion with MSRE, cfDNA molecules were removed at a much higher ratio in normal samples (b; 350→100) than in diluted CRC samples (c; 1500→1100).

实施例2:分析cfDNA以检测肿瘤的存在或不存在Example 2: Analysis of cfDNA to detect the presence or absence of tumor

在夸登特健康公司(Guardant Health)(Redwood City,CA,USA)通过基于血液的NGS测定分析一组患者样品以检测癌症的存在或不存在。从这些患者的血浆中提取cfDNA。然后将患者样品的cfDNA与甲基结合结构域(MBD)缓冲液和与MBD蛋白缀合的磁珠组合,并孵育过夜。在该孵育期间,甲基化cfDNA(如果在cfDNA样品中存在的话)与MBD蛋白结合。用含有递增盐浓度的缓冲液将未甲基化或甲基化较少的DNA从珠上洗下。最后,使用高盐缓冲液将重度甲基化的DNA从MBD蛋白洗下。这些洗涤产生甲基化递增的cfDNA的三种分区(低甲基化分区、残留甲基化分区和高甲基化分区)。A group of patient samples were analyzed by blood-based NGS assays at Guardant Health (Redwood City, CA, USA) to detect the presence or absence of cancer. cfDNA was extracted from the plasma of these patients. The cfDNA of the patient sample was then combined with a methyl binding domain (MBD) buffer and magnetic beads conjugated to the MBD protein and incubated overnight. During this incubation period, methylated cfDNA (if present in the cfDNA sample) binds to the MBD protein. Unmethylated or less methylated DNA was washed off the beads with a buffer containing increasing salt concentrations. Finally, heavily methylated DNA was washed off the MBD protein using a high salt buffer. These washes produce three partitions of cfDNA with increasing methylation (low methylation partitions, residual methylation partitions, and high methylation partitions).

任选地,高甲基化分区中的cfDNA分子经历酶修饰(EM),由此使得未修饰的胞嘧啶,而不是mC和hmC,经历脱氨基,从而通过将未修饰的胞嘧啶转化为尿嘧啶来标记第一分区中非特异性分区的低甲基化分子。Optionally, cfDNA molecules in the hypermethylated partition undergo enzymatic modification (EM) whereby unmodified cytosine, but not mC and hmC, undergo deamination, thereby marking the hypomethylated molecules of the non-specific partition in the first partition by converting unmodified cytosine to uracil.

在将分区中的cfDNA浓缩后,将分区的cfDNA的末端突出端延伸,并在延伸期间通过聚合酶将腺苷残基添加到cfDNA片段的3’末端。将每个片段的5’末端磷酸化。这些修饰使分区的cfDNA可连接。添加DNA连接酶和衔接子以将每个分区的cfDNA分子的每个末端与衔接子连接。这些衔接子含有非独特分子条形码,并且每个分区与具有非独特分子条形码的衔接子连接,所述非独特分子条形码与用于其他分区的衔接子中的条形码是可区分的。After the cfDNA in the partition is concentrated, the terminal overhangs of the partitioned cfDNA are extended, and adenosine residues are added to the 3' ends of the cfDNA fragments by a polymerase during the extension. The 5' end of each fragment is phosphorylated. These modifications make the partitioned cfDNA connectable. DNA ligase and adapters are added to connect each end of the cfDNA molecule of each partition to the adapter. These adapters contain non-unique molecular barcodes, and each partition is connected to an adapter with a non-unique molecular barcode that is distinguishable from the barcodes in the adapters used for other partitions.

使低甲基化分区中的cfDNA与一种或更多种具有甲基化识别位点的MSRE接触。这些酶裂解低甲基化分区中至少一部分非特异性分区的DNA。可选地或另外地,使高甲基化分区中的cfDNA与一种或更多种具有未甲基化识别位点的MSRE接触。这些酶裂解高甲基化分区中至少一部分非特异性分区的DNA。The cfDNA in the low methylation partition is contacted with one or more MSREs having methylation recognition sites. These enzymes cleave the DNA of at least a portion of non-specific partitions in the low methylation partition. Alternatively or additionally, the cfDNA in the high methylation partition is contacted with one or more MSREs having unmethylated recognition sites. These enzymes cleave the DNA of at least a portion of non-specific partitions in the high methylation partition.

连接后,将四个分区汇集在一起并通过PCR进行扩增。被一种或更多种MSRE裂解的分子不经历指数扩增,因为它们并非在每个末端都具有衔接子。After ligation, the four partitions are pooled together and amplified by PCR. Molecules cleaved by one or more MSREs do not undergo exponential amplification because they do not have an adaptor at each end.

在PCR之后,洗涤扩增的DNA并在富集之前浓缩。在浓缩后,将扩增的DNA与盐缓冲液和生物素化RNA探针(包括针对序列可变靶区组的探针和针对表观遗传靶区组的探针)组合,并将该混合物孵育过夜。针对序列可变区组的探针具有约50kb的足迹,并且针对表观遗传靶区组的探针具有约500kb的足迹。针对序列可变靶区组的探针包括靶向表3-5中鉴定的至少一个基因亚组的寡核苷酸,并且针对表观遗传靶区组的探针包括靶向以下选择的寡核苷酸:高甲基化可变靶区、低甲基化可变靶区、CTCF结合靶区、转录起始位点靶区、聚焦扩增靶区和甲基化对照区。After PCR, the amplified DNA is washed and concentrated before enrichment. After concentration, the amplified DNA is combined with a salt buffer and a biotinylated RNA probe (including a probe for a sequence variable target group and a probe for an epigenetic target group), and the mixture is incubated overnight. The probe for the sequence variable group has a footprint of about 50kb, and the probe for the epigenetic target group has a footprint of about 500kb. The probe for the sequence variable target group includes an oligonucleotide targeting at least one gene subset identified in Tables 3-5, and the probe for the epigenetic target group includes an oligonucleotide targeting the following selections: a high methylation variable target region, a low methylation variable target region, a CTCF binding target region, a transcription start site target region, a focused amplification target region, and a methylation control region.

生物素化RNA探针(与DNA杂交)被链霉亲和素磁珠捕获,并通过一系列基于盐的洗涤与未捕获的扩增的DNA分离,从而富集样品。富集后,使用Illumina NovaSeq测序仪对富集样品的等分试样进行测序。然后使用生物信息学工具/算法分析由测序仪生成的序列读段。分子条形码用于鉴定独特分子以及将样品解卷积为差异性MBD分区的分子。本实施例中描述的方法,除了基于分子的分区提供关于分子的总体甲基化水平(即甲基化胞嘧啶残基)的信息(包括由于低甲基化分区中非特异性分区的cfDNA的裂解而具有增加的准确性和/或置信度)之外,还可以基于高甲基化分区中未甲基化胞嘧啶的转化提供关于甲基化胞嘧啶位置的更高分辨率的信息。序列可变靶区序列通过检测可以判定的基因组改变(诸如SNV、插入、缺失和融合)来分析,所述判定具有区分真实肿瘤变异与技术错误(例如,PCR错误、测序错误)的足够的支持。独立分析表观遗传靶区序列,以检测已显示出差异性甲基化的区域(例如,与健康cfDNA相比,在潜在的癌组织中)中cfDNA分子的甲基化状态。最后,将两个分析的结果结合以产生最终的肿瘤存在/不存的判定。Biotinylated RNA probes (hybridized with DNA) are captured by streptavidin magnetic beads and separated from uncaptured amplified DNA by a series of salt-based washings to enrich the sample. After enrichment, aliquots of the enriched sample are sequenced using the Illumina NovaSeq sequencer. The sequence reads generated by the sequencer are then analyzed using bioinformatics tools/algorithms. Molecular barcodes are used to identify unique molecules and to deconvolute samples into molecules of differential MBD partitions. The method described in this embodiment, in addition to providing information about the overall methylation level (i.e., methylated cytosine residues) of the molecule based on the partition of the molecule (including increased accuracy and/or confidence due to the cleavage of cfDNA of non-specific partitions in low methylation partitions), can also provide information about the position of methylated cytosine based on the conversion of unmethylated cytosine in high methylation partitions. Sequence variable target region sequences are analyzed by detecting genomic changes (such as SNV, insertion, deletion, and fusion) that can be determined, and the determination has sufficient support for distinguishing true tumor mutations from technical errors (e.g., PCR errors, sequencing errors). The epigenetic target sequence is analyzed independently to detect the methylation status of cfDNA molecules in regions that have shown differential methylation (e.g., in potentially cancerous tissue compared to healthy cfDNA). Finally, the results of the two analyses are combined to produce a final tumor presence/absence decision.

实施例3:以单核苷酸分辨率分析来自健康受试者和患有早期结肠直肠癌的受试者的cfDNA样品中的甲基化Example 3: Analysis of methylation at single nucleotide resolution in cfDNA samples from healthy subjects and subjects with early-stage colorectal cancer

对来自健康受试者和患有早期结肠直肠癌的受试者的cfDNA样品进行如下分析。使用MBD将cfDNA分区以提供高甲基化分区、中等甲基化分区和低甲基化分区。将每个分区中分区的DNA与衔接子连接,并进行EM-seq转化程序,由此未修饰的胞嘧啶,而不是mC和hmC,经历脱氨基,尽管在可选的程序中,可以如本文描述使高甲基化分区中经分区的DNA与具有未甲基化识别位点的MSRE接触。在这样的脱氨基之后,将分区准备用于测序并进行全基因组测序。对每个分区单独测序,尽管在可选的程序中可以对分区差异性加标签(例如,在分区之后和EM-seq转化之前,或者在分区和EM-seq转化之后和进一步测序准备之前)、汇集、处理和并行测序。The cfDNA samples from healthy subjects and subjects with early colorectal cancer were analyzed as follows. cfDNA was partitioned using MBD to provide high methylation partitions, medium methylation partitions, and low methylation partitions. The DNA of the partitions in each partition was connected to an adapter and the EM-seq conversion procedure was performed, whereby unmodified cytosines, rather than mC and hmC, were deaminized, although in an optional procedure, the DNA of the partitions in the high methylation partitions could be contacted with the MSRE with unmethylated recognition sites as described herein. After such deamination, the partitions were prepared for sequencing and whole genome sequencing was performed. Sequencing was performed separately for each partition, although in an optional procedure, the partition differences could be labeled (e.g., after the partitions and before EM-seq conversion, or after the partitions and EM-seq conversions and before further sequencing preparations), pooled, processed, and sequenced in parallel.

来自高甲基化可变靶区的序列数据是生物信息学上分离的,尽管在可选的程序中,靶区可以在测序前在体外富集。对高甲基化可变靶区的每碱基甲基化进行定量,如图7中示出的,它示出了来自高甲基化分区的高甲基化可变靶区中每个分子的甲基化CpG数目。x轴指示每个分子的CpG总数,因此沿对角线的点代表每个CpG处具有甲基化的分子。因此,可以以单碱基分辨率分析甲基化,并定量MBD分区的材料的每碱基甲基化和部分分子甲基化。与来自健康受试者的样品相比,来自患有结肠直肠癌的受试者的样品在这些区域中表现出高得多的总体甲基化。Sequence data from the hypermethylated variable target regions were bioinformatically separated, although in an optional procedure the target regions can be enriched in vitro prior to sequencing. Per-base methylation of the hypermethylated variable target regions was quantified, as shown in FIG7 , which shows the number of methylated CpGs per molecule in the hypermethylated variable target regions from the hypermethylated partitions. The x-axis indicates the total number of CpGs per molecule, so the points along the diagonal represent molecules with methylation at each CpG. Thus, methylation can be analyzed at single-base resolution and per-base methylation and fractional molecule methylation of MBD partitioned material quantified. Samples from subjects with colorectal cancer exhibited much higher overall methylation in these regions compared to samples from healthy subjects.

实施例4:对MDRE消化的cfDNA的分析Example 4: Analysis of MDRE-digested cfDNA

分离来自两个健康供体的cfDNA的多个等分试样,并使甲基化cfDNA经历基于MBD的分区。然后使低甲基化的cfDNA分区经历NGS衔接子与cfDNA分子的连接。然后使经连接的来自每个供体的cfDNA经历优先裂解甲基化DNA的MSRE的消化,也称为甲基化依赖性限制性内切酶(MDRE)消化。所使用的MDRE是FspEI、LpnPI、MspJI或SgeI,或“模拟”消化(不向消化中添加酶),或作为对照反应的跳过MDRE反应的未消化条件。在MDRE步骤之后,在通用PCR中扩增低甲基化cfDNA分区,其中被MDRE裂解的DNA不被指数扩增,因为并非在每个末端都存在衔接子。然后使用杂交捕获组对PCR产物进行靶向基因组区域的富集,在第二PCR中扩增,并通过NGS测序。杂交捕获组靶包括用于富集的基因组的“阳性对照(ctrl)”区和“阴性对照(ctrl)”区。阳性对照区是基因组中被发现在包括血液和癌性组织的所有人类组织中普遍高度甲基化(>85%甲基化,通过亚硫酸氢盐-seq)的CpG密集区。相反,阴性对照区在所有人类组织中普遍未甲基化(<15%甲基化)。根据NGS分析,比较在所有条件下测序的阳性对照分子(即阳性对照区中的分子)和阴性对照分子(即阴性对照区中的分子)的数目,以分别估计MDRE灵敏度和特异性。图8A-图8B示出了与“模拟”条件相比,FspEI酶处理将阳性对照分子数目减少>100倍,展示了对于甲基化分子消化的~99%灵敏度。图8C-图8D示出了FspEI处理不显著减少阴性对照分子,表明FspEI消化的高特异性(不消化未甲基化的分子)。注意到MspJI显示出一定的灵敏度,但与FspEI相比特异性差,而LpnI和SgeI显示出很少/没有灵敏度。Multiple aliquots of cfDNA from two healthy donors are separated, and methylated cfDNA is subjected to MBD-based partitioning. Then the low-methylated cfDNA partition is subjected to the connection of NGS adapters to cfDNA molecules. Then the cfDNA from each donor connected is subjected to the digestion of MSRE that preferentially cleaves methylated DNA, also known as methylation-dependent restriction endonuclease (MDRE) digestion. The MDRE used is FspEI, LpnPI, MspJI or SgeI, or "simulated" digestion (no enzyme is added to the digestion), or the undigested condition of skipping MDRE reaction as a control reaction. After the MDRE step, low-methylated cfDNA partitions are amplified in universal PCR, where the DNA cleaved by MDRE is not exponentially amplified because adapters are not present at each end. Then the PCR product is enriched in the targeted genomic region using a hybrid capture group, amplified in the second PCR, and sequenced by NGS. The hybrid capture group target includes a "positive control (ctrl)" area and a "negative control (ctrl)" area for the enriched genome. The positive control region is a CpG-dense region in the genome that is found to be generally highly methylated (>85% methylated, by bisulfite-seq) in all human tissues including blood and cancerous tissues. In contrast, the negative control region is generally unmethylated (<15% methylated) in all human tissues. According to NGS analysis, the number of positive control molecules (i.e., molecules in the positive control region) and negative control molecules (i.e., molecules in the negative control region) sequenced under all conditions is compared to estimate the MDRE sensitivity and specificity, respectively. Figures 8A-8B show that compared with the "simulation" conditions, FspEI enzyme treatment reduces the number of positive control molecules by >100 times, demonstrating a ~99% sensitivity for digestion of methylated molecules. Figures 8C-8D show that FspEI treatment does not significantly reduce negative control molecules, indicating the high specificity of FspEI digestion (unmethylated molecules are not digested). It is noted that MspJI shows a certain sensitivity, but poor specificity compared to FspEI, while LpnI and SgeI show little/no sensitivity.

使用具有不同识别位点的分子和每个分子的位点数目来计算MDRE消化效率。消化效率计算为1-[MDRE条件下阳性对照分子数目]/[模拟条件下阳性对照分子数目]。FspEI的包括5mCpG的一般识别序列为C5mCGH(H=A、C或T),其中裂解发生在下游12-16个碱基。FspEI回文位点C5mCGG包含两个FspEI识别位点——在相反方向的上链和下链上。一般包含5mCpG的共有序列是5mCpGNR,其可以与FspEI共有序列重叠。图9A-图9D表明,消化效率随着每个分子的C5mCGH或C5mCGG位点的最小数目而增加,并且在回文位点(C5mCGG)更有效。具有至少一个C5mCGG或至少两个C5mCGH位点的阳性对照分子以95%的效率被裂解。MDRE digestion efficiency is calculated using molecules with different recognition sites and the number of sites for each molecule. Digestion efficiency is calculated as 1-[number of positive control molecules under MDRE conditions]/[number of positive control molecules under simulation conditions]. The general recognition sequence of FspEI including 5m CpG is C 5m CGH (H=A, C or T), wherein cracking occurs in the downstream 12-16 bases. The FspEI palindromic site C 5m CGG comprises two FspEI recognition sites-on the upper and lower chains in opposite directions. The consensus sequence generally comprising 5m CpG is 5m CpGNR, which can overlap with the FspEI consensus sequence. Figure 9 A-Figure 9D shows that digestion efficiency increases with the minimum number of C 5m CGH or C 5m CGG sites for each molecule, and is more effective at palindromic sites (C 5m CGG). Positive control molecules with at least one C 5m CGG or at least two C 5m CGH sites are cracked with an efficiency of 95%.

此外,对同时或顺序用FspEI和MspJI消化进行测试。用两种MDRE(FspEI然后MspJI)顺序消化具有最高效率。可能是在同时消化(FspEI和MspJI)中,MspJI有时与DNA结合但不裂解(较低的个体效率),从而在空间上阻断FspEI活性。尽管FspEI然后MspJI在这里比单独的FspEI具有更高的总体效率,但是单独的FspEI具有更好的裂解特异性。因此,在不同的情况下,单独用FspEI消化或用FspEI然后用MspJI消化可能是优选的。注意到最小位点的数目越多,观察到的阳性对照分子就越少(图9C-图9D)并因此消化效率估计变得更有噪声。In addition, to being tested with FspEI and MspJI digestion simultaneously or sequentially.Sequential digestion with two kinds of MDRE (FspEI then MspJI) has the highest efficiency.It may be that in digestion simultaneously (FspEI and MspJI), MspJI sometimes binds to DNA but does not crack (lower individual efficiency), thereby spatially blocking FspEI activity.Although FspEI then MspJI has a higher overall efficiency than independent FspEI here, independent FspEI has better cracking specificity.Therefore, in different situations, it may be preferred to digest with FspEI alone or to digest with FspEI then MspJI.Notice that the number of minimum sites is more, the positive control molecules observed are less (Fig. 9 C-Fig. 9 D) and therefore digestion efficiency estimation becomes more noisy.

实施例5:MDRE处理后检测肿瘤DNAExample 5: Detection of tumor DNA after MDRE treatment

将从四名健康供体中分离的cfDNA用于创建“正常”和模拟“癌症”的cfDNA样品。单纯的供体样品作为“正常”样品使用,并且用掺入结肠直肠癌(CRC)患者的cfDNA来创建“癌症”样品。先前已经测量了CRC cfDNA样品的循环肿瘤DNA分数,并使用该分数将计算量的CRC cfDNA掺入到正常供体cfDNA中,使得得到的“癌症”样品含有0.5%的循环肿瘤DNA(图10A-图10J中的“0.5% CRC”)。对所有样品进行基于MBD的分区,将cfDNA分为高甲基化和低甲基化的cfDNA分区。然后将低甲基化的cfDNA分区与NGS衔接子连接。然后用FspEI、MspJI、或FspEI+MspJI对经连接的来自每个供体的cfDNA进行MDRE消化。“模拟消化”(不向消化反应中添加酶)和“无消化”条件(完全跳过MDRE反应)用作对照反应。在MDRE步骤之后,在通用PCR中扩增未消化的低甲基化分区cfDNA,然后使用杂交捕获组进行靶向基因组区域的富集,并且然后在第二PCR中扩增,并通过NGS测序。杂交捕获组靶包括用于富集的基因组的低甲基化可变靶区和“阴性对照(ctrl)”区。阴性对照区是基因组中被发现在包括血液和癌性组织的所有人类组织中普遍低甲基化(<15%甲基化,亚硫酸氢盐-seq)的CpG密集区。低甲基化可变靶区是文献中注释为与健康结肠组织和血液相比在CRC组织中具有减少的甲基化百分比的基因组区域。根据NGS分析,在所有消化条件下比较“正常”和“癌症”样品之间具有2个或更多个CCGG位点的低甲基化可变靶区分子(其应被MDRE以高效率消化)的数目(图10A-图10E)。还将低甲基化可变靶区分子计数与阴性对照分子计数的比值进行比较,该比值针对能够影响低甲基化可变靶区分子计数的变化cfDNA输入量进行归一化(图10F-图10J)。在无MDRE消化条件下(“无消化”和“模拟消化”)没有观察到低甲基化可变靶区癌症信号的可分辨检测。也就是说,低甲基化可变靶区分子和归一化比值水平在“癌症”和“正常”样品之间是不可区分的(没有显著差异)(这由图10C、图10E、图10H和图10J中的水平箭头标记)。相反,当有MDRE处理时,与“正常”样品相比,在“癌症”样品中检测到低甲基化可变靶区计数和归一化比值的移动(增加)(由图10A、图10B、图10D、图10F、图10G和图10I中的向右上箭头标记)。因此,MDRE处理使得能够在0.5%CRC ctDNA的“癌症”样品中检测到癌症低甲基化可变靶区信号,这是通过单独的MBD分区测定检测不到的。cfDNA isolated from four healthy donors was used to create "normal" and simulated "cancer" cfDNA samples. Pure donor samples were used as "normal" samples, and "cancer" samples were created by incorporating cfDNA from patients with colorectal cancer (CRC). The circulating tumor DNA fraction of CRC cfDNA samples has been previously measured, and the calculated amount of CRC cfDNA was incorporated into the normal donor cfDNA using this fraction, so that the resulting "cancer" sample contained 0.5% of circulating tumor DNA ("0.5% CRC" in Figure 10A-Figure 10J). MBD-based partitioning was performed on all samples, and cfDNA was divided into high-methylation and low-methylation cfDNA partitions. The low-methylation cfDNA partition was then connected to the NGS adapter. Then FspEI, MspJI, or FspEI+MspJI were used to perform MDRE digestion on the cfDNA from each donor connected. "Simulated digestion" (no enzyme added to the digestion reaction) and "no digestion" conditions (completely skipping the MDRE reaction) were used as control reactions. After the MDRE step, undigested low-methylation partitioned cfDNA is amplified in a universal PCR, and then a hybrid capture group is used for enrichment of the targeted genomic region, and then amplified in a second PCR and sequenced by NGS. The hybrid capture group target includes a low-methylation variable target region and a "negative control (ctrl)" region for the enriched genome. The negative control region is a CpG-dense region in the genome that is found to be generally low-methylated (<15% methylated, bisulfite-seq) in all human tissues including blood and cancerous tissues. The low-methylation variable target region is a genomic region annotated in the literature as having a reduced percentage of methylation in CRC tissues compared to healthy colon tissues and blood. According to NGS analysis, the number of low-methylation variable target region molecules (which should be digested with high efficiency by MDRE) with 2 or more CCGG sites between "normal" and "cancer" samples is compared under all digestion conditions (Figures 10A-10E). The ratio of the low methylation variable target molecule count to the negative control molecule count was also compared, and the ratio was normalized for the variable cfDNA input amount that could affect the low methylation variable target molecule count (Figure 10F-Figure 10J). No distinguishable detection of low methylation variable target cancer signals was observed under MDRE-free digestion conditions ("no digestion" and "simulated digestion"). That is, the low methylation variable target molecule and normalized ratio levels were indistinguishable (no significant difference) between "cancer" and "normal" samples (this is marked by horizontal arrows in Figure 10C, Figure 10E, Figure 10H, and Figure 10J). In contrast, when there was MDRE treatment, the movement (increase) of the low methylation variable target count and normalized ratio was detected in the "cancer" sample compared with the "normal" sample (marked by the upper right arrows in Figure 10A, Figure 10B, Figure 10D, Figure 10F, Figure 10G, and Figure 10I). Thus, MDRE treatment enabled the detection of a cancer hypomethylated variable target signal in a “cancer” sample of 0.5% CRC ctDNA, which was undetectable by the MBD partitioning assay alone.

******

虽然本文已经示出和描述了本发明的优选实施方案,但对于本领域技术人员将明显的是,这样的实施方案只是以实例的方式提供的。并不意图本发明受限于说明书中提供的特定实例。尽管已经参考前述说明书描述了本发明,但本文的实施方案的描述和说明不应以限制性的意义来解释。在不偏离本发明的情况下,本领域技术人员现在将想到许多变化、改变和替换。此外,应当理解,本发明的所有方面并不限于本文阐述的特定描绘、配置或相对比例,这些取决于各种条件和变量。应当理解,在实践本发明时可以采用本文描述的本公开内容的实施方案的各种替代选择。因此设想本公开内容还应涵盖任何此类替代选择、修改、变化或等同物。意图所附权利要求界定本发明的范围,并且从而涵盖在这些权利要求范围内的方法和结构及其等同物。Although preferred embodiments of the present invention have been shown and described herein, it will be apparent to those skilled in the art that such embodiments are provided only by way of example. It is not intended that the present invention be limited to the specific examples provided in the specification. Although the present invention has been described with reference to the foregoing description, the description and illustration of the embodiments herein should not be interpreted in a restrictive sense. Without departing from the present invention, those skilled in the art will now think of many variations, changes and substitutions. In addition, it should be understood that all aspects of the present invention are not limited to the specific depictions, configurations or relative proportions set forth herein, which depend on various conditions and variables. It should be understood that various alternatives of the embodiments of the present disclosure described herein may be adopted when practicing the present invention. It is therefore contemplated that the present disclosure should also encompass any such alternatives, modifications, variations or equivalents. It is intended that the appended claims define the scope of the present invention, and thus encompass methods and structures and their equivalents within the scope of these claims.

虽然为了清楚与理解的目的,已经通过图示和实例的方式对前述公开内容进行了一些详细描述,但是本领域普通技术人员通过阅读本公开内容将会清楚,在不偏离本公开内容的真实范围的情况下,可以进行形式和细节上的各种改变,并且可以在所附权利要求书的范围内实施。例如,所有方法、系统、计算机可读介质和/或组件特征、步骤、元件或其他方面都可以以各种组合来使用。Although the foregoing disclosure has been described in some detail by way of illustration and example for the purpose of clarity and understanding, it will be clear to those skilled in the art upon reading this disclosure that various changes in form and detail may be made without departing from the true scope of the disclosure and may be implemented within the scope of the appended claims. For example, all methods, systems, computer-readable media and/or component features, steps, elements or other aspects may be used in various combinations.

本文引用的所有专利、专利申请、网站、其他出版物或文件、登录号等都为了所有目的通过引用以其整体并入,其程度如同每个单独的项目都被具体且单独地指示通过引用如此并入一样。如果序列的不同版本在不同时间与一个登记号关联,则意指在本申请的有效申请日时与该登记号关联的版本。如果适用的话,有效提交日期意指真实提交日期或提及该登记号的优先权申请的提交日期中较早的一个。同样,如果出版物、网站等的不同版本在不同时间发布,则意指在本申请的实际提交日期最近发布的版本,除非另有指示。All patents, patent applications, websites, other publications or documents, accession numbers, etc. cited herein are incorporated by reference in their entirety for all purposes, to the extent that each individual item is specifically and individually indicated to be so incorporated by reference. If different versions of a sequence are associated with an accession number at different times, it is intended to refer to the version associated with that accession number at the effective filing date of the present application. If applicable, the effective filing date means the earlier of the actual filing date or the filing date of the priority application that mentions the registration number. Similarly, if different versions of a publication, website, etc. are published at different times, it is intended to refer to the version that was published most recently on the actual filing date of the present application, unless otherwise indicated.

Claims (44)

1.一种用于分析生物样品中的核酸分子的方法,所述方法包括:1. A method for analyzing nucleic acid molecules in a biological sample, said method comprising: a)基于所述核酸分子的甲基化状态,将所述生物样品中的所述核酸分子的至少一个亚组分区为多于一个分区组,其中所述生物样品包含甲基化的核酸分子和未甲基化的核酸分子;a) partitioning at least a subgroup of said nucleic acid molecules in said biological sample into more than one partition group based on the methylation status of said nucleic acid molecules, wherein said biological sample comprises methylated nucleic acid molecules and Unmethylated nucleic acid molecules; b)用至少一种甲基化敏感性限制性内切酶消化所述多于一个分区组中的一个或更多个分区组的至少一个亚组;和b) digesting at least a subset of one or more of the more than one compartments with at least one methylation-sensitive restriction enzyme; and c)确定所述分区组中的至少一个中的所述核酸分子的一个或更多个遗传基因座处的甲基化状态。c) determining the methylation status at one or more genetic loci of said nucleic acid molecules in at least one of said partition groups. 2.一种用于确定核酸分子的甲基化状态的方法,所述方法包括:2. A method for determining the methylation status of a nucleic acid molecule, said method comprising: a)提供核酸分子的生物样品,其中所述核酸分子包括甲基化的核酸分子和未甲基化的核酸分子;a) providing a biological sample of nucleic acid molecules, wherein said nucleic acid molecules include methylated nucleic acid molecules and unmethylated nucleic acid molecules; b)基于所述核酸分子的甲基化状态将所述生物样品中的所述核酸分子的至少一个亚组分区为多于一个分区组;b) partitioning at least a subgroup of said nucleic acid molecules in said biological sample into more than one partition group based on the methylation status of said nucleic acid molecules; c)用至少一种甲基化敏感性限制性内切酶消化所述多于一个分区组中的一个或更多个分区组的至少一个亚组;c) digesting at least a subset of one or more of the more than one partitions with at least one methylation-sensitive restriction enzyme; d)针对感兴趣的基因组区域对所述多于一个分区组中的所述核酸分子的至少一个亚组进行富集,其中所述核酸分子的至少一个亚组包含所述一个或更多个分区组中的消化的核酸分子;和d) enriching at least a subset of said nucleic acid molecules in said group of more than one partitions for a genomic region of interest, wherein said at least one subset of nucleic acid molecules comprises said one or more partitions digested nucleic acid molecules in the set; and e)确定所述分区组中的至少一个中的所述核酸分子的一个或更多个遗传基因座处的甲基化状态。e) determining the methylation status at one or more genetic loci of said nucleic acid molecules in at least one of said partition groups. 3.一种分析生物样品中核酸分子的方法,所述方法包括:3. A method for analyzing nucleic acid molecules in a biological sample, said method comprising: a)基于所述核酸分子的甲基化状态,将所述生物样品中的所述核酸分子的至少一个亚组分区为多于一个分区组,其中所述生物样品包含甲基化的核酸分子和未甲基化的核酸分子,并且所述多于一个分区组包括第一分区组和第二分区组,其中相对于所述第二分区组,甲基化的核酸分子在所述第一分区组中被过度代表;a) partitioning at least a subgroup of said nucleic acid molecules in said biological sample into more than one partition group based on the methylation status of said nucleic acid molecules, wherein said biological sample comprises methylated nucleic acid molecules and unmethylated nucleic acid molecules, and the more than one partition group comprises a first partition group and a second partition group, wherein methylated nucleic acid molecules are in the first partition group relative to the second partition group overrepresented in b)用至少一种甲基化敏感性限制性内切酶消化所述多于一个分区组中的所述第一分区组的至少一个亚组;和b) digesting at least a subset of said first subgroup of said more than one subgroup with at least one methylation sensitive restriction enzyme; and c)从第一分区组的至少一部分捕获包含表观遗传靶区的第一靶区组,并且从第二分区组的至少一部分捕获包含表观遗传靶区的第二靶区组。c) capturing a first target segment comprising the epigenetic target region from at least a portion of the first compartment and capturing a second target segment comprising the epigenetic target region from at least a portion of the second segment. 4.根据权利要求3所述的方法,其中捕获所述第一靶区组包括使所述第一分区组中的DNA与第一靶特异性探针组接触,并且捕获所述第二靶区组包括使所述第二分区组中的DNA与第二靶特异性探针组接触。4. The method of claim 3, wherein capturing the first target region comprises contacting DNA in the first region with a first target-specific probe set, and capturing the second target region Grouping includes contacting DNA in said second subgroup with a second set of target-specific probes. 5.根据权利要求3或4所述的方法,所述方法还包括确定所述分区组或靶区组中的至少一个中的所述核酸分子的一个或更多个遗传基因座处的甲基化状态。5. The method of claim 3 or 4, further comprising determining the methyl group at one or more genetic loci of the nucleic acid molecule in at least one of the partition or target block. status. 6.根据上述权利要求中任一项所述的方法,其中所述感兴趣的基因组区域、所述第一靶区组和/或所述第二靶区组包含序列可变靶区。6. The method according to any one of the preceding claims, wherein the genomic region of interest, the first set of target regions and/or the second set of target regions comprise sequence variable target regions. 7.根据上述权利要求中任一项所述的方法,所述方法还包括在所述消化步骤之前,将一个或更多个衔接子附接到所述多于一个分区组中的至少一部分所述核酸分子的至少一端。7. The method according to any one of the preceding claims, further comprising, prior to the digesting step, attaching one or more adapters to at least a portion of the more than one partitioned groups. at least one end of the nucleic acid molecule. 8.一种用于确定核酸分子的甲基化状态的方法,所述方法包括:8. A method for determining the methylation status of a nucleic acid molecule, said method comprising: a)提供核酸分子的生物样品,其中所述核酸分子包括甲基化的核酸分子和未甲基化的核酸分子;a) providing a biological sample of nucleic acid molecules, wherein said nucleic acid molecules include methylated nucleic acid molecules and unmethylated nucleic acid molecules; b)基于所述核酸分子的甲基化状态将所述生物样品中的所述核酸分子的至少一个亚组分区为多于一个分区组;b) partitioning at least a subgroup of said nucleic acid molecules in said biological sample into more than one partition group based on the methylation status of said nucleic acid molecules; c)将一个或更多个衔接子附接到所述多于一个分区组中的所述核酸分子的至少一端;c) attaching one or more adapters to at least one end of said nucleic acid molecules in said more than one partition; d)用至少一种甲基化敏感性限制性内切酶消化所述多于一个分区组中的一个或更多个分区组的至少一个亚组;d) digesting at least a subset of one or more of the more than one partitions with at least one methylation-sensitive restriction enzyme; e)针对感兴趣的基因组区域对所述多于一个分区组中的所述核酸分子的至少一个亚组进行富集;其中所述核酸分子的至少一个亚组包含所述一个或更多个分区组中的消化的核酸分子;和e) enriching at least a subset of said nucleic acid molecules in said group of more than one partitions for a genomic region of interest; wherein said at least one subset of nucleic acid molecules comprises said one or more partitions digested nucleic acid molecules in the set; and f)确定所述分区组中的至少一个中的所述核酸分子的一个或更多个遗传基因座处的甲基化状态。f) determining the methylation status at one or more genetic loci of said nucleic acid molecules in at least one of said partition groups. 9.根据权利要求7或8所述的方法,其中将衔接子附接到所述多于一个分区组中的至少一部分所述核酸分子的两端。9. The method of claim 7 or 8, wherein adapters are attached to both ends of at least a portion of the nucleic acid molecules in the more than one partitioned group. 10.根据权利要求1所述的方法,所述方法还包括,在c)之前,针对感兴趣的基因组区域对所述多于一个分区组中的所述核酸分子的至少一个亚组进行富集,其中所述核酸分子的至少一个亚组包含所述一个或更多个分区组中的消化的核酸分子。10. The method of claim 1, further comprising, prior to c), enriching at least a subset of the nucleic acid molecules in the more than one partition group for a genomic region of interest , wherein at least a subset of said nucleic acid molecules comprises digested nucleic acid molecules in said one or more partitions. 11.根据前述权利要求中任一项所述的方法,所述方法还包括检测所述生物样品中癌症的存在或不存在。11. The method according to any one of the preceding claims, further comprising detecting the presence or absence of cancer in the biological sample. 12.根据上述权利要求中任一项所述的方法,所述方法还包括确定所述生物样品中的癌症水平。12. The method according to any one of the preceding claims, further comprising determining the level of cancer in the biological sample. 13.根据上述权利要求中任一项所述的方法,其中确定所述甲基化状态包括对所述消化的核酸分子的至少一个亚组进行测序。13. The method of any preceding claim, wherein determining the methylation status comprises sequencing at least a subset of the digested nucleic acid molecules. 14.根据权利要求7-13中任一项所述的方法,其中所述一个或更多个衔接子包含至少一个标签。14. The method of any one of claims 7-13, wherein the one or more adapters comprise at least one tag. 15.根据上述权利要求中任一项所述的方法,其中所述甲基化敏感性限制性内切酶选择性地消化在所述甲基化敏感性限制性内切酶的识别位点处未甲基化的核酸分子。15. The method according to any one of the preceding claims, wherein the methylation sensitive restriction enzyme selectively digests at the recognition site of the methylation sensitive restriction endonuclease Unmethylated nucleic acid molecules. 16.根据上述权利要求中任一项所述的方法,其中在所述消化步骤之后对至少一部分核酸分子进行扩增和/或测序,并且被所述甲基化敏感性限制性内切酶消化的核酸分子不被扩增和/或不被测序。16. The method according to any one of the preceding claims, wherein at least a portion of the nucleic acid molecule is amplified and/or sequenced after said digestion step and digested by said methylation sensitive restriction endonuclease The nucleic acid molecules are not amplified and/or not sequenced. 17.根据上述权利要求中任一项所述的方法,所述方法包括用至少两种甲基化敏感性限制性内切酶消化所述多于一个分区组中的一个或更多个分区组的至少一个亚组。17. The method according to any one of the preceding claims, comprising digesting one or more of the more than one compartmental groups with at least two methylation-sensitive restriction endonucleases at least one subgroup of . 18.根据权利要求17所述的方法,其中所述至少两种甲基化敏感性限制性内切酶由两种甲基化敏感性限制性内切酶组成。18. The method of claim 17, wherein the at least two methylation-sensitive restriction enzymes consist of two methylation-sensitive restriction enzymes. 19.根据权利要求17或18所述的方法,其中所述甲基化敏感性限制性内切酶包括BstUI和HpaII或由BstUI和HpaII组成。19. The method according to claim 17 or 18, wherein the methylation sensitive restriction enzyme comprises or consists of BstUI and HpaII. 20.根据权利要求17或18所述的方法,其中所述甲基化敏感性限制性内切酶包括HhaI和AccII或由HhaI和AccII组成。20. The method according to claim 17 or 18, wherein the methylation sensitive restriction enzyme comprises or consists of HhaI and AccII. 21.根据权利要求17或18所述的方法,其中所述至少两种甲基化敏感性限制性内切酶包括三种甲基化敏感性限制性内切酶或由三种甲基化敏感性限制性内切酶组成。21. The method of claim 17 or 18, wherein the at least two methylation-sensitive restriction enzymes comprise three methylation-sensitive restriction enzymes or consist of three methylation-sensitive restriction enzymes. constituting restriction endonucleases. 22.根据权利要求17或21所述的方法,其中所述甲基化敏感性限制性内切酶包括BstUI、HpaII和Hin6I或由BstUI、HpaII和Hin6I组成。22. The method according to claim 17 or 21, wherein the methylation sensitive restriction enzyme comprises or consists of BstUI, HpaII and Hin6I. 23.根据上述权利要求中任一项所述的方法,其中所述甲基化敏感性限制性内切酶选自由以下组成的组:AatII、AccII、AciI、Aor13HI、Aor15HI、BspT104I、BssHII、BstUI、Cfr10I、ClaI、CpoI、Eco52I、HaeII、HapII、HhaI、Hin6I、HpaII、HpyCH4IV、MluI、MspI、NaeI、NotI、NruI、NsbI、PmaCI、Psp1406I、PvuI、SacII、SalI、SmaI和SnaBI。23. The method according to any one of the preceding claims, wherein the methylation-sensitive restriction enzyme is selected from the group consisting of AatII, AccII, AciI, Aor13HI, Aor15HI, BspT104I, BssHII, BstUI , Cfr10I, ClaI, CpoI, Eco52I, HaeII, HapII, HhaI, Hin6I, HpaII, HpyCH4IV, MluI, MspI, NaeI, NotI, NruI, NsbI, PmaCI, Psp1406I, PvuI, SacII, SalI, SmaI, and SnaBI. 24.根据权利要求7-23中任一项所述的方法,其中所述一个或更多个衔接子耐受所述甲基化敏感性限制性内切酶的消化。24. The method of any one of claims 7-23, wherein the one or more adapters are resistant to digestion by the methylation-sensitive restriction enzyme. 25.根据权利要求24所述的方法,其中所述一个或更多个耐受性衔接子包含一个或更多个甲基化核苷酸,任选地其中所述甲基化核苷酸包括5-甲基胞嘧啶和/或5-羟甲基胞嘧啶。25. The method of claim 24, wherein the one or more tolerant adapters comprise one or more methylated nucleotides, optionally wherein the methylated nucleotides comprise 5-methylcytosine and/or 5-hydroxymethylcytosine. 26.根据权利要求24所述的方法,其中所述一个或更多个耐受性衔接子包含一个或更多个耐受甲基化敏感性限制性内切酶的核苷酸类似物。26. The method of claim 24, wherein the one or more tolerant adapters comprise one or more nucleotide analogs that are tolerant to methylation-sensitive restriction enzymes. 27.根据权利要求24所述的方法,其中所述一个或更多个耐受性衔接子包含不被甲基化敏感性限制性内切酶识别的核苷酸序列。27. The method of claim 24, wherein the one or more tolerant adapters comprise a nucleotide sequence that is not recognized by a methylation-sensitive restriction enzyme. 28.根据权利要求14-27中任一项所述的方法,其中所述标签包含分子条形码。28. The method of any one of claims 14-27, wherein the tag comprises a molecular barcode. 29.根据权利要求28所述的方法,其中与所述多于一个分区组中的第一分区组中的核酸分子附接的分子条形码不同于与所述多于一个分区组中的第二分区组中的核酸分子附接的分子条形码。29. The method of claim 28, wherein the molecular barcode attached to the nucleic acid molecule in the first partition group of the more than one partition group is different from the second partition in the more than one partition group. Molecular barcodes attached to the nucleic acid molecules in the set. 30.根据权利要求1-29所述的方法,其中对所述多于一个分区组中的第一分区组和所述多于一个分区组中的第二分区组差异性加标签。30. The method of claims 1-29, wherein a first of the more than one partition groups and a second of the more than one partition groups differences are tagged. 31.根据权利要求30所述的方法,其中将第一分区标签与所述第一分区组中的核酸分子附接,并且将第二分区标签与所述第二分区组中的核酸分子附接。31. The method of claim 30, wherein a first partition label is attached to the nucleic acid molecules in the first partition group, and a second partition label is attached to the nucleic acid molecules in the second partition group . 32.根据上述权利要求中任一项所述的方法,其中所述甲基化的核酸分子包括5-甲基胞嘧啶和/或5-羟甲基胞嘧啶。32. The method according to any one of the preceding claims, wherein the methylated nucleic acid molecule comprises 5-methylcytosine and/or 5-hydroxymethylcytosine. 33.根据权利要求13-32中任一项所述的方法,其中所述测序由下一代测序仪进行。33. The method of any one of claims 13-32, wherein the sequencing is performed by a next generation sequencer. 34.根据前述权利要求中任一项所述的方法,其中所述生物样品选自由以下组成的组:DNA样品、RNA样品、多核苷酸样品、无细胞DNA样品和无细胞RNA样品。34. The method of any preceding claim, wherein the biological sample is selected from the group consisting of a DNA sample, an RNA sample, a polynucleotide sample, a cell-free DNA sample, and a cell-free RNA sample. 35.根据前述权利要求中任一项所述的方法,其中所述生物样品是无细胞DNA样品。35. The method of any one of the preceding claims, wherein the biological sample is a cell-free DNA sample. 36.根据权利要求35所述的方法,其中所述无细胞DNA在1ng和500ng之间。36. The method of claim 35, wherein the cell-free DNA is between 1 ng and 500 ng. 37.根据前述权利要求中任一项所述的方法,其中所述分区包括基于所述核酸分子与优先结合包含甲基化核苷酸的核酸分子的结合剂的不同结合亲和力对所述核酸分子进行分区。37. The method according to any one of the preceding claims, wherein the partitioning comprises classifying the nucleic acid molecule based on the different binding affinities of the nucleic acid molecule to binding agents that preferentially bind nucleic acid molecules comprising methylated nucleotides. Partitioned. 38.根据权利要求37所述的方法,其中所述结合剂是甲基结合结构域(MBD)蛋白。38. The method of claim 37, wherein the binding agent is a methyl binding domain (MBD) protein. 39.根据权利要求37所述的方法,其中所述结合剂是对一种或更多种甲基化核苷酸碱基特异性的抗体。39. The method of claim 37, wherein the binding agent is an antibody specific for one or more methylated nucleotide bases. 40.根据权利要求2-39中任一项所述的方法,其中所述感兴趣的基因组区域或表观遗传靶区包含用于癌症检测的差异性甲基化区域。40. The method of any one of claims 2-39, wherein the genomic region or epigenetic target region of interest comprises a differentially methylated region for cancer detection. 41.根据权利要求13-40中任一项所述的方法,所述方法还包括在测序之前对至少一部分所述核酸分子进行扩增。41. The method of any one of claims 13-40, further comprising amplifying at least a portion of the nucleic acid molecule prior to sequencing. 42.根据权利要求41所述的方法,其中在所述扩增中使用的引物包含至少一种样品索引。42. The method of claim 41, wherein primers used in said amplification comprise at least one sample index. 43.根据上述权利要求中任一项所述的方法,其中所述一个或更多个遗传基因座包括多于一个遗传基因座。43. The method of any one of the preceding claims, wherein the one or more genetic loci comprise more than one genetic locus. 44.根据权利要求43所述的方法,其中所述多于一个遗传基因座包含一个或更多个基因组区域。44. The method of claim 43, wherein the more than one genetic locus comprises one or more genomic regions.
CN202180080053.7A 2020-09-30 2021-09-29 Methods and systems for improving the signal-to-noise ratio of DNA methylation partition assays Pending CN116568822A (en)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US63/086,000 2020-09-30
US202063105183P 2020-10-23 2020-10-23
US63/105,183 2020-10-23
PCT/US2021/071648 WO2022073011A1 (en) 2020-09-30 2021-09-29 Methods and systems to improve the signal to noise ratio of dna methylation partitioning assays

Publications (1)

Publication Number Publication Date
CN116568822A true CN116568822A (en) 2023-08-08

Family

ID=87405130

Family Applications (2)

Application Number Title Priority Date Filing Date
CN202180080053.7A Pending CN116568822A (en) 2020-09-30 2021-09-29 Methods and systems for improving the signal-to-noise ratio of DNA methylation partition assays
CN202180080104.6A Pending CN116529394A (en) 2020-09-30 2021-09-29 Compositions and methods for analyzing DNA using zoning and methylation dependent nucleases

Family Applications After (1)

Application Number Title Priority Date Filing Date
CN202180080104.6A Pending CN116529394A (en) 2020-09-30 2021-09-29 Compositions and methods for analyzing DNA using zoning and methylation dependent nucleases

Country Status (1)

Country Link
CN (2) CN116568822A (en)

Also Published As

Publication number Publication date
CN116529394A (en) 2023-08-01

Similar Documents

Publication Publication Date Title
US20230323474A1 (en) Compositions and methods for isolating cell-free dna
US11946106B2 (en) Methods and systems to improve the signal to noise ratio of DNA methylation partitioning assays
US20240229112A1 (en) Compositions and methods for analyzing cell-free dna in methylation partitioning assays
US11939636B2 (en) Methods and systems for improving patient monitoring after surgery
US12234518B2 (en) Compositions and methods for analyzing DNA using partitioning and base conversion
US20240229113A1 (en) Methods and compositions for detecting nucleic acid variants
EP4065725B1 (en) Methods, compositions and systems for improving the binding of methylated polynucleotides
US20230313288A1 (en) Methods for sequence determination using partitioned nucleic acids
CN116568822A (en) Methods and systems for improving the signal-to-noise ratio of DNA methylation partition assays
US20240002946A1 (en) Methods and systems for improving patient monitoring after surgery

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination