HK40032105B

HK40032105B - A method to quantify telomere length and genomic motifs

Info

Publication number: HK40032105B
Application number: HK42020022528.2A
Authority: HK
Inventors: 邓亮生; 马淑玲; 胡令芳
Original assignee: 香港中文大学
Priority date: 2019-06-03
Filing date: 2020-12-22
Publication date: 2024-11-01

Description

Methods for quantifying telomere length and genomic motifs

相关申请的交叉引用Cross-reference to related applications

本申请要求于2019年6月3日提交的美国临时专利申请第62/856,449号的优先权，所述临时申请通过引用以其整体并入本文。This application claims priority to U.S. Provisional Patent Application No. 62/856,449, filed June 3, 2019, which is incorporated herein by reference in its entirety.

发明背景Background of the Invention

分子倒置探针或MIP是单链DNA探针，其具有靶向(互补)靶模板的同一条链的5’和3’末端。在所有先前的应用中，MIP测定的原理利用通过模板依赖性连接事件形成环化产物，所述连接事件对样品中的模板序列具有特异性，并且用于查询样品的序列内容。为了去除非特异性MIP，在连接步骤之后，例如通过DNA消化酶如核酸外切酶I和核酸外切酶III的组合，去除所有非环状产物。Molecular inverted probes, or MIPs, are single-stranded DNA probes with 5' and 3' ends on the same strand that target (complement) a target template. In all previous applications, the principle of MIP assays utilizes the formation of a circularized product via a template-dependent ligation event, which is specific to the template sequence in the sample and used to query the sample's sequence content. To remove non-specific MIPs, after the ligation step, all non-circular products are removed, for example, by a combination of DNA-digesting enzymes such as exonuclease I and exonuclease III.

然而，经典的MIP探针及其使用方法(例如用于核酸检测和定量)不适合用于串联重复序列，例如端粒。设计用于经典MIP探针的测定是基于MIP的5’和3’末端与模板DNA上的特异性不变位点的杂交，其中5’和3’序列与模板DNA上的不同靶序列杂交。在经典的基于MIP的方法中，5’和3’末端在与同一模板结合时彼此紧邻，从而允许它们的直接连接和环化，或者被一个或多个核苷酸隔开，从而允许它们在3’末端延伸(也称为“缺口填充”)和随后的连接之后环化。在所有这些情况下，在检测和/或定量之前，这些反应的环化产物随后从线性的，即未连接的MIP和非环化的连接的MIP中分离。However, classic MIP probes and their usage methods (e.g., for nucleic acid detection and quantification) are not suitable for tandem repeat sequences, such as telomeres. Assays designed for classic MIP probes are based on hybridization of the 5' and 3' ends of the MIP to specific invariant sites on the template DNA, where the 5' and 3' sequences hybridize to different target sequences on the template DNA. In classic MIP-based methods, the 5' and 3' ends are adjacent to each other when binding to the same template, allowing for direct ligation and circularization, or they are separated by one or more nucleotides, allowing for circularization after 3' end extension (also known as "gap filling") and subsequent ligation. In all these cases, the circularized products of these reactions are subsequently separated from linear, i.e., unligated MIPs and non-circularized ligated MIPs before detection and/or quantification.

因为针对串联重复序列的MIP的5’和3’末端将必然靶向相同的重复序列，所以这种探针不适于传统的基于MIP的方法，因为它们将允许MIP末端与给定重复阵列内的许多重复序列中的任一个杂乱结合，通过模板DNA内高度可变数量的插入核苷酸将5’和3’末端隔开并产生不同的产物集合。鉴于产物的这种多样性，以及基于经典的MIP的测定涉及在检测或定量之前去除所有线性产物的事实，当与串联重复序列一起使用时，经典的基于MIP的方法将导致高度不准确且不可靠的结果。因此，对于端粒长度或其它串联重复序列的长度的基于MIP的查询，之前尚未显示出成功。Because the 5' and 3' ends of a MIP targeting tandem repeat sequences will necessarily target the same repeat sequence, such probes are unsuitable for traditional MIP-based methods. This is because they would allow the MIP ends to bind randomly to any of the many repeat sequences within a given repeat array, with the 5' and 3' ends separated by a highly variable number of inserted nucleotides within the template DNA, resulting in different sets of products. Given this product diversity, and the fact that classical MIP-based assays involve removing all linear products before detection or quantification, classical MIP-based methods will lead to highly inaccurate and unreliable results when used with tandem repeat sequences. Therefore, MIP-based queries for telomere length or other tandem repeat lengths have not previously shown success.

测量端粒或其它可变长度的串联重复序列的长度的能力可用于许多目的，包括作为生物衰老的生物标志物。人类端粒由位于每个染色体末端的6-碱基对重复基序组成。细胞分裂时的端粒缩短，它们的损耗率可能受到疾病和健康状况的影响。例如，它们的缩短率可能受生活方式因素如饮食、吸烟、身体活动和饮酒以及压力和心理社会因素的影响(Epel等人,2004；Starkweather等人,2013)。因此，端粒长度的测量可以用于指示个体的生物学年龄和个体为了享受更好的健康可以进行的生活方式变化的类型。The ability to measure the length of telomeres or other variable-length tandem repeat sequences can be used for a variety of purposes, including as a biomarker of biological aging. Human telomeres consist of 6-base-pair repeating motifs located at the ends of each chromosome. Telomeres shorten during cell division, and their rate of loss can be influenced by disease and health conditions. For example, their shortening rate can be influenced by lifestyle factors such as diet, smoking, physical activity and alcohol consumption, as well as stress and psychosocial factors (Epel et al., 2004; Starkweather et al., 2013). Therefore, telomere length measurements can be used to indicate an individual's biological age and the types of lifestyle changes an individual can make to enjoy better health.

已经开发了多种方法来测量端粒长度，但是每种方法均有其自身的局限性和限制的集合。例如，用于测量端粒长度的经典方法是端粒限制性片段测定(TRF；Kimura等人,2010)，其涉及限制性消化来自外周血白细胞的端粒，随后进行Southern印迹。然而，TRF不适于处理大的DNA样品规格(Samani等人,2001,Valdes等人,2005)，并且结果容易受到例如所使用的凝胶电泳的类型、可获得的DNA的量、图像捕获装置的质量以及用于分析的计算机软件的影响(Samani等人,2001,Valdes等人,2005)。Several methods have been developed to measure telomere length, but each method has its own set of limitations and constraints. For example, the classic method for measuring telomere length is telomere restriction fragmentation assay (TRF; Kimura et al., 2010), which involves restrictive digestion of telomeres from peripheral blood leukocytes followed by Southern blotting. However, TRF is not well-suited for handling large DNA sample sizes (Samani et al., 2001; Valdes et al., 2005), and the results are susceptible to factors such as the type of gel electrophoresis used, the amount of DNA available, the quality of the image capture device, and the computer software used for analysis (Samani et al., 2001; Valdes et al., 2005).

用于测量端粒长度的另一种方法是T/S测定，其基于实时定量PCR并且与参考样品比较。然而，研究已经显示最初提供的变异系数被低估，并且DNA的质量，使用的PCR机器和分析软件可以影响结果(Aubert等人,2012,Cunningham等人,2013,Nussey等人,2014)。已使用的其它测定包括定量荧光原位杂交(Q-FISH)和流式细胞术FISH(Flow-FISH)；这些方法可以允许高的精度水平，但局限性在于必须新鲜抽取分析的血样，这在例如流行病学研究方面是不切实际的。Another method for measuring telomere length is the T/S assay, which is based on real-time quantitative PCR and compared with a reference sample. However, studies have shown that the initially provided coefficient of variation is underestimated, and that DNA quality, the PCR machine used, and the analysis software can affect the results (Aubert et al., 2012; Cunningham et al., 2013; Nussey et al., 2014). Other assays used include quantitative fluorescence in situ hybridization (Q-FISH) and flow cytometry FISH (Flow-FISH); these methods allow for high levels of accuracy but are limited by the requirement for freshly drawn blood samples for analysis, which is impractical for applications such as epidemiological studies.

因此，需要用于测量端粒或其它串联重复序列的长度的新的、可靠的方法，其可以大规模应用于例如流行病学研究中或日常医院实验室实践中。本发明解决了这些和其它需求。Therefore, there is a need for new, reliable methods for measuring the length of telomeres or other tandem repeat sequences, which can be applied on a large scale, for example, in epidemiological studies or routine hospital laboratory practices. This invention addresses these and other needs.

发明概述Invention Overview

在一个方面，本发明提供了用于确定端粒长度和/或串联重复序列的拷贝数的单链DNA探针，其中所述探针包含：i)5’同源区，其延伸至所述探针的5'末端并且包含与所述串联重复序列互补的核苷酸序列；ii)接头区；和iii)3’同源区，其延伸至所述探针的3’末端并且包含与所述串联重复序列互补的核苷酸序列，使得所述5’同源区和3’同源区能够结合包含所述串联重复序列的模板DNA的同一条链；其中在5’同源区和3’同源区与模板DNA的同一条链结合时，使得3’同源区紧接模板上单个重复单元内5’同源区的3’，并且因此所述3’同源区的3’末端和所述5’同源区的5’末端被模板DNA上小于一个完整重复单元的核苷酸缺口隔开，所述小于一个完整重复单元的核苷酸缺口包含至多2种不同的碱基。In one aspect, the present invention provides a single-stranded DNA probe for determining telomere length and/or copy number of a tandem repeat sequence, wherein the probe comprises: i) a 5' homologous region extending to the 5' end of the probe and comprising a nucleotide sequence complementary to the tandem repeat sequence; ii) a linker region; and iii) a 3' homologous region extending to the 3' end of the probe and comprising a nucleotide sequence complementary to the tandem repeat sequence, such that the 5' homologous region and the 3' homologous region are capable of binding to the same strand of template DNA comprising the tandem repeat sequence; wherein, when the 5' homologous region and the 3' homologous region bind to the same strand of template DNA, the 3' homologous region is immediately adjacent to the 3' of the 5' homologous region within a single repeat unit on the template, and thus the 3' end of the 3' homologous region and the 5' end of the 5' homologous region are separated by a nucleotide gap on the template DNA less than one complete repeat unit, the nucleotide gap less than one complete repeat unit comprising up to two different bases.

在本发明的一些实施方案中，串联重复序列为端粒序列。在一些实施方案中，所述端粒为人类端粒。在一些实施方案中，5’和3’同源区的核苷酸序列与串联重复序列100％互补。在一些实施方案中，5’和3’同源区的长度各自为15-25个核苷酸。在一些实施方案中，串联重复序列的每个重复单元的长度为2-10个核苷酸。在一些实施方案中，所述接头区包含一个或多个序列元件，其选自通用引物序列、探针特异性引物序列、TaqMan探针序列和标签序列。In some embodiments of the invention, the tandem repeat sequence is a telomere sequence. In some embodiments, the telomere is a human telomere. In some embodiments, the nucleotide sequences of the 5' and 3' homologous regions are 100% complementary to the tandem repeat sequence. In some embodiments, the length of each of the 5' and 3' homologous regions is 15-25 nucleotides. In some embodiments, the length of each repeat unit of the tandem repeat sequence is 2-10 nucleotides. In some embodiments, the adapter region comprises one or more sequence elements selected from universal primer sequences, probe-specific primer sequences, TaqMan probe sequences, and tag sequences.

在本发明的一些实施方案中，当结合在模板DNA上的单个重复单元内时，3’同源区的3’末端和5’同源区的5’末端被1个或2个核苷酸隔开。在一些实施方案中，1个或2个核苷酸包含碱基G。在一些实施方案中，当结合在模板DNA上的单个重复单元内时，3’同源区的3’末端和5’同源区的5’末端被3个核苷酸隔开。在一些实施方案中，3个核苷酸包含2种不同的碱基，并且其中2种不同的碱基为A和G。在一些实施方案中，5’同源区包含SEQ ID NO:4或SEQ ID NO:5的核苷酸序列。在一些实施方案中，3’同源区包含SEQ ID NO:6-8中任一个的核苷酸序列。在一些实施方案中，探针包含SEQ ID NO:1-3中任一个的核苷酸序列。In some embodiments of the invention, when bound to a single repeat unit on template DNA, the 3' end of the 3' homologous region and the 5' end of the 5' homologous region are separated by one or two nucleotides. In some embodiments, the one or two nucleotides comprise the base G. In some embodiments, when bound to a single repeat unit on template DNA, the 3' end of the 3' homologous region and the 5' end of the 5' homologous region are separated by three nucleotides. In some embodiments, the three nucleotides comprise two different bases, wherein the two different bases are A and G. In some embodiments, the 5' homologous region comprises the nucleotide sequence of SEQ ID NO:4 or SEQ ID NO:5. In some embodiments, the 3' homologous region comprises the nucleotide sequence of any one of SEQ ID NO:6-8. In some embodiments, the probe comprises the nucleotide sequence of any one of SEQ ID NO:1-3.

在另一方面，本发明提供了用于测定基因组内可变串联重复序列的拷贝数的试剂盒，其中所述试剂盒包含：如本文所述的第一单链DNA探针，其中所述第一DNA探针的5’和3’同源区与在基因组中的拷贝数在个体之间变化的串联重复序列互补；和如本文所述的第二单链DNA探针，其中所述第二DNA探针的5’和3’同源区与在基因组中的拷贝数在个体之间稳定的串联重复序列互补；其中所述第一探针的核苷酸缺口所包含的至多两种碱基也被所述第二探针的核苷酸缺口所包含。In another aspect, the present invention provides a kit for determining the copy number of variable tandem repeat sequences within a genome, wherein the kit comprises: a first single-stranded DNA probe as described herein, wherein the 5' and 3' homologous regions of the first DNA probe are complementary to tandem repeat sequences in the genome whose copy number varies between individuals; and a second single-stranded DNA probe as described herein, wherein the 5' and 3' homologous regions of the second DNA probe are complementary to tandem repeat sequences in the genome whose copy number is stable between individuals; wherein at most two bases contained in the nucleotide gap of the first probe are also contained in the nucleotide gap of the second probe.

在一些实施方案中，所述试剂盒还包含反应混合物，其包含DNA聚合酶，包含连接酶活性的酶以及对应于第一和第二探针的核苷酸缺口所包含的至多两种碱基的脱氧核糖核苷三磷酸。在一些实施方案中，DNA聚合酶为T4 DNA聚合酶。在一些实施方案中，连接酶活性由Amp连接酶提供。在一些实施方案中，反应混合物不包含核酸外切酶。在一些实施方案中，所述试剂盒还包含测试样品基因组DNA。在一些实施方案中，所述试剂盒还包含与第一和/或第二探针的接头区内的序列互补的通用引物、两种或更多种探针特异性引物和/或两种或更多种TaqMan探针。In some embodiments, the kit further comprises a reaction mixture containing a DNA polymerase, an enzyme with ligase activity, and deoxyribonucleoside triphosphates containing up to two bases corresponding to the nucleotide gaps of the first and second probes. In some embodiments, the DNA polymerase is T4 DNA polymerase. In some embodiments, the ligase activity is provided by Amp ligase. In some embodiments, the reaction mixture does not contain exonuclease. In some embodiments, the kit further comprises genomic DNA of the test sample. In some embodiments, the kit further comprises universal primers complementary to sequences within the adapter regions of the first and/or second probes, two or more probe-specific primers, and/or two or more TaqMan probes.

在本发明试剂盒的一些实施方案中，由第一探针的5’和3’同源区识别的串联重复序列为端粒序列。在一些实施方案中，端粒为人类端粒。在一些实施方案中，由第二探针的5’和3’同源区识别的串联重复序列为4bp的短串联重复。在一些实施方案中，第一探针的5’和3’同源区的核苷酸序列与在基因组中的拷贝数在个体之间变化的串联重复序列100％互补。在一些实施方案中，第二探针的5’和3’同源区的核苷酸序列与在基因组中的拷贝数在个体之间稳定的串联重复序列100％互补。在一些实施方案中，第一和/或第二探针的5’和/或3’同源区的长度为15至25个核苷酸。在一些实施方案中，由第一和/或第二探针识别的串联重复序列的重复单元的长度为3-6个核苷酸。In some embodiments of the kit of the present invention, the tandem repeat sequence recognized by the 5' and 3' homologous regions of the first probe is a telomere sequence. In some embodiments, the telomeres are human telomeres. In some embodiments, the tandem repeat sequence recognized by the 5' and 3' homologous regions of the second probe is a short tandem repeat of 4 bp. In some embodiments, the nucleotide sequence of the 5' and 3' homologous regions of the first probe is 100% complementary to a tandem repeat sequence in the genome whose copy number varies between individuals. In some embodiments, the nucleotide sequence of the 5' and 3' homologous regions of the second probe is 100% complementary to a tandem repeat sequence in the genome whose copy number is stable between individuals. In some embodiments, the length of the 5' and/or 3' homologous regions of the first and/or second probes is 15 to 25 nucleotides. In some embodiments, the repeat unit of the tandem repeat sequence recognized by the first and/or second probes is 3-6 nucleotides in length.

在另一方面，本发明提供了测定个体基因组中可变长度的串联重复序列区的长度的方法，其中所述方法包括：i)提供如本文所述的第一单链DNA探针，其中所述第一DNA探针的5’和3’同源区与在基因组中的拷贝数在个体之间变化的串联重复序列互补；和ii)提供如本文所述的第二单链DNA探针，其中第二DNA探针的5’和3’同源区与在基因组中的拷贝数在个体之间稳定的串联重复序列互补；其中第一探针的核苷酸缺口所包含的至多两种碱基也被第二探针的核苷酸缺口所包含；iii)在有助于所述探针的3’末端的延伸和与结合至同一模板的探针的5’末端连接的条件下，在DNA聚合酶、连接酶活性和对应于第一和第二探针的核苷酸缺口所包含的碱基的至多两种脱氧核糖核苷三磷酸的存在下，使来自个体的生物样品与第一和第二探针接触；iv)定量在步骤iii)中产生的第一和/或第二探针的环化和连接的线性探针产物；以及v)使用第一和/或第二探针的环化和连接的线性探针产物的相对量来确定第一和/或第二探针的连接点的归一化丰度，其是对应于个体基因组中探针的串联重复长度的指标。On the other hand, the present invention provides a method for determining the length of variable-length tandem repeat regions in an individual's genome, wherein the method comprises: i) providing a first single-stranded DNA probe as described herein, wherein the 5' and 3' homologous regions of the first DNA probe are complementary to tandem repeat sequences whose copy number varies between individuals in the genome; and ii) providing a second single-stranded DNA probe as described herein, wherein the 5' and 3' homologous regions of the second DNA probe are complementary to tandem repeat sequences whose copy number is stable between individuals in the genome; wherein at most two bases contained in the nucleotide gap of the first probe are also contained in the nucleotide gap of the second probe; iii) in a manner conducive to the determination of the length of a variable-length tandem repeat region in an individual's genome; Under the conditions of extension of the 3' end of the probe and ligation to the 5' end of the probe bound to the same template, a biological sample from an individual is contacted with the first and second probes in the presence of DNA polymerase, ligase activity, and at most two deoxyribonucleoside triphosphates corresponding to the bases contained in the nucleotide gaps of the first and second probes; iv) quantifying the circularized and ligated linear probe products of the first and/or second probes generated in step iii); and v) determining the normalized abundance of the ligation sites of the first and/or second probes using the relative amounts of the circularized and ligated linear probe products of the first and/or second probes, which is an indicator corresponding to the tandem repeat length of the probes in the individual genome.

在本发明方法的一些实施方案中，第一和第二探针的5’和3’同源区的核苷酸序列分别与第一和第二串联重复序列100％互补。在一些实施方案中，探针的5’和3’同源区的长度为15-25个核苷酸。在一些实施方案中，可变串联重复序列区为端粒。在一些实施方案中，端粒为人类端粒。在一些实施方案中，每个探针的接头区包含一个或多个序列元件，其选自通用引物序列、探针特异性引物序列、TaqMan探针序列和标签序列。在一些实施方案中，当结合在串联重复序列上的单个重复单元内时，探针的3’同源区的3’末端和5’同源区的5’末端被1个或2个核苷酸的缺口隔开，其中所述1个或2个核苷酸包含单一碱基，并且其中在步骤iii)中仅提供对应于所述单一碱基的一个脱氧核糖核苷三磷酸。在一些实施方案中，单一碱基为G。在一些实施方案中，当结合在串联重复序列上的单个重复单元内时，步骤i)中的探针或步骤ii)中的探针的3’同源区的3’末端和5’同源区的5’末端被3个核苷酸的缺口隔开，其中所述3个核苷酸包含2种碱基，并且其中在步骤iii)中提供了对应于2种碱基的两种脱氧核糖核苷三磷酸。在一些实施方案中，2种碱基为A和G。In some embodiments of the method of the present invention, the nucleotide sequences of the 5' and 3' homologous regions of the first and second probes are 100% complementary to the first and second tandem repeat sequences, respectively. In some embodiments, the length of the 5' and 3' homologous regions of the probes is 15-25 nucleotides. In some embodiments, the variable tandem repeat sequence region is a telomere. In some embodiments, the telomere is a human telomere. In some embodiments, the adapter region of each probe contains one or more sequence elements selected from universal primer sequences, probe-specific primer sequences, TaqMan probe sequences, and tag sequences. In some embodiments, when bound within a single repeat unit on a tandem repeat sequence, the 3' end of the 3' homologous region and the 5' end of the 5' homologous region of the probe are separated by a 1- or 2-nucleotide gap, wherein the 1- or 2-nucleotide gap comprises a single base, and wherein only one deoxyribonucleoside triphosphate corresponding to the single base is provided in step iii). In some embodiments, the single base is G. In some embodiments, when bound within a single repeat unit on a tandem repeat sequence, the 3' end of the 3' homologous region and the 5' end of the 5' homologous region of the probe in step i) or step ii) are separated by a 3-nucleotide gap, wherein the 3 nucleotides comprise two bases, and wherein two deoxyribonucleoside triphosphates corresponding to the two bases are provided in step iii). In some embodiments, the two bases are A and G.

在本发明方法的一些实施方案中，步骤i)中的探针的5’同源区包含SEQ ID NO:4或SEQ ID NO:5的核苷酸序列。在一些实施方案中，步骤i)中的探针的3’同源区包含SEQ IDNO:6-8中任一个的核苷酸序列。在一些实施方案中，步骤i)中的探针包含SEQ ID NO:1-3中任一个的核苷酸序列。在一些实施方案中，相对于生物样品中基因组DNA的量，提供了大量过量的步骤i)和ii)中的探针。在一些实施方案中，在步骤i)至v)期间不添加核酸外切酶。在一些实施方案中，在步骤i)至v)期间添加的唯一核酸外切酶为核酸外切酶I，并且其中核酸外切酶I的水平不超过20个单位/50ng基因组DNA并且存在最多1小时。在一些实施方案中，使用选自定量PCR、数字PCR和测序的方法测定探针的环化和连接的线性探针产物的量。在一些实施方案中，仅使用第一单链探针，而另一种方法用于定量输入样品DNA的量。In some embodiments of the method of the present invention, the 5' homologous region of the probe in step i) comprises the nucleotide sequence of SEQ ID NO:4 or SEQ ID NO:5. In some embodiments, the 3' homologous region of the probe in step i) comprises the nucleotide sequence of any one of SEQ ID NO:6-8. In some embodiments, the probe in step i) comprises the nucleotide sequence of any one of SEQ ID NO:1-3. In some embodiments, a large excess of the probes in steps i) and ii) is provided relative to the amount of genomic DNA in the biological sample. In some embodiments, no exonuclease is added during steps i) to v). In some embodiments, the only exonuclease added during steps i) to v) is exonuclease I, and the level of exonuclease I does not exceed 20 units/50 ng of genomic DNA and is present for a maximum of 1 hour. In some embodiments, the amount of circularized and ligated linear probe product is determined using a method selected from quantitative PCR, digital PCR, and sequencing. In some embodiments, only the first single-stranded probe is used, while another method is used to quantify the amount of input sample DNA.

附图简述Brief description of the attached diagram

图1A-1C.在常规MIP测定中形成环化的MIP产物以及在本发明中使用的新的连接的和线性的MIP(LL-MIP)产物。图1A显示了常规MIP测定中的简单的环化MIP。图1B显示了其它类型的连接产物LL-MIP。图1C显示了克隆PCR产物后的示例性LL-MIP产物的序列，其显示了在该LL-MIP中将6个MIP探针连接在一起。Figures 1A-1C. Formation of circularized MIP products in conventional MIP assays and novel ligated and linear MIP (LL-MIP) products used in this invention. Figure 1A shows a simple circularized MIP in a conventional MIP assay. Figure 1B shows other types of ligated LL-MIP products. Figure 1C shows the sequence of an exemplary LL-MIP product after cloning a PCR product, demonstrating the ligation of six MIP probes together in this LL-MIP.

图2.连接点的数目代表端粒长度或基因组基序数目的原理。连接点的丰度是表示端粒长度的指标。Figure 2. The principle that the number of junctions represents telomere length or the number of genomic motifs. The abundance of junctions is an indicator of telomere length.

图3.显示PCR后LL-MIP产物的凝胶。泳道1、3、4、5显示了～100bp(MIP的长度)倍数的条带。左边的大小标志物显示了100bp的间隔。Figure 3. Gel showing the LL-MIP product after PCR. Lanes 1, 3, 4, and 5 show bands that are ~100 bp (MIP length). The size markers on the left show the 100 bp intervals.

图4.现有的MIP测定和本发明中测定端粒长度的方法的比较。Figure 4. Comparison of existing MIP measurements and the method for measuring telomere length in this invention.

图5.通过TRF测定测量的报道的端粒长度(kbp)与代表端粒基序的连接点相对于参考样品的丰度比率之间的相关性，所述比率由Δ-ΔCt值确定。Figure 5. Correlation between reported telomere length (kbp) measured by TRF and the abundance ratio of junctions representing telomere motifs relative to a reference sample, the ratio being determined by Δ-ΔCt values.

图6.显示了本发明的方法对加入的DNA模板的量具有成比例的响应(线性)的实例和数据。Figure 6 shows examples and data where the method of the present invention has a proportional (linear) response to the amount of DNA template added.

图7.在效率(斜率)校正的Δ-ΔCt定量中使用的实例和数据。该Δ-ΔCt值可以用作端粒长度的另一种测量。在该实例中，在杂交反应中使用短的MIP。Figure 7. Examples and data used in efficiency (slope) corrected Δ-ΔCt quantification. This Δ-ΔCt value can be used as another measure of telomere length. In this example, a short MIP is used in the hybridization reaction.

图8.用于在杂交和连接反应后定量连接点的示意性工作流程。每个连接点代表靶串联重复的长度单位。使用例如qPCR的方法来定量连接点的丰度。然后可以将归一化丰度用作串联重复长度的生物标志物。还可以使用诸如实施例和图5中示出的回归方程，通过校准的样品，将其转换为长度单位(例如，在端粒的情况下，为kbp)。Figure 8. Schematic workflow for quantifying tandem repeats after hybridization and ligation reactions. Each tandem repeat represents a unit of length of the target tandem repeat. The abundance of tandem repeats is quantified using methods such as qPCR. The normalized abundance can then be used as a biomarker for the length of the tandem repeat. It can also be converted to units of length (e.g., kbp in the case of telomeres) using a calibrated sample, such as the regression equations shown in the examples and Figure 5.

图9.端粒长度的指标值不受用于杂交的不同起始DNA量的影响。Figure 9. The index value of telomere length is not affected by the different starting DNA amounts used for hybridization.

图10.将从本发明方法获得的指标值校准至TL(以kbp为单位)Figure 10. Calibration of index values obtained from the method of this invention to TL (in kbp).

图11.用于定量2种MIP的连接点的ddPCR的二维聚类图，其中通道1荧光(FAM，端粒连接点)相对通道2(HEX,α卫星连接点)作图。Figure 11. Two-dimensional clustering diagram of ddPCR used to quantify the linkage sites of two MIPs, in which channel 1 fluorescence (FAM, telomere linkage site) is plotted relative to channel 2 (HEX, α-satellite linkage site).

发明详述Invention Details

1.引言 1. Introduction

本发明提供了通过定量样品中相应重复基序的丰度来确定串联重复长度的方法和组合物。本文感兴趣的一种特定类型的串联重复为端粒。所述方法和组合物涉及使用专门设计用于串联重复序列的分子倒置探针(MIP)。This invention provides methods and compositions for determining tandem repeat length by quantifying the abundance of corresponding repeat motifs in a sample. One particular type of tandem repeat of interest herein is telomeres. The methods and compositions involve the use of molecular inverted probes (MIPs) specifically designed for tandem repeat sequences.

本发明的方法和组合物提供了MIP，其专门设计用于基因组中的含串联重复的区域，并且利用了在这样的区域上基于MIP的测定中产生的连接产物的多样性，包括环化和连接的线性产物。所述方法对环化和连接的线性产物进行定量，以允许可靠和快速地评估在个体之间变化的串联重复序列(如端粒序列)的拷贝数。特别地，在优选的方法中，本发明涉及针对串联重复序列的MIP的使用，其中MIP的5’和3’末端包含相同的序列，其中当结合在模板DNA上的单个重复单元内时，它们通常与模板DNA序列100％互补，并且其中它们被设计成能够结合在MIP的3’和5’末端之间具有少量核苷酸的小缺口(例如，1-3个核苷酸)的模板DNA链。此外，本发明MIP的3’和5’末端之间的缺口通常仅包含1个或2个不同的核苷酸，允许在所述方法中使用一个或两个dNTP，并且从而确保末端的连接只能发生在串联重复的单个重复单元内，而不能跨越多个重复。在优选的实施方案中，所述方法省略了在检测和/或定量之前的去除线性产物的步骤，例如基于核酸外切酶的步骤，而是定量环化和连接的线性产物的连接点以确定样品中特定重复基序的丰度作为端粒长度或另一基因组基序的长度的指标参数。The methods and compositions of the present invention provide MIPs specifically designed for regions containing tandem repeats in the genome and utilize the diversity of ligation products generated in MIP-based assays on such regions, including circularized and ligated linear products. The methods quantify the circularized and ligated linear products to allow for reliable and rapid assessment of copy numbers of tandem repeat sequences (such as telomere sequences) that vary between individuals. In particular, in a preferred method, the present invention relates to the use of MIPs for tandem repeat sequences, wherein the 5' and 3' ends of the MIP contain identical sequences, which are typically 100% complementary to the template DNA sequence when bound within a single repeat unit on template DNA, and wherein they are designed to bind to template DNA strands with a small gap (e.g., 1-3 nucleotides) between the 3' and 5' ends of the MIP. Furthermore, the gap between the 3' and 5' ends of the present invention typically contains only one or two different nucleotides, allowing the use of one or two dNTPs in the methods, and thus ensuring that ligation of the ends occurs only within a single repeat unit of the tandem repeat, and not across multiple repeats. In a preferred embodiment, the method omits the step of removing linear products before detection and/or quantification, such as a step based on exonucleases, and instead quantifies the connection points of circularized and ligated linear products to determine the abundance of a specific repetitive motif in the sample as an indicator parameter of telomere length or the length of another genomic motif.

2.定义 2. Definition

如本文所用，关于例如多核苷酸的大小、同源性百分比或任何其它定量测量的术语“约”意指在所提供的值中可能有小程度的变化。例如，“约”意指该值可以与所述值相差例如+/-1％、2％、3％、4％、5％、6％、7％、8％、9％或10％。As used herein, the term “about” in relation to, for example, the size of a polynucleotide, percentage of homology, or any other quantitative measurement, means that there may be a small degree of variation within the provided values. For example, “about” means that the value may differ from the stated value by, for example, +/- 1%, 2%, 3%, 4%, 5%, 6%, 7%, 8%, 9%, or 10%.

术语“核酸”或“多核苷酸”是指单链或双链形式的脱氧核糖核酸(DNA)或核糖核酸(RNA)及其聚合物。除非特别限定，否则该术语涵盖含有天然核苷酸的已知类似物的核酸，其具有与参考核酸类似的结合特性，并且以与天然存在的核苷酸类似的方式被代谢。除非另有说明，否则特定的核酸序列还隐含地涵盖其保守修饰的变体(例如简并密码子取代)、等位基因、直系同源物、SNP和互补序列以及明确指出的序列。具体地，简并密码子取代可以通过产生其中一个或多个选择的(或所有的)密码子的第三位被混合碱基和/或脱氧肌苷残基取代的序列来实现(Batzer等人,Nucleic Acid Res.19:5081(1991)；Ohtsuka等人,J.Biol.Chem.260:2605-2608(1985)；和Rossolini等人,Mol.Cell.Probes 8:91-98(1994))。The term "nucleic acid" or "polynucleotide" refers to deoxyribonucleic acid (DNA) or ribonucleic acid (RNA) in single-stranded or double-stranded form and their polymers. Unless otherwise specified, the term covers nucleic acids containing known analogs of natural nucleotides, having similar binding properties to reference nucleic acids, and being metabolized in a manner similar to naturally occurring nucleotides. Unless otherwise stated, a specific nucleic acid sequence also implicitly covers variants of its conserved modifications (e.g., degenerate codon substitutions), alleles, orthologs, SNPs, and complementary sequences, as well as explicitly stated sequences. Specifically, degenerate codon substitution can be achieved by producing a sequence in which the third position of one or more selected (or all) codons is replaced by a mixed base and/or deoxyinosine residue (Batzer et al., Nucleic Acid Res. 19:5081 (1991); Ohtsuka et al., J. Biol. Chem. 260:2605-2608 (1985); and Rossolini et al., Mol. Cell. Probes 8:91-98 (1994)).

“分子倒置探针”(MIP)为核酸探针，其与感兴趣的靶核酸的互补序列杂交，其中MIP 5’末端从MIP 3’末端与靶标杂交的位置与靶标上的5’杂交，即此时5’和3’末端在单个重复内相邻结合。当与靶标杂交时，MIP形成从一端到另一端的回环，使得末端相对于彼此“倒置”，即3’末端位于MIP的5’末端的5’。位于MIP两端的MIP序列被配置成与靶序列杂交，并且在本文中被称为5’和3’同源区。在5’和3’同源区之间是接头区，其基本上不与靶DNA序列互补，因此不与模板DNA杂交，并且其含有一个或多个功能元件，如引物结合位点、标签序列等。通常，MIP需要5’磷酸化以使连接进行。A molecular inverted probe (MIP) is a nucleic acid probe that hybridizes to the complementary sequence of a target nucleic acid of interest. The 5' end of the MIP hybridizes to the 5' end of the target from the 3' end of the MIP, meaning the 5' and 3' ends bind adjacently within a single repeat. When hybridizing with the target, the MIP forms a loop from one end to the other, such that the ends are "inverted" relative to each other, i.e., the 3' end is located at the 5' end of the MIP. The MIP sequences at both ends are configured to hybridize with the target sequence and are referred to herein as the 5' and 3' homologous regions. Between the 5' and 3' homologous regions is the adapter region, which is essentially not complementary to the target DNA sequence and therefore does not hybridize with the template DNA. This adapter region contains one or more functional elements, such as primer binding sites, tag sequences, etc. Typically, the MIP requires 5' phosphorylation for ligation to occur.

本文使用的“环化”是指将核酸的3’末端与核酸的5’末端连接，从而产生连续的未切割环状物或环。“连接的线性MIP”或“LL-MIP”或“连接的线性探针”是指两个或更多个彼此连接的MIP，即一个MIP的3’末端已经延伸(通过“缺口填充”)并连接到与同一DNA模板相邻结合(例如，就在模板上单个重复单元内3’)的不同MIP的5’末端。As used in this article, "circularization" refers to the joining of the 3' end of a nucleic acid to its 5' end, resulting in a continuous, uncut circular structure or loop. "Linked linear MIP," "LL-MIP," or "linked linear probe" refers to two or more MIPs linked together, where the 3' end of one MIP has been extended (through "gap filling") and joined to the 5' end of a different MIP that binds adjacent to the same DNA template (e.g., within a single repeat unit on the template at 3').

本文使用的术语“互补的”是指核苷酸或核酸之间的杂交或Watson-Crick碱基配对，诸如例如双链DNA分子的两条链之间或者寡核苷酸引物与待测序或扩增的单链核酸上的引物结合位点之间的杂交或Watson-Crick碱基配对。互补核苷酸通常是A和T(或A和U)，或者C和G。当经过最佳比对和比较并且具有适当的核苷酸插入或缺失的一条链的核苷酸与另一条链的核苷酸至少约80％，通常至少约90％至95％，并且更优选约98至100％配对时，认为两个单链RNA或DNA分子是互补的。可选地，当RNA或DNA链在选择性杂交条件下与其互补物杂交时，存在互补性。通常，当在至少14至25个核苷酸的一段上有至少约65％的互补性，优选至少约75％，更优选至少约90％的互补性时，将发生选择性杂交。参见M.KanehisaNucleic Acids Res.12:203(1984)。通常，对于MIP的同源区，与感兴趣的靶核酸具有100％的互补性，尽管在某些实施方案中可能包括错配。As used herein, the term "complementary" refers to hybridization or Watson-Crick base pairing between nucleotides or nucleic acids, such as, for example, between the two strands of a double-stranded DNA molecule or between an oligonucleotide primer and a primer binding site on a single-stranded nucleic acid to be sequenced or amplified. Complementary nucleotides are typically A and T (or A and U), or C and G. Two single-stranded RNA or DNA molecules are considered complementary when, after optimal alignment and comparison, the nucleotides of one strand with appropriate nucleotide insertions or deletions pair with the nucleotides of the other strand at least about 80%, typically at least about 90% to 95%, and more preferably about 98% to 100%. Alternatively, complementarity exists when the RNA or DNA strand hybridizes with its complement under selective hybridization conditions. Typically, selective hybridization occurs when there is at least about 65% complementarity, preferably at least about 75%, and more preferably at least about 90% complementarity in a segment of at least 14 to 25 nucleotides. See M. Kanehisa Nucleic Acids Res. 12:203 (1984). Typically, the homologous region of a MIP has 100% complementarity with the target nucleic acid of interest, although mismatches may be included in some implementations.

“缺口填充”是本文所述的反应，其中通过与互补靶核酸杂交的分子倒置探针的5’和3’末端之间的聚合酶作用填充缺口。缺口填充在本文中也称为结合MIP探针的3’末端的“延伸”，其中延伸覆盖缺口并允许延伸的3’末端和结合的5’端之间的连接。在优选的实施方案中，填充的缺口由1-3个核苷酸组成或被限制在单个重复基序单元内。"Gap filling" is the reaction described herein, in which a gap is filled by polymerase action between the 5' and 3' ends of a molecularly inverted probe that hybridizes to a complementary target nucleic acid. Gap filling is also referred to herein as "extension" of the 3' end of the MIP probe, wherein the extension covers the gap and allows for ligation between the extended 3' end and the bound 5' end. In a preferred embodiment, the filled gap consists of 1-3 nucleotides or is confined within a single repeat motif unit.

本文使用的“同源区”，例如“3’同源区”或“5’同源区”是分子倒置探针中与感兴趣的靶核酸互补的那些部分。MIP通常具有两个同源区(HR)，一个在探针的5’末端或附近，并且一个在3’末端或附近。在优选的实施方案中，本发明的HR适合于能够与感兴趣的含有串联重复的靶核酸杂交，从而使它们被靶DNA序列上的单个重复单元内的1-3个核苷酸的缺口隔开。在优选的实施方案中，同源区与靶序列100％互补，但在一些实施方案中，可以在给定同源区中存在错配，例如1个、2个、3个或更多个错配。The term "homologous region" as used herein, such as "3' homologous region" or "5' homologous region," refers to those portions of a molecularly inverted probe that are complementary to the target nucleic acid of interest. A MIP typically has two homologous regions (HRs), one at or near the 5' end of the probe and one at or near the 3' end. In a preferred embodiment, the HRs of the present invention are adapted to hybridize with target nucleic acids of interest containing tandem repeats, such that they are separated by a 1-3 nucleotide gap within a single repeat unit on the target DNA sequence. In a preferred embodiment, the homologous region is 100% complementary to the target sequence; however, in some embodiments, mismatches may exist within a given homologous region, such as one, two, three, or more mismatches.

本文使用的术语“杂交”是指这样的过程，其中两个单链多核苷酸非共价地结合以形成稳定的双链多核苷酸；三链杂交在理论上也是可能的。所得(通常)双链多核苷酸为“杂交体”。杂交通常在严格条件下，例如，盐浓度不超过约1M并且温度为至少25℃进行。杂交可以在试剂如约0.1mg/ml的鲱鱼精DNA或约0.5mg/ml的(乙酰化的)BSA存在下进行。由于其它因素可能影响杂交的严格性，包括碱基组成和互补链的长度，参数的组合比单独的任何一个的绝对测量更重要。可以在例如Michael R.Green&Joseph Sambrook,MolecularCloning:A Laboratory Manual,(第4版，2012)中找到适用于各种测定的杂交条件的另外的指导。As used herein, the term "hybridization" refers to a process in which two single-stranded polynucleotides non-covalently combine to form a stable double-stranded polynucleotide; triple-strand hybridization is also theoretically possible. The resulting (usually) double-stranded polynucleotide is called a "hybrid." Hybridization is typically performed under stringent conditions, for example, with a salt concentration not exceeding about 1 M and a temperature of at least 25 °C. Hybridization can be performed in the presence of reagents such as about 0.1 mg/ml of herring sperm DNA or about 0.5 mg/ml of (acetylated) BSA. Because other factors can affect the stringency of hybridization, including base composition and the length of the complementary strand, the combination of parameters is more important than the absolute measurement of any one of them alone. Further guidance on hybridization conditions applicable to various assays can be found, for example, in Michael R. Green & Joseph Sambrook, Molecular Cloning: A Laboratory Manual, (4th edition, 2012).

“聚合酶链式反应”或“PCR”意指通过DNA互补链的同时引物延伸体外扩增特定DNA序列的反应，这在本领域是众所周知的。PCR是用于制备侧翼连接有引物结合位点的靶核酸的多拷贝或复制物的反应，这样的反应包括以下步骤的一次或多次重复：(i)使靶核酸变性，(ii)将引物退火至引物结合位点，和(iii)在三磷酸核苷的存在下通过核酸聚合酶延伸引物。通常，反应在热循环仪中通过针对每个步骤优化的不同温度进行循环。特定的温度、每个步骤的持续时间和步骤之间的变化率取决于本领域普通技术人员众所周知的许多因素，例如参考文献：McPherson等人编辑,PCR:A Practical Approach and PCR2:APractical Approach(IRL Press,Oxford,分别为1991年和1995年)所例示的。例如，在使用Taq DNA聚合酶的常规PCR中，双链靶核酸可以在>90℃的温度下变性，引物在50-75℃范围的温度下退火，以及引物在72-78℃范围的温度下延伸。术语“PCR”涵盖反应的衍生形式，包括但不限于RT-PCR、实时PCR、巢式PCR、定量PCR、多重PCR等。“多重PCR”意指其中多个靶序列(或单个靶序列和一个或多个参考序列)在相同的反应混合物中同时进行的PCR，例如Bernard等人,Anal.Biochem.,273:221-228(1999)(双色实时PCR)。通常，对于每个被扩增的序列使用不同的引物组。“定量PCR”或“qPCR”意指设计用于测量样品或样本中一个或多个特定靶序列的丰度的PCR。定量PCR包括这种靶序列的绝对定量和相对定量。使用一个或多个参考序列进行定量测量，所述参考序列可以与靶序列分开测定或一起测定。对于样品或样本，参考序列可以是内源性的或外源性的，在后一种情况下，参考序列可以包括一个或多个竞争模板。典型的内源性参考序列包括以下基因的转录物的区段：β-肌动蛋白、GAPDH、β₂-微球蛋白、核糖体RNA等。定量PCR的技术是本领域普通技术人员众所周知的，如以下参考文献中所例示的：Freeman等人,Biotechniques,26:112-126(1999)；Becker-Andre等人,Nucleic Acids Research,17:9437-9447(1989)；Zimmerman等人,Biotechniques,21:268-279(1996)；Diviacco等人,Gene,122:3013-3020(1992)；Becker-Andre等人,NucleicAcids Research,17:9437-9446(1989)。"Polymerase chain reaction" or "PCR" refers to the in vitro amplification of a specific DNA sequence by simultaneous primer extension of the complementary strand of DNA, as is well known in the art. PCR is a reaction used to prepare multiple copies or replicas of a target nucleic acid flanked by primer binding sites. Such a reaction comprises one or more repetitions of the following steps: (i) denaturing the target nucleic acid, (ii) annealing the primers to the primer binding sites, and (iii) extending the primers by a nucleic acid polymerase in the presence of nucleoside triphosphates. Typically, the reaction is cycled in a thermal cycler at different temperatures optimized for each step. The specific temperature, the duration of each step, and the rate of change between steps depend on many factors well known to those skilled in the art, such as those illustrated in references: McPherson et al. (eds.), PCR: A Practical Approach and PCR2: A Practical Approach (IRL Press, Oxford, 1991 and 1995, respectively). For example, in conventional PCR using Taq DNA polymerase, double-stranded target nucleic acids can be denatured at temperatures >90°C, primers are annealed in the range of 50–75°C, and primers are extended in the range of 72–78°C. The term “PCR” encompasses various forms of the reaction, including but not limited to RT-PCR, real-time PCR, nested PCR, quantitative PCR, multiplex PCR, etc. “Multiplex PCR” refers to PCR in which multiple target sequences (or a single target sequence and one or more reference sequences) are simultaneously performed in the same reaction mixture, e.g., Bernard et al., Anal. Biochem., 273:221-228 (1999) (two-color real-time PCR). Typically, a different set of primers is used for each sequence being amplified. “Quantitative PCR” or “qPCR” refers to PCR designed to measure the abundance of one or more specific target sequences in a sample or specimen. Quantitative PCR includes both absolute and relative quantification of such target sequences. Quantitative measurements are performed using one or more reference sequences, which may be measured separately from or together with the target sequences. For a sample or specimen, the reference sequence can be endogenous or exogenous; in the latter case, the reference sequence may include one or more competing templates. Typical endogenous reference sequences include segments of transcripts of genes such as β-actin, GAPDH, _β2 -microglobulin, and ribosomal RNA. Quantitative PCR techniques are well known to those skilled in the art, as illustrated in the following references: Freeman et al., Biotechniques, 26:112-126 (1999); Becker-Andre et al., Nucleic Acids Research, 17:9437-9447 (1989); Zimmerman et al., Biotechniques, 21:268-279 (1996); Diviacco et al., Gene, 122:3013-3020 (1992); Becker-Andre et al., Nucleic Acids Research, 17:9437-9446 (1989).

本文使用的术语“引物”是指单链寡核苷酸，其能够在合适的条件(例如缓冲液和温度)下，在四种不同的三磷酸核苷和聚合试剂(例如DNA聚合酶、RNA聚合酶或逆转录酶)的存在下，作为模板指导的DNA合成的起始点。在任何给定的情况下，引物的长度取决于例如引物的预期用途，并且通常在15至30个核苷酸的范围。短的引物分子通常需要较冷的温度以与模板形成足够稳定的特异性杂交复合物。引物不需要反映模板的确切序列，但必须足够互补以与这样的模板杂交。引物位点为与引物杂交的模板区域。引物对为一组引物，包括与待扩增序列的5’末端杂交的5’上游引物和与待扩增序列的3’末端的互补物杂交的3’下游引物。As used herein, the term "primer" refers to a single-stranded oligonucleotide that, under suitable conditions (e.g., buffer and temperature), in the presence of four different nucleosides of triphosphate and polymerization reagents (e.g., DNA polymerase, RNA polymerase, or reverse transcriptase), serves as the starting point for template-guided DNA synthesis. In any given case, the length of a primer depends on, for example, its intended use and is typically in the range of 15 to 30 nucleotides. Short primer molecules generally require cooler temperatures to form a sufficiently stable and specific hybridization complex with the template. Primers do not need to reflect the exact sequence of the template, but must be sufficiently complementary to hybridize with such a template. The primer site is the template region that hybridizes with the primer. A primer pair is a set of primers consisting of a 5' upstream primer that hybridizes to the 5' end of the sequence to be amplified and a 3' downstream primer that hybridizes to the complementary element at the 3' end of the sequence to be amplified.

“样品”意指来自寻求靶核酸的检测或测量的生物、环境、医学、动物、细菌、植物或患者来源的一定量的材料。通常，样品是细胞的有机体组织的裂解物。通常，本发明上下文中的样品包括含有核酸的材料。生物样品可以是包括人在内的动物流体、固体(例如粪便)或组织，以及液体和固体食物及饲养产品和成分(如乳制品、蔬菜、肉和肉副产品)和废物。生物样品可以包括取自患者的材料，包括但不限于培养物、血液、唾液、脑脊液、胸膜液、乳汁、淋巴、痰、精液、针吸出物等。生物样品可以获自家畜的所有不同家族以及野化或野生动物，包括但不限于诸如有蹄类动物、熊、鱼、啮齿动物等的动物。生物样品也可以获自植物，如玉米、稻、小麦、莴苣和胡椒。"Sample" means a quantity of material from a biological, environmental, medical, animal, bacterial, plant, or patient source from which detection or measurement of a target nucleic acid is sought. Typically, a sample is a lysate of an organism's cellular tissue. Generally, in the context of this invention, a sample includes material containing nucleic acids. Biological samples can be fluids, solids (e.g., feces), or tissues of animals, including humans, as well as liquid and solid food and feed products and ingredients (e.g., dairy products, vegetables, meat and meat by-products) and waste. Biological samples can include material taken from a patient, including but not limited to cultures, blood, saliva, cerebrospinal fluid, pleural fluid, milk, lymph, sputum, semen, needle aspirates, etc. Biological samples can be obtained from all different families of livestock as well as from wild or wild animals, including but not limited to animals such as ungulates, bears, fish, rodents, etc. Biological samples can also be obtained from plants such as corn, rice, wheat, lettuce, and pepper.

本文使用的术语“感兴趣的靶核酸”是指推定包括感兴趣的靶序列，例如含串联重复的区域，如端粒或基因组的含ATGG的区域的样品核酸。关于MIP，感兴趣的靶序列包括与MIP同源区互补的那些序列。As used in this article, "target nucleic acid of interest" refers to nucleic acids in a sample that presumably include target sequences of interest, such as regions containing tandem repeats, like telomeres, or ATGG-containing regions of the genome. Regarding MIPs, target sequences of interest include those sequences complementary to homologous regions of the MIP.

串联重复是指核苷酸序列，在所述核苷酸序列中，2-60个核苷酸长的短核苷酸序列单元重复多次，其中重复单元彼此直接相邻。每个重复单元的确切核苷酸序列被称为重复基序。重复基序(在本文中也被称为重复单元)的常见实例包括[CA]、[CAG]、[GATA]等。根据重复基序的长度，将串联重复分类为微卫星和小卫星。通常，微卫星的重复基序单元短且小于10个核苷酸。二核苷酸重复(例如[CA]_n重复)是人类基因组中最有名的微卫星或短串联重复之一。小卫星中重复基序的长度范围为10至60个核苷酸。作为一个实例，人类端粒中的重复基序[TTAGGG]可以在每个染色体末端的端粒内重复成百上千次。因此，[TTAGGG]是对人类端粒重复特异的序列基序，并且[TTAGGG]_n用于表示端粒中基序的重复性质。应理解，串联重复序列可以指含有串联重复的DNA的一条链或两条链。串联重复在基因组中可以是变化的，如在端粒的情况下，这意味着重复单元的数目、或拷贝数、或携带重复的基因组区域的总大小可以在个体之间、或者个体中的不同细胞或细胞类型之间以任何程度变化，例如1.1、1.2、1.3、1.4、1.5、1.6、1.7、1.8、1.9、2、2.5、3、3.5、4、4.5、5、5.5、6、6.5、7、7.5、8、8.5、9、9.5、10倍或更高。串联重复在基因组中也可以是“稳定的”或相对“恒定的”，如例如在ATGG重复的情况下，这意味着重复单元的数目、或拷贝数、或携带重复的基因组区域的总大小的变化在个体之间、或者个体中的不同细胞或细胞类型之间不超过例如1％、2％、3％、4％、5％、6％、7％、8％、9％、10％、15％、20％、25％、30％或更多。Tandem repeats are nucleotide sequences in which short nucleotide sequence units, 2–60 nucleotides in length, are repeated multiple times, with the repeating units directly adjacent to each other. The exact nucleotide sequence of each repeating unit is called a repeat motif. Common examples of repeat motifs (also referred to as repeating units in this document) include [CA], [CAG], [GATA], etc. Tandem repeats are classified into microsatellites and microsatellites based on their length. Typically, microsatellite repeat motif units are short and less than 10 nucleotides. Dinucleotide repeats (e.g., [CA] _n repeats) are among the most well-known microsatellites or short tandem repeats in the human genome. Repeat motifs in microsatellites range in length from 10 to 60 nucleotides. As an example, the repeat motif [TTAGGG] in human telomeres can be repeated hundreds or thousands of times within the telomere at the end of each chromosome. Therefore, [TTAGGG] is a sequence motif specific to human telomere repeats, and [TTAGGG] _n is used to indicate the repeating nature of the motif in telomeres. It should be understood that a tandem repeat sequence can refer to one or two strands of DNA containing tandem repeats. Tandem repeats can vary in the genome, such as in the case of telomeres. This means that the number of repeat units, or copy number, or the total size of the genomic region carrying the repeat can vary to any degree between individuals or between different cells or cell types within an individual, for example, 1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7, 1.8, 1.9, 2, 2.5, 3, 3.5, 4, 4.5, 5, 5.5, 6, 6.5, 7, 7.5, 8, 8.5, 9, 9.5, 10 times or more. Tandem repeats in the genome can also be “stable” or relatively “constant,” such as in the case of ATGG repeats, which means that the number of repeat units, or copy number, or the total size of the genomic region carrying the repeat does not vary between individuals, or between different cells or cell types within an individual, by more than, for example, 1%, 2%, 3%, 4%, 5%, 6%, 7%, 8%, 9%, 10%, 15%, 20%, 25%, 30% or more.

“杂交条件”为预期导致互补序列之间特异性杂交的条件，例如，关于完全匹配的互补靶标，当测试核酸也与探针至少50％杂交(例如，在相同的杂交条件下定量)时，即，在完全匹配的探针以与针对与任何未匹配的靶核酸杂交所观察到的信噪比高至少约5x-10x的信噪比结合完全匹配的互补靶标的条件下，至少为探针与靶标杂交的一半的信噪比，则认为测试核酸与探针核酸特异性杂交。"Hybridization conditions" are the conditions expected to result in specific hybridization between complementary sequences. For example, regarding perfectly matched complementary targets, when the test nucleic acid also hybridizes with the probe at least 50% (e.g., quantified under the same hybridization conditions), that is, when a perfectly matched probe binds to a perfectly matched complementary target with a signal-to-noise ratio at least about 5x-10x higher than that observed for hybridization with any unmatched target nucleic acid, the test nucleic acid is considered to have specifically hybridized with the probe nucleic acid.

如本文所用，在描述两个或更多个多核苷酸序列的上下文中，术语“相同的”或“同一性百分比”是指两个或更多个相同的序列或指定的子序列。当在比较窗口或指定区域中比较和比对最大对应性时，“基本上相同的”两个序列具有至少60％的同一性，优选65％、70％、75％、80％、85％、90％、91％、92％、93、94％、95％、96％、97％、98％、99％、或100％的同一性，如使用序列比较算法或通过手动比对和目测(如果未指定特定区域)所测量的。关于多核苷酸序列，该定义还指测试序列的互补物。As used herein, in the context of describing two or more polynucleotide sequences, the terms "identical" or "percentage of identity" refer to two or more identical sequences or specified subsequences. Two sequences that are "substantially identical" have at least 60% identity when compared and aligned to the maximum correspondence within a comparison window or specified region, preferably 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identity, as measured using sequence comparison algorithms or by manual alignment and visual inspection (if no specific region is specified). Regarding polynucleotide sequences, this definition also refers to the complement of the test sequence.

对于序列比较，通常一个序列用作参考序列，将测试序列与该参考序列进行比较。当使用序列比较算法时，将测试和参考序列输入到计算机中，如果需要，指定子序列坐标，并且指定序列算法程序参数。可以使用默认的程序参数，或者可以指定可选的参数。然后序列比较算法基于程序参数来计算测试序列相对于参考序列的序列同一性百分比。对于核酸和蛋白质的序列比较，使用以下讨论的BLAST 2.0算法和默认参数。For sequence comparisons, a reference sequence is typically used, and the test sequence is compared to this reference sequence. When using a sequence comparison algorithm, the test and reference sequences are input into the computer, and subsequence coordinates are specified if necessary, along with the sequence algorithm program parameters. Default program parameters can be used, or optional parameters can be specified. The sequence comparison algorithm then calculates the percentage of sequence identity between the test sequence and the reference sequence based on the program parameters. For nucleic acid and protein sequence comparisons, the BLAST 2.0 algorithm and default parameters discussed below are used.

本文使用的“比较窗口”包括提及的选自20至600，通常约50至约200，更通常约100至约150个的连续位置数目中的任一个的片段，其中在将两个序列最佳比对后，可以将序列与相同数目的连续位置的参考序列进行比较。The “comparison window” used in this article includes any of the following segments selected from 20 to 600, typically about 50 to about 200, and more typically about 100 to about 150 consecutive positions, wherein after the two sequences are best aligned, the sequence can be compared with a reference sequence of the same number of consecutive positions.

用于确定序列同一性和序列相似性百分比的算法为BLAST 2.0算法，其描述于Altschul等人,(1990)J.Mol.Biol.215:403-410。用于进行BLAST分析的软件可在国家生物技术信息中心网站(National Center for Biotechnology Information website,ncbi.nlm.nih.gov)公开获得。该算法涉及首先通过鉴定问询序列中的长度W的短字符来鉴定高评分序列对(HSP)，当与数据库序列中相同长度的字符比对时，所述短字符能匹配或满足一定的正值阈值得分T。T被称为邻近字符得分阈值(Altschul等人，同上)。这些最初的邻近字符命中作为种子，用于起始搜索以找到含有它们的较长的HSP。然后沿着每个序列在两个方向延伸字符命中，直到能增加累积比对得分。对于核苷酸序列，利用参数M(匹配残基对的奖励得分；总是>0)和N(错配残基的罚分；总是<0)计算累积得分。对于氨基酸序列，使用评分矩阵来计算累积得分。当出现以下情形时，终止每个方向上字符命中的延伸：累积比对得分从其最大实现值掉落X的量；由于累积一个或多个负评分的残基比对而使累积得分达到0或低于0；或者到达任一序列的末端。BLAST算法参数W、T和X决定比对的灵敏度和速度。BLASTN程序(对于核苷酸序列)默认使用：字长(W)为28、预期值(E)为10、M＝1、N＝-2以及两条链比较(参见Henikoff&Henikoff,Proc.Natl.Acad.Sci.USA 89:10915(1989))。The algorithm used to determine sequence identity and sequence similarity percentage is the BLAST 2.0 algorithm, described in Altschul et al., (1990) J.Mol.Biol.215:403-410. The software used for BLAST analysis is publicly available on the website of the National Center for Biotechnology Information (ncbi.nlm.nih.gov). The algorithm involves first identifying high-scoring sequence pairs (HSPs) by identifying short characters of length W in the query sequence. These short characters match or satisfy a certain positive threshold score T when compared to characters of the same length in a database sequence. T is called the neighboring character score threshold (Altschul et al., ibid.). These initial neighboring character hits serve as seeds for initiating the search to find longer HSPs containing them. Character hits are then extended in both directions along each sequence until the cumulative alignment score increases. For nucleotide sequences, a cumulative score is calculated using parameters M (reward score for matching residue pairs; always >0) and N (penalty score for mismatched residues; always <0). For amino acid sequences, a scoring matrix is used to calculate the cumulative score. Extension of a character hit in each direction is terminated when: the cumulative alignment score drops by X from its maximum realized value; the cumulative score reaches 0 or is below 0 due to the accumulation of one or more negatively scored residue alignments; or the end of either sequence is reached. The BLAST algorithm parameters W, T, and X determine the sensitivity and speed of the alignment. The BLASTN program (for nucleotide sequences) defaults to: word length (W) 28, expected value (E) 10, M = 1, N = -2, and two-strand comparison (see Henikoff & Henikoff, Proc. Natl. Acad. Sci. USA 89:10915 (1989)).

BLAST算法还对两个序列之间的相似性进行统计分析(参见，例如Karlin&Altschul,Proc.Nat’l.Acad.Sci.USA 90:5873-5787(1993))。BLAST算法提供的一个相似性的量度是最小和概率(P(N))，其提供了两个核苷酸或氨基酸序列之间偶然发生匹配的概率的指示。例如，如果测试核酸与参考核酸的比较中的最小和概率小于约0.2，更优选小于约0.01，并且最优选小于约0.001，则认为该核酸与参考序列相似。The BLAST algorithm also performs statistical analysis on the similarity between two sequences (see, for example, Karlin & Altschul, Proc. Nat’l. Acad. Sci. USA 90:5873-5787 (1993)). One measure of similarity provided by the BLAST algorithm is the minimum sum probability (P(N)), which provides an indication of the probability that a match will occur by chance between two nucleotide or amino acid sequences. For example, if the minimum sum probability in the comparison of the test nucleic acid with the reference nucleic acid is less than about 0.2, more preferably less than about 0.01, and most preferably less than about 0.001, then the nucleic acid is considered similar to the reference sequence.

3.串联重复靶向分子倒置探针 3. Tandem repeating targeted molecular inverted probe

本发明的MIP为单链DNA探针，其包含三个区域：位于MIP的5’部分的5’同源区，位于MIP的3’部分的3’同源区以及位于5’和3’同源区之间的中心接头区。如下文更详细描述的，5’和3’同源区被设计成与包含串联重复序列的模板DNA的同一条链同源，使得3’末端能够与5’末端相邻的模板(即就在模板DNA的3’方向)结合，允许3’末端通过DNA聚合酶延伸并连接到相邻的5’末端。然而，应理解，由于模板DNA内重复序列的性质，5’和3’末端能够与串联重复内的任何重复单元结合，并且因此当结合时可以被多个重复单元隔开。相比之下，接头区在杂交的3’和5’同源区之间成环，并且不与模板DNA结合。The MIP of this invention is a single-stranded DNA probe comprising three regions: a 5' homologous region located in the 5' portion of the MIP, a 3' homologous region located in the 3' portion of the MIP, and a central adapter region located between the 5' and 3' homologous regions. As described in more detail below, the 5' and 3' homologous regions are designed to be homologous to the same strand of the template DNA containing tandem repeat sequences, such that the 3' end can bind to the template adjacent to the 5' end (i.e., in the 3' direction of the template DNA), allowing the 3' end to extend and connect to the adjacent 5' end via DNA polymerase. However, it should be understood that due to the nature of the repeat sequences within the template DNA, the 5' and 3' ends can bind to any repeat unit within the tandem repeat, and therefore can be separated by multiple repeat units when binding. In contrast, the adapter region forms a loop between the hybridized 3' and 5' homologous regions and does not bind to the template DNA.

同源区的长度通常为15-25个核苷酸，例如长度为18-22个核苷酸，并且通常与模板内的串联重复序列100％互补(即，它们与模板DNA的非杂交链100％相同)，尽管也可以使用小于100％互补，例如90％、95％、96％、97％、98％或99％互补或者含有例如1个、2个、3个或更多个错配的同源区。5’和3’区通常延伸到MIP的相应末端，即5’同源区的5’末端也是MIP的5’末端，并且3’同源区的3’末端也是MIP的3’末端。在与模板串联重复序列小于100％互补的情况下，错配的核苷酸通常不包含MIP的末端5’和3’核苷酸(即，在本方法中将由DNA聚合酶延伸和/或连接的末端核苷酸)。与其它测定(例如实时PCR端粒T/S比测定)相比，本发明MIP的5’和3’末端的杂交序列优选与没有错配碱基的端粒重复基序[TTAGGG]_n的区段100％相同。Homologous regions are typically 15-25 nucleotides long, for example, 18-22 nucleotides long, and are usually 100% complementary to the tandem repeat sequence within the template (i.e., they are 100% identical to the nonhybridized strand of the template DNA), although less than 100% complementarity, such as 90%, 95%, 96%, 97%, 98%, or 99%, or containing, for example, one, two, three, or more mismatched homologous regions, may also be used. The 5' and 3' regions typically extend to the corresponding ends of the MIP; that is, the 5' end of the 5' homologous region is also the 5' end of the MIP, and the 3' end of the 3' homologous region is also the 3' end of the MIP. In cases where the complementarity to the template tandem repeat sequence is less than 100%, the mismatched nucleotides typically do not contain the terminal 5' and 3' nucleotides of the MIP (i.e., the terminal nucleotides that will be extended and/or ligated by DNA polymerase in this method). Compared to other assays (e.g., real-time PCR telomere T/S ratio assays), the hybridization sequences at the 5' and 3' ends of the MIP of the present invention are preferably 100% identical to the segments of the telomere repeat motif [TTAGGG] _n without mismatched bases.

中心接头区可以具有不同的大小，但长度通常为约30-200个核苷酸，例如长度为30个、35个、40个、45个、50个、55个、60个、65个、70个、75个、80个、85个、90个、95个、100个、110个、120个、130个、140个、150个、160个、170个、180个、190个或200个核苷酸，或者对应于该范围内的任何整数的数值，并且除了具有足够的长度以允许5’和3’同源区形成环从而结合相同的模板DNA之外，所述中心接头区还可以含有用于例如分离、纯化、检测、切割或定量本发明所述方法的产物的多种功能序列中的任一种。例如，接头区可以含有通用引物序列、MIP特异性引物序列、探针序列(例如TaqMan探针序列)、标签ID或条形码序列和限制酶识别或切割位点。The central adapter region can vary in size, but its length is typically about 30-200 nucleotides, for example, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, or 200 nucleotides, or any integer value corresponding to this range. In addition to having sufficient length to allow the 5' and 3' homologous regions to form a loop for binding the same template DNA, the central adapter region may also contain any of a variety of functional sequences for, for example, separating, purifying, detecting, cleaving, or quantifying the product of the method described in this invention. For example, the adapter region may contain universal primer sequences, MIP-specific primer sequences, probe sequences (e.g., TaqMan probe sequences), tag ID or barcode sequences, and restriction enzyme recognition or cleavage sites.

串联重复序列和同源区tandem repeats and homologous regions

本发明的组合物和方法可以用于检测基因组中任何包含串联重复序列的区域的拷贝数或长度，所述区域包括总长度或大小即重复的拷贝数在个体之间变化的区域，如端粒。可以使用本发明的方法和组合物评估任何这样的序列。在许多实施方案中，所述方法和组合物用于确定端粒，例如人类端粒的精确或近似拷贝数，即总长度。在许多实施方案中，还评估了以下的串联重复：其大小或拷贝数在个体之间基本上不变化，并且因此被称为是“稳定的”或“恒定的”，例如人类基因组内的4-bp短串联重复[ATGG]序列。The compositions and methods of the present invention can be used to detect the copy number or length of any region in the genome containing tandem repeat sequences, including regions where the total length or size, i.e., the copy number of the repeat, varies between individuals, such as telomeres. Any such sequence can be evaluated using the methods and compositions of the present invention. In many embodiments, the methods and compositions are used to determine the exact or approximate copy number, i.e., the total length, of telomeres, such as human telomeres. In many embodiments, tandem repeats whose size or copy number is substantially invariant between individuals and are therefore referred to as “stable” or “constant,” such as 4-bp short tandem repeat [ATGG] sequences in the human genome, are also evaluated.

无论单个重复单元的长度和基因组中串联重复的拷贝数如何，均可以分析串联重复。例如，6-核苷酸的人类端粒序列(TTAGGG)可以重复成百上千次，覆盖例如基因组内1-11kb，但也可以评估重复次数比此重复次数少的串联重复序列的拷贝数，例如基因组中数十至数百次重复，或者重复次数比此重复次数多的串联重复序列的拷贝数，例如，基因组中成千上万次。Tandem repeats can be analyzed regardless of the length of a single repeat unit or the copy number of tandem repeats in the genome. For example, a 6-nucleotide human telomere sequence (TTAGGG) can be repeated hundreds or thousands of times, covering, for example, 1-11 kb in the genome. However, it is also possible to assess the copy number of tandem repeat sequences with fewer repetitions, such as tens to hundreds of repetitions in the genome, or the copy number of tandem repeat sequences with more repetitions, such as tens of thousands of repetitions in the genome.

本发明的方法可以用于评估个体中端粒或其它串联重复区的总长度，例如，以在一次的基础上评估端粒是长于还是短于标准参考长度，例如，用于确定它是否指示与改变的端粒长度相关的病况的存在或缺乏，并且它们也可以用于评估单个个体中端粒或其它串联重复区的进化，例如，评估治疗方案对例如端粒长度的影响的功效。The methods of the present invention can be used to assess the total length of telomeres or other tandem repeats in an individual, for example, to assess on a single basis whether telomeres are longer or shorter than a standard reference length, for example, to determine whether it indicates the presence or absence of a condition associated with altered telomere length, and they can also be used to assess the evolution of telomeres or other tandem repeats in a single individual, for example, to assess the efficacy of treatment regimens on, for example, telomere length.

使用本发明方法评估的串联重复内的单独重复序列单元的长度可以是任何大小，例如2-20个核苷酸，例如2个、3个、4个、5个、6个、7个、8个、9个、10个、11个、12个、13个、14个、15个、16个、17个、18个、19个、20个核苷酸或更多。在一些实施方案中，重复序列是6个核苷酸，如在人类端粒中，或者4个核苷酸，如在人类ATGG重复或AAGG重复中。在某些实施方案中，评估了与某些疾病状态一起扩展的重复，例如，随着脆性X综合征扩展的CGG三联体重复或随着亨廷顿病扩展的CAG三联体重复。The length of the individual repeat sequence units within the tandem repeats evaluated using the method of this invention can be of any size, such as 2-20 nucleotides, for example 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20 nucleotides or more. In some embodiments, the repeat sequence is 6 nucleotides, as in human telomeres, or 4 nucleotides, as in human ATGG or AAGG repeats. In some embodiments, repeats that expand with certain disease states are evaluated, for example, CGG triplet repeats that expand with Fragile X syndrome or CAG triplet repeats that expand with Huntington's disease.

除了评估可变串联重复如端粒外，所述方法和组合物还可以用于评估个体之间相对恒定或稳定的重复序列的长度或拷贝数，例如，当评估基因组中可变区例如端粒的大小时用作内部对照。这种恒定序列的大小优选在个体之间的变化例如平均小于50％、40％、30％、20％、10％、5％、4％、3％、2％、1％或更少，或者个体之间的相对水平变化平均不超过例如0.5、0.6、0.7、0.8、0.9、1、1.1、1.2、1.3、1.4、1.5、1.6、1.7、1.8、1.9或2倍。在一些实施方案中，本发明方法中使用的恒定序列以大的拷贝数存在于基因组中，例如包含成百上千个串联重复，使得它们以与可变重复序列(例如端粒)类似的水平存在。这种高拷贝数对照序列的使用提供了优于评估串联重复(例如端粒长度)的传统方法的显著优点，所述传统方法倾向于使用非重复序列作为对照。In addition to assessing variable tandem repeats such as telomeres, the methods and compositions can also be used to assess the length or copy number of relatively constant or stable repetitive sequences between individuals, for example, as an internal control when assessing the size of variable regions in the genome, such as telomeres. The size of such constant sequences preferably varies between individuals, for example, by an average of less than 50%, 40%, 30%, 20%, 10%, 5%, 4%, 3%, 2%, 1%, or less, or by an average relative level variation between individuals not exceeding, for example, 0.5, 0.6, 0.7, 0.8, 0.9, 1, 1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7, 1.8, 1.9, or 2-fold. In some embodiments, the constant sequences used in the methods of the present invention exist in the genome at large copy numbers, for example, containing hundreds or thousands of tandem repeats, such that they exist at a level similar to that of variable repeat sequences (e.g., telomeres). The use of such high copy number control sequences provides a significant advantage over conventional methods for assessing tandem repeats (e.g., telomere length), which tend to use non-repetitive sequences as controls.

对于可变和恒定重复，所述方法可以用于评估基因组中含重复的区域的总大小，例如，所述区域的长度达到例如50％、55％、60％、65％、70％、75％、80％、85％、90％、95％、96％、97％、98％、99％或更高的精确度，或者大约或精确的重复数目达到例如50％、55％、60％、65％、70％、75％、80％、85％、90％、95％、96％、97％、98％、99％或更高的精确度。For variable and constant repeats, the method can be used to assess the total size of repeating regions in the genome, for example, the length of the region reaching, for example, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99% or higher precision, or the approximate or exact number of repeats reaching, for example, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99% or higher precision.

在一个实施方案中，序列ATGG被用作恒定的对照重复序列。可以使用的其它串联重复包括AAGG、GAATG、AATGG、ACTCC、CAGC、AACGG、AACAT、ACAGAG、AGGGTC、AAAAT、AAAAG、GAGAGG和AACC等。可以使用许多串联重复中的任一种，条件是它们的量在所研究物种的个体(例如当评估人类端粒长度时在人类中)的基因组中是相对稳定的，并且可以设计适当的MIP以使用如本文所述的方法靶向串联重复。In one implementation, the ATGG sequence is used as a constant control repeat sequence. Other tandem repeats that can be used include AAGG, GAATG, AATGG, ACTCC, CAGC, AACGG, AACAT, ACAGAG, AGGGTC, AAAAT, AAAAG, GAGAGG, and AACC. Any of these tandem repeats can be used, provided that their quantity is relatively stable in the genome of an individual of the species under study (e.g., in humans when assessing human telomere length) and that a suitable MIP can be designed to target the tandem repeat using the methods described herein.

5’和3’同源区5' and 3' cognate regions

MIP的5’和3’同源区包含与模板DNA内的串联重复序列同源的核苷酸序列。在优选的实施方案中，5’和3’区与串联重复序列100％同源，虽然在一些实施方案中，5’和/或3’区可以含有错配，例如，相对于串联重复序列，在给定的同源区内有1个、2个、3个或更多个错配。5’同源区开始于MIP的5’末端并延伸例如15-25个核苷酸到MIP中，3’同源区开始于MIP的3’末端并延伸例如15-25个核苷酸到MIP中。5’和3’同源区与模板DNA的同一条链互补，即它们均包含相同的重复序列，例如包含2个、3个、4个、5个、6个或更多个重复单元(例如靶向人类端粒的18-碱基对同源区将包含3个6-碱基对重复)。设计5’和3’区，使得3’同源区能够就在与模板上相同重复单元内的5’同源区的5’结合模板，但被模板上的少量核苷酸(例如1-3个核苷酸)隔开，使得仅在用DNA聚合酶延伸3’末端以在缺口内添加1-3个核苷酸(即，“缺口填充”)后才发生连接。The 5' and 3' homologous regions of the MIP contain nucleotide sequences homologous to tandem repeat sequences within the template DNA. In a preferred embodiment, the 5' and 3' regions are 100% homologous to the tandem repeat sequences, although in some embodiments, the 5' and/or 3' regions may contain mismatches, for example, one, two, three, or more mismatches within a given homologous region relative to the tandem repeat sequence. The 5' homologous region begins at the 5' end of the MIP and extends, for example, 15-25 nucleotides into the MIP, and the 3' homologous region begins at the 3' end of the MIP and extends, for example, 15-25 nucleotides into the MIP. The 5' and 3' homologous regions are complementary to the same strand of the template DNA, i.e., they both contain the same repeat sequence, for example, containing two, three, four, five, six, or more repeat units (e.g., an 18-base-pair homologous region targeting human telomeres would contain three 6-base-pair repeats). The 5' and 3' regions are designed such that the 3' homologous region can bind to the template at the 5' of the 5' homologous region within the same repeat unit on the template, but is separated by a small number of nucleotides (e.g., 1-3 nucleotides) on the template, so that ligation only occurs after the 3' end is extended with DNA polymerase to add 1-3 nucleotides into the gap (i.e., "gap filling").

当设计用于本发明的MIP，特别是5’和3’同源区时，一个重要的考虑因素是当两个末端在同一DNA模板上的单个重复单元内彼此相邻杂交时，将MIP的5’和3’末端隔开的缺口。通常，为了用于本发明的方法，MIP的5’和3’末端将被少量的核苷酸，例如1个、2个或3个核苷酸隔开。应理解，MIP的5’和3’末端之间缺口的大小可以在给定测定中使用的不同MIP之间不同，即，用于评估可变串联重复序列(例如端粒)的拷贝数的MIP可以是例如1个、2个或3个核苷酸，并且用于评估恒定串联重复序列(例如ATGG)的拷贝数的对照MIP可以是例如1个、2个或3个核苷酸，其中两个缺口中的核苷酸数目彼此独立。When designing the MIP used in this invention, particularly the 5' and 3' homologous regions, an important consideration is the gap separating the 5' and 3' ends of the MIP when the two ends hybridize adjacently within a single repeat unit on the same DNA template. Typically, for the methods of this invention, the 5' and 3' ends of the MIP will be separated by a small number of nucleotides, such as 1, 2, or 3 nucleotides. It should be understood that the size of the gap between the 5' and 3' ends of the MIP can vary between different MIPs used in a given assay; that is, the MIP used to assess the copy number of variable tandem repeat sequences (e.g., telomeres) can be, for example, 1, 2, or 3 nucleotides, and the control MIP used to assess the copy number of constant tandem repeat sequences (e.g., ATGG) can be, for example, 1, 2, or 3 nucleotides, wherein the number of nucleotides in the two gaps is independent of each other.

应理解，可以将5’和3’同源区设计成与模板DNA的任一条链杂交，只要两个区共有相同的序列，即使得它们均靶向同一条链。It should be understood that the 5' and 3' homologous regions can be designed to hybridize with either strand of the template DNA, as long as the two regions share the same sequence, thus making them both target the same strand.

当结合在模板DNA链的单个重复单元内时，将MIP的3’和5’末端隔开的核苷酸数目的另外的考虑是对应于缺口内核苷酸的碱基的同一性。在设计端粒靶向MIP的优选实施方案中，不管缺口内核苷酸的数目如何，核苷酸将包含一种或两种碱基，使得本发明方法的延伸步骤可以在仅对应于缺口内一种或两种碱基的那些脱氧核糖核苷三磷酸(dNTP)的存在下进行。以这种方式，如下所示，存在于反应混合物中的一种或两种dNTP足以允许跨越单个给定重复单元内的缺口延伸，但不足以允许跨越需要例如三种或四种dNTP的多个重复单元延伸。When bound within a single repeat unit of the template DNA strand, another consideration for the number of nucleotides separating the 3' and 5' ends of the MIP is the identity of the bases corresponding to the nucleotides within the nick. In a preferred embodiment of designing telomere-targeting MIPs, regardless of the number of nucleotides within the nick, the nucleotides will contain one or two bases, such that the extension step of the method of the present invention can be carried out in the presence of those deoxyribonucleoside triphosphates (dNTPs) corresponding only to one or two bases within the nick. In this way, as shown below, one or two dNTPs present in the reaction mixture are sufficient to allow extension across the nick within a single given repeat unit, but not sufficient to allow extension across multiple repeat units requiring, for example, three or four dNTPs.

出于说明的目的，由于人类端粒序列基序是GGGTTA，靶向该序列的MIP的模板(互补)DNA链将是[AATCCC]_n或代表长度为3个基序的区段.....CCCAATCCCAATCCCAAT....。因此，靶向该序列的MIP的5’末端可以读取5’-GTTAGGGTTAGGGTTAGGGTT-，并且3’末端可以读取-TAGGGTTAGGGTTAGGGTTA-3’。以这种方式，如果两个末端相邻地结合同一模板，即，3’末端恰好在单个重复内5’末端的3’，则探针的5’末端的G可以杂交直至模板上的AATCCC重复基序中的最后一个(第六个碱基)C，并且探针的3’末端的A可以相邻地向下杂交至同一重复内的T(第三碱基)(即从左至右，在AATCCC内的第三核苷酸终止)。由此，两个末端将被基序中的前两个C(第四个和第五个碱基)隔开，因此，如果DNA聚合酶向MIP的3’末端添加两个G，并且如果第二个添加的G然后与MIP的5’末端的G连接，则所述两个末端从而能够连接在一起。在这种情况下，当两个末端属于相同的MIP时，MIP将被环化。可选地，如果两个末端属于不同的MIP，则将产生连接的线性MIP，其包含两个彼此连接的MIP。For illustrative purposes, since the human telomere sequence motif is GGGTTA, the template (complementary) DNA strand for a MIP targeting this sequence will be [AATCCC] _n or a segment representing a length of 3 motifs...CCCAATCCCAATCCCAAT.... Therefore, the 5' end of a MIP targeting this sequence can be read as 5'-GTTAGGGTTAGGGTTAGGTT-, and the 3' end can be read as -TAGGGTTAGGGTTAGGGTTA-3'. In this way, if both ends bind adjacently to the same template, i.e., the 3' end is exactly the 3' of the 5' end within a single repeat, the G at the 5' end of the probe can hybridize up to the last (sixth base) C in the AATCC C repeat motif on the template, and the A at the 3' end of the probe can hybridize adjacently down to the T (third base) within the same repeat (i.e., from left to right, the third nucleotide terminates within AAT CCC). Thus, the two ends are separated by the first two Cs (the fourth and fifth bases) in the motif, so if DNA polymerase adds two Gs to the 3' end of the MIP, and if the second added G is then linked to the G at the 5' end of the MIP, the two ends can thus be joined together. In this case, when the two ends belong to the same MIP, the MIP will be circularized. Alternatively, if the two ends belong to different MIPs, a joined linear MIP will be produced, which contains two MIPs joined together.

类似地，对于可以用作对照MIP以评估基因组中恒定或稳定重复序列的拷贝数的人类ATGG 4-碱基对重复，模板将是[CCTA]_n或....TACCTACCTACC....。在这种情况下，MIP的5’末端可以是5’-ATGGATGGATGGATGGATGGATGG-，并且MIP的3’末端可以是GGATGGATGGATGGATGGAT-3’。因此，如果两个末端相邻地结合同一模板，则5’末端的A可以杂交直至模板内重复单元之一中的T(即，从右至左，直至CCTACCTA中的第七个核苷酸)，并且3’末端的T可以向下杂交至模板内的A(即，从左至右，直至CCTACCTA中的第四个核苷酸)。在这种情况下，MIP的两个末端也将被单个重复内的两个C隔开，因此，如果DNA聚合酶向MIP的3’末端添加两个G，并且如果第二个G然后与相邻5’末端的A连接，则所述两个末端能够连接。Similarly, for human ATGG 4-base-pair repeats that can be used as control MIPs to assess copy number of constant or stable repetitive sequences in the genome, the template would be [CCTA] _n or ....TACCTACCTACC.... In this case, the 5' end of the MIP could be 5'-ATGGATGGATGGATGGATGGATGG-, and the 3' end of the MIP could be GGATGGATGGATGGATGGAT-3'. Therefore, if the two ends bind to the same template adjacent to each other, the A at the 5' end can hybridize up to the T in one of the repeat units within the template (i.e., from right to left, up to the seventh nucleotide in CCTACC T A), and the T at the 3' end can hybridize down to the A within the template (i.e., from left to right, up to the fourth nucleotide in CCT A CCTA). In this case, the two ends of the MIP will also be separated by two Cs within a single repeat, so if the DNA polymerase adds two Gs to the 3' end of the MIP, and if the second G is then linked to the A at the adjacent 5' end, the two ends can be joined.

在上述两个实例中，即，利用靶向人类端粒序列的MIP和靶向ATGG重复的MIP，单个重复内两个MIP的3’和5’末端之间的缺口内的所有核苷酸包括模板上的碱基C。以这种方式，DNA聚合酶介导的MIP 3’末端的延伸可以在单独存在dGTP，而在反应混合物中不存在其它dNTP的情况下进行。由于这是利用两种MIP的情况，可以进行多重反应，其中两种MIP在同一反应中同时使用，有相同的基因组DNA模板。此外，如果仅使用dGTP，这也意味着延伸只能在单个重复内发生，并且跨越多个重复的延伸(即如果3’和5’末端结合彼此分开的多于一个重复单元的模板)是不可能的，因为这不仅需要dGTP，而且还需要dATP和dTTP。为此，在本发明方法和组合物的许多实施方案中，仅使用或包括一种或两种dNTP。In the two examples above, namely, using a MIP targeting human telomere sequences and a MIP targeting ATGG repeats, all nucleotides within the gap between the 3' and 5' ends of the two MIPs within a single repeat include the C base on the template. In this way, DNA polymerase-mediated extension of the 3' end of the MIP can proceed in the presence of dGTP alone, without other dNTPs in the reaction mixture. Since this utilizes two MIPs, multiplex reactions can be performed, where both MIPs are used simultaneously in the same reaction with the same genomic DNA template. Furthermore, if only dGTP is used, this also means that extension can only occur within a single repeat, and extension across multiple repeats (i.e., if the 3' and 5' ends bind to templates of more than one repeat unit separated from each other) is impossible, as this requires not only dGTP but also dATP and dTTP. Therefore, in many embodiments of the methods and compositions of the present invention, only one or two dNTPs are used or included.

在一个实施方案中，反应中包括的唯一dNTP为dGTP。在其它实施方案中，所包括的仅两个核苷酸为dGTP和dATP。然而，应理解，可以设计MIP，使得可以使用一种或两种dNTP的任何组合，只要所使用的一种或多种dNTP允许在靶向端粒的MIP的串联重复单元内(但不能跨越串联重复单元)延伸。In one embodiment, the only dNTP included in the reaction is dGTP. In other embodiments, only two nucleotides are included, dGTP and dATP. However, it should be understood that MIPs can be designed such that any combination of one or two dNTPs can be used, provided that one or more dNTPs used allow extension within (but not across) the tandem repeat unit of the MIP targeting telomeres.

接头区Connector area

接头区位于5’和3’同源区之间，不与模板DNA杂交，即不与模板DNA互补，并且它既提供间隔功能，也包括允许例如探针的鉴定、分离、切割或扩增的许多元件中的任一种，当5’和3’同源区结合同一模板DNA时，所述间隔功能允许探针呈现环状(环)构象。这种序列元件的实例包括在给定测定中使用的所有MIP通用的引物结合位点，例如通用正向引物结合位点；探针特异性引物结合位点，例如，对每个MIP特异性的反向引物结合位点；探针结合序列，例如由用于定量扩增反应的探针(如TaqMan探针)结合的序列；标签或条形码序列；和切割或限制位点，其可以用于例如在扩增反应之前使环化的MIP线性化。The adapter region, located between the 5' and 3' homologous regions, does not hybridize with the template DNA, i.e., it is not complementary to the template DNA. It serves both a spacer function and includes any of a number of elements that allow for, for example, the identification, separation, cleavage, or amplification of the probe. This spacer function allows the probe to assume a circular (loop) conformation when the 5' and 3' homologous regions bind to the same template DNA. Examples of such sequence elements include primer binding sites common to all MIPs used in a given assay, such as universal forward primer binding sites; probe-specific primer binding sites, such as reverse primer binding sites specific to each MIP; probe binding sequences, such as sequences bound by probes used for quantitative amplification reactions (e.g., TaqMan probes); tag or barcode sequences; and cleavage or restriction sites, which can be used, for example, to linearize the circularized MIP prior to the amplification reaction.

接头区可以具有任何大小，例如20个、25个、30个、35个、40个、45个、50个、55个、60个、65个、70个、75个、80个、85个、90个、95个、100个、110个、120个、130个、140个、150个、160个、170个、180个、190个、200个或更多个核苷酸，并且可以包括任何数目的元件，例如1个、2个、3个、4个、5个、6个、7个、8个、9个、10个或更多个元件。The linker region can be of any size, such as 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, 200 or more nucleotides, and can include any number of elements, such as 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 or more elements.

可以包含在接头区中的标签序列或条形码为可以编码任何期望信息的独特序列，例如样品鉴定序列、靶标信息序列、样品采集时间编码序列等，从而允许通过例如测序或杂交特性来进行它们的鉴定。通过包括这样的序列，可以在单个检测事件中明确地检测出来自多个样品的MIP产物。标签元件可以包括诸如在“条形码”核酸序列内的信息，其可以通过例如测序程序或特异性杂交来读取。可选地，一个或多个标签元件可以是可以将MIP特异性结合至可鉴定阵列位置的亲和部分。例如，可以通过具有与各种标签序列互补的多核苷酸探针的核酸阵列来查询标签。例如，标签可以包含与分支DNA(bDNA)程序的捕获探针互补的序列。bDNA捕获探针可以排列在固体支持物上的阵列中，每个位置对应于特定的标签和样品。任选地，标签可以与具有适当捕获和信号元件的珠粒相对应，例如，用于在FACS流动装置中检测或通过用电荷耦合装置成像。The tag sequence or barcode contained in the adapter region can be a unique sequence that can encode any desired information, such as a sample identification sequence, target information sequence, sample acquisition time encoding sequence, etc., thereby allowing their identification by, for example, sequencing or hybridization characteristics. By including such a sequence, MIP products from multiple samples can be definitively detected in a single detection event. The tag element can include information such as within a “barcode” nucleic acid sequence, which can be read by, for example, a sequencing program or specific hybridization. Optionally, one or more tag elements can be affinity portions that can specifically bind the MIP to an identifiable array location. For example, the tag can be queried by an array of nucleic acid probes with polynucleotide probes complementary to various tag sequences. For example, the tag can contain a sequence complementary to a capture probe of a branched DNA (bDNA) program. The bDNA capture probes can be arranged in an array on a solid support, each location corresponding to a specific tag and sample. Optionally, the tag can correspond to beads with appropriate capture and signaling elements, for example, for detection in a FACS flow apparatus or by imaging with a charge-coupled device.

可以例如通过将许多修饰的核苷酸或碱基中的任一个掺入到用于切割的期望位点的接头区，和/或通过待被例如尿嘧啶N-糖基化酶或本领域普通技术人员已知的其它适当的酶，例如限制酶切割的接头的配置来引入切割位点。The cleavage site can be introduced, for example, by incorporating any of a number of modified nucleotides or bases into a linker region for cleavage, and/or by configuring a linker to be cleaved by, for example, uracil N-glycosylation enzymes or other suitable enzymes known to those skilled in the art, such as restriction enzymes.

其它核酸衍生物也可以掺入到本发明的MIP中，例如以防止它们的消化，特别是当它们暴露于可能含有核酸酶的生物样品时。如本文所用，核酸衍生物为非天然存在的核酸或其单元。核酸衍生物可以含有非天然存在的元件，如非天然存在的核苷酸和非天然存在的主链键合(linkage)。Other nucleic acid derivatives may also be incorporated into the MIP of the present invention, for example, to prevent their digestion, particularly when they are exposed to biological samples that may contain nucleases. As used herein, nucleic acid derivatives are non-naturally occurring nucleic acids or units thereof. Nucleic acid derivatives may contain non-naturally occurring elements, such as non-naturally occurring nucleotides and non-naturally occurring backbone linkages.

核酸衍生物可以含有主链修饰，诸如但不限于硫代磷酸酯键合、磷酸二酯修饰的核酸、硫代磷酸酯修饰、磷酸二酯和硫代磷酸酯核酸的组合、甲基膦酸酯、烷基膦酸酯、磷酸酯、烷基硫代膦酸酯、氨基磷酸酯、氨基甲酸酯、碳酸酯、磷酸三酯、乙酰胺酯(acetamidate)、羧甲基酯、甲基硫代磷酸酯、二硫代磷酸酯、对乙氧基以及以上的组合。核酸的主链组成可以是同质的或异质的。Nucleic acid derivatives may contain backbone modifications, such as, but not limited to, phosphate-thioester bonds, phosphodiester-modified nucleic acids, phosphate-thioester modifications, combinations of phosphodiester and phosphate-thioester nucleic acids, methylphosphonates, alkylphosphonates, phosphate esters, alkyl phosphate-thioesters, aminophosphate esters, carbamate esters, carbonate esters, triphosphate esters, acetamidate esters, carboxymethyl esters, methyl phosphate-thioesters, dithiophosphate esters, p-ethoxy groups, and combinations thereof. The backbone composition of nucleic acids can be homogeneous or heterogeneous.

核酸衍生物可以在糖和/或碱基中含有取代或修饰。例如，它们可以包含具有主链糖的核酸，所述主链糖共价连接至除了3’位的羟基和5’位的磷酸基以外的低分子量有机基团(例如，2'-O-烷基化核糖基团)。核酸衍生物可以包括非核糖的糖，如阿拉伯糖。核酸衍生物可以含有取代的嘌呤和嘧啶，如C-5丙炔修饰的碱基、5-甲基胞嘧啶、2-氨基嘌呤、2-氨基-6-氯嘌呤、2,6-二氨基嘌呤、次黄嘌呤、2-硫尿嘧啶和假异胞嘧啶。在一些实施方案中，取代可以包括糖/碱基、与碱基连接的基团(包括生物素、荧光基团(荧光素、青色素、罗丹明等))、化学反应性基团(包括羧基、NHS、硫醇等)或它们的任何组合中的一个或多个取代/修饰。Nucleic acid derivatives may contain substitutions or modifications in the sugar and/or base. For example, they may comprise nucleic acids having a backbone sugar covalently linked to a low molecular weight organic group (e.g., a 2'-O-alkylated ribose group) other than a hydroxyl group at the 3' position and a phosphate group at the 5' position. Nucleic acid derivatives may include non-ribose sugars, such as arabinose. Nucleic acid derivatives may contain substituted purines and pyrimidines, such as C-5 propyne-modified bases, 5-methylcytosine, 2-aminopurine, 2-amino-6-chloropurine, 2,6-diaminopurine, hypoxanthine, 2-thiouracil, and pseudoisocytosine. In some embodiments, substitutions may include one or more of the following: sugar/base, a group linked to the base (including biotin, fluorescent groups (fluorescein, cyanin, rhodamine, etc.)), a chemically reactive group (including carboxyl, NHS, thiols, etc.), or any combination thereof.

在一些实施方案中，使用其它方法来增强LL-MIP产物的形成，如通过使用MIP的接头区中的锁核酸碱基来预形成内部自环。锁核酸(LNA)含有一个或多个核苷酸构建模块，其中额外的亚甲基桥将核糖部分固定在C3’-endo(β-D-LNA)或C2'-endo(α-L-LNA)构象中。可使用商业核酸合成仪和标准亚磷酰胺化学来合成LNA。In some implementations, other methods are used to enhance the formation of LL-MIP products, such as pre-forming internal self-loops by using locked nucleic acid bases in the linker region of the MIP. Locked nucleic acids (LNAs) contain one or more nucleotide building blocks in which additional methylene bridges anchor the ribose moiety in a C3’-endo (β-D-LNA) or C2’-endo (α-L-LNA) conformation. LNAs can be synthesized using commercial nucleic acid synthesizers and standard phosphoramide chemistry.

4.生物样品和基因组DNA的分离 4. Isolation of biological samples and genomic DNA

为了实施本发明的方法，将一种或多种MIP探针例如在多重反应中与需要测定串联重复区(如端粒)的长度(例如，近似或精确的拷贝数)的来自个体的含有基因组DNA的生物样品杂交。在一些实施方案中，从以下个体中取样，所述个体具有与改变的端粒大小例如端粒损失相关的特征、生活方式因素或病况，如衰老及多种与衰老相关的病况、认知衰退、精神障碍如抑郁症、精神分裂症、应激-焦虑、癌症、心血管病、糖尿病以及各种生活方式因素如饮食、吸烟、身体活动和饮酒。所述方法可以在这样的个体中进行，或者在被认为易受这样的特征、因素或病况影响的个体中进行，用于例如筛选目的或诊断目的(即，在将端粒长度与参考端粒长度进行比较的一次性评估中)，或者监测对给定治疗方案的反应(即，其中随时间推移反复测量给定个体的端粒以监测其进展)。To implement the method of the present invention, one or more MIP probes are hybridized, for example in a multiplex reaction, to a biological sample containing genomic DNA from an individual for which the length (e.g., approximate or precise copy number) of tandem repeat regions (such as telomeres) needs to be determined. In some embodiments, samples are taken from individuals who have characteristics, lifestyle factors, or conditions associated with altered telomere size, such as telomere loss, such as aging and various age-related conditions, cognitive decline, mental disorders such as depression, schizophrenia, stress-anxiety, cancer, cardiovascular disease, diabetes, and various lifestyle factors such as diet, smoking, physical activity, and alcohol consumption. The method can be performed in such individuals, or in individuals considered susceptible to such characteristics, factors, or conditions, for purposes such as screening or diagnostic purposes (i.e., in a one-off assessment comparing telomere length to a reference telomere length), or to monitor response to a given treatment regimen (i.e., where telomeres of a given individual are repeatedly measured over time to monitor their progression).

在本发明中可以使用任何来源的基因组DNA，如来自外周血细胞的基因组DNA。也可使用其它细胞类型，例如特定细胞类型，例如癌细胞，以监测特定细胞类型中的端粒长度。合适的样品包括但不限于：生物样品，如组织和体液。例如，样品获自例如血液、尿液、血清、淋巴、唾液、肛门和阴道分泌物、汗液和精液、皮肤、器官等。Genomic DNA from any source can be used in this invention, such as genomic DNA from peripheral blood cells. Other cell types, such as specific cell types like cancer cells, can also be used to monitor telomere length in specific cell types. Suitable samples include, but are not limited to, biological samples such as tissues and body fluids. For example, samples may be obtained from, for example, blood, urine, serum, lymph, saliva, anal and vaginal secretions, sweat and semen, skin, organs, etc.

使用本领域已知的方法从样品中获得靶核酸的分离物。从生物样品中分离核酸通常包括以提取样品中存在的基因组核酸并使其可用于分析的方式处理生物样品。产生提取/分离的基因组核酸的任何分离方法均可以用于本发明的实践中。Isolates of target nucleic acids are obtained from samples using methods known in the art. Isolation of nucleic acids from biological samples typically involves processing the biological sample in a manner that extracts genomic nucleic acids present in the sample and makes them available for analysis. Any isolation method that produces extracted/isolated genomic nucleic acids can be used in the practice of this invention.

通常，使用诸如Sambrook,J.,Fritsch,E R和Maniatis,T.(1980))MolecularCloning:A Laboratory Manual.第2版.Cold Spring Harbor,N.Y.:Cold Spring HarborLaboratory.)中描述的技术提取核酸。其它方法包括：盐析DNA提取(P.Sunnucks等人,Genetics,1996,144:747-756；S.M.Aljanabi and I.Martinez,Nucl.Acids Res.1997,25:4692-4693)、三甲基溴化铵盐DNA提取(S.Gustincich等人,BioTechniques,1991,11:298-302)和硫氰酸胍DNA提取(J.B.W.Hammond等人,Biochemistry,1996,240:298-300)。已经开发了几种方案以从血液中提取基因组DNA。Nucleic acids are typically extracted using techniques such as those described in Sambrook, J., Fritsch, E R., and Maniatis, T. (1980) Molecular Cloning: A Laboratory Manual. 2nd ed. Cold Spring Harbor, N.Y.: Cold Spring Harbor Laboratory. Other methods include: salting out DNA extraction (P. Sunnucks et al., Genetics, 1996, 144:747-756; S.M. Aljanabi and I. Martinez, Nucl. Acids Res. 1997, 25:4692-4693), trimethylammonium bromide DNA extraction (S. Gustincich et al., BioTechniques, 1991, 11:298-302), and guanidine thiocyanate DNA extraction (J.B.W. Hammond et al., Biochemistry, 1996, 240:298-300). Several protocols have been developed for extracting genomic DNA from blood.

还存在许多试剂盒，其可以用于从组织和体液中提取DNA，并且可以从例如BDBiosciences Clontech(Palo Alto,Calif.)、Epicentre Technologies(Madison,Wis.)、Gentra Systems,Inc.(Minneapolis,Minn.)、MicroProbe Corp.(Bothell,Wash.)、Organon Teknika(Durham,N.C.)、Qiagen Inc.(Valencia,Calif.)、Autogen(Holliston,Mass.)；Beckman Coulter(Brea,Calif.),(AutoGenFlex STAR robot with QiagenFlexiGene chemistry商购获得。例如，Autogen生产FlexStar自动化提取试剂盒与QiagenFlexiGene Chemistry组合使用，以及Beckeman Coulter生产Agencourt GenFind试剂盒用于基于珠粒的提取化学。详细描述所遵循的方案的用户指南通常包括在所有这些试剂盒中，例如，标题为“Qiagen PureGene Handbook”，第3版，日期为2011年6月的关于其PureGene提取化学的Qiagen文献。There are also many kits available for extracting DNA from tissues and body fluids, and from sources such as BDBiosciences Clontech (Palo Alto, Calif.), Epicentre Technologies (Madison, Wis.), Gentra Systems, Inc. (Minneapolis, Minn.), MicroProbe Corp. (Bothell, Wash.), Organon Teknika (Durham, N.C.), Qiagen Inc. (Valencia, Calif.), Autogen (Holliston, Mass.); Beckman Coulter (B (Rea, Calif.), (AutoGenFlex STAR robot with QiagenFlexiGene chemistry is commercially available. For example, Autogen manufactures the FlexStar Automated Extraction Kit for use with QiagenFlexiGene Chemistry, and Beckeman Coulter manufactures the Agencourt GenFind Kit for bead-based extraction chemistry. User guides detailing the protocols followed are typically included in all these kits, for example, Qiagen literature on its PureGene extraction chemistry entitled "Qiagen PureGene Handbook," 3rd edition, dated June 2011.)

在从样品中获得细胞后，优选裂解细胞以分离基因组核酸。细胞提取物可以进行其它步骤以通过例如示差沉淀、柱层析、用有机溶剂萃取等来驱使核酸分离完成。然后可以通过例如过滤和/或离心和/或用离液盐如异硫氰酸胍或尿素或者用有机溶剂如苯酚和/或HCCl₃进一步处理提取物，以使任何污染的和潜在干扰的蛋白质变性。基因组核酸也可以重悬于水合溶液中，如水性缓冲液。基因组核酸可以悬浮在例如水、Tris缓冲液或其它缓冲液中。在某些实施方案中，基因组核酸可以重悬浮于Qiagen DNA水合溶液或其它pH约7.5的基于Tris的缓冲液中。After obtaining cells from the sample, cells are preferably lysed to isolate genomic nucleic acids. The cell extract can undergo further steps to drive nucleic acid separation, such as differential precipitation, column chromatography, extraction with organic solvents, etc. The extract can then be further treated, for example, by filtration and/or centrifugation and/or with a dissociative salt such as guanidine isothiocyanate or urea, or with an organic solvent such as phenol and/or HCl ₃ , to denature any contaminating and potentially interfering proteins. Genomic nucleic acids can also be resuspended in a hydration solution, such as an aqueous buffer. Genomic nucleic acids can be suspended in, for example, water, Tris buffer, or other buffers. In some embodiments, genomic nucleic acids can be resuspended in Qiagen DNA hydration solution or other Tris-based buffers at approximately pH 7.5.

根据用于提取的方法的类型，所获得的基因组核酸的大小可以变化。可以通过例如使用琼脂糖凝胶的脉冲场凝胶电泳(PFGE)来确定基因组核酸的完整性和大小。The size of the obtained genomic nucleic acids can vary depending on the type of extraction method used. The integrity and size of the genomic nucleic acids can be determined, for example, by pulsed-field gel electrophoresis (PFGE) using agarose gels.

在一些实施方案中，基因组DNA在用于本发明方法之前被片段化。核酸(包括基因组核酸)可以使用多种方法中的任一种，如机械片段化、化学片段化和酶片段化进行片段化。核酸片段化的方法是本领域已知的，并且包括但不限于DNase消化、超声处理、机械剪切等((J.Sambrook等人,“Molecular Cloning:A Laboratory Manual”,1989,2.sup.nd Ed.,Cold Spring Harbour Laboratory Press:New York,N.Y.；P.Tijssen,“Hybridizationwith Nucleic Acid Probes--Laboratory Techniques in Biochemistry and MolecularBiology(Parts I and II)”,1993,Elsevier；C.P.Ordahl等人,Nucleic Acids Res.,1976,3:2985-2999；P.J.Oefner等人,Nucleic Acids Res.,1996,24:3879-3889；Y.R.Thorstenson等人,Genome Res.,1998,8:848-855)。美国专利公开2005/0112590提供了本领域已知的各种片段化方法的一般概述。In some embodiments, genomic DNA is fragmented prior to use in the methods of the present invention. Nucleic acids (including genomic nucleic acids) can be fragmented using any of a variety of methods, such as mechanical fragmentation, chemical fragmentation, and enzymatic fragmentation. Methods for nucleic acid fragmentation are known in the art and include, but are not limited to, DNase digestion, sonication, mechanical shearing, etc. (J. Sambrook et al., "Molecular Cloning: A Laboratory Manual", 1989, 2. sup. and Ed., Cold Spring Harbour Laboratory Press: New York, N.Y.; P. Tijssen, "Hybridization with Nucleic Acid Probes--Laboratory Techniques in Bioc "Hemistry and Molecular Biology (Parts I and II)," 1993, Elsevier; C.P. Ordahl et al., Nucleic Acids Res., 1976, 3:2985-2999; P.J. Oefner et al., Nucleic Acids Res., 1996, 24:3879-3889; Y.R. Thorstenson et al., Genome Res., 1998, 8:848-855). US Patent Publication 2005/0112590 provides a general overview of various fragmentation methods known in the art.

在许多实施方案中，用限制酶，例如具有在测定中使用的MIP靶向的任何序列中均未发现的限制位点的酶消化基因组DNA。限制性内切酶识别双链核酸内的特定序列，并且通常切割识别位点内或接近识别位点的两条链，以使核酸片段化。基于天然存在的限制性内切酶的组成和酶辅因子需求、它们的靶序列的性质和它们的DNA切割位点相对于靶序列的位置，将它们分为四组(类型I、II、III和IV)。Bickle T A,Kruger D H(June 1993).“Biology of DNA restriction”.Microbiol.Rev.57(2):434-50；Boyer H W(1971).“DNArestriction and modification mechanisms in bacteria”.Annu.Rev.Microbiol.25:153-76；Yuan R(1981).“Structure and mechanism of multifunctional restrictionendonucleases”.Annu.Rev.Biochem.50:285-319。所有类型的酶均识别特定的短DNA序列，并进行DNA的核酸内切切割以产生具有末端5’-磷酸的特定片段。酶在其识别序列、亚基组成、切割位置和辅因子需求方面不同。Williams RJ(2003).“Restriction endonucleases:classification,properties,and applications”.Mol.Biotechnol.23(3):225-43.In many implementations, genomic DNA is digested with restriction enzymes, such as enzymes having restriction sites not found in any sequence targeted by the MIP used in the assay. Restriction endonucleases recognize specific sequences within double-stranded nucleic acids and typically cleave both strands within or near the recognition site to fragment the nucleic acid. Based on the composition and cofactor requirements of naturally occurring restriction endonucleases, the nature of their target sequences, and the position of their DNA cleavage sites relative to the target sequence, they are classified into four groups (Types I, II, III, and IV). Bickle T A, Kruger D H (June 1993). "Biology of DNA restriction". Microbiol. Rev. 57(2):434-50; Boyer H W (1971). "DNA restriction and modification mechanisms in bacteria". Annu. Rev. Microbiol. 25:153-76; Yuan R (1981). "Structure and mechanism of multifunctional restriction endonucleases". Annu. Rev. Biochem. 50:285-319. All types of enzymes recognize specific short DNA sequences and perform endonuclease cleavage to produce specific fragments with a terminal 5'-phosphate. Enzymes differ in their recognition sequences, subunit composition, cleavage sites, and cofactor requirements. Williams RJ(2003). "Restriction endonucleases: classification, properties, and applications". Mol. Biotechnol. 23(3): 225-43.

在一个实施方案中，在使酶失活(通过在90℃下孵育20分钟)，离心DNA(例如，在7krpm持续10分钟)并收集上清液之前，用AluI(例如，来自New England Biolabs)消化DNA，例如，在37℃下，用10单位的AluI消化50ng DNA，持续2小时。然后可以将DNA保存在例如4℃，直到用于本发明的方法中。In one embodiment, before inactivating the enzyme (by incubating at 90°C for 20 minutes), centrifuging the DNA (e.g., at 7 krpm for 10 minutes), and collecting the supernatant, the DNA is digested with AluI (e.g., from New England Biolabs), for example, digesting 50 ng of DNA with 10 units of AluI at 37°C for 2 hours. The DNA can then be stored, for example, at 4°C until used in the method of the present invention.

在本发明的另一个实施方案中，本发明的方法和组合物可以用于在基因组规模上评估，即定量甲基化程度。例如，在DNA样品预消化步骤中，可以向预消化的DNA样品中加入甲基化敏感的限制酶。然后即在甲基化敏感的限制酶(其切割DNA的能力取决于甲基化核苷酸的存在)存在下和在识别相同限制位点的甲基化不敏感的限制酶(同裂酶)存在下比较来自相同样品的两个反应的结果。例如，可以使用酶对HpaII(甲基化敏感的)和MspI。对于这种方法，被设计用于靶向CpG岛的另外的MIP可以与基因组参考MIP(如本文所述的4-碱基对(ATGG)MIP)一起使用。In another embodiment of the invention, the methods and compositions of the invention can be used to assess, i.e., quantify, the degree of methylation at the genome scale. For example, in a DNA sample pre-digestion step, a methylation-sensitive restriction enzyme can be added to the pre-digested DNA sample. The results of the two reactions from the same sample are then compared, i.e., in the presence of a methylation-sensitive restriction enzyme (whose ability to cleave DNA depends on the presence of methylated nucleotides) and in the presence of a methylation-insensitive restriction enzyme (isoschistoses) that recognizes the same restriction site. For example, enzyme pairs HpaII (methylation-sensitive) and MspI can be used. For this approach, additional MIPs designed to target CpG islands can be used in conjunction with a genome reference MIP (such as the 4-base pair (ATGG) MIP described herein).

在一些实施方案中，本发明的方法还提供了使基因组DNA变性以使其成为用于与本发明的MIP探针杂交的单链。变性可以由如上所述选择的片段化方法产生。例如，本领域技术人员将认识到，基因组核酸可以在基于pH的剪切或经由产生切口的内切核酸酶进行片段化期间变性。变性可以在片段化之前、期间或之后发生。此外，在片段化步骤期间使用pH或加热可产生变性的核酸片段。参见，例如，McDonnell,“Antisepsis,disinfection,andsterilization:types,action,and resistance”,pg.239(2007)。In some embodiments, the method of the present invention also provides denaturation of genomic DNA to make it single-stranded for hybridization with the MIP probes of the present invention. Denaturation can be produced by a fragmentation method selected as described above. For example, those skilled in the art will recognize that genomic nucleic acids can be denatured during fragmentation by pH-based cleavage or via a nick-generating endonuclease. Denaturation can occur before, during, or after fragmentation. Furthermore, the use of pH or heating during the fragmentation step can produce denatured nucleic acid fragments. See, for example, McDonnell, “Antisepsis, disinfection, and sterilization: types, action, and resistance”, pg. 239 (2007).

基于热的变性是双链脱氧核糖核酸通过断裂碱基之间的氢键解旋并分离成单链的过程。未知序列的核酸的热变性通常使用足够高的温度，例如95-98℃，以确保在不存在任何化学变性剂的情况下使甚至具有非常高的GC含量的核酸变性。优化用于核酸变性的条件(例如，时间、温度等)在本领域普通技术人员的能力范围内。Thermal denaturation is the process by which double-stranded deoxyribonucleic acid (DNA) unwinds and separates into single strands by breaking hydrogen bonds between bases. Thermal denaturation of nucleic acids with unknown sequences typically uses sufficiently high temperatures, such as 95-98°C, to ensure denaturation even of nucleic acids with very high GC content in the absence of any chemical denaturing agents. Optimizing the conditions used for nucleic acid denaturation (e.g., time, temperature, etc.) is within the capabilities of those skilled in the art.

利用pH使核酸变性也是本领域众所周知的，并且这种变性可以使用本领域已知的任何方法(如将核酸引入到高或低pH、低离子强度和/或热中)来完成，所述方法破坏了碱基配对，导致双链螺旋解离成单链。对于基于pH的变性方法，参见例如Dore等人,BiophysJ.1969November；9(11):1281-1311；A.M.Michelson The Chemistry of Nucleosides andNucleotides,Academic Press,London and New York(1963)。也可以经由电化学手段，例如通过电极向溶液中的核酸施加电压使核酸变性。Denaturation of nucleic acids using pH is also well known in the art, and such denaturation can be accomplished using any method known in the art (such as introducing nucleic acids into high or low pH, low ionic strength, and/or heat), which disrupts base pairing, causing the double-stranded helix to dissociate into single strands. For pH-based denaturation methods, see, for example, Dore et al., Biophys J. 1969 November; 9(11):1281-1311; A.M. Michelson, The Chemistry of Nucleosides and Nucleotides, Academic Press, London and New York (1963). Nucleic acids can also be denatured by electrochemical means, such as applying a voltage to the nucleic acids in solution via electrodes.

杂交、延伸和连接Hybridization, extension and connection

应理解，本发明的方面可以涉及改变基因组核酸的量和改变MIP探针的量以达到定制的结果。在一些实施方案中，每个对象使用的基因组核酸的量的范围为1ng至10g(例如，500ng至5μg)。然而，可以使用更高或更低的量(例如，小于1ng、大于10μg、10-50μg、50-100μg或更高)。在一些实施方案中，对于每个感兴趣的串联重复，可以针对特定的应用优化每个测定所使用的探针的量。在一些实施方案中，MIP探针与基因组等价物(例如二倍体基因组等价物)的比率(摩尔比，例如以浓度比测量)的范围为1x10⁶至1x10¹²或1e6至1e12。然而，可以使用更低的、更高的或中间的比率，特别是作为每个基因组的重复基序的丰度的函数。It should be understood that aspects of the present invention may involve altering the amount of genomic nucleic acid and the amount of MIP probes to achieve customized results. In some embodiments, the amount of genomic nucleic acid used per subject ranges from 1 ng to 10 g (e.g., 500 ng to 5 μg). However, higher or lower amounts may be used (e.g., less than 1 ng, greater than 10 μg, 10-50 μg, 50-100 μg or higher). In some embodiments, for each tandem repeat of interest, the amount of probe used for each assay may be optimized for a specific application. In some embodiments, the ratio (molar ratio, e.g., measured as a concentration ratio) of the MIP probe to a genomic equivalent (e.g., a diploid genomic equivalent) ranges from 1 x ^10⁶ to ¹ x 10¹² or 1e⁶ to 1e¹². However, lower, higher, or intermediate ratios may be used, particularly as a function of the abundance of repetitive motifs in each genome.

在本发明的方法中，为了确保重复阵列内(如在端粒处)每个重复的最大覆盖，MIP通常相对于基因组DNA过量存在。以上提及的MIP探针与基因组等价物的比率高于常规MIP测定中所用的比率。例如，与传统MIP方法相比，所述比率高达1000x或更高，以使基因组内的潜在结合位点(例如重复序列)饱和，并使所产生的连接产物(无论是环状产物还是连接的线性产物)的数目最大化。在一个实施方案中，使用50ng基因组DNA(例如，5μl的10ng/μl溶液)，并加入2μl的每种MIP的10nM储备溶液。以这种方式，在DNA聚合酶活性和适当的dNTP的存在下以及在连接酶活性的存在下，将产生大量的连接产物，其反映了基因组中存在的重复数目。使用相对于基因组DNA过量的MIP与经典的MIP测定的传统教导形成对比，在经典的MIP测定中，使用相对于基因组DNA高水平的MIP只会增加产生非特异性和(不期望的)线性产物的可能性并降低产生(期望的)环化产物的可能性。In the method of this invention, to ensure maximum coverage of each repeat within the repeat array (e.g., at telomeres), the MIP is typically present in excess relative to the genomic DNA. The ratio of the aforementioned MIP probe to its genomic equivalent is higher than that used in conventional MIP assays. For example, this ratio is as high as 1000x or more compared to conventional MIP methods, to saturate potential binding sites (e.g., repetitive sequences) within the genome and to maximize the number of ligation products generated (whether circular or linear). In one embodiment, 50 ng of genomic DNA (e.g., 5 μl of a 10 ng/μl solution) is used, with 2 μl of a 10 nM stock solution of each MIP added. In this way, in the presence of DNA polymerase activity and appropriate dNTPs, and in the presence of ligase activity, a large number of ligation products reflecting the number of repeats present in the genome will be generated. The use of excessive MIP relative to genomic DNA contrasts with the traditional teachings of classic MIP assays, in which using high levels of MIP relative to genomic DNA only increases the likelihood of generating nonspecific and (undesired) linear products and decreases the likelihood of generating (desired) cyclized products.

在试剂和酶的存在下温育MIP和基因组DNA，允许杂交的3’MIP延伸，并且将延伸的3’末端随后连接到与同一串联重复单元内同一模板结合的5’末端。在一个实施方案中，进行第一杂交步骤，其包括以下组分：基因组DNA、靶向感兴趣的串联序列的MIP(例如，靶向端粒的MIP和靶向ATGG重复的MIP)、适当的缓冲液(例如，10x Amp连接酶缓冲液)和至期望的最终体积的水。可以根据标准方法进行杂交步骤，例如使用具有以下温度循环的热循环仪：1.95℃，持续10分钟；2.72℃，持续1分钟，伴随着缓慢的变温速率(例如，0.1℃/s)；3.56℃，持续5分钟，伴随着缓慢的变温速率(例如，0.1℃/c)；4.转到步骤2，10x；和5.56℃，持续16小时。In the presence of reagents and enzymes, the MIP and genomic DNA are incubated, allowing the 3' MIP to extend for hybridization, and the extended 3' end is subsequently ligated to the 5' end, which binds to the same template within the same tandem repeat unit. In one embodiment, a first hybridization step is performed, comprising the following components: genomic DNA, MIPs targeting the tandem sequence of interest (e.g., MIPs targeting telomeres and MIPs targeting ATGG repeats), appropriate buffer (e.g., 10x Amp ligase buffer), and water to the desired final volume. The hybridization step can be performed according to standard methods, for example using a thermal cycler with the following temperature cycles: 1. 95°C for 10 minutes; 2. 72°C for 1 minute with a slow temperature change rate (e.g., 0.1°C/s); 3. 56°C for 5 minutes with a slow temperature change rate (e.g., 0.1°C/s); 4. Proceed to step 2, 10x; and 5. 56°C for 16 hours.

杂交后，进行缺口填充和连接步骤，其中，例如，将以下试剂添加至杂交混合物中：连接酶(例如，5U的Amp连接酶；例如，来自Lucigen)、DNA聚合酶(例如，0.6U的T4聚合酶；例如，来自New England Biolabs)、适当的脱氧核糖核苷三磷酸(例如dGTP，例如2.5μl的2.5mM储备溶液)、BSA(例如1X)和缓冲液(例如Amp连接酶缓冲液)和至适当的最终缓冲液浓度和总体积的水。然后根据标准方法进行缺口填充和连接，例如使用具有以下温度循环的热循环仪：37℃持续30分钟；75℃持续20分钟。Following hybridization, a gap-filling and ligation step is performed, wherein, for example, the following reagents are added to the hybridization mixture: ligase (e.g., 5 U of Amp ligase; e.g., from Lucigen), DNA polymerase (e.g., 0.6 U of T4 polymerase; e.g., from New England Biolabs), appropriate deoxyribonucleoside triphosphate (e.g., dGTP, e.g., 2.5 μl of 2.5 mM stock solution), BSA (e.g., 1X), and buffer (e.g., Amp ligase buffer) and water to the appropriate final buffer concentration and total volume. Gap-filling and ligation are then performed according to standard methods, e.g., using a thermal cycler with the following temperature cycles: 37°C for 30 minutes; 75°C for 20 minutes.

提供以上实验条件是为了说明的目的，并且应理解，所用的实验条件和/或性质、量或者所用不同试剂的存在或不存在可以在所有本发明反应步骤中变化，例如，关于杂交、缺口填充/连接和扩增步骤的温度、持续时间或循环次数，以及关于所用的试剂。反应条件的这种优化完全在本领域普通技术人员的能力范围内。在一些实施方案中，在反应中使用比上述浓度更高浓度的MIP和/或dNTP。The above experimental conditions are provided for illustrative purposes, and it should be understood that the experimental conditions and/or properties, amounts, or presence or absence of different reagents used can vary in all reaction steps of the present invention, for example, regarding the temperature, duration, or number of cycles of hybridization, gap filling/ligation, and amplification steps, and regarding the reagents used. Such optimization of reaction conditions is entirely within the capabilities of those skilled in the art. In some embodiments, higher concentrations of MIPs and/or dNTPs than those described above are used in the reaction.

如本文别处所述，本发明的方法通常省略了核酸外切酶步骤(与传统的基于MIP的测定相比)，尽管可以包括短的核酸外切酶处理以去除未连接的MIP或用于质量控制或比较目的。As described elsewhere in this document, the method of the present invention typically omits the exonuclease step (compared to conventional MIP-based assays), although it may include short exonuclease treatments to remove unligated MIPs or for quality control or comparison purposes.

5.使用新的实验设计产生的产物和控制措施 5. Products and control measures generated using the new experimental design

如上所述，当仅在覆盖单个重复内的结合至同一模板的3’和5’末端之间的缺口所需的那些脱氧核苷酸的存在下进行杂交、延伸和连接反应时，可以产生两种类型的产物。当5’和3’末端属于同一MIP时产生的第一产物是环化的连接产物。这样的环化产物对应于先前MIP测定的期望产物。当5’和3’末端属于不同的MIP时产生的第二产物是连接的线性产物(或“LL-MIP”)。在经典的MIP测定中，例如通过核酸外切酶活性去除所有线性产物，并且进行定量、检测或分析的产物仅为环状产物。As described above, when hybridization, extension, and ligation reactions are performed only in the presence of those deoxynucleotides required to cover the gap between the 3' and 5' ends of the same template within a single repeat, two types of products can be produced. The first product, produced when the 5' and 3' ends belong to the same MIP, is a cyclized ligation product. Such a cyclized product corresponds to the expected product of a previous MIP assay. The second product, produced when the 5' and 3' ends belong to different MIPs, is a ligated linear product (or "LL-MIP"). In classical MIP assays, for example, all linear products are removed by exonuclease activity, and only the cyclic product is quantified, detected, or analyzed.

本发明的方法产生几种类型的产物，包括在多种经典MIP测定中描述的环化的MIP和称为LL-MIP的新的连接产物，其包含两个或更多个MIP。特别地，由于长串联重复内存在的大量MIP的结合位点，以及由于本发明方法中所用的MIP相对于基因组DNA浓度的高浓度，产生了多聚体形式，其中3个、4个、5个、6个、7个、8个、9个、10个或更多个MIP连接在一起。这些产物一起反映了基因组中串联重复的拷贝数。当环化探针反映一个连接事件(图1中的连接点)时，类似地，包含两个MIP的LL-MIP也反映一个连接点，并且均表示串联重复的一个长度单位。此外，包含多于两个MIP的LL-MIP反映了多个连接事件，并因此反映了靶标串联重复(例如端粒)的多个长度单位。通常，包含n个MIP的多聚体LL-MIP的连接事件的数目是n-1。因此，在本发明的许多实施方案中，保留和定量在所述方法中获得的所有连接的产物，包括环状产物以及包含连接在一起的2个或更多个单独MIP的连接的线性产物，以确定串联重复序列如端粒的总体大小，即拷贝数。连接点的总体丰度或这种基序的拷贝数表示端粒或其它串联重复的长度(图8)。The method of this invention produces several types of products, including circularized MIPs described in various classical MIP assays and novel ligation products called LL-MIPs, which contain two or more MIPs. Specifically, due to the large number of MIP binding sites present within long tandem repeats, and due to the high concentration of MIPs used in the method of this invention relative to the concentration of genomic DNA, multimeric forms are produced, in which 3, 4, 5, 6, 7, 8, 9, 10, or more MIPs are linked together. These products together reflect the copy number of tandem repeats in the genome. Just as a circularized probe reflects a ligation event (the ligation point in Figure 1), similarly, an LL-MIP containing two MIPs also reflects a ligation point, and both represent a length unit of the tandem repeat. Furthermore, LL-MIPs containing more than two MIPs reflect multiple ligation events and thus multiple length units of the target tandem repeat (e.g., telomeres). Typically, the number of ligation events in a multimeric LL-MIP containing n MIPs is n-1. Therefore, in many embodiments of the invention, all linked products obtained by the method, including cyclic products and linear products comprising two or more individual MIPs linked together, are retained and quantified to determine the overall size, i.e., copy number, of tandem repeat sequences such as telomeres. The overall abundance of the linkage points or the copy number of such motifs indicates the length of telomeres or other tandem repeats (Figure 8).

端粒重复基序[GGGTTA]由3种类型的核苷酸组成，即G、T和A。虽然MIP的5’和3’末端的结合位点为随机分布的，但一些可以结合相同的基序，如上文所述，而其它可以跨越多个重复。为了定量端粒长度，仅在相同的单独基序内发生的那些MIP连接进行随后的反应步骤并进行计数是必要的。这可以通过将缺口填充反应步骤中使用的dNTP的数目限制为至多两个dNTP来实现。在一个实施方案中，在缺口填充反应中仅使用一个dNTP(dGTP)。然而，应理解，这种要求对于其它参考(稳定)串联重复并不是关键的。例如，当[AAGG]串联重复用作端粒长度定量反应中的参考串联重复时，可以在缺口填充反应中提供dATP和dGTP。Telomere repeat motifs [GGGTTA] consist of three types of nucleotides: G, T, and A. While the binding sites at the 5' and 3' ends of the MIPs are randomly distributed, some can bind to the same motif, as described above, while others can span multiple repeats. For telomere length quantification, it is necessary to perform subsequent reaction steps and count only those MIP ligations occurring within the same individual motif. This can be achieved by limiting the number of dNTPs used in the nick-filling reaction step to a maximum of two dNTPs. In one embodiment, only one dNTP (dGTP) is used in the nick-filling reaction. However, it should be understood that this requirement is not critical for other reference (stable) tandem repeats. For example, when the [AAGG] tandem repeat is used as the reference tandem repeat in a telomere length quantification reaction, dATP and dGTP can be provided in the nick-filling reaction.

6.产物的分离、检测和/或定量 6. Separation, detection, and/or quantification of products.

如本文别处所述，在先前的基于MIP的测定中，通过酶促方法(使用例如核酸外切酶的组合，包括核酸外切酶III)或经由亲和结合(使用例如具有开放5’末端的生物素标记的MIP)来去除所有线性产物。在测量端粒长度或其它串联重复序列的背景下，这是不适当的，因为它将错误地降低端粒或其它重复序列的计算大小(即，拷贝数)。这是因为端粒长度与连接点的总数成比例，而不管这些点是在环化产物内还是在LL-MIP内。As described elsewhere in this document, in previous MIP-based assays, all linear products were removed by enzymatic methods (using, for example, a combination of exonucleases, including exonuclease III) or via affinity binding (using, for example, a biotin-tagged MIP with an open 5' end). This is inappropriate in the context of measuring telomere length or other tandem repeat sequences because it will incorrectly reduce the calculated size (i.e., copy number) of telomeres or other repeat sequences. This is because telomere length is proportional to the total number of junctions, regardless of whether these junctions are within the cyclized product or within the LL-MIP.

因此，在一个实施方案中，在本发明方法中不使用核酸外切酶来定量端粒或其它串联重复的大小。然而，在一些实施方案中，仅使用核酸外切酶I，并且仅以有限的水平(例如，小于1、2、3、4、5、10、15或20个单位)且持续有限的持续时间(例如，长达1小时，或5、10、15、20、25、30、35、40、45、50或55分钟)，以去除游离的(即，未连接的)MIP探针，而不降解连接的线性MIP。Therefore, in one embodiment, exonucleases are not used in the method of the present invention to quantify the size of telomeres or other tandem repeats. However, in some embodiments, only exonuclease I is used, and only at limited levels (e.g., less than 1, 2, 3, 4, 5, 10, 15, or 20 units) and for a limited duration (e.g., up to 1 hour, or 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, or 55 minutes) to remove free (i.e., unligated) MIP probes without degrading ligated linear MIPs.

使用本发明方法获得的连接产物(包括环化和线性产物(LL-MIP))中的连接点的丰度可以以多种方式中的任一种来定量，包括在具有或不具有参考样品和/或效率校正的情况下，通过qPCR进行相对定量；例如使用精确测量的量的克隆MIP产物作为基准样品(calibrator sample)，通过qPCR进行绝对定量；通过数字PCR，例如通过液滴数字PCR；或者通过测序，例如高通量测序来计数各种MIP产物的数目。The abundance of linkage sites in the linkage products (including cyclized and linear products (LL-MIP)) obtained using the method of this invention can be quantified in any of a variety of ways, including relative quantification by qPCR with or without a reference sample and/or efficiency correction; absolute quantification by qPCR using a precisely measured amount of clonal MIP product as a calibrator sample; digital PCR, such as droplet digital PCR; or counting the number of various MIP products by sequencing, such as high-throughput sequencing.

在一个实施方案中，使用PCR，例如使用TaqMan探针或其它标准方法的qPCR，定量环化和连接的线性MIP中的连接点的丰度。例如，可以使用反应中使用的所有MIP通用的引物(例如，通用正向引物)以及MIP特异性引物(例如，每个MIP特异性的反向引物)进行qPCR。在一个实施方案中，通用正向引物是为longGC_M13_F(ggcgcatggcTCA CAC AGG AAA CAGCTA TGA C)。当进行用于确定端粒长度时，在一个实施方案中，端粒-MIP特异性反向引物为longGC_Link_R_tel尾(gcgcatgtgaATC GGG AAG CTG AAG TAA CC)，并且ATGG-MIP特异性反向引物为longGC_Link_R_tel尾(gcgcatgtgaATC GGG AAG CTG AAG TAA CC)。此外，当在TaqMan测定中进行时，可以使用对每个MIP具有特异性的探针，如Taq-3-Telo(对于端粒MIP；/56-FAM/TA GGG TTA G/ZEN/G GTT AGG GTT/3IABkFQ)和Taq-3-ATGG(对于ATGGMIP；/5HEX/GG ATG GAT G/ZEN/G ATG GAT GGA T/3IABkFQ)。In one embodiment, PCR, such as qPCR using TaqMan probes or other standard methods, is used to quantify the abundance of linkage sites in a circularized and ligated linear MIP. For example, qPCR can be performed using primers common to all MIPs used in the reaction (e.g., a universal forward primer) as well as MIP-specific primers (e.g., reverse primers specific to each MIP). In one embodiment, the universal forward primer is longGC_M13_F (ggcgcatggcTCA CAC AGG AAA CAGCTA TGA C). When used to determine telomere length, in one embodiment, the telomere-MIP-specific reverse primer is longGC_Link_R_teltail (gcgcatgtgaATC GGG AAG CTG AAG TAA CC), and the ATGG-MIP-specific reverse primer is longGC_Link_R_teltail (gcgcatgtgaATC GGG AAG CTG AAG TAA CC). In addition, when performing the assay in TaqMan, probes specific to each MIP can be used, such as Taq-3-Telo (for telomere MIPs; /56-FAM/TA GGG TTA G/ZEN/G GTT AGG GTT/3IABkFQ) and Taq-3-ATGG (for ATGGMIPs; /5HEX/GG ATG GAT G/ZEN/G ATG GAT GGA T/3IABkFQ).

这样的反应可以根据标准方法进行，例如，使用以下试剂进行：MIP产物(例如，5μl)；PCR预混合物(例如，7.5μl的2X储备溶液)；通用正向引物(例如，引物M13_F；0.375μl的20μM储备溶液)；MIP 1的反向引物(例如，端粒反向引物；0.375μl的20μM储备溶液)；MIP 2的反向引物(例如ATGG反向引物；0.375μl的20μM储备溶液)；用于MIP 1的TaqMan探针(例如Taqman-telo；0.3μl的10μM储备溶液)；用于MIP 2的TaqMan探针(例如Taqman-4bp；0.3μl的10μM储备溶液)；和加至最终体积为15μl的水。Such reactions can be performed according to standard methods, for example, using the following reagents: MIP product (e.g., 5 μl); PCR premix (e.g., 7.5 μl of 2X stock solution); universal forward primer (e.g., primer M13_F; 0.375 μl of 20 μM stock solution); reverse primer for MIP 1 (e.g., telomere reverse primer; 0.375 μl of 20 μM stock solution); reverse primer for MIP 2 (e.g., ATGG reverse primer; 0.375 μl of 20 μM stock solution); TaqMan probe for MIP 1 (e.g., Taqman-telo; 0.3 μl of 10 μM stock solution); TaqMan probe for MIP 2 (e.g., Taqman-4bp; 0.3 μl of 10 μM stock solution); and water added to a final volume of 15 μl.

可以在热循环仪，例如Roche LC480热循环仪上使用标准温度循环上进行定量PCR，所述标准温度循环例如1.95℃，持续10分钟；2.95℃，持续10秒；3.65℃，持续30秒；4.72℃，持续10秒；5.转到步骤2，40x。Quantitative PCR can be performed on a thermal cycler, such as the Roche LC480 thermal cycler, using standard temperature cycles, for example: 1.95°C for 10 minutes; 2.95°C for 10 seconds; 3.65°C for 30 seconds; 4.72°C for 10 seconds; 5. Proceed to step 2, 40x.

然而，应理解，每个试剂(例如每个引物)的量和其它反应条件(例如所使用的温度循环)可以变化以获得最佳结果；这种反应条件的优化完全在本领域普通技术人员的能力范围内。However, it should be understood that the amount of each reagent (e.g., each primer) and other reaction conditions (e.g., the temperature cycle used) can be varied to obtain optimal results; such optimization of reaction conditions is entirely within the capabilities of those skilled in the art.

在一些实施方案中，使用下一代测序(NGS)评估MIP产物，所述下一代测序能够以大量平行的方式确定例如探针或探针产物的核酸序列。示例性NGS技术包括例如通过合成测序(Illumina,Inc.,San Diego,Calif.)、用零模波导的单分子实时测序(PacificBiosciences of California,Inc.,Menlo Park,Calif.)、焦磷酸测序、离子半导体测序(Thermo Fisher Scientific Corporation,Carlsbad,Calif.)和通过连接测序(ThermoFisher Scientific Corporation,Carlsbad,Calif.)。通常，对于通过NGS技术进行的基因组测序，全长核酸(例如，gDNA)被分裂成“模板”片段。然后通常在空间上分开的检测器位置捕获或固定模板，从而允许同时进行数百到数十亿个测序反应。在一些情况下，必须在测序之前扩增模板以允许足够的信号。In some implementations, next-generation sequencing (NGS) is used to evaluate MIP products, which is capable of determining the nucleic acid sequence of, for example, a probe or probe product in a multi-parallel manner. Exemplary NGS technologies include, for example, sequencing by synthesis (Illumina, Inc., San Diego, Calif.), real-time single-molecule sequencing with zero-mode waveguides (Pacific Biosciences of California, Inc., Menlo Park, Calif.), pyrosequencing, ion semiconductor sequencing (Thermo Fisher Scientific Corporation, Carlsbad, Calif.), and sequencing by ligation (Thermo Fisher Scientific Corporation, Carlsbad, Calif.). Typically, for genome sequencing via NGS, full-length nucleic acids (e.g., gDNA) are split into “template” fragments. The template is then captured or immobilized at typically spatially separated detector locations, allowing hundreds to billions of sequencing reactions to be performed simultaneously. In some cases, the template must be amplified before sequencing to allow sufficient signal.

7.测定串联重复序列拷贝数 7. Determine the copy number of tandem repeat sequences.

一旦已经使用例如qPCR对连接的MIP产物进行定量，就确定了基因组的可变串联重复序列区的长度。这种确定可以以多种方式中的任一种进行，例如基于通过连接的MIP产物的qPCR、数字PCR或高通量测序获得的靶标(例如端粒)的丰度与稳定的对照串联重复之间的关系从参数计算的。在以下实施例中概述和说明了这样的方法。Once the ligated MIP product has been quantified using, for example, qPCR, the length of the variable tandem repeat region in the genome has been determined. This determination can be performed in any of a variety of ways, such as by calculating parameters based on the relationship between the abundance of targets (e.g., telomeres) obtained by qPCR, digital PCR, or high-throughput sequencing of the ligated MIP product and stable control tandem repeats. Such methods are outlined and illustrated in the following examples.

在一个实施方案中，通过计算两个qPCR的Δ-Δ-Ct(即，用于靶标MIP的qPCR和用于对照MIP的qPCR)并使用参考样本来进行确定。例如，通过确定每个qPCR的Ct，确定每个样品的Δ-Ct，即样品中可变串联重复序列的Ct和稳定对照重复序列的Ct之间的差值，以及在对给定样品的不同Δ-Ct求平均之后(如果，例如，以一式两份、一式三份等进行qPCR)，通过确定样品的Δ-Ct与参考样品的Δ-Ct之间的差值来计算Δ-Δ-Ct。参考样品可以是任何合适的样品，其中可变串联重复序列区的大小是已知的和/或根据讨论中的测定的目的是适当的；例如，对于测定患有或疑似患有癌症的患者的端粒长度的方法，参考样品可以来自没有癌症的健康个体。一旦确定了每个样品的Δ-Δ-Ct，就可以确定每个样品中串联重复序列的总长度。例如，假设效率值为2，可变串联重复序列(例如端粒)内的连接点数目与对照串联重复序列(例如含ATGG的串联重复序列)内的点数的比率为2^(Δ-ΔCt)。每个样品的Δ-Δ-Ct和比值均可以用作串联重复的生物标志物，例如端粒长度的新的指标参数。In one implementation, Δ-Δ-Ct is determined by calculating the Δ-Δ-Ct of two qPCRs (i.e., the qPCR for the target MIP and the qPCR for the control MIP) and using a reference sample. For example, Δ-Δ-Ct is determined for each sample by determining the Ct of each qPCR, i.e., the difference between the Ct of the variable tandem repeat sequence in the sample and the Ct of the stable control repeat sequence, and Δ-Δ-Ct is calculated by determining the difference between the Δ-Δ-Ct of the sample and the Δ-Δ-Ct of the reference sample after averaging the different Δ-Δ-Cts of a given sample (if, for example, qPCR is performed in duplicate, triplicate, etc.). The reference sample can be any suitable sample where the size of the variable tandem repeat region is known and/or appropriate depending on the purpose of the determination discussed; for example, for a method of determining the telomere length of a patient with or suspected of having cancer, the reference sample can be from a healthy individual without cancer. Once the Δ-Δ-Ct of each sample is determined, the total length of the tandem repeat sequence in each sample can be determined. For example, assuming an efficiency value of 2, the ratio of the number of connection points in a variable tandem repeat sequence (e.g., telomeres) to the number of connection points in a control tandem repeat sequence (e.g., an ATGG-containing tandem repeat sequence) is 2^(Δ-ΔCt). The Δ-Δ-Ct and ratio for each sample can be used as novel biomarkers for tandem repeats, such as telomere length.

在qPCR反应中定量PCR产物的多种其它方法包括相对和绝对定量方法。在一个实施方案中，定量参考样品的连续稀释液以获得Ct值以计算总测定效率。另外，参考样品可以用于所有qPCR批次，并且将样品与相同参考样品进行比较以去除由于qPCR Ct值中的板间变化而导致的批次影响。这些方法(被称为Δ-Δ-Ct方法和效率校正的Δ-Δ-Ct方法)可以在本文应用以定量两种MIP的连接点。Several other methods for quantifying PCR products in qPCR reactions include relative and absolute quantification methods. In one implementation, serial dilutions of a reference sample are quantified to obtain Ct values to calculate the overall assay efficiency. Alternatively, the reference sample can be used for all qPCR batches, and samples are compared to the same reference sample to remove batch effects due to inter-plate variations in qPCR Ct values. These methods (referred to herein as Δ-Δ-Ct methods and efficiency-corrected Δ-Δ-Ct methods) can be applied to quantify the linkage points of two MIPs.

此外，如果在具有测试样品的相同qPCR中使用具有已知(给定)量的连接点的校准器，则可以进行qPCR的绝对定量。这样的方法被称为qPCR结果的绝对定量。在其它实施方案中，可以进行连接点丰度的其它定量方法，包括数字PCR和测序。Furthermore, absolute quantification of qPCR can be performed if a calibrator with a known (given) amount of linkage sites is used in the same qPCR with the test sample. This method is referred to as absolute quantification of qPCR results. In other embodiments, other methods for quantifying linkage site abundance can be used, including digital PCR and sequencing.

此外，如以下实施例部分所述，可以进行另外的测定以确定用不同MIP进行的qPCR的总效率，以及反应的线性(例如，不同的起始DNA浓度)和特异性。此外，可以通过例如常规PCR扩增产物，并且对产物进行克隆和测序，或通过凝胶电泳分离产物，例如以评估获得的LL-MIP产物的大小和性质。这样的程序可以例如由试剂盒制造商在试剂盒开发阶段的期间进行。In addition, as described in the following examples section, further assays can be performed to determine the overall efficiency of qPCR performed with different MIPs, as well as the linearity (e.g., different starting DNA concentrations) and specificity of the reaction. Furthermore, the products can be amplified by, for example, conventional PCR and cloned and sequenced, or separated by gel electrophoresis, to assess, for example, the size and properties of the obtained LL-MIP products. Such procedures can be performed by the kit manufacturer, for example, during the kit development phase.

可以用给定的MIP，合并MIP杂交-缺口填充-连接反应和qPCR反应的效率来计算测定的总效率，并且计算的总效率可以用于校正测定中使用的两种(或更多种)MIP的Ct值的斜率差，允许获得对应于效率(斜率)校正的相对比率的指标值。该值可以用于将串联重复的长度表示为与参考样品相比的比率。例如，鉴于具有给定端粒长度的样品，如图5中示出的癌细胞系，其它样品可以使用它们的比值作为独立变量(x-轴)，基于图5中的回归线读出它们的端粒长度值(依赖变量，y-轴)。The overall efficiency of an assay can be calculated by combining the efficiencies of the MIP hybridization-gap-filling-ligation reaction and the qPCR reaction, given a specific MIP. This calculated overall efficiency can be used to correct for the slope difference of the Ct values of the two (or more) MIPs used in the assay, allowing for an index value corresponding to the relative ratio corrected for the efficiency (slope). This value can be used to express the length of the tandem repeats as a ratio compared to a reference sample. For example, given a sample with a given telomere length, such as the cancer cell line shown in Figure 5, other samples can use their ratios as independent variables (x-axis) and read their telomere length values based on the regression line in Figure 5 (y-axis).

实施例Example

将通过具体实施例更详细地描述本发明。提供以下实施例仅用于说明目的，而不旨在以任何方式限制本发明。本领域技术人员将容易地认识到可以改变或修改多种非关键参数以产生基本上相同的结果。The invention will be described in more detail through specific embodiments. The following embodiments are provided for illustrative purposes only and are not intended to limit the invention in any way. Those skilled in the art will readily recognize that various non-critical parameters can be changed or modified to produce substantially the same results.

实施例1：用于确定端粒长度或其它可变串联重复序列的新的基于MIP的方法。Example 1: A novel MIP-based method for determining telomere length or other variable tandem repeat sequences.

在本实施例中，使用端粒基序MIP探针研究了端粒重复长度，该探针在其5’和3’末端均包含被接头序列隔开的端粒重复序列(GGGTTA)。端粒MIP两端的端粒重复基序的序列靶向长度的范围可以为例如15个碱基对至25个碱基对的长度。In this embodiment, telomere repeat length was investigated using a telomere motif MIP probe, which contains telomere repeat sequences (GGGTTA) separated by adapter sequences at both its 5' and 3' ends. The sequence targeting length of the telomere repeat motifs at both ends of the telomere MIP can range from, for example, 15 to 25 base pairs.

为了归一化样品中输入DNA的量，使用第二MIP来定量样品DNA中的二倍体基因组等价物的数目。在另一个实施方案中，也可以使用其它方法来代表样品DNA中的二倍体基因组等价物。例如，单拷贝基因可以通过qPCR定量为该目的的指标。To normalize the amount of input DNA in the sample, a second MIP is used to quantify the number of diploid genomic equivalents in the sample DNA. In another embodiment, other methods may also be used to represent diploid genomic equivalents in the sample DNA. For example, single-copy genes can be quantified by qPCR as an indicator for this purpose.

由本发明的优选MIP提供的精确匹配/互补，例如在5’和3’同源区与模板DNA的靶序列之间具有100％互补，使得检测重复基序(如端粒)是可行的，并且常见的串联重复序列，例如常见的4碱基对序列ATGG可以用作归一化对照。ATGG或另外类似的4-bp串联重复对于这样的归一化是合适的对照重复，因为它在人类基因组中是丰富的，并且在取自不同个体的样品中以每二倍体基因组等价物大致类似(稳定)的水平存在。The precise matching/complementarity provided by the preferred MIP of this invention, for example, 100% complementarity between the 5' and 3' homologous regions and the target sequence of the template DNA, makes it feasible to detect repetitive motifs (such as telomeres), and common tandem repeat sequences, such as the common 4-base-pair sequence ATGG, can be used as normalization controls. ATGG or other similar 4-bp tandem repeats are suitable control repeats for such normalization because they are abundant in the human genome and exist at roughly similar (stable) levels per diploid genome equivalent in samples taken from different individuals.

用于定量该4-碱基对串联重复的MIP在其5’和3’末端由被接头序列隔开的4-碱基对重复基序(本实施例中为ATGG)组成。如同靶向端粒的MIP，靶向4-碱基对基序MIP两端的ATGG重复基序的序列长度(即5’和3’同源区)范围可以为例如15个碱基对至25个碱基对。The MIP used to quantify the 4-base-pair tandem repeat consists of 4-base-pair repeat motifs (ATGG in this example) separated by a linker sequence at its 5' and 3' ends. Similar to telomere-targeting MIPs, the sequence length (i.e., the 5' and 3' homologous regions) of the ATGG repeat motifs at both ends of the 4-base-pair motif MIP can range from, for example, 15 to 25 base pairs.

由于端粒和4-碱基对重复基序都是串联重复，其中相同的6-碱基对或4-碱基对基序是串联重复的，这种特殊的序列构型对常规的MIP测定方法提出了新的挑战，常规的MIP测定方法均是基于连接后对环化的MIP的专一检测。Since telomeres and 4-base pair repeating motifs are tandem repeats, with the same 6-base pair or 4-base pair motif being tandem repeats, this special sequence configuration poses a new challenge to conventional MIP determination methods, which are based on the specific detection of cyclized MIPs after ligation.

因为MIP的3’和5’末端可以与相同的重复靶向基序杂交，所以用常规的MIP测定，无法控制同一MIP分子的两个末端杂交的距离。也就是说，3’和5’末端的杂交位置之间的缺口是随机的，并且不能确定，即它可以在单个重复基序内，或者它可以跨越多个重复。Because the 3' and 5' ends of a MIP can hybridize with the same repeating target motif, conventional MIP assays cannot control the hybridization distance between the two ends of the same MIP molecule. In other words, the gap between the hybridization positions of the 3' and 5' ends is random and cannot be determined; it can be within a single repeating motif or it can span multiple repeats.

因此，为了定量端粒或其它串联重复序列的长度的特定目的，如果与模板DNA杂交的MIP的3’和5’末端之间的缺口距离是随机的，则作为端粒长度指标的连接产物的定量将是有问题的，因为一些连接产物将代表一个基序单元，而其它连接产物将代表2个、3个或6个或甚至更多的基序单元。例如，如果两个末端杂交彼此分离的多个基序，则定量将低估端粒的丰度，并因此低估端粒长度，因为多个重复单元将被计数为一个重复单元。因此，理想的测定应为每个基序重复提供一个连接信号(图2)。这在本发明的方法和组合物中以多种方式实现，尤其包括通过使用(1)专门设计的MIP探针，和(2)在缺口填充/延伸步骤中使用有限数量的脱氧核糖核苷酸三磷酸(dNTP)。在一个实施方案中，端粒MIP的3’和5’末端之间的缺口为“GG”。类似地，在一个实施方案中，4-碱基对(ATGG)串联重复MIP的3’和5’末端之间的缺口为“GG”。Therefore, for the specific purpose of quantifying the length of telomeres or other tandem repeat sequences, quantification of ligation products as an indicator of telomere length will be problematic if the gap distance between the 3' and 5' ends of the MIP hybridized with the template DNA is random, as some ligation products will represent one motif unit while others will represent two, three, or six or even more motif units. For example, if multiple motifs hybridized at two ends are separated from each other, quantification will underestimate the abundance of telomeres and therefore underestimate telomere length, as multiple repeat units will be counted as one repeat unit. Therefore, the ideal assay should provide a ligation signal for each motif repeat (Figure 2). This is achieved in various ways in the methods and compositions of the present invention, particularly by using (1) specially designed MIP probes and (2) a limited number of deoxyribonucleotide triphosphates (dNTPs) in the gap-filling/extension step. In one embodiment, the gap between the 3' and 5' ends of the telomere MIP is “GG”. Similarly, in one embodiment, the gap between the 3' and 5' ends of the 4-base-pair (ATGG) tandem repeat MIP is “GG”.

这种MIP的设计，即其中端粒MIP和4-碱基对重复MIP的缺口均为GG，使得能够在MIP延伸步骤中使用单独的dGTP，而不添加其它dNTP。在延伸程序期间，仅有dGTP存在于反应混合物中，因此仅有两端彼此相邻杂交的MIP分子，即具有隔开两端的2个核苷酸GG缺口的MIP分子被连接。相比之下，当端粒MIP的5’和3’末端之间的缺口大于2个核苷酸时，即它们被多于一个的重复基序单元隔开，因此该缺口包括“GGGTTAGG”、“GGGTTAGGGTTAGG”或(GGGTTA)nGG等，其中‘n’为任何整数时，末端不能完成延伸，并且也无法连接。This MIP design, where both the telomere MIP and the 4-base-pair repeating MIP have GG nicks, allows the use of dGTP alone in the MIP extension step without the addition of other dNTPs. During the extension process, only dGTP is present in the reaction mixture, so only MIP molecules that hybridize to each other at their ends—that is, MIP molecules with a 2-nucleotide GG nick separating their ends—are ligated. In contrast, when the nick between the 5' and 3' ends of the telomere MIP is larger than 2 nucleotides, i.e., they are separated by more than one repeating motif unit, such that the nick includes "GGGTTAGG", "GGGTTAGGGTTAGG", or (GGGTTA)nGG, where 'n' is any integer, the ends cannot be extended and cannot be ligated.

类似地，当4-碱基对(ATGG)串联重复MIP的3’和5’末端之间的缺口大于2个核苷酸时，即它们被多于一个基序(重复)单元隔开，因此该缺口包括“GGATGG”、“GGATGGATGG”或(GGAT)nGG等，其中n’为任何整数时，末端杂交多个分开的单元的任何MIP不能完成延伸，并且也无法连接。Similarly, when the gap between the 3' and 5' ends of a 4-base pair (ATGG) tandem repeat MIP is greater than 2 nucleotides, i.e. they are separated by more than one motif (repetition) unit, such that the gap includes "GGATGG", "GGATGGATGG", or (GGAT)nGG, where n' is any integer, any MIP with multiple separate units hybridized at the ends cannot complete its extension and cannot be joined.

在一个实施方案中，使用另一种端粒MIP(称为MIP-TelF)，其中3’和5’末端之间的缺口为“AGG”。在这种情况下，MIP延伸反应仅包括2种脱氧核糖核苷三磷酸，即dATP和dGTP，以便将连接产物的形成限制到其3’和5’末端在单个重复单元内杂交并防止跨越多个重复单元延伸的那些MIP。In one implementation, another type of telomere MIP (referred to as MIP-TelF) is used, where the gap between the 3' and 5' ends is “AGG”. In this case, the MIP elongation reaction involves only two deoxyribonucleoside triphosphates, namely dATP and dGTP, in order to restrict the formation of the ligation product to hybridization within a single repeat unit at its 3' and 5' ends and prevent those MIPs from elongating across multiple repeat units.

MIP-TelF的序列：5'/Phos/GTT AGG GTT AGG GTT AGG GTT ACTTCAGCTTCCCGATCCGACGGTAGTGTTCA CAC AGG AAA CAG CTA TGA CTAG GGT TAG GGT TAG GGT T。The sequence of MIP-TelF: 5'/Phos/GTT AGG GTT AGG GTT AGG GTT ACTTCAGCTTCCCGATCCGACGGTAGTGTTCA CAC AGG AAA CAG CTA TGA CTAG GGT TAG GGT TAG GGT T.

基序单元的重复性质导致连接后形成连接的和线性的MIP产物(称为LL-MIP)(图1-3)。在以前的MIP测定中，这样的线性MIP会通过酶促方法(例如，使用核酸外切酶的组合，包括靶向线性双链DNA的核酸外切酶III)或基于亲和力结合的方法(例如，具有开放5’末端的生物素标记的MIP)被全部去除。The repetitive nature of the motif unit leads to the formation of a linked and linear MIP product (referred to as LL-MIP) after ligation (Figure 1-3). In previous MIP assays, such linear MIPs were completely removed by enzymatic methods (e.g., using a combination of exonucleases, including exonuclease III targeting linear double-stranded DNA) or by affinity-based methods (e.g., biotin-tagged MIPs with an open 5' end).

然而，出于测量端粒长度的目的，这是不适当的，因为它将错误地降低端粒或其它串联重复序列的基序计数(即，计算的拷贝数)。实际上，我们发现在与样品模板的端粒基序结合后，相当大比例的端粒MIP形成LL-MIP(图1-3)。克隆这些LL-MIP并对其进行测序以证实其序列含有多个MIP并且它们的靶向序列连接在一起(图1C)。However, this is inappropriate for measuring telomere length because it would incorrectly reduce the motif count (i.e., the calculated copy number) of telomeres or other tandem repeat sequences. In fact, we found that a significant proportion of telomere MIPs formed LL-MIPs upon binding to the telomere motifs of the sample template (Figures 1-3). These LL-MIPs were cloned and sequenced to confirm that their sequences contained multiple MIPs and that their target sequences were linked together (Figure 1C).

在本发明方法的优选实施方案中，同时定量环化的MIP(传统的MIP产物)和LL-MIP(本文特别保留的MIP产物的新形式，包括包含2个、3个、4个、5个、6个、7个、8个、9个、10个或更多个单独MIP连接在一起的LL-MIP)。因此，每个连接点或信号代表端粒长度的特定单元(即，串联重复内的一个重复单元)。In a preferred embodiment of the method of the present invention, both quantitatively cyclized MIPs (conventional MIP products) and LL-MIPs (a new form of MIP product specifically retained herein, including LL-MIPs comprising 2, 3, 4, 5, 6, 7, 8, 9, 10 or more individual MIPs linked together) are simultaneously cyclized. Thus, each connection point or signal represents a specific unit of telomere length (i.e., a repeating unit within a tandem repeat).

可以通过多种方法进行定量，所述方法包括但不限于：Quantification can be performed using various methods, including but not limited to:

1.在有或没有参考样品和/或效率校正的情况下，通过qPCR进行相对定量；1. Relative quantification by qPCR with or without reference samples and/or efficiency correction;

2.使用精确测量的量的克隆MIP产物作为基准样品通过qPCR进行绝对定量；2. Absolute quantification was performed by qPCR using precisely measured amounts of cloned MIP products as a reference sample;

3.数字PCR，例如通过液滴数字PCR3. Digital PCR, such as digital PCR via droplets.

4.测序，例如，高通量测序以计数各种MIP产物的数目。4. Sequencing, for example, high-throughput sequencing to count the number of various MIP products.

实施例2：杂交、延伸和连接反应Example 2: Hybridization, extension, and ligation reactions

限制酶预处理DNA样品DNA samples pretreated with restriction enzymes

向50ng DNA样品中加入多达10单位的限制酶，例如AluI(New England Biolab)，并将样品在37℃温育1小时。然后将酶在80℃灭活20min。在7,000rpm离心10min并收集上清液后，可以将消化后的样品保持在4℃。Add up to 10 units of restriction enzyme, such as AluI (New England Biolab), to a 50 ng DNA sample and incubate the sample at 37°C for 1 hour. Then inactivate the enzyme at 80°C for 20 min. After centrifuging at 7,000 rpm for 10 min and collecting the supernatant, the digested sample can be kept at 4°C.

MIP杂交步骤MIP hybridization steps

使用一种靶向人端粒重复的MIP探针，其具有GG的缺口，以及靶向ATGG重复基序的第二MIP，其也具有GG的缺口。对于所有的MIP探针，均需要5’磷酸化。A MIP probe targeting human telomere repeats with a GG notch was used, as well as a second MIP targeting the ATGG repeat motif, also with a GG notch. 5' phosphorylation was required for all MIP probes.

靶向端粒重复基序的探针(缺口为“GG”)：Probes targeting telomere repeat motifs (with notches "GG"):

MIP-TelF+A:/5Phos/GT TAG GGT TAG GGT TAG GGT TAC TTC AGC TTC CCG ATCCGA CGG TAG TGT TCA CAC AGG AAA CAG CTA TGA CTA GGG TTA GGG TTA GGG TTAMIP-TelF+A:/5Phos/GT TAG GGT TAG GGT TAG GGT TAC TTC AGC TTC CCG ATCCGA CGG TAG TGT TCA CAC AGG AAA CAG CTA TGA CTA GGG TTA GGG TTA GGG TTA

靶向4bp(ATGG)重复基序的探针(缺口为“GG”)：Probes targeting the 4bp (ATGG) repeat motif (with a "GG" notch):

MIP-ATGG:/5Phos/AT GGA TGG ATG GAT GGA TGG ATG GCT TCA GCT TCC CGATCC GAC GGT TAG GTT CAC ACA GGA AAC AGC TAT GAC GGA TGG ATG GAT GGA TGG ATMIP-ATGG:/5Phos/AT GGA TGG ATG GAT GGA TGG ATG GCT TCA GCT TCC CGATCC GAC GGT TAG GTT CAC ACA GGA AAC AGC TAT GAC GGA TGG ATG GAT GGA TGG AT

两种探针均被设计成当相邻地结合在模板DNA上时，在3’和5’末端之间留下“GG”的缺口；在模板依赖性连接可以发生之前，必须通过DNA聚合酶填充这两个核苷酸缺口。Both probes were designed to leave a “GG” gap between the 3’ and 5’ ends when they bind adjacently to the template DNA; these two nucleotide gaps must be filled by DNA polymerase before template-dependent ligation can occur.

如下进行杂交程序过夜(括号中的值为储备浓度)：Perform the hybridization procedure overnight as follows (values in parentheses are the stock concentration):

杂交方案在热循环仪中利用以下温度循环进行，但同样可以使用本领域众所周知的其它合适的设备进行：The hybridization scheme was performed in a thermal cycler using the following temperature cycles, but it can also be performed using other suitable equipment well known in the art:

在杂交程序后，添加以下试剂至上述的25μl中，得到总共30μl(括号中的值为储备浓度)：After the hybridization procedure, add the following reagents to the above 25 μl to obtain a total of 30 μl (values in parentheses are stock concentrations):

如下进行缺口填充和连接温度方案：The following are the notch filling and connection temperature schemes:

37℃，持续30分钟，然后37℃ for 30 minutes, then

75℃，持续20分钟。Set the temperature to 75℃ for 20 minutes.

在本实施例中，不进行核酸外切酶处理，但可以在本文将其添加作为任选步骤。如本公开内容中的别处所强调的，该方法被专门设计成检测环化的MIP(其也被称为典型的或常规的MIP产物)和新的LL-MIP产物。因此，不需要核酸外切酶处理。In this embodiment, exonuclease treatment is not performed, but it may be added herein as an optional step. As highlighted elsewhere in this disclosure, this method is specifically designed to detect circularized MIPs (also referred to as typical or conventional MIP products) and novel LL-MIP products. Therefore, exonuclease treatment is not required.

然而，出于质量控制或比较目的，例如，如试剂盒制造商所进行的，可以添加核酸外切酶消化步骤。这在一些实例中也进行以表明使用核酸外切酶会不利地影响测定性能。在一些实施方案中，可以添加有限量的核酸外切酶1(例如，对于50ng基因组DNA，为20U)，持续有限的时间，例如，长达1小时，以去除未连接的探针，但不消除连接的线性产物。However, for quality control or comparative purposes, such as those performed by the kit manufacturer, an exonuclease digestion step may be added. This is also done in some instances to demonstrate that the use of exonucleases adversely affects assay performance. In some embodiments, a limited amount of exonuclease 1 (e.g., 20 U for 50 ng of genomic DNA) may be added for a limited time, e.g., up to 1 hour, to remove unligated probes but not to eliminate ligated linear products.

实施例3：环化的MIP和LL-MIP产物的定量Example 3: Quantitative analysis of cyclized MIP and LL-MIP products

定量环化的MIP和LL-MIP产物中的连接点的总数目(或丰度)以计算端粒长度的新指标。定量可以以多种方式中的任一种进行，包括例如通过(1)使用相对定量或绝对定量方法进行的定量PCR(qPCR)，(2)数字PCR，(3)测序或(4)任何其它合适的定量方法。The total number (or abundance) of linkage sites in circulated MIP and LL-MIP products is a new indicator for calculating telomere length. Quantification can be performed in any of a variety of ways, including, for example, by (1) quantitative PCR (qPCR) using relative or absolute quantification methods, (2) digital PCR, (3) sequencing, or (4) any other suitable quantification method.

在典型的实施方案中，将3种或4种引物用于扩增和定量端粒MIP和基因组参考MIP(例如4-bp MIP-ATGG)的连接点的丰度。在该实施例中，总共使用3种引物，因为正向引物(称为longGC_M13_F)对于端粒MIP和基因组参考(对照)4bp-MIP-ATGG是通用的，并且使用对端粒特异性探针或对照ATGG探针具有特异性的两种另外的引物。将富含5’GC的尾添加到所有三种引物中。In a typical implementation, three or four primers are used to amplify and quantify the abundance of the junction between the telomere MIP and the genomic reference MIP (e.g., 4-bp MIP-ATGG). In this example, a total of three primers are used because the forward primer (called longGC_M13_F) is universal for both the telomere MIP and the genomic reference (control) 4bp-MIP-ATGG, and two additional primers specific to either the telomere-specific probe or the control ATGG probe are used. A 5’ GC-rich tail is added to all three primers.

使用的通用引物和MIP特异性引物：Universal primers and MIP-specific primers used:

longGC_M13_F:ggcgcatggcTCA CAC AGG AAA CAG CTA TGA ClongGC_M13_F:ggcgcatggcTCA CAC AGG AAA CAG CTA TGA C

longGC_Link_R_tel尾:gcgcatgtgaATC GGG AAG CTG AAG TAA CClongGC_Link_R_tel tail:gcgcatgtgaATC GGG AAG CTG AAG TAA CC

longGC_Link_R_ATGG:gcatggcgcaATC GGG AAG CTG AAG CCA tccatlongGC_Link_R_ATGG:gcatggcgcaATC GGG AAG CTG AAG CCA tccat

在本实施方案中，在qPCR中使用两种TaqMan探针以区分由两种类型的MIP探针形成的连接产物。In this implementation, two TaqMan probes are used in qPCR to distinguish ligation products formed by two types of MIP probes.

Taq-3-Telo:/56-FAM/TA GGG TTA G/ZEN/G GTT AGG GTT/3IABkFQ/Taq-3-Telo:/56-FAM/TA GGG TTA G/ZEN/G GTT AGG GTT/3IABkFQ/

Taq-3-ATGG:/5HEX/GG ATG GAT G/ZEN/G ATG GAT GGA T/3IABkFQ/Taq-3-ATGG:/5HEX/GG ATG GAT G/ZEN/G ATG GAT GGA T/3IABkFQ/

PCR反应管制备物(总体积为15μl)如下：The PCR reaction tube preparations (total volume 15 μl) are as follows:

qPCR在Roche LC480热循环仪中以如下PCR温度循环方案进行：qPCR was performed in a Roche LC480 thermal cycler using the following PCR temperature cycling protocol:

典型实验步骤的变化和扩展Variations and extensions of typical experimental procedures

根据样品模板或靶向基序的特性，可以实施以上步骤中呈现的典型实验步骤的变化，以提高测定精度和特异性。Depending on the characteristics of the sample template or target motif, variations of the typical experimental steps presented above can be implemented to improve measurement accuracy and specificity.

这样的变化包括但不限于：Such changes include, but are not limited to:

1.进一步增加MIP水平，如以上提及的给定浓度的1.5x、2x、4x等。1. Further increase the MIP level, such as 1.5x, 2x, 4x, etc. of the given concentration mentioned above.

2.进一步增加dGTP或所提供的特定核苷三磷酸的浓度，如以上提及的浓度的2x、4x等。2. Further increase the concentration of dGTP or the specific nucleoside triphosphate provided, such as 2x, 4x, etc. of the concentrations mentioned above.

3.可以在其它实施方案中使用其它方法代替使用查询对照串联重复的第二MIP作为样品中存在的二倍体基因组等价物的指标。例如，单拷贝基因或其它遗传标志物的qPCR可以直接用输入样品进行，以使端粒MIP结果归一化。3. In other implementation schemes, alternative methods can be used instead of the second MIP (mutually invoked protein) of the query control tandem repeat as an indicator of the presence of diploid genomic equivalents in the sample. For example, qPCR of single-copy genes or other genetic markers can be performed directly on the input sample to normalize the telomere MIP results.

实施例4：结果Example 4: Results

端粒长度的目前指标来源于端粒MIP形成的连接点的数目(丰度)和基因组参考MIP(即本实施例中使用的4-bp(ATGG)MIP)形成的连接点的数目(丰度)。在该实施例中，使用2个qPCR的Δ-Δ-Ct。来自成年男性样品的qPCR的Ct值用作参考样品(表中最后三个样品)。研究了三种癌细胞系，即MCF-7、HepG2和K562。还包括罗氏端粒长度测定试剂盒(目录号1220913600,Roche Diagnostics GmbH,Germany)中的对照样品和女性样品。所有反应均以一式三份进行，因为三个不同的样品和一个反应对于K562细胞系失败。计算ΔCT(FAM、端粒重复和HEX之间的差值，4bp重复Ct值)，并获得每个样品的平均值。然后，从获自男性对象的参考样品的ΔCt减去每个样品的ΔCt，以计算Δ-Δ-Ct值。假设效率值为2，端粒基序连接点与基因组4-碱基对(ATGG)基序连接点的比率可以通过2^(Δ-ΔCt)计算。The current metric for telomere length is derived from the number (abundance) of junctions formed by telomere MIPs and the number (abundance) of junctions formed by genomic reference MIPs (i.e., the 4-bp (ATGG) MIPs used in this example). In this example, Δ-Δ-Ct from two qPCRs was used. Ct values from qPCRs of adult male samples were used as reference samples (the last three samples in the table). Three cancer cell lines were studied: MCF-7, HepG2, and K562. Control samples from the Roche Telomere Length Assay Kit (catalog number 1220913600, Roche Diagnostics GmbH, Germany) and female samples were also included. All reactions were performed in triplicate because three different samples and one reaction failed for the K562 cell line. ΔCT (the difference between FAM, telomere repeats, and HEX, and the 4bp repeat Ct value) was calculated, and the mean value for each sample was obtained. Then, the ΔCt of each sample is subtracted from the ΔCt of the reference sample obtained from male subjects to calculate the Δ-Δ-Ct value. Assuming an efficiency value of 2, the ratio of telomere motif attachment sites to genomic 4-base pair (ATGG) motif attachment sites can be calculated using 2^(Δ-ΔCt).

Δ-ΔCt或每个样品的比值(下表1中的第7列)可以用作端粒长度的新指标。Δ-ΔCt, or the ratio per sample (column 7 in Table 1 below), can be used as a new indicator of telomere length.

在文献中已经报道了几种癌细胞系的端粒长度，如通过末端限制性片段(TRF)测定所测量的。图5显示了TRF和相对定量比率之间的高度相关性。图5中的回归线用于从任何给定的比值(x-轴)读取端粒长度(y-轴)。根据图5中显示的回归线，女性样品的比值为1.39，并且估计的端粒长度为8.98kbp。Telomere length has been reported in the literature for several cancer cell lines, such as those measured by terminal restriction fragment (TRF) assays. Figure 5 shows a high correlation between TRF and relative quantitative ratios. The regression line in Figure 5 is used to read telomere length (y-axis) from any given ratio (x-axis). Based on the regression line shown in Figure 5, the ratio for the female sample is 1.39, and the estimated telomere length is 8.98 kbp.

表ITable I

实施例5：本发明的线性响应的证实Example 5: Confirmation of the linear response of the present invention

将来自实施例2-4中的男性样品的相同参考样品稀释成一系列不同的DNA浓度，使得样品中存在不同量(丰度)的端粒重复。将它们制备成具有这些总DNA量的样品：50ng、37.5ng、25ng、12.5ng和6.25ng。将这5种浓度中的每一种的一式三份用于本发明的端粒测定。也就是说，总共进行15个单独的反应。结果显示在以下表II中。The same reference samples from the male samples in Examples 2-4 were diluted to a series of different DNA concentrations, resulting in varying amounts (abundance) of telomere repeats in the samples. These were prepared as samples with these total DNA amounts: 50 ng, 37.5 ng, 25 ng, 12.5 ng, and 6.25 ng. Triples of each of these five concentrations were used for the telomere assay of this invention. That is, a total of 15 separate reactions were performed. The results are shown in Table II below.

将输入的DNA量进行log转化，并且结果显示在图6的X轴上。在该相对定量分析中，将2种MIP靶标的qPCR的阈值循环(Ct值，在其它参考文本中也称为Cq)相对于输入DNA输入量作图。The input DNA amount was log-transformed, and the results are shown on the X-axis in Figure 6. In this relative quantification analysis, the threshold cycles (Ct values, also referred to as Cq in other references) of qPCR for the two MIP targets were plotted against the input DNA amount.

通过qPCR对两个靶标(端粒和ATGG MIP的连接点)的丰度的定量显示在输入DNA量和相应的阈值循环之间的良好的线性。测定系数(r2)为0.96和0.98。Quantification of the abundance of two targets (telomeres and ATGG MIP linkage sites) by qPCR showed good linearity between the amount of input DNA and the corresponding threshold cycles. The coefficients of determination (r²) were 0.96 and 0.98.

2条回归线的斜率分别为-2.96和-2.75。可以以类似于使用连续稀释样品计算qPCR效率的方式来计算分析的总效率。然而，MIP测定的这种总体组合效率合并了2个单独反应(即MIP杂交-缺口填充-连接反应和qPCR定量反应)的效率。The slopes of the two regression lines were -2.96 and -2.75, respectively. The overall efficiency of the analysis can be calculated in a manner similar to that used for calculating qPCR efficiency with serially diluted samples. However, this overall combined efficiency measured by MIP combines the efficiencies of two individual reactions (i.e., the MIP hybridization-gap filling-ligation reaction and the qPCR quantification reaction).

总效率计算为10^(-1/斜率)。因此，对端粒长度和4-碱基对(ATGG)重复基序的MIP测定的效率分别为2.2和2.3。The overall efficiency was calculated to be 10^(-1/slope). Therefore, the efficiencies for MIP determination of telomere length and 4-base pair (ATGG) repeating motifs were 2.2 and 2.3, respectively.

表II通过连续稀释一个样品得到的不同输入DNA量的结果Table II shows the results of obtaining different amounts of input DNA by serially diluting a single sample.

实施例6：针对靶向基序的测定的特异性的证实Example 6: Confirmation of the specificity of the determination of the target motif

水空白对照样品和秀丽隐杆线虫(Caenorhabditis elegans)(秀丽隐杆线虫(C.elegans)，通常被称为蛔虫)的DNA(其端粒重复基序与人类不同)均被用于表明本发明可以区分来自非特异性DNA(秀丽隐杆线虫)的靶基序或水空白。Water blank control samples and DNA from *Caenorhabditis elegans* (C. elegans, commonly known as roundworm) (whose telomere repeat motifs differ from those in humans) were both used to demonstrate that the present invention can distinguish between target motifs from nonspecific DNA (*C. elegans*) or water blanks.

表IIITable III

在测定中使用3个秀丽隐杆线虫DNA样品(各50ng)。使用实施例2-3中的标准方案。Three *C. elegans* DNA samples (50 ng each) were used in the assay. The standard protocol in Examples 2-3 was used.

表IVTable IV

结果显示，水空白对照中的连接点的丰度仅为用于该端粒测定的典型人样品(50ng)的约1/32(多于5个循环的Ct值的差值，例如24.32-18.88＝5.44)。因此，该方法对靶向基因组基序显示出30x特异性。秀丽隐杆线虫具有与水空白对照类似的结果。The results showed that the abundance of junctions in the water control was only about 1/32 of that in a typical human sample (50 ng) used for this telomere assay (the difference in Ct values over more than 5 cycles, e.g., 24.32 - 18.88 = 5.44). Therefore, this method showed 30x specificity for targeting genomic motifs. Similar results were observed in *C. elegans*, compared to the water control.

实施例7：连接的和线性的MIP(LL-MIP)产物的证实Example 7: Confirmation of Linked and Linear MIP (LL-MIP) Products

将6个样品用于实施例2-3中的端粒MIP方案，除了步骤5中不添加TaqMan探针外。相反，进行常规PCR。使用较高的杂交温度(69.3℃)，并且在该常规PCR反应中仅存在靶向两种类型的端粒MIP连接产物的一对引物(即longGC_M13_F和longGC_Link_R_tel尾)。然后，在4％琼脂糖凝胶上按大小分离PCR产物。最小条带(～100bp)代表具有一个连接点的典型环化MIP产物。然后，～200bp的条带代表具有连接在一起的2个MIP的LL-MIP，以此类推。Six samples were used in the telomere MIP protocol of Examples 2-3, except that the TaqMan probe was not added in step 5. Instead, conventional PCR was performed. A higher hybridization temperature (69.3°C) was used, and only one pair of primers (i.e., longGC_M13_F and longGC_Link_R_tel tail) targeting the two types of telomere MIP ligation products were present in this conventional PCR reaction. The PCR products were then separated by size on a 4% agarose gel. The smallest band (~100 bp) represents a typical circularized MIP product with one ligation point. Then, the band ~200 bp represents an LL-MIP with two MIPs ligated together, and so on.

凝胶结果显示在图3中。左侧显示了100bp间隔大小的标志物。显示了省略核酸外切酶消化步骤以及不同量的核酸外切酶I(New England Biolab)的效果。结果显示，可以省略核酸外切酶消化步骤。当仅使用小于20个单位的核酸外切酶I不超过1小时的时候，也获得了类似的结果。如果超过该量和/或持续时间，则观察到LL-MIP的降解(PCR产物长于～100bp)。其它核酸外切酶(如常规MIP测定中的核酸外切酶III或核酸外切酶VII)的加入引起LL-MIP产物的降解，并不利地影响本发明的测定性能的结果。The gel electrophoresis results are shown in Figure 3. The left side shows markers with 100 bp intervals. The effects of omitting the exonuclease digestion step and using different amounts of exonuclease I (New England Biolab) are illustrated. The results show that the exonuclease digestion step can be omitted. Similar results were obtained when using less than 20 units of exonuclease I for no more than 1 hour. If this amount and/or duration is exceeded, LL-MIP degradation (PCR products longer than ~100 bp) was observed. The addition of other exonucleases (such as exonuclease III or exonuclease VII in conventional MIP assays) causes degradation of the LL-MIP product and adversely affects the assay performance of this invention.

表VTable V

实施例8：克隆来自实施例4的PCR产物以确认LL-MIP代表靶基因组基序。Example 8: Cloning the PCR product from Example 4 to confirm that LL-MIP represents the target genomic motif.

从类似于实施例7的常规PCR获得的PCR产物用于克隆和DNA测序以确认LL-MIP产物对靶基因组基序具有特异性。直接克隆PCR产物的TA克隆试剂盒(Invitrogen,USA)用于直接克隆由引物(longGC_M13_F和longGC_Link_R_tel尾)产生的PCR产物。PCR products obtained from conventional PCR, similar to that in Example 7, were used for cloning and DNA sequencing to confirm that the LL-MIP products were specific to the target genomic motif. A TA cloning kit (Invitrogen, USA) for directly cloning PCR products was used to directly clone PCR products generated by primers (longGC_M13_F and longGC_Link_R_tel tail).

选择具有插入物的阳性克隆进行测序，并且具有LL-MIP插入物的那些克隆比常规环化MIP产物的～100bp的预期插入物大小更长。两个较长克隆的插入物大小比400bp更长。一个插入物代表具有连接在一起的6个MIP的LL-MIP，如图1B示意性示出的。靶基因组基序显示为TAACCC，其为GGGTTA端粒基序的确切互补序列。Positive clones with inserts were selected for sequencing, and those clones with LL-MIP inserts were longer than the expected insert size of ~100 bp for conventional circularized MIP products. The insert sizes of the two longer clones were longer than 400 bp. One insert represents an LL-MIP with six MIPs linked together, as schematically shown in Figure 1B. The target genome motif is shown as TAACCC, which is the exact complementary sequence to the GGGTTA telomere motif.

该克隆插入物的测序结果显示在图1C中。The sequencing results of the clone insert are shown in Figure 1C.

实施例9：具有与靶基序较少杂交碱基(17bp)的替代MIP探针设计(图7)。Example 9: Design of an alternative MIP probe with fewer hybridization bases (17 bp) with the target motif (Figure 7).

还设计并成功测试了靶向端粒和4bp基序的另一组MIP。两种MIP的序列为：Another set of MIPs targeting telomeres and 4bp motifs was also designed and successfully tested. The sequences of the two MIPs are as follows:

ShortMIPtelo:/5Phos/G GT TAG GGT TAG GGT TAC TTC AGC TTC CCG ATC CGACGG TAG TGT TCA CAC AGG AAA CAG CTA TGA C GG TTA GGG TTA GGG TTAShortMIPtelo:/5Phos/G GT TAG GGT TAG GGT TAC TTC AGC TTC CCG ATC CGACGG TAG TGT TCA CAC AGG AAA CAG CTA TGA C GG TTA GGG TTA GGG TTA

ShortMIP4bp:/5Phos/G ATG GAT GGA TGG ATG GCT TCA GCT TCC CGA TCC GACGGT TAG GTT CAC ACA GGA AAC AGC TAT GAC TGG ATG GAT GGA TGG AT。ShortMIP4bp:/5Phos/G ATG GAT GGA TGG ATG GCT TCA GCT TCC CGA TCC GACGGT TAG GTT CAC ACA GGA AAC AGC TAT GAC TGG ATG GAT GGA TGG AT.

两种MIP均与各自的靶重复杂交，有长度为17个碱基对的杂交末端。其它条件与典型的实验步骤相同，除了使用50℃的较低杂交温度以外。将参考人类DNA样品用作模板。将不同输入量的模板DNA用作测试样品，以检查连接点的测量作为样品中存在的端粒基序的量的指标的线性和精度。评价了50ng、37.5ng、25ng和12.5ng的输入DNA的四个样品。还包括水空白对照。所有样品以一式三份进行以评价测定精度。Both MIPs hybridized to their respective target repeats, with hybridization ends of 17 base pairs in length. Other conditions were the same as typical experimental procedures, except for the use of a lower hybridization temperature of 50°C. A reference human DNA sample was used as a template. Different input amounts of template DNA were used as test samples to examine the linearity and accuracy of the telomere measurement as an indicator of the amount of telomere motifs present in the sample. Four samples with input DNA of 50 ng, 37.5 ng, 25 ng, and 12.5 ng were evaluated. A water blank control was also included. All samples were performed in triplicate to evaluate assay accuracy.

这对短MIP显著增强了测定中使用的典型DNA量(50ng)和作为模板对照的水之间的区别。端粒基序和4bp(ATGG)基序的Ct的差值分别为>8.5个循环和～6个循环。这些差值对应于>60倍的特异性。This pair of short MIPs significantly enhanced the difference between the typical amount of DNA used in the assay (50 ng) and water as a template control. The Ct differences for telomere motifs and 4 bp (ATGG) motifs were >8.5 cycles and ~6 cycles, respectively. These differences correspond to >60-fold specificity.

表VI使用短的MIP设计，通过连续稀释一个样品得到的不同输入DNA量的结果(结果绘制在图7中)Table VI uses a short MIP design to show the results of obtaining different amounts of input DNA by serially diluting a single sample (the results are plotted in Figure 7).

如以上实施例中所述，可以计算2种MIP测定的总效率。与基于效率校正的qPCR中的相对定量方法类似，在MIP测定中使用总效率来校正2种基序的Ct值的斜率差。因此，可以分析总效率(斜率)校正的相对比率。首先，计算图7中两条回归线的效率(斜率)。端粒基序和4bp(ATGG)基序的斜率分别为-3.06和-2.46。As described in the examples above, the overall efficiency of the two MIP assays can be calculated. Similar to the relative quantification method in efficiency-corrected qPCR, the overall efficiency is used in the MIP assays to correct for the slope difference of the Ct values of the two motifs. Therefore, the relative ratio of the overall efficiency (slope) correction can be analyzed. First, the efficiencies (slopes) of the two regression lines in Figure 7 are calculated. The slopes for the telomere motif and the 4bp (ATGG) motif are -3.06 and -2.46, respectively.

然后，基于2条回归线，将37.5ng样品的Ct值用作基准样品。效率计算为10^(-1/斜率)。Then, based on the two regression lines, the Ct value of the 37.5 ng sample was used as the baseline sample. The efficiency was calculated as 10^(-1/slope).

端粒连接点与4bp(ATGG)基序连接点相对于基准样品的丰度的效率(斜率)校正的相对比率等于：The relative ratio of the efficiency (slope) correction of telomere junctions to 4bp (ATGG) motif junctions relative to the abundance of the reference sample is equal to:

[端粒基序的效率^(参考样品与测试样品之间的ΔCt)]/[4bp基序的效率^(参考样品与测试样本之间的ΔCt)][Efficiency of telomere motifs^(ΔCt between reference and test samples)]/[Efficiency of 4bp motifs^(ΔCt between reference and test samples)]

利用该指标值，即效率(斜率)校正的相对比率，端粒长度被表示为与参考样品相比的比率。一式三份的平均值范围为0.985至1.038。这是由这种新方法产生的端粒长度的新指标的实例。它代表端粒长度相对于参考样品的比率。例如，50ng样品的端粒长度是参考样品(37.5ng样品)的0.99x。期望值为1，因为所有样品均来源于还用作参考样品的相同人类对象。结果显示，效率(斜率)校正的相对比率消除了由于输入DNA量的差异而引起的任何偏差效应。Using this index value, namely the efficiency (slope) corrected relative ratio, telomere length is expressed as a ratio to a reference sample. The average of triplicate samples ranges from 0.985 to 1.038. This is an example of the new index for telomere length generated by this novel method. It represents the ratio of telomere length to a reference sample. For example, the telomere length of a 50 ng sample is 0.99x that of the reference sample (37.5 ng sample). The expected value is 1 because all samples were derived from the same human subject that was also used as the reference sample. The results show that the efficiency (slope) corrected relative ratio eliminates any bias effects caused by differences in the amount of input DNA.

实施例10：基于端粒MIP和基因组参考MIP的连接点丰度得出端粒长度指标值的其Example 10: Determining telomere length index values based on the abundance of telomere MIPs and genomic reference MIPs 它计算方法Its calculation method

多种方法可以用于定量qPCR中PCR产物的确切量，包括相对和绝对定量方法。在另一个实施方案中(参见以上实施例)，可以对参考样品的连续稀释液进行定量以获得Ct值来计算总测定效率。参考样品可以用于所有qPCR批次和样品中，并与相同参考样品进行比较以去除由于qPCR Ct值中的板间变化引起的批次影响。这些方法被称为Δ-Δ-Ct方法和效率校正的Δ-Δ-Ct方法。它们在本文用于定量两种MIP的连接点。Several methods can be used to quantify the exact amount of PCR products in qPCR, including relative and absolute quantification methods. In another embodiment (see the examples above), serial dilutions of a reference sample can be quantified to obtain Ct values to calculate the overall assay efficiency. The reference sample can be used in all qPCR batches and samples and compared with the same reference sample to remove batch effects due to inter-plate variations in qPCR Ct values. These methods are referred to as Δ-Δ-Ct methods and efficiency-corrected Δ-Δ-Ct methods. They are used herein to quantify the linker point of two MIPs.

此外，如果将具有已知(给定)连接点数量的校准物与测试样品一起用于同一qPCR中，则可以进行qPCR的绝对定量。这样的方法被称为qPCR结果的绝对定量。Furthermore, absolute quantification of qPCR can be performed if a calibrator with a known (given) number of linkage sites is used together with the test sample in the same qPCR. This method is called absolute quantification of qPCR results.

在其它实施方案中，可以进行连接点丰度的其它定量方法，包括数字PCR和测序。In other implementations, other methods for quantifying linkage site abundance can be used, including digital PCR and sequencing.

实施例11：用于定量表观遗传标志物的进一步扩展Example 11: Further expansion for quantitative epigenetic markers

在DNA样品预消化步骤中，向预消化的DNA样品中加入甲基化敏感的限制酶使得能够在基因组规模上定量甲基化的程度。比较来自相同样品的两个反应的结果，即甲基化敏感的限制酶(其切割DNA的能力取决于甲基化核苷酸的存在)和识别相同限制位点的甲基化不敏感的限制酶(同裂酶)。例如，可以使用酶对HpaII(甲基化敏感的)和MspI。将设计用于靶向CpG岛的新MIP与基因组参考MIP，如本文使用的4bp(ATGG)MIP一起使用。In the DNA sample pre-digestion step, the addition of a methylation-sensitive restriction enzyme to the pre-digested DNA sample allows for the quantification of the degree of methylation on a genome-wide scale. The results of two reactions from the same sample are compared: a methylation-sensitive restriction enzyme (whose ability to cleave DNA depends on the presence of methylated nucleotides) and a methylation-insensitive restriction enzyme (isoschistoses) that recognizes the same restriction sites. For example, enzyme pairs HpaII (methylation-sensitive) and MspI can be used. Novel MIPs designed to target CpG islands are used in conjunction with genomic reference MIPs, such as the 4 bp (ATGG) MIP used in this paper.

在一个实施方案中，首先通过靶标富集方法分离感兴趣的特定靶基因组区，所述方法中的许多方法是本领域已知的。例如，基于阵列的捕获或溶液中捕获是有用的捕获方法，其可以用于分离感兴趣的靶基因组区。在靶标富集后，可以处理样品以定量重复基序，以检测基序的扩增(长度)。该方法可以用于例如检测与多种疾病相关的三核苷酸重复扩增。In one implementation, a specific target genomic region of interest is first isolated using a target enrichment method, many of which are known in the art. For example, array-based capture or in-solution capture are useful capture methods that can be used to isolate the target genomic region of interest. After target enrichment, the sample can be processed to quantify repeat motifs to detect motif amplification (length). This method can be used, for example, to detect trinucleotide repeat amplification associated with various diseases.

实施例12：替代实施方案Example 12: Alternative Implementation Scheme

在端粒长度的估计中所用的实验步骤、MIP靶标、探针、引物和酶使用可能存在变化。这样的变化包括：The experimental procedures, MIP targets, probes, primers, and enzymes used in telomere length estimation may vary. Such variations include:

(a)更短的杂交步骤持续时间，例如2小时的杂交(a) Shorter hybridization step duration, such as 2 hours of hybridization

(b)更简单的杂交温度方案(b) A simpler hybridization temperature scheme

(c)使用其它稳定的短串联重复作为第二MIP的靶标。例如，[AAGG]n串联重复可以用作靶标。查询[AAGG]n的相应MIP为：5’-GAA GGAA GGAA GGAA GGAA GGGGCGCTTCAGCTTCCCGATCCGACGGTAGTGTTCA CAC AGG AAA CAG CTA TGA CAA GGAA GGAA GGAA GGAA GG-3’。末端的磷酸化和羟基化要求与典型方案中的相同。(c) Use other stable short tandem repeats as targets for the second MIP. For example, the [AAGG]n tandem repeat can be used as a target. The corresponding MIP for [AAGG]n is: 5’-GAA GGAA GGAA GGAA GGAA GGGGCGCTTCAGCTTCCCGATCCGACGGTAGTGTTCA CAC AGG AAA CAG CTA TGA CAA GGAA GGAA GGAA GGAA GGAA GG-3’. The terminal phosphorylation and hydroxylation requirements are the same as in typical schemes.

(d)可以使用与典型方案中的那些酶具有类似特性的其它酶。(d) Other enzymes with similar properties to those in the typical scheme can be used.

(e)在扩大生产阶段测定的实验中，杂交步骤的持续时间可以缩短至16小时。(e) In experiments conducted during the scale-up production phase, the duration of the hybridization step can be reduced to 16 hours.

(f)使用如表VII中所示的另外的探针和引物。(f) Use additional probes and primers as shown in Table VII.

表VII另外的探针和引物Table VII. Additional probes and primers

实施例13：杂交、延伸和连接反应的替代实施方案的方案Example 13: Alternative implementation scheme for hybridization, extension, and ligation reactions

DNA样品的限制酶预处理DNA samples were pretreated with restriction enzymes.

50-500ng DNA可以用于单独样品中以定量端粒长度(TL)。向DNA样品中加入多达10单位的两种限制酶，例如AluI和DdeI(New England Biolabs)，并将样品在37℃下温育1小时。然后将酶在80℃灭活20min。以7,000rpm离心10min并收集上清液后，消化后的样品可以保持在4℃。50-500 ng of DNA can be used in a single sample to quantify telomere length (TL). Add up to 10 units of two restriction enzymes, such as AluI and DdeI (New England Biolabs), to the DNA sample and incubate the sample at 37°C for 1 hour. Then inactivate the enzymes at 80°C for 20 min. After centrifugation at 7,000 rpm for 10 min and collection of the supernatant, the digested sample can be kept at 4°C.

MIP杂交步骤MIP hybridization steps

使用与实施例2中相同的靶向人端粒重复的MIP探针，其中MIP探针靶向(AAGG)重复基序。The same MIP probe targeting human telomere repeats as in Example 2 is used, wherein the MIP probe targets (AAGG) repeat motifs.

MIP-TelF+A:/5Phos/GT TAG GGT TAG GGT TAG GGT TAC TTC AGC TTC CCG ATCCGA CGG TAG TGT TCA CAC AGG AAA CAG CTA TGA CTA GGG TTA GGG TTA GGG TTA。MIP-TelF+A:/5Phos/GT TAG GGT TAG GGT TAG GGT TAC TTC AGC TTC CCG ATCCGA CGG TAG TGT TCA CAC AGG AAA CAG CTA TGA CTA GGG TTA GGG TTA GGG TTA.

靶向(AAGG)重复基序的另一种探针(缺口为“GAA”)：Another probe targeting (AAGG) repeating motifs (with a notch of "GAA"):

MIP-AAGGpure:/5Phos/GG AAGG AAGG AAGG AAGG AAGG TCG ATC CGA CAG CTTCCG TAG CGG TTT CAC ACA GGA AAC AGC TAT GAC TCA CAG AAGG AAGG AAGG AAGG AAGGAAG。MIP-AAGGpure:/5Phos/GG AAGG AAGG AAGG AAGG AAGG TCG ATC CGA CAG CTTCCG TAG CGG TTT CAC ACA GGA AAC AGC TAT GAC TCA CAG AAGG AAGG AAGG AAGG AAGGAAG.

在杂交混合物中如下进行杂交程序过夜：(括号中的值为储备浓度)：The hybridization mixture was subjected to the following hybridization procedure overnight: (values in parentheses are stock concentrations):

以以下温度循环，在热循环仪中进行杂交方案，但同样可以使用本领域已知的其它合适的设备进行杂交方案：Hybridization schemes were performed in a thermal cycler at the following temperatures, but other suitable devices known in the art can also be used:

在杂交程序后，将5μl杂交混合物加入至15μl缺口填充和连接反应试剂(以下所示)中，将其预热至50℃。After the hybridization procedure, add 5 μl of the hybridization mixture to 15 μl of the notch filling and ligation reaction reagent (shown below) and preheat it to 50 °C.

缺口填充和连接步骤Gap filling and connection steps

在37℃下制备带有dGTP和dATP的缺口填充和连接反应试剂，或在制备后将其预温并保持在37℃(括号中的值为储备浓度)：Prepare the gap-filling and ligation reaction reagents with dGTP and dATP at 37°C, or preheat and maintain them at 37°C after preparation (values in parentheses are stock concentrations):

将5μl的杂交混合物预热至50℃，持续30分钟，然后加入15μl预温的缺口填充和连接反应试剂。Preheat 5 μl of the hybridization mixture to 50 °C for 30 minutes, then add 15 μl of preheated notch filling and ligation reagent.

然后，如下进行缺口填充和连接反应以产生具有2种MIP探针的连接点的MIP产物：Then, the notch filling and ligation reactions are performed as follows to produce a MIP product with ligation sites for two types of MIP probes:

37℃，持续30分钟，然后37℃ for 30 minutes, then

45℃，持续5分钟，然后45℃, for 5 minutes, then

95℃，持续10分钟，然后95℃, for 10 minutes, then

4℃并保持。Keep at 4℃.

连接点步骤的定量Quantification of connection point steps

缺口填充和连接后的MIP产物可以用于连接点的定量，作为基因组基序数量的量度。在该实施方案中，使用了qPCR，但也可以容易地使用其他技术。The gap-filled and ligated MIP product can be used to quantify the ligation sites, serving as a measure of the number of genomic motifs. In this implementation, qPCR was used, but other techniques can easily be employed.

qPCR中使用了四种引物和两种TaqMan探针。Four primers and two TaqMan probes were used in the qPCR.

Long GC_Telo_F GGTCCGAGCCAGC TAT GAC TAG GGT TAG GGLong GC_Telo_F GGTCCGAGCCAGC TAT GAC TAG GGT TAG GG

LongGC_Link_R_tel尾GCGCATGTGA ATC GGG AAG CTG AAG TAA CCLongGC_Link_R_tel GCGCATGTGA ATC GGG AAG CTG AAG TAA CC

LongGC_M13F+6GCTGCCTCGC AGG AAA CAG CTA TGA CTC ACA GLongGC_M13F+6GCTGCCTCGC AGG AAA CAG CTA TGA CTC ACA G

LongGC_AAGG_R GGTGCGTCGC GCT GTC GGA TCG ACC TTC CTTLongGC_AAGG_R GGTGCGTCGC GCT GTC GGA TCG ACC TTC CTT

Taqman_long_Telo/56-FAM/TA GGG TTA GGG TTA GGG TTA GGGT/3IABkFQ/Taqman_long_Telo/56-FAM/TA GGG TTA GGG TTA GGG TTA GGGT/3IABkFQ/

Taqman_AAGG/5HEX/AAG GAA GGA AGG AAG GAA GG/3IABkFQ/Taqman_AAGG/5HEX/AAG GAA GGA AGG AAG GAA GG/3IABkFQ/

在qPCR之前，将所有样品、标准品和校准品的缺口填充和连接的产物用水稀释100倍。因此，所有样品、标准品和校准品均具有相同的稀释倍数。Prior to qPCR, all nick-filled and ligated products of samples, standards, and calibrators were diluted 100-fold with water. Therefore, all samples, standards, and calibrators had the same dilution factor.

qPCR反应管制备(总体积为15μl)如下：The qPCR reaction tubes were prepared (total volume 15 μl) as follows:

在Roche LC480热循环仪中按照如下PCR温度循环方案进行qPCR：qPCR was performed in a Roche LC480 thermal cycler according to the following PCR temperature cycling protocol:

5.转到2，40个循环5. Go to step 2, 40 cycles.

实施例14：端粒长度的线性反应和指标值不受用于杂交的不同起始DNA量的影响Example 14: The linear response and index values of telomere length are not affected by different starting DNA amounts used for hybridization. 的证实Confirmation

将参考样品稀释成一系列不同的DNA浓度，以使样品中存在不同量(丰度)的端粒重复和AAGG重复。将它们制备成样品，并向杂交反应中加入这些总DNA量：160ng、80ng、40ng和20ng。在实施例13中所述的端粒测定方案中，这四个量中的每一个重复使用。结果显示在以下表VIII中。使用Δ-Ct(2种MIP探针的两个Ct值之差)得出的端粒长度的平均指标值为0.945，变异系数为6.32％。The reference sample was diluted to a series of different DNA concentrations to ensure varying amounts (abundance) of telomere repeats and AAGG repeats in the sample. These were prepared as samples, and the following total DNA amounts were added to the hybridization reaction: 160 ng, 80 ng, 40 ng, and 20 ng. Each of these four amounts was repeated in the telomere assay protocol described in Example 13. The results are shown in Table VIII below. The mean index value of telomere length obtained using Δ-Ct (the difference between the two Ct values of the two MIP probes) was 0.945, with a coefficient of variation of 6.32%.

将输入的DNA量进行对数转换，并且结果显示在图9的X轴上。在此相对定量分析中，将2种MIP靶标(MIP-TelF+A和MIP-AAGGpure)的qPCR的阈值循环(Ct值，在其他参考文本中也称为Cq)针对输入DNA量(对数标尺)作图。The input DNA amount was logarithmically transformed, and the results are shown on the X-axis in Figure 9. In this relative quantification analysis, the threshold cycles (Ct values, also referred to as Cq in other references) of qPCR for the two MIP targets (MIP-TelF+A and MIP-AAGGpure) were plotted against the input DNA amount (logarithmic bar).

通过qPCR对两种靶标(分别代表由MIP-TelF+A和MIP-AAGGpure形成的连接点)的丰度的定量显示了输入DNA量和相应阈值循环(Ct值)之间的良好线性。测定系数(r2)为0.999和0.997。Quantification of the abundance of two targets (representing the linker sites formed by MIP-TelF+A and MIP-AAGGpure, respectively) by qPCR showed good linearity between the amount of input DNA and the corresponding threshold cycle (Ct value). The coefficients of determination (r²) were 0.999 and 0.997, respectively.

两条回归线的斜率分别为-3.05和-3.021。可以使用系列稀释样品以类似于计算qPCR效率的方式来计算测定的总效率。然而，MIP测定的总的综合效率并入了如实施例13中所述的所涉及的所有反应步骤的效率，即MIP杂交步骤、缺口填充-连接步骤和定量连接点步骤。相比之下，在典型的qPCR测定中通常提及的qPCR的效率仅描述了qPCR的效率。The slopes of the two regression lines were -3.05 and -3.021, respectively. The overall efficiency of the assay can be calculated using serially diluted samples in a manner similar to that used to calculate qPCR efficiency. However, the overall combined efficiency of the MIP assay incorporates the efficiency of all reaction steps involved as described in Example 13, namely the MIP hybridization step, the nick-filling-ligation step, and the quantitative junction site step. In contrast, the qPCR efficiency typically mentioned in a typical qPCR assay only describes the efficiency of the qPCR itself.

将总效率计算为图9中结果的10^(-1/斜率)。因此，针对MIP-TelF+A(端粒探针)和MIP-AAGGpure(AAGG探针)的MIP测定效率均为2.1。它们接近理想效率值2。The overall efficiency was calculated as 10^(-1/slope) of the results in Figure 9. Therefore, the MIP assay efficiency for both MIP-TelF+A (telomere probe) and MIP-AAGGpure (AAGG probe) was 2.1. This is close to the ideal efficiency value of 2.

利用这些接近理想的总效率值，通过使用给定或已知的端粒长度值(以kbp计)的样品可以容易地校准端粒长度。通过本发明的方法一起测定这些基准样品和未知TL的样品，然后通过定量基准样品和未知样品的连接点获得的指标值可以转化到以kbp为单位的TL。Using these near-ideal total efficiency values, telomere length can be easily calibrated using samples with given or known telomere length values (in kbp). These reference samples and samples with unknown telomere lengths (TLs) are measured together using the method of this invention, and the index values obtained by quantifying the junction of the reference and unknown samples can then be converted to TLs in kbp.

如先前的实施例所示，可以通过以下计算方法中的一种获得指标值：Δ-Ct方法、Δ-Δ-Ct方法和效率校正的Δ-ΔCt方法。As shown in the previous embodiments, the index value can be obtained by one of the following calculation methods: the Δ-Ct method, the Δ-Δ-Ct method, and the efficiency-corrected Δ-ΔCt method.

表VIII具有不同输入DNA量的实施例13方案的结果Table VIII Results of Example 13 with different amounts of input DNA

实施例15：使用基准样品将连接点的指标值转化到以kbp为单位的TLExample 15: Converting the index values of the connection points to TL in kbp using a reference sample

通过使用一个或多个具有给定已知TL的基准样品，可以将本发明测定法产生的指标值转化到以kbp为单位的TL。在使用类似于实施例13的程序的实验中，测定了四个已知TL的样品。它们的指标值和用于两种探针的连接点的qPCR的原始数据显示在表IX中。By using one or more reference samples with a given known TL, the index values generated by the assay of this invention can be converted to TL in kbp. In experiments using a procedure similar to that of Example 13, four samples with known TLs were measured. Their index values and raw data from qPCR for the connection points of the two probes are shown in Table IX.

TL的指标值基于如实施例V中给出的端粒(MIP-TelF+A)连接点与AAGG基序(MIP-AAGGpure)连接点的丰度的效率校正的相对比率。该批次中的参考样品分别具有23.7和24.2的相应Ct值，并且反应效率分别为1.88和1.85。The TL index value is based on an efficiency-corrected relative ratio of the abundance of telomere (MIP-TelF+A) junctions to AAGG motif (MIP-AAGGpure) junctions, as given in Example V. Reference samples in this batch had corresponding Ct values of 23.7 and 24.2, respectively, and reaction efficiencies of 1.88 and 1.85, respectively.

这些基准样品的TL与指标值之间的相关性显示在图10中。测定的相关性(r2)接近1。基准样品的TL覆盖人类样品中的TL的典型值。The correlation between the TL values of these reference samples and the index values is shown in Figure 10. The determined correlation (r²) is close to 1. The TL values of the reference samples cover typical values of TL in human samples.

基于图10中所示的回归公式，可以通过线性回归方法推导出其它未知TL的样品的TL。Based on the regression formula shown in Figure 10, the TL of other samples with unknown TL can be derived by linear regression.

表IX基准样品的连接点的定量结果Table IX Quantitative Results of Connection Points in Reference Samples

实施例16：使用数字液滴PCR(ddPCR)定量2种MIP探针的连接点Example 16: Quantification of the junction sites of two MIP probes using digital droplet PCR (ddPCR)

可以通过数字液滴PCR定量实施例2、实施例13或类似实施方案的缺口填充和连接步骤后形成的MIP产物中的连接点，以确定用于评估端粒长度的指标值。The ligation points in the MIP products formed after the gap-filling and ligation steps of Examples 2, 13, or similar embodiments can be quantified by digital droplet PCR to determine the index values used to assess telomere length.

该实施方案使用通过以下MIP探针产生的MIP产物；This implementation uses MIP products generated via the following MIP probes;

靶向端粒重复基序的探针：Probes targeting telomere repeating motifs:

MIP-TelF:/5Phos/GT TAG GGT TAG GGT TAG GGT TAC TTC AGC TTC CCG ATCCGA CGG TAG TGT TCA CAC AGG AAA CAG CTA TGA CTA GGG TTA GGG TTA GGG TTMIP-TelF:/5Phos/GT TAG GGT TAG GGT TAG GGT TAC TTC AGC TTC CCG ATCCGA CGG TAG TGT TCA CAC AGG AAA CAG CTA TGA CTA GGG TTA GGG TTA GGG TT

靶向着丝粒的α卫星序列的探针：Probes targeting centromere α-satellite sequences:

MIP-Centv2:5'/Phos/gtc TAG GTT TGA TGT GAA GAT Ata ccc gCT TCA GCTTCC CGA TCC GAC GGT agg ttT CAC ACA GGA AAC AGC TAT GAC tca cag aaA ACG TTCTGA GAA TGCMIP-Centv2:5'/Phos/gtc TAG GTT TGA TGT GAA GAT Ata ccc gCT TCA GCTTCC CGA TCC GAC GGT agg ttT CAC ACA GGA AAC AGC TAT GAC tca cag aaA ACG TTCTGA GAA TGC

在缺口填充和连接步骤后，将MIP产物稀释50倍，并且在每个ddPCR反应中使用5μl。After the gap filling and ligation steps, the MIP product was diluted 50-fold and 5 μl was used in each ddPCR reaction.

在ddPCR中使用四种引物和两种TaqMan探针。Four primers and two TaqMan probes were used in ddPCR.

M13_MIP_L2:GCGGGCAGGGCGGCtctagaTCACACAGGAAA CAGCTATGACM13_MIP_L2:GCGGGCAGGGCGGCtctagaTCACACAGGAAA CAGCTATGAC

MIP_LinkC-2Ls:GGCCCTACCGTCGGATCGGGAAGCMIP_LinkC-2Ls:GGCCCTACCGTCGGATCGGGAAGC

M13_Cent_v2-s:GGCCTATGACTCACAGAAAACGTTCTGAGM13_Cent_v2-s:GGCCTATGACTCACAGAAAACGTTCTGAG

Linker_v2-s:CTACCGTCGGATCGGGAAGLinker_v2-s:CTACCGTCGGATCGGGAAG

TaqMan探针:TaqMan probe:

Taqman-3-Telo:/56-FAM/TAG GGT TAG GGT TAG GGT T/3IABkFQ/Taqman-3-Telo:/56-FAM/TAG GGT TAG GGT TAG GGT T/3IABkFQ/

Taq-cent:/5HEX/GTC TAG GTT TGA TGT GAA GAT ATA CCC G CTT/3IABkFQ/Taq-cent:/5HEX/GTC TAG GTT TGA TGT GAA GAT ATA CCC G CTT/3IABkFQ/

在20μl的总反应体积中用以下试剂进行数字PCR(ddPCR)(括号中的值为储备浓度):Perform digital PCR (ddPCR) with the following reagents in a total reaction volume of 20 μl (values in parentheses are stock concentrations):

ddPCR热循环方案如下：The ddPCR thermal cycling protocol is as follows:

由MIP-TelF形成的端粒连接点的数字计数(图11)为551和567个拷贝/μl(一个样品二次重复反应)，并且由MIP-Centv2形成的着丝粒α卫星连接点的数字计数为95和97个拷贝/μl(一个样品二次重复反应)。因此，这些二次重复物的指标值分别为5.8和5.85。The digital counts of telomere junctions formed by MIP-TelF (Figure 11) were 551 and 567 copies/μl (two replicates per sample), and the digital counts of centromere α-satellite junctions formed by MIP-Centv2 were 95 and 97 copies/μl (two replicates per sample). Therefore, the index values for these replicates were 5.8 and 5.85, respectively.

序列信息Sequence information

SEQ ID NO:1(MIP-TelF)SEQ ID NO:1 (MIP-TelF)

5'/Phos/GT TAG GGT TAG GGT TAG GGT TAC TTC AGC TTC CCG ATC CGA CGGTAG TGT TCACAC AGG AAA CAG CTA TGA CTA GGG TTA GGG TTA GGG TT5'/Phos/GT TAG GGT TAG GGT TAG GGT TAC TTC AGC TTC CCG ATC CGA CGGTAG TGT TCACAC AGG AAA CAG CTA TGA CTA GGG TTA GGG TTA GGG TT

SEQ ID NO:2(MIP-TelF+A)SEQ ID NO:2(MIP-TelF+A)

5’Phos/GT TAG GGT TAG GGT TAG GGT TAC TTC AGC TTC CCG ATC CGA CGG TAGTGT TCA CAC AGG AAA CAG CTA TGA CTA GGG TTA GGG TTA GGG TTA5’Phos/GT TAG GGT TAG GGT TAG GGT TAC TTC AGC TTC CCG ATC CGA CGG TAGTGT TCA CAC AGG AAA CAG CTA TGA CTA GGG TTA GGG TTA GGG TTA

SEQ ID NO:3(ShortMIPtelo)SEQ ID NO:3 (ShortMIPtelo)

5’Phos/G GT TAG GGT TAG GGT TAC TTC AGC TTC CCG ATC CGA CGG TAG TGTTCA CAC AGG AAA CAG CTA TGA C GG TTA GGG TTA GGG TTA5’Phos/G GT TAG GGT TAG GGT TAC TTC AGC TTC CCG ATC CGA CGG TAG TGTTCA CAC AGG AAA CAG CTA TGA C GG TTA GGG TTA GGG TTA

SEQ ID NO:4(MIP-TelF,MIP-TelF+A的5’同源区)SEQ ID NO:4 (5' homologous region of MIP-TelF, MIP-TelF+A)

GT TAG GGT TAG GGT TAG GGT TAGT TAG GGT TAG GGT TAG GGT TA

SEQ ID NO:5(ShortMIPtelo的5’同源区)SEQ ID NO:5 (5' homologous region of ShortMIPtelo)

G GT TAG GGT TAG GGT TAG GT TAG GGT TAG GGT TAG

SEQ ID NO:6(MIP-TelF的3’同源区)SEQ ID NO:6 (3' homologous region of MIP-TelF)

TA GGG TTA GGG TTA GGG TTTA GGG TTA GGG TTA GGG TT

SEQ ID NO:7(MIP-TelF+A的3’同源区)SEQ ID NO:7 (3' homologous region of MIP-TelF+A)

TA GGG TTA GGG TTA GGG TTATA GGG TTA GGG TTA GGG TTA

SEQ ID NO:8(ShortMIPtelo的3’同源区)SEQ ID NO:8 (3' homologous region of ShortMIPtelo)

GG TTA GGG TTA GGG TTAGG TTA GGG TTA GGG TTA

靶向(AAGG)n的序列的4-bp基序的MIP探针MIP probe targeting the 4-bp motif of (AAGG)n sequence

SEQ ID NO:9(AAGG:MIP-2019AAGGpure)SEQ ID NO:9(AAGG:MIP-2019AAGGpure)

/5Phos/ggaaggaaggaaggaaggaaggTCGATCCGACAGCTTCCGTagCGgttTCACACAGGAAACAGCTATGACtcacagaaggaaggaaggaaggaaggaag/5Phos/ggaaggaaggaaggaaggaaggTCGATCCGACAGCTTCCGTagCGgttTCACACAGGAAACAGCTATGACtcacagaaggaaggaaggaaggaaggaag

qPCR中使用的另外的引物Other primers used in qPCR

SEQ ID NO:10(Long GC_Telo_F G)SEQ ID NO:10(Long GC_Telo_F G)

GGTCCGAGCCAGC TAT GAC TAG GGT TAG GGGGTCCGAGCCAGC TAT GAC TAG GGT TAG GG

SEQ ID NO:11(longGC_Link_R_tel尾)SEQ ID NO:11(longGC_Link_R_tel tail)

gcgcatgtgaATC GGG AAG CTG AAG TAA CCgcgcatgtgaATC GGG AAG CTG AAG TAA CC

SEQ ID NO:12(longGC_M13F+6)SEQ ID NO:12(longGC_M13F+6)

GCTGCCTCGC AGG AAA CAG CTA TGA CTC ACA GGCTGCCTCGC AGG AAA CAG CTA TGA CTC ACA G

SEQ ID NO:13(longGC_AAGG_R)SEQ ID NO:13(longGC_AAGG_R)

GGTGCGTCGC GCT GTC GGA TCG ACC TTC CTTGGTGCGTCGC GCT GTC GGA TCG ACC TTC CTT

Taqman探针Taqman probe

SEQ ID NO:14(Taqman_long_Telo(FAM))SEQ ID NO:14(Taqman_long_Telo(FAM))

5’FAM-TA GGG TTA GGG TTA GGG TTA GGGT5’FAM-TA GGG TTA GGG TTA GGG TTA GGGT

SEQ ID NO:15(Taqman_AAGG(HEX))SEQ ID NO:15(Taqman_AAGG(HEX))

5’HEX-AAG GAA GGA AGG AAG GAA GG5’HEX-AAG GAA GGA AGG AAG GAA GG

SEQ ID NO:16(longGC_M13_F)SEQ ID NO:16(longGC_M13_F)

ggcgcatggcTCA CAC AGG AAA CAG CTA TGA CggcgcatggcTCA CAC AGG AAA CAG CTA TGA C

SEQ ID NO:17(longGC_Link_R_ATGG)SEQ ID NO:17(longGC_Link_R_ATGG)

gcatggcgacaATC GGG AAG CTG AAG CCA tccatgcatggcgacaATC GGG AAG CTG AAG CCA tccat

SEQ ID NO:18(Taq-3-Telo(对于端粒MIP))SEQ ID NO:18 (Taq-3-Telo (for telomere MIP))

/56-FAM/TA GGG TTA G/ZEN/G GTT AGG GTT/3IABkFQ/56-FAM/TA GGG TTA G/ZEN/G GTT AGG GTT/3IABkFQ

SEQ ID NO:19(Taq-3-ATGG(对于ATGG MIP))SEQ ID NO:19(Taq-3-ATGG (for ATGG MIP))

/5HEX/GG ATG GAT G/ZEN/G ATG GAT GGA T/3IABkFQ/5HEX/GG ATG GAT G/ZEN/G ATG GAT GGA T/3IABkFQ

SEQ ID NO:20(MIP-ATGG)SEQ ID NO:20(MIP-ATGG)

/5Phos/AT GGA TGG ATG GAT GGA TGG ATG GCT TCA GCT TCC CGA TCC GAC GGTTAG GTT CAC ACA GGA AAC AGC TAT GAC GGA TGG ATG GAT GGA TGG AT/5Phos/AT GGA TGG ATG GAT GGA TGG ATG GCT TCA GCT TCC CGA TCC GAC GGTTAG GTT CAC ACA GGA AAC AGC TAT GAC GGA TGG ATG GAT GGA TGG AT

SEQ ID NO:21(ShortMIP4bp)SEQ ID NO:21(ShortMIP4bp)

/5Phos/G ATG GAT GGA TGG ATG GCT TCA GCT TCC CGA TCC GAC GGT TAG GTTCAC ACA GGA AAC AGC TAT GAC TGG ATG GAT GGA TGG AT/5Phos/G ATG GAT GGA TGG ATG GCT TCA GCT TCC CGA TCC GAC GGT TAG GTTCAC ACA GGA AAC AGC TAT GAC TGG ATG GAT GGA TGG AT

SEQ ID NO:22(查询[AAGG]n的MIP)SEQ ID NO:22 (Query for MIP of [AAGG]n)

5’-GAA GGAA GGAA GGAA GGAA GGGGCGCTTCAGCTTCCCGATCCGACGGTAGTGTTCA CACAGG AAA CAG CTA TGA CAA GGAA GGAA GGAA GGAA GG-3’5’-GAA GGAA GAA GAA GAA GGGGCGCTTCAGCTTCCCGATCCGACGGTAGTGTTCA CACAGG AAA CAG CTA TGA CAA GGAA GAA GAA GGAA GG-3’

SEQ ID NO:23(MIP-AAGGpure)SEQ ID NO:23(MIP-AAGGpure)

/5Phos/GG AAGG AAGG AAGG AAGG AAGG TCG ATC CGA CAG CTT CCG TAG CGGTTT CAC ACA GGA AAC AGC TAT GAC TCA CAG AAGG AAGG AAGG AAGG AAGG AAG/5Phos/GG AAGG AAGG AAGG AAGG AAGG TCG ATC CGA CAG CTT CCG TAG CGGTTT CAC ACA GGA AAC AGC TAT GAC TCA CAG AAGG AAGG AAGG AAGG AAGG AAG

SEQ ID NO:24(MIP-Centv2)SEQ ID NO:24(MIP-Centv2)

5'/Phos/gtc TAG GTT TGA TGT GAA GAT Ata ccc gCT TCA GCT TCC CGA TCCGAC GGT agg ttT CAC ACA GGA AAC AGC TAT GAC tca cag aaA ACG TTC TGA GAA TGC5'/Phos/gtc TAG GTT TGA TGT GAA GAT Ata ccc gCT TCA GCT TCC CGA TCCGAC GGT agg ttT CAC ACA GGA AAC AGC TAT GAC tca cag aaA ACG TTC TGA GAA TGC

SEQ ID NO:25(M13_MIP_L2)SEQ ID NO:25(M13_MIP_L2)

GCGGGCAGGGCGGCtctagaTCACACAGGAAACAGCTATGACGCGGGCAGGGCGGCtctagaTCACACAGGAAACAGCTATGAC

SEQ ID NO:26(MIP_LinkC-2Ls)SEQ ID NO:26(MIP_LinkC-2Ls)

GGCCCTACCGTCGGATCGGGAAGCGGCCCTACCGTCGGATCGGGAAGC

SEQ ID NO:27(M13_Cent_v2-s)SEQ ID NO:27(M13_Cent_v2-s)

GGCCTATGACTCACAGAAAACGTTCTGAGGGCCTATGACTCACAGAAAACGTTCTGAG

SEQ ID NO:28(Linker_v2-s)SEQ ID NO:28(Linker_v2-s)

CTACCGTCGGATCGGGAAGCTACCGTCGGATCGGGAAG

SEQ ID NO:29(Taq-cent)SEQ ID NO:29(Taq-cent)

/5HEX/GTC TAG GTT TGA TGT GAA GAT ATA CCC G CTT/3IABkFQ//5HEX/GTC TAG GTT TGA TGT GAA GAT ATA CCC G CTT/3IABkFQ/

SEQ ID NO:30(图1示例的LL-MIP产物序列)SEQ ID NO:30 (Example LL-MIP product sequence in Figure 1)

GCGCATGTGAATCGGGAAGCTGAAGTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAGTCATAGCTGTTTCCTGTGTGAACACTACCGTCGGATCGGGAAGCTGAAGTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAGTCATAGCTGTTTCCTGTGTGAACACTACCGTCGGATCGGGAAGCTGAAGTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAGTCATAGCTGTTTCCCGTGTGAACACTACCGTCGGATCGGGAAGCGAAGTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAGTCATAGCTGTTTCCTGTGTGAACACTACCGTCGGATCGGAAGCTGAATAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAGTCATAGCTGTTTCCTGTGTGAACACTACCGTCGGATCGGGAAGCTGAAGTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAGTCATAGCTGTTTCCTGTGTGAGCCATGCGCCGCGCATGTGAATCGGGAAGCTGAAGTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAGTCATAGCTGTTTCCTGTGTGAACACTACCGTCGGATCGGGAAGCTGAAGTAACCCTAACCCTAACCCTA ACCCTAACCCTAACCCTAGTCATAGCTGTTTCCTGTGTGAACACTACCGTCGGATCGGGAAGCTGAAGTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAGTCATAGCTGTTTCCCGTGTGAACACT ACCGTCGGATCGGGAAGCGAAGTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAGTCATAGCTGTTTCCTGTGTGAACACTACCGTCGGATCGGAAGCTGAATAACCCTAACCCTAACCCTAACCCTAACCCT AACCCTAACCCTAGTCATAGCTGTTTCCTGTGTGAACACTACCGTCGGATCGGGAAGCTGAAGTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAGTCATAGCTGTTTCCTGTGTGAGCCATGCGCC

尽管为了清楚理解的目的，已经通过说明和实施例的方式相当详细地描述了上述发明，但本领域技术人员将理解，可以在所附权利要求的范围内实施某些改变和修改。此外，本文提供的每个参考文献通过引用以其整体并入，其程度如同每个参考文献单独通过引用并入。Although the invention has been described in considerable detail by way of illustration and examples for purposes of clarity, those skilled in the art will understand that certain changes and modifications may be made within the scope of the appended claims. Furthermore, each reference provided herein is incorporated herein by reference in its entirety as if it were incorporated individually.

引用的参考文献References cited

Absalan,Farnaz,and Mostafa Ronaghi.2007.“Molecular Inversion ProbeAssay.”Methods in Molecular Biology.doi.org/10.1385/1-59745-515-6:315.Absalan, Farnaz, and Mostafa Ronaghi. 2007. “Molecular Inversion ProbeAssay.” Methods in Molecular Biology.doi.org/10.1385/1-59745-515-6:315.

Cawthon,Richard.2012.Methods of predicting mortality risk bydetermining telomere length.EP2474822A1.Cawthon,Richard.2012.Methods of predicting mortality risk by determining telomere length.EP2474822A1.

Cawthon,Richard M.2009.“Telomere Length Measurement by a NovelMonochrome Multiplex Quantitative PCR Method.”Nucleic Acids Research.doi.org/10.1093/nar/gkn1027.Cawthon, Richard M. 2009. “Telomere Length Measurement by a NovelMonochrome Multiplex Quantitative PCR Method.” Nucleic Acids Research.doi.org/10.1093/nar/gkn1027.

Cawthon,Richard M.2010.Monochrome multiplex quantitativepcr.WO2010075413A1.Cawthon, Richard M. 2010. Monochrome multiplex quantitative PCR. WO2010075413A1.

Cawthon,Richard M.2010.Reducing non-target nucleic acid dependentamplifications:amplifying repetitive nucleic acid sequences.US7695904.Cawthon, Richard M. 2010. Reducing non-target nucleic acid dependent amplifications: amplifying repetitive nucleic acid sequences. US7695904.

Cawthon,Richard M.,Richard A.Kerber,Sandra J.Hasstedt,and ElizabethO’Brien.2011.Methods and kits for determining biological age and longevitybased on gene expression profiles.US20110207128A1.Cawthon, Richard M., Richard A. Kerber, Sandra J. Hasstedt, and Elizabeth O’Brien. 2011. Methods and kits for determining biological age and longevitybased on gene expression profiles. US20110207128A1.

Cawthon,R.M.2002.“Telomere Measurement by Quantitative PCR.”NucleicAcids Research.doi.org/10.1093/nar/30.10.e47.Cawthon, R.M. 2002. “Telomere Measurement by Quantitative PCR.” NucleicAcids Research.doi.org/10.1093/nar/30.10.e47.

Hardenbol,Paul,Johan Banér,Maneesh Jain,Mats Nilsson,EugeniA.Namsaraev,George A.Karlin-Neumann,Hossein Fakhrai-Rad,et al.2003.“Multiplexed Genotyping with Sequence-Tagged Molecular Inversion Probes.”Nature Biotechnology.doi.org/10.1038/nbt821.Hardenbol, Paul, Johan Banér, Maneesh Jain, Mats Nilsson, Eugeni A. Namsaraev, George A. Karlin-Neumann, Hossein Fakhrai-Rad, et al. 2003. "Multiplexed Genotyping with Sequence-Tagged Molecular Inversion Probes." Nature Biotechnology.doi.org/10.1038/nbt821.

Harley,Calvin.2014.Saliva-derived measures of telomere abundance andsample collection device.US20140370505A1Harley,Calvin.2014.Saliva-derived measures of telomere abundance and sample collection device.US20140370505A1

Harley,Calvin,Jue Lin,and Karl Guegler.2018.Multiplex quantitativePCR.US9944978.Harley,Calvin,Jue Lin,and Karl Guegler.2018.Multiplex quantitativePCR.US9944978.

Keefe,David,Sherman Weissman,Lin Liu,Fang Wang,and Xinghua Pan.2016.Amethod for a single cell analysis of telomere length.US20160032360A1.Keefe, David, Sherman Weissman, Lin Liu, Fang Wang, and Xinghua Pan. 2016. A method for a single cell analysis of telomere length. US20160032360A1.

Litterst,Claudia,Austin P.SO,and Duc Do.2014.Digital assays with areporter for amplicon length.WO2014031908A1.Litterst,Claudia,Austin P.SO,and Duc Do.2014.Digital assays with areporter for amplicon length.WO2014031908A1.

Litterst,Claudia,and Luis A.Ugozzoli.2016.Digital assay for telomerelength.US9347094B2.Litterst,Claudia,and Luis A.Ugozzoli.2016.Digital assay for telomerelength.US9347094B2.

Martin-Ruiz,Carmen M.,Duncan Baird,Laureline Roger,Petra Boukamp,Damir Krunic,Richard Cawthon,Martin M.Dokter,et al.2015.“Reproducibility ofTelomere Length Assessment:An International Collaborative Study.”International Journal of Epidemiology.doi.org/10.1093/ije/dyu191.Martin-Ruiz, Carmen M., Duncan Baird, Laureline Roger, Petra Boukamp, Damir Krunic, Richard Cawthon, Martin M. Dokter, et al. 2015. “Reproducibili ty ofTelomere Length Assessment:An International Collaborative Study.”International Journal of Epidemiology.doi.org/10.1093/ije/dyu191.

Nilsson,Mats,Helena Malmgren,Martina Samiotaki,Marek Kwiatkowski,Bhanu P.Chowdhary,and Ulf Landegren.1994.“Padlock Probes:CircularizingOligonucleotides for Localized DNA Detection.”Science.doi.org/10.1126/science.7522346.Nilsson, Mats, Helena Malmgren, Martina Samiotaki, Marek Kwiatkowski, Bhanu P. Chowdhary, and Ulf Landegren. 1994. “Padlock Probes: Circularizing Oligonucleotides for Localized DNA Detection.” Science.doi.org/10.1126/science.7522346.

Syvanen,Ann Christine.2005.“Toward Genome-Wide Snp Genotyping.”NatureGenetics.doi.org/10.1038/ng1558.Syvanen, Ann Christine. 2005. “Toward Genome-Wide Snp Genotyping.” NatureGenetics.doi.org/10.1038/ng1558.

序列表sequence list

<110> 香港中文大学(The Chinese University of Hong Kong)<110> The Chinese University of Hong Kong

<120> 定量端粒长度和基因组基序的方法<120> Methods for Quantifying Telomere Length and Genomic Motifs

<150> 62/856,449<150> 62/856,449

<151> 2019-06-03<151> 2019-06-03

<160> 30<160> 30

<170> SIPOSequenceListing 1.0<170> SIPOSequenceListing 1.0

<210> 1<210> 1

<211> 91<211> 91

<212> DNA/RNA<212> DNA/RNA

<213> 人工序列(Artificial Sequence)<213> Artificial Sequence

<220><220>

<221> misc_feature<221> misc_feature

<222> (1)..(1)<222> (1)..(1)

<223> 磷酸化<223> Phosphorylation

<400> 1<400> 1

gttagggtta gggttagggt tacttcagct tcccgatccg acggtagtgt tcacacagga 60gttagggtta gggttagggt tacttcagct tcccgatccg acggtagtgt tcacacagga 60

aacagctatg actagggtta gggttagggt t 91aacagctatg actagggtta gggttagggt t 91

<210> 2<210> 2

<211> 92<211> 92

<212> DNA/RNA<212> DNA/RNA

<213> 人工序列(Artificial Sequence)<213> Artificial Sequence

<220><220>

<221> misc_feature<221> misc_feature

<222> (1)..(1)<222> (1)..(1)

<223> 磷酸化<223> Phosphorylation

<400> 2<400> 2

aacagctatg actagggtta gggttagggt ta 92aacagctatg actagggtta gggttagggt ta 92

<210> 3<210> 3

<211> 84<211> 84

<212> DNA/RNA<212> DNA/RNA

<213> 人工序列(Artificial Sequence)<213> Artificial Sequence

<220><220>

<221> misc_feature<221> misc_feature

<222> (1)..(1)<222> (1)..(1)

<223> 磷酸化<223> Phosphorylation

<400> 3<400> 3

ggttagggtt agggttactt cagcttcccg atccgacggt agtgttcaca caggaaacag 60ggttagggtt aggggttatactt cagcttcccg atccgacggt agtgttcaca caggaaacag 60

ctatgacggt tagggttagg gtta 84ctatgacggt tagggttagg gtta 84

<210> 4<210> 4

<211> 22<211> 22

<212> DNA/RNA<212> DNA/RNA

<213> 人工序列(Artificial Sequence)<213> Artificial Sequence

<400> 4<400> 4

gttagggtta gggttagggt ta 22gttagggtta gggttagggt ta 22

<210> 5<210> 5

<211> 17<211> 17

<212> DNA/RNA<212> DNA/RNA

<213> 人工序列(Artificial Sequence)<213> Artificial Sequence

<400> 5<400> 5

ggttagggtt agggtta 17ggttagggtt agggtta 17

<210> 6<210> 6

<211> 19<211> 19

<212> DNA/RNA<212> DNA/RNA

<213> 人工序列(Artificial Sequence)<213> Artificial Sequence

<400> 6<400> 6

tagggttagg gttagggtt 19tagggttagg gttagggtt 19

<210> 7<210> 7

<211> 20<211> 20

<212> DNA/RNA<212> DNA/RNA

<213> 人工序列(Artificial Sequence)<213> Artificial Sequence

<400> 7<400> 7

tagggttagg gttagggtta 20tagggttagg gttagggtta 20

<210> 8<210> 8

<211> 17<211> 17

<212> DNA/RNA<212> DNA/RNA

<213> 人工序列(Artificial Sequence)<213> Artificial Sequence

<400> 8<400> 8

ggttagggtt agggtta 17ggttagggtt agggtta 17

<210> 9<210> 9

<211> 99<211> 99

<212> DNA/RNA<212> DNA/RNA

<213> 人工序列(Artificial Sequence)<213> Artificial Sequence

<220><220>

<221> misc_feature<221> misc_feature

<222> (1)..(1)<222> (1)..(1)

<223> 磷酸化<223> Phosphorylation

<400> 9<400> 9

ggaaggaagg aaggaaggaa ggtcgatccg acagcttccg tagcggtttc acacaggaaa 60ggaaggaagg aaggaaggaa ggtcgatccg acagcttccg tagcggtttc acacaggaaa 60

cagctatgac tcacagaagg aaggaaggaa ggaaggaag 99cagctatgac tcacagaagg aaggaaggaa ggaaggaag 99

<210> 10<210> 10

<211> 30<211> 30

<212> DNA/RNA<212> DNA/RNA

<213> 人工序列(Artificial Sequence)<213> Artificial Sequence

<400> 10<400> 10

ggtccgagcc agctatgact agggttaggg 30ggtccgagcc agctatgact agggttaggg 30

<210> 11<210> 11

<211> 30<211> 30

<212> DNA/RNA<212> DNA/RNA

<213> 人工序列(Artificial Sequence)<213> Artificial Sequence

<400> 11<400> 11

gcgcatgtga atcgggaagc tgaagtaacc 30gcgcatgtga atcgggaagc tgaagtaacc 30

<210> 12<210> 12

<211> 32<211> 32

<212> DNA/RNA<212> DNA/RNA

<213> 人工序列(Artificial Sequence)<213> Artificial Sequence

<400> 12<400> 12

gctgcctcgc aggaaacagc tatgactcac ag 32gctgcctcgc aggaaacagc tatgactcac ag 32

<210> 13<210> 13

<211> 31<211> 31

<212> DNA/RNA<212> DNA/RNA

<213> 人工序列(Artificial Sequence)<213> Artificial Sequence

<400> 13<400> 13

ggtgcgtcgc gctgtcggat cgaccttcct t 31ggtgcgtcgc gctgtcggat cgaccttcct t 31

<210> 14<210> 14

<211> 24<211> 24

<212> DNA/RNA<212> DNA/RNA

<213> 人工序列(Artificial Sequence)<213> Artificial Sequence

<220><220>

<221> misc_feature<221> misc_feature

<222> (1)..(1)<222> (1)..(1)

<223> FAM<223> FAM

<400> 14<400> 14

tagggttagg gttagggtta gggt 24tagggttagg gttagggtta gggt 24

<210> 15<210> 15

<211> 20<211> 20

<212> DNA/RNA<212> DNA/RNA

<213> 人工序列(Artificial Sequence)<213> Artificial Sequence

<220><220>

<221> misc_feature<221> misc_feature

<222> (1)..(1)<222> (1)..(1)

<223> HEX<223> HEX

<400> 15<400> 15

aaggaaggaa ggaaggaagg 20aaggaaggaa ggaaggaagg 20

<210> 16<210> 16

<211> 32<211> 32

<212> DNA/RNA<212> DNA/RNA

<213> 人工序列(Artificial Sequence)<213> Artificial Sequence

<400> 16<400> 16

ggcgcatggc tcacacagga aacagctatg ac 32ggcgcatggc tcacacagga aacagctatg ac 32

<210> 17<210> 17

<211> 34<211> 34

<212> DNA/RNA<212> DNA/RNA

<213> 人工序列(Artificial Sequence)<213> Artificial Sequence

<400> 17<400> 17

gcatggcgac aatcgggaag ctgaagccat ccat 34gcatggcgac aatcgggaag ctgaagccat ccat 34

<210> 18<210> 18

<211> 19<211> 19

<212> DNA/RNA<212> DNA/RNA

<213> 人工序列(Artificial Sequence)<213> Artificial Sequence

<220><220>

<221> misc_feature<221> misc_feature

<222> (1)..(1)<222> (1)..(1)

<223> 6-FAM<223> 6-FAM

<220><220>

<221> misc_feature<221> misc_feature

<222> (9)..(9)<222> (9)..(9)

<223> ZEN<223> ZEN

<220><220>

<221> misc_feature<221> misc_feature

<222> (19)..(19)<222> (19)..(19)

<223> IABkFQ<223> IABkFQ

<400> 18<400> 18

tagggttagg gttagggtt 19tagggttagg gttagggtt 19

<210> 19<210> 19

<211> 20<211> 20

<212> DNA/RNA<212> DNA/RNA

<213> 人工序列(Artificial Sequence)<213> Artificial Sequence

<220><220>

<221> misc_feature<221> misc_feature

<222> (1)..(1)<222> (1)..(1)

<223> HEX<223> HEX

<220><220>

<221> misc_feature<221> misc_feature

<222> (9)..(9)<222> (9)..(9)

<223> ZEN<223> ZEN

<220><220>

<221> misc_feature<221> misc_feature

<222> (20)..(20)<222> (20)..(20)

<223> IABkFQ<223> IABkFQ

<400> 19<400> 19

ggatggatgg atggatggat 20ggatggatgg atggatggat 20

<210> 20<210> 20

<211> 94<211> 94

<212> DNA/RNA<212> DNA/RNA

<213> 人工序列(Artificial Sequence)<213> Artificial Sequence

<220><220>

<221> misc_feature<221> misc_feature

<222> (1)..(1)<222> (1)..(1)

<223> 磷酸化<223> Phosphorylation

<400> 20<400> 20

atggatggat ggatggatgg atggcttcag cttcccgatc cgacggttag gttcacacag 60atggatggat ggatggatgg atggcttcag cttcccgatc cgacggttag gttcacacag 60

gaaacagcta tgacggatgg atggatggat ggat 94gaaacagcta tgacggatgg atggatggat ggat 94

<210> 21<210> 21

<211> 84<211> 84

<212> DNA/RNA<212> DNA/RNA

<213> 人工序列(Artificial Sequence)<213> Artificial Sequence

<220><220>

<221> misc_feature<221> misc_feature

<222> (1)..(1)<222> (1)..(1)

<223> 磷酸化<223> Phosphorylation

<400> 21<400> 21

gatggatgga tggatggctt cagcttcccg atccgacggt taggttcaca caggaaacag 60gatggatgga tggatggctt cagcttcccg atccgacggt taggttcaca caggaaacag 60

ctatgactgg atggatggat ggat 84ctatgactgg atggatggat ggat 84

<210> 22<210> 22

<211> 95<211> 95

<212> DNA/RNA<212> DNA/RNA

<213> 人工序列(Artificial Sequence)<213> Artificial Sequence

<400> 22<400> 22

gaaggaagga aggaaggaag gggcgcttca gcttcccgat ccgacggtag tgttcacaca 60gaaggaagga aggaaggaag gggcgcttca gcttcccgat ccgacggtag tgttcacaca 60

ggaaacagct atgacaagga aggaaggaag gaagg 95ggaaacagct atgacaagga aggaaggaag gaagg 95

<210> 23<210> 23

<211> 99<211> 99

<212> DNA/RNA<212> DNA/RNA

<213> 人工序列(Artificial Sequence)<213> Artificial Sequence

<220><220>

<221> misc_feature<221> misc_feature

<222> (1)..(1)<222> (1)..(1)

<223> 磷酸化<223> Phosphorylation

<400> 23<400> 23

<210> 24<210> 24

<211> 102<211> 102

<212> DNA/RNA<212> DNA/RNA

<213> 人工序列(Artificial Sequence)<213> Artificial Sequence

<220><220>

<221> misc_feature<221> misc_feature

<222> (1)..(1)<222> (1)..(1)

<223> 磷酸化<223> Phosphorylation

<400> 24<400> 24

gtctaggttt gatgtgaaga tatacccgct tcagcttccc gatccgacgg taggtttcac 60gtctaggttt gatgtgaaga tatacccgct tcagcttccc gatccgacgg taggtttcac 60

acaggaaaca gctatgactc acagaaaacg ttctgagaat gc 102acaggaaaca gctatgactc acaggaaacg ttctgagaat gc 102

<210> 25<210> 25

<211> 42<211> 42

<212> DNA/RNA<212> DNA/RNA

<213> 人工序列(Artificial Sequence)<213> Artificial Sequence

<400> 25<400> 25

gcgggcaggg cggctctaga tcacacagga aacagctatg ac 42gcgggcaggg cggctctaga tcacacagga aacagctatg ac 42

<210> 26<210> 26

<211> 24<211> 24

<212> DNA/RNA<212> DNA/RNA

<213> 人工序列(Artificial Sequence)<213> Artificial Sequence

<400> 26<400> 26

ggccctaccg tcggatcggg aagc 24ggccctaccg tcggatcggg aagc 24

<210> 27<210> 27

<211> 29<211> 29

<212> DNA/RNA<212> DNA/RNA

<213> 人工序列(Artificial Sequence)<213> Artificial Sequence

<400> 27<400> 27

ggcctatgac tcacagaaaa cgttctgag 29ggcctatgac tcacagaaaa cgttctgag 29

<210> 28<210> 28

<211> 19<211> 19

<212> DNA/RNA<212> DNA/RNA

<213> 人工序列(Artificial Sequence)<213> Artificial Sequence

<400> 28<400> 28

ctaccgtcgg atcgggaag 19ctaccgtcgg atcgggaag 19

<210> 29<210> 29

<211> 31<211> 31

<212> DNA/RNA<212> DNA/RNA

<213> 人工序列(Artificial Sequence)<213> Artificial Sequence

<220><220>

<221> misc_feature<221> misc_feature

<222> (1)..(1)<222> (1)..(1)

<223> HEX<223> HEX

<220><220>

<221> misc_feature<221> misc_feature

<222> (31)..(31)<222> (31)..(31)

<223> IABkFQ<223> IABkFQ

<400> 29<400> 29

gtctaggttt gatgtgaaga tatacccgct t 31gtctaggttt gatgtgaaga tatacccgct t 31

<210> 30<210> 30

<211> 556<211> 556

<212> DNA/RNA<212> DNA/RNA

<213> 人工序列(Artificial Sequence)<213> Artificial Sequence

<400> 30<400> 30

gcgcatgtga atcgggaagc tgaagtaacc ctaaccctaa ccctaaccct aaccctaacc 60gcgcatgtga atcgggaagc tgaagtaacc ctaaccctaa ccctaaccct aaccctaacc 60

ctaaccctag tcatagctgt ttcctgtgtg aacactaccg tcggatcggg aagctgaagt 120ctaaccctag tcatagctgt ttcctgtgtg aacactaccg tcggatcggg aagctgaagt 120

aaccctaacc ctaaccctaa ccctaaccct aaccctagtc atagctgttt cctgtgtgaa 180aaccctaacc ctaaccctaa ccctaaccct aaccctagtc atagctgttt cctgtgtgaa 180

cactaccgtc ggatcgggaa gctgaagtaa ccctaaccct aaccctaacc ctaaccctaa 240cactaccgtc ggatcgggaa gctgaagtaa ccctaaccct aaccctaacc ctaaccctaa 240

ccctaaccct agtcatagct gtttcccgtg tgaacactac cgtcggatcg ggaagcgaag 300ccctaaccct agtcatagct gtttcccgtg tgaacactac cgtcggatcg ggaagcgaag 300

taaccctaac cctaacccta accctaaccc taaccctagt catagctgtt tcctgtgtga 360taaccctaac cctaacccta accctaaccc taaccctagt catagctgtt tcctgtgtga 360

acactaccgt cggatcggaa gctgaataac cctaacccta accctaaccc taaccctaac 420acactaccgt cggatcggaa gctgaataac cctaacccta accctaaccc taaccctaac 420

cctaacccta gtcatagctg tttcctgtgt gaacactacc gtcggatcgg gaagctgaag 480cctaacccta gtcatagctg tttcctgtgt gaacactacc gtcggatcgg gaagctgaag 480

taaccctaac cctaacccta accctaaccc taaccctaac cctagtcata gctgtttcct 540taaccctaac cctaacccta accctaaccc taaccctaac cctagtcata gctgtttcct 540

gtgtgagcca tgcgcc 556gtgtgagcca tgcgcc 556

Claims

1. A single-stranded DNA probe for determining telomere length and/or copy number of tandem repeat sequences, said probe comprising:

i) A 5’ homologous region that extends to the 5’ end of the probe and contains a nucleotide sequence complementary to the tandem repeat sequence;

ii) Joint area; and

iii) A 3' homologous region extending to the 3' end of the probe and containing a nucleotide sequence complementary to the tandem repeat sequence, such that the 5' and 3' homologous regions can bind to the same strand of template DNA containing the tandem repeat sequence; wherein

When the 5' homologous region and the 3' homologous region bind to the same strand of the template DNA, the 3' homologous region is immediately adjacent to the 3' end of the 5' homologous region within a single repeat unit on the template, and thus the 3' end of the 3' homologous region and the 5' end of the 5' homologous region are separated by a nucleotide gap on the template DNA smaller than a complete repeat unit.

The nucleotide gap, which is less than one complete repeat unit, contains at most two different bases.

2. The probe of claim 1, wherein the tandem repeat sequence is a telomere sequence.

3. The probe of claim 2, wherein the telomere is a human telomere.

4. The probe of claim 1, wherein the nucleotide sequences of the 5' homologous region and the 3' homologous region are 100% complementary to the tandem repeat sequence.

5. The probe of claim 1, wherein the length of the 5' homologous region and the 3' homologous region is 15-25 nucleotides each.

6. The probe of claim 1, wherein each repeat unit of the tandem repeat sequence is 2-10 nucleotides in length.

7. The probe of claim 1, wherein the adapter region comprises one or more sequence elements selected from universal primer sequences, probe-specific primer sequences, TaqMan probe sequences, and tag sequences.

8. The probe of claim 1, wherein when bound to a single repeat unit on the template DNA, the 3' end of the 3' homologous region and the 5' end of the 5' homologous region are separated by one or two nucleotides.

9. The probe of claim 8, wherein one or two nucleotides comprise the base G.

10. The probe of claim 1, wherein when bound to a single repeat unit on the template DNA, the 3' end of the 3' homologous region and the 5' end of the 5' homologous region are separated by three nucleotides.

11. The probe of claim 10, wherein the three nucleotides comprise two different bases, and wherein the two different bases are A and G.

12. The probe of claim 1, wherein the 5' homologous region comprises the nucleotide sequence of SEQ ID NO:4 or SEQ ID NO:5.

13. The probe of claim 1, wherein the 3' homologous region comprises the nucleotide sequence of any one of SEQ ID NOs: 6-8.

14. The probe of claim 1, wherein the probe comprises the nucleotide sequence of any one of SEQ ID NOs:1-3.

15. A kit for determining the copy number of variable tandem repeat sequences in the genome, comprising first and second single-stranded DNA probes, wherein each of the first and second single-stranded DNA probes comprises:

i) A 5’ homologous region that extends to the 5’ end of the DNA probe and contains a nucleotide sequence complementary to the tandem repeat sequence;

ii) Joint area; and

iii) A 3’ homologous region extending to the 3’ end of the DNA probe and containing a nucleotide sequence complementary to the tandem repeat sequence, such that the 5’ homologous region and the 3’ homologous region can bind to the same strand of template DNA containing the tandem repeat sequence;

When the 5' homologous region and the 3' homologous region bind to the same strand of the template DNA, the 3' homologous region is immediately adjacent to the 3' end of the 5' homologous region within a single repeat unit on the template DNA. Therefore, the 3' end of the 3' homologous region and the 5' end of the 5' homologous region are separated by a nucleotide gap on the template DNA smaller than one complete repeat unit, the nucleotide gap containing at most two different bases.

The 5' and 3' homologous regions of the first single-stranded DNA probe are complementary to the first tandem repeat sequence whose copy number varies among individuals.

The 5' and 3' homologous regions of the second single-stranded DNA probe are complementary to its second tandem repeat sequence, the copy number of which remains unchanged between individuals.

The nucleotide gap of the first single-stranded DNA probe contains at most two bases that are also contained in the nucleotide gap of the second single-stranded DNA probe.

16. The kit of claim 15, further comprising a reaction mixture comprising DNA polymerase, an enzyme comprising ligase activity, and deoxyribonucleoside triphosphates comprising up to two bases corresponding to the nucleotide gaps of the first and second single-stranded DNA probes.

17. The kit of claim 16, wherein the DNA polymerase is T4 DNA polymerase.

18. The kit of claim 16, wherein the ligase activity is provided by Amp ligase.

19. The kit of claim 16, wherein the reaction mixture does not contain exonuclease.

20. The kit of claim 15, further comprising genomic DNA of the test sample.

21. The kit of claim 15, further comprising universal primers, two or more probe-specific primers, or two or more TaqMan probes that are complementary to sequences within the adapter region of the first and/or second single-stranded DNA probe.

22. The kit of claim 15, wherein the tandem repeat sequence identified by the 5' and 3' homologous regions of the first single-stranded DNA probe is a telomere sequence.

23. The kit of claim 22, wherein the telomeres are human telomeres.

24. The kit of claim 15, wherein the tandem repeat sequence identified by the 5' and 3' homologous regions of the second single-stranded DNA probe is a short tandem repeat of 4 bp.

25. The kit of claim 15, wherein the nucleotide sequences of the 5' and 3' homologous regions of the first single-stranded DNA probe are 100% complementary to tandem repeat sequences in the genome whose copy number varies between individuals.

26. The kit of claim 15, wherein the nucleotide sequences of the 5' and 3' homologous regions of the second single-stranded DNA probe are 100% complementary to a tandem repeat sequence in the genome with a copy number stable between individuals.

27. The kit of claim 15, wherein the length of the 5' and/or 3' homologous region of the first and/or second single-stranded DNA probe is 15 to 25 nucleotides.

28. The kit of claim 15, wherein the repeat unit of the tandem repeat sequence recognized by the first and/or second single-stranded DNA probe is 3-6 nucleotides in length.

29. A non-diagnostic method for determining the length of variable-length tandem repeat regions in an individual's genome, the method comprising:

i) Provide a first single-stranded DNA probe and a second single-stranded DNA probe, each of the first and second single-stranded DNA probes comprising:

(1) A 5’ homologous region that extends to the 5’ end of the DNA probe and contains a nucleotide sequence complementary to the tandem repeat sequence;

(2) Joint area; and

(3) A 3’ homologous region that extends to the 3’ end of the DNA probe and contains a nucleotide sequence complementary to the tandem repeat sequence, such that the 5’ homologous region and the 3’ homologous region can bind to the same strand of template DNA containing the tandem repeat sequence.

The 5' and 3' homologous regions of the first single-stranded DNA probe are complementary to the first tandem repeat sequence whose copy number varies across individual genomes.

The 5' and 3' homologous regions of the second single-stranded DNA probe are complementary to its second tandem repeat sequence, the copy number of which remains unchanged across individual genomes.

The nucleotide gap of the first single-stranded DNA probe contains at most two bases that are also contained in the nucleotide gap of the second single-stranded DNA probe;

ii) Under conditions conducive to the extension of the 3' end of the DNA probe and its ligation to the 5' end of the DNA probe bound to the same template, in the presence of DNA polymerase, ligase activity, and at most two deoxyribonucleoside triphosphates corresponding to the bases contained in the denucleotide nicks of the first and second single-stranded DNA probes, a biological sample from the individual is contacted with a plurality of the first single-stranded DNA probes and a plurality of the second single-stranded DNA probes to extend the 3' ends of the first and second single-stranded DNA probes and ligate them to the 5' ends of the first and second single-stranded DNA probes, respectively, thereby producing first and second circularized and ligated linear probe products;

iii) Quantifying the first and second cyclized and connected linear probe products generated in step ii); and

iv) Normalize the amount of the first circularized and ligated linear probe product relative to the amount of the second circularized and ligated linear probe product to determine the copy number of the first tandem repeat sequence in the individual genome.

30. The method of claim 29, wherein the nucleotide sequences of the 5' and 3' homologous regions of the first and second single-stranded DNA probes are 100% complementary to the first and second tandem repeat sequences, respectively.

31. The method of claim 29, wherein the length of the 5' and 3' homologous regions of the probe is 15-25 nucleotides.

32. The method of claim 29, wherein the first tandem repeat sequence region is a telomere.

33. The method of claim 32, wherein the telomeres are human telomeres.

34. The method of claim 29, wherein the adapter region of each probe comprises one or more sequence elements selected from universal primer sequences, probe-specific primer sequences, TaqMan probe sequences, and tag sequences.

35. The method of claim 29, wherein when bound within a single repeat unit on the tandem repeat sequence, the 3' end of the 3' homologous region and the 5' end of the 5' homologous region of the probe are separated by a gap of one or two nucleotides, wherein the one or two nucleotides comprise a single base, and wherein only one deoxyribonucleoside triphosphate corresponding to the single base is provided in step ii).

36. The method of claim 35, wherein the single base is G.

37. The method of claim 29, wherein when bound within a single repeat unit on the tandem repeat sequence, the 3' end of the 3' homologous region and the 5' end of the 5' homologous region of the probe in step i) are separated by a 3-nucleotide gap, wherein the 3 nucleotides comprise 2 bases, and wherein in step ii) two deoxyribonucleoside triphosphates corresponding to the 2 bases are provided.

38. The method of claim 37, wherein the two bases are A and G.

39. The method of claim 29, wherein the 5' homologous region of the probe in step i) comprises the nucleotide sequence of SEQ ID NO:4 or SEQ ID NO:5.

40. The method of claim 29, wherein the 3' homologous region of the probe in step i) comprises the nucleotide sequence of any one of SEQ ID NOs: 6-8.

41. The method of claim 29, wherein the probe in step i) comprises the nucleotide sequence of any one of SEQ ID NOs:1-3.

42. The method of claim 29, wherein a large excess of the probe in step i) is provided relative to the amount of genomic DNA in the biological sample.

43. The method of claim 29, wherein no exonuclease is added during steps i) to iv).

44. The method of claim 29, wherein the only exonuclease added during steps i) to iv) is exonuclease I, and wherein the level of exonuclease I does not exceed 20 units/50 ng genomic DNA and is present for a maximum of 1 hour.

45. The method of claim 29, wherein the amount of circularized and ligated linear probe product of the probe is determined using a method selected from quantitative PCR, digital PCR, and sequencing.