HK1223660B - Sequencing library construction method, and kit and application thereof - Google Patents
Sequencing library construction method, and kit and application thereof Download PDFInfo
- Publication number
- HK1223660B HK1223660B HK16112031.8A HK16112031A HK1223660B HK 1223660 B HK1223660 B HK 1223660B HK 16112031 A HK16112031 A HK 16112031A HK 1223660 B HK1223660 B HK 1223660B
- Authority
- HK
- Hong Kong
- Prior art keywords
- sequence
- sequencing
- transposase
- tag
- kit
- Prior art date
Links
Description
技术领域Technical Field
本发明涉及测序技术领域,尤其涉及一种测序文库的构建方法及试剂盒和应用。The present invention relates to the field of sequencing technology, and in particular to a method for constructing a sequencing library, a kit and applications thereof.
背景技术Background Art
新一代测序技术以高通量、低成本的优势,自出现之日起就倍受欢迎。随着技术的发展,新一代测序技术在许多科学研究和临床检测方面都有应用。Next-generation sequencing technology has been extremely popular since its introduction due to its high throughput and low cost. With the development of technology, next-generation sequencing technology has been applied in many scientific research and clinical testing areas.
目前很多科学研究与临床应用需要在单个细胞水平进行,或者在微量水平进行。在单细胞水平分析DNA遗传变异信息,判断细胞、胚胎或个体是否患病或携带疾病基因,亦是常见的研究方法。例如,在辅助生殖技术中的植入前诊断(Preimplantation GeneticDiagnosis),涉及对配子细胞、单个卵裂球细胞或胚胎细胞进行DNA遗传检测,判断受精卵或胚胎的基因型是否正常,选取正常的胚胎进行植入。Currently, many scientific research and clinical applications need to be conducted at the single cell level or at the trace level. Analyzing DNA genetic variation information at the single cell level to determine whether a cell, embryo, or individual is diseased or carries a disease gene is also a common research method. For example, preimplantation genetic diagnosis (PGD) in assisted reproductive technology involves performing DNA genetic testing on gametes, single blastomere cells, or embryonic cells to determine whether the genotype of the fertilized egg or embryo is normal and select normal embryos for implantation.
测序仪方面,Life Technologies公司研发的Ion Proton测序仪采用新一代半导体测序技术,以小型化、快速的优势得到了广泛的欢迎。In terms of sequencers, the Ion Proton sequencer developed by Life Technologies uses a new generation of semiconductor sequencing technology and has been widely welcomed for its advantages of miniaturization and speed.
在实际利用Ion Proton测序仪完成单细胞检测染色体数目异常(CNV)中,很多时候对时效性要求相当高,例如在辅助生殖技术中的植入前诊断,在胚胎不冻存的情况下,需要24小时得出检测结果,这就需要在检测的每一个环节都尽可能缩短时间。In the actual use of the Ion Proton sequencer to complete single-cell detection of chromosomal nucleation abnormalities (CNV), timeliness is often very demanding. For example, in preimplantation diagnosis in assisted reproductive technology, if the embryos are not frozen, it takes 24 hours to obtain the test results. This requires shortening the time as much as possible at every step of the test.
测序文库的构建是通过测序实现单细胞染色体数目异常检测的必经步骤,传统的测序文库构建方法主要是通过打断仪(如Covaris)等对基因组DNA等靶DNA进行机械打断,然后通过末端修复、添加接头等步骤实现(如图1中左侧的PF文库构建流程所示)。基于机械打断的片段随机性良好,但是通量上也要依赖大量的Covaris打断仪,同时需要后续单独进行末端处理、加接头和PCR以及各种纯化操作。Construction of a sequencing library is an essential step in detecting single-cell chromosome number abnormalities through sequencing. Traditional sequencing library construction methods primarily involve mechanically fragmenting target DNA, such as genomic DNA, using a fragmentation instrument (such as the Covaris), followed by end repair and adapter addition (as shown in the PF library construction process on the left in Figure 1). Mechanically fragmented fragments exhibit high randomness, but throughput also relies on a large number of Covaris fragmentation instruments, requiring subsequent separate end processing, adapter addition, PCR, and various purification operations.
通过转座酶同时实现DNA片段化和接头的添加,完成测序文库构建的方法已经有报道,比如国际专利申请WO2010/048605公开了一种转座子末端组合物和方法,能够用于测序文库的构建,这种方法能够减少样品处理的时间。但是,由于通过转座酶实现的DNA片段化和接头的添加只能在靶DNA的5’端加上标签,还需要一步DNA聚合酶等核酸修饰酶催化的3’端标签序列添加,才能得到双接头的DNA文库,这不可避免的增加了文库构建的耗时。Methods for constructing sequencing libraries by simultaneously fragmenting DNA and adding adapters using transposases have been reported. For example, International Patent Application WO2010/048605 discloses a transposon end composition and method for constructing sequencing libraries, which can reduce sample processing time. However, because DNA fragmentation and adapter addition achieved by transposases only add tags to the 5' end of the target DNA, a further step, catalyzed by a nucleic acid-modifying enzyme such as a DNA polymerase, is required to add a tag sequence to the 3' end to produce a double-adapted DNA library. This inevitably increases the time required for library construction.
发明内容Summary of the Invention
本发明提供一种测序文库的构建方法及试剂盒和应用,所述测序文库的构建方法使用转座酶实现一步法打断DNA并在5’端和3’端分别加入测序接头,测序接头中包含样品标签信息,能够同时实现不同来源样品的测序,进一步实现染色体数目异常检测,该方法比常规文库构建方法节省时间,并且操作简单,对实验设备和反应条件的要求较低,利于新一代测序检测单细胞或微量DNA技术的推广应用。The present invention provides a method for constructing a sequencing library, a kit, and applications thereof. The sequencing library construction method uses a transposase to achieve a one-step DNA fragmentation and adds sequencing adapters to the 5' and 3' ends, respectively. The sequencing adapters contain sample label information, and can simultaneously sequence samples from different sources, further realizing the detection of abnormal chromosome numbers. Compared with conventional library construction methods, this method saves time, is simple to operate, and has lower requirements for experimental equipment and reaction conditions, thus facilitating the promotion and application of next-generation sequencing technology for detecting single cells or trace amounts of DNA.
根据本发明的第一方面,本发明提供一种测序文库的构建方法,所述方法包括:将靶DNA与转座酶包埋复合物在转座反应的条件下孵育,产生两端带有双接头的DNA文库;其中,所述转座酶包埋复合物包括转座酶、转座酶识别序列互补序列、第一测序接头序列和第二测序接头序列,所述第一测序接头序列包括5’端的第一测序标签序列和3’端的转座酶识别序列,所述第二测序接头序列包括5’端的第二测序标签序列、样品标签序列和3’端的转座酶识别序列。According to a first aspect of the present invention, a method for constructing a sequencing library is provided, comprising: incubating target DNA with a transposase-embedded complex under transposition reaction conditions to generate a DNA library with double adapters at both ends; wherein the transposase-embedded complex comprises a transposase, a sequence complementary to a transposase recognition sequence, a first sequencing adapter sequence, and a second sequencing adapter sequence, wherein the first sequencing adapter sequence comprises a first sequencing tag sequence at the 5' end and a transposase recognition sequence at the 3' end, and the second sequencing adapter sequence comprises a second sequencing tag sequence at the 5' end, a sample tag sequence, and a transposase recognition sequence at the 3' end.
本发明所用的靶DNA可以是基因组DNA或扩增的DNA,如全基因组扩增的DNA。其中,基因组DNA的样品来源可以是人类单细胞、少数几个细胞或微量DNA样品等。细胞类型可以是植入前遗传检测的胚胎细胞,癌症研究的单个肿瘤细胞,产前诊断的母体外周血有核红细胞、血浆、羊水,病理学研究的组织切片等。The target DNA used in the present invention can be genomic DNA or amplified DNA, such as whole genome amplified DNA. The genomic DNA sample can be derived from a single human cell, a small number of cells, or a trace DNA sample. The cell type can include embryonic cells for preimplantation genetic testing, single tumor cells for cancer research, maternal peripheral blood nucleated red blood cells, plasma, amniotic fluid for prenatal diagnosis, or tissue sections for pathological studies.
本发明中,所述的全基因组扩增是指对单个细胞、几个细胞或微量核酸样品进行全基因组范围的扩增,其方法可以是部分随机引物扩增(Degenerate OligonucleotidePrimer PCR,缩写DOP-PCR)、完全随机引物扩增(Primer Extension PreamplificationPCR,缩写PEP-PCR)、多重链置换扩增(Multiple Displacement Amplification,缩写MDA)、OmniPlex WGA等方法中的任一种。也可采用商业试剂盒如QIAgen公司的REPLI-g,SigmaAldrich公司的GenomePlex WGA,New England Biolabs公司的Sureplex,RubiconGenomics公司的PicoPlex WGA,GE Healthcare公司的illustra Genomiphi V2等试剂盒中的任一种。In the present invention, whole genome amplification refers to whole genome amplification of a single cell, several cells or a trace nucleic acid sample, and the method can be any of the methods such as partial random primer amplification (Degenerate Oligonucleotide Primer PCR, abbreviated as DOP-PCR), complete random primer amplification (Primer Extension Preamplification PCR, abbreviated as PEP-PCR), multiple strand displacement amplification (Multiple Displacement Amplification, abbreviated as MDA), OmniPlex WGA, etc. Commercial kits such as QIAgen's REPLI-g, SigmaAldrich's GenomePlex WGA, New England Biolabs' Sureplex, RubiconGenomics' PicoPlex WGA, GE Healthcare's illustra Genomiphi V2, etc. can also be used.
本发明的方法可对新一代高通量半导体测序平台产生的测序序列进行染色体拷贝数分析。其中,新一代高通量半导体测序平台包括并不限于IonTorrentTM和Ion ProtonTM测序平台。The method of the present invention can perform chromosome copy number analysis on sequencing sequences generated by a new generation of high-throughput semiconductor sequencing platforms, including but not limited to the IonTorrent ™ and Ion Proton ™ sequencing platforms.
本发明中,所述样品标签序列为随机序列,优选6-14个碱基的随机序列,更优选10个碱基的随机序列,由于随机序列的每一个位点均有A、T、C和G四种选择,理论上随机序列若有N个碱基可产生4N个样品标签序列,因此10个碱基的随机序列足够标记测序样品。In the present invention, the sample tag sequence is a random sequence, preferably a random sequence of 6-14 bases, more preferably a random sequence of 10 bases. Since each site of the random sequence has four options: A, T, C and G, theoretically, if the random sequence has N bases, 4 N sample tag sequences can be generated. Therefore, a random sequence of 10 bases is sufficient to label the sequencing sample.
作为本发明的优选技术方案,所述第二测序接头序列在样品标签序列与3’端的转座酶识别序列之间还包括测序特殊碱基“GAT”。在样品标签序列后加入三个碱基“GAT”,避免出现两个连续的C,以免在后续分析时造成标签识别出错。As a preferred technical solution of the present invention, the second sequencing adapter sequence also includes the sequencing-specific base "GAT" between the sample tag sequence and the transposase recognition sequence at the 3' end. Adding three bases "GAT" after the sample tag sequence avoids two consecutive Cs, which can cause tag recognition errors during subsequent analysis.
作为本发明的优选技术方案,所述第一测序标签序列和/或第二测序标签序列选自Ion TorrentTM或Ion ProtonTM测序平台的标签序列;因此,本发明的方法适用于IonTorrentTM或Ion ProtonTM测序平台。As a preferred technical solution of the present invention, the first sequencing tag sequence and/or the second sequencing tag sequence are selected from tag sequences of the Ion Torrent ™ or Ion Proton ™ sequencing platforms; therefore, the method of the present invention is applicable to the IonTorrent ™ or Ion Proton ™ sequencing platforms.
作为本发明的优选技术方案,所述转座酶识别序列为转座酶Tn5识别的19bp的嵌合端转座子末端。As a preferred technical solution of the present invention, the transposase recognition sequence is a 19 bp chimeric transposon end recognized by transposase Tn5.
作为本发明的优选技术方案,所述转座酶识别序列互补序列具有SEQ ID NO:1所示的碱基序列;所述第一测序接头序列具有SEQ ID NO:2所示的碱基序列;所述第二测序接头序列具有SEQ ID NO:3所示的碱基序列。As a preferred technical solution of the present invention, the transposase recognition sequence complementary sequence has the base sequence shown in SEQ ID NO: 1; the first sequencing adapter sequence has the base sequence shown in SEQ ID NO: 2; and the second sequencing adapter sequence has the base sequence shown in SEQ ID NO: 3.
其中,SEQ ID NO:1为5'-CTGTCTCTTATACACATCT-3'。需要说明的是,本发明的转座酶识别序列互补序列并不局限于SEQ ID NO:1所示的碱基序列,在其5’端和3’端均可以有若干附加的碱基序列。SEQ ID NO:2为:SEQ ID NO: 1 is 5'-CTGTCTCTTATACACATCT-3'. It should be noted that the complementary sequence of the transposase recognition sequence of the present invention is not limited to the base sequence shown in SEQ ID NO: 1, and there may be several additional base sequences at both the 5' end and the 3' end. SEQ ID NO: 2 is:
5'-CCACTACGCCTCCGCTTTCCTCTCTATGGGCAGTCGGTGATAGATGTGTATAAGAGACAG-3';其中,下划线部分为转座酶识别序列,非下划线部分为第一测序标签序列。需要说明的是,本发明的第一测序接头序列并不局限于SEQ ID NO:2所示的碱基序列,在转座酶识别序列和第一测序标签序列前后以及它们之间还可以有若干附加的碱基序列或连接序列。SEQ IDNO:3为:5'-CCACTACGCCTCCGCTTTCCTCTCTATGGGCAGTCGGTGAT AGATGTGTATAAGAGACAG -3'; wherein the underlined portion is the transposase recognition sequence, and the non-underlined portion is the first sequencing tag sequence. It should be noted that the first sequencing adapter sequence of the present invention is not limited to the base sequence shown in SEQ ID NO: 2. Several additional base sequences or linker sequences may be present before, after, and between the transposase recognition sequence and the first sequencing tag sequence. SEQ ID NO: 3 is:
5'-CCATCTCATCCCTGCGTGTCTCCGACTCAGNNNNNNNNNNGATAGATGTGTATAAGAGACAG-3';其中,下划线部分为转座酶识别序列,NNNNNNNNNN为样品标签序列,每个N可选自A、T、C和G中任一个,NNNNNNNNNN之前的序列为第二测序标签序列,之后的GAT为测序特殊碱基。需要说明的是,本发明的第二测序接头序列并不局限于SEQ ID NO:3所示的碱基序列,在转座酶识别序列之前和/或之后还可以有若干附加的碱基序列或连接序列,在第二测序标签序列之前还可以有若干附加的碱基序列。5'-CCATCTCATCCCTGCGTGTCTCCGACTCAGNNNNNNNNNNGAT AGATGTGTATAAGAGACAG -3'; wherein the underlined portion is the transposase recognition sequence, NNNNNNNNNN is the sample tag sequence, each N can be selected from any of A, T, C, and G, the sequence before NNNNNNNNNN is the second sequencing tag sequence, and the subsequent GAT is a special sequencing base. It should be noted that the second sequencing adapter sequence of the present invention is not limited to the base sequence shown in SEQ ID NO: 3. Several additional base sequences or linker sequences may be present before and/or after the transposase recognition sequence, and several additional base sequences may be present before the second sequencing tag sequence.
根据本发明的第二方面,本发明提供一种用于构建测序文库的试剂盒,所述试剂盒包括转座酶识别序列互补序列、第一测序接头序列和第二测序接头序列,所述第一测序接头序列包括5’端的第一测序标签序列和3’端的转座酶识别序列,所述第二测序接头序列包括5’端的第二测序标签序列、样品标签序列和3’端的转座酶识别序列。According to a second aspect of the present invention, the present invention provides a kit for constructing a sequencing library, the kit comprising a transposase recognition sequence complementary sequence, a first sequencing adapter sequence and a second sequencing adapter sequence, wherein the first sequencing adapter sequence comprises a first sequencing tag sequence at the 5' end and a transposase recognition sequence at the 3' end, and the second sequencing adapter sequence comprises a second sequencing tag sequence at the 5' end, a sample tag sequence and a transposase recognition sequence at the 3' end.
作为本发明的优选技术方案,所述样品标签序列为随机序列,优选6-14个碱基的随机序列,更优选10个碱基的随机序列。As a preferred technical solution of the present invention, the sample tag sequence is a random sequence, preferably a random sequence of 6-14 bases, more preferably a random sequence of 10 bases.
作为本发明的优选技术方案,所述第二测序接头序列在样品标签序列与3’端的转座酶识别序列之间还包括测序特殊碱基“GAT”。As a preferred technical solution of the present invention, the second sequencing adapter sequence further includes a sequencing-specific base "GAT" between the sample tag sequence and the transposase recognition sequence at the 3' end.
作为本发明的优选技术方案,所述第一测序标签序列和/或第二测序标签序列选自Ion TorrentTM或Ion ProtonTM测序平台的标签序列。As a preferred technical solution of the present invention, the first sequencing tag sequence and/or the second sequencing tag sequence are selected from tag sequences of Ion Torrent ™ or Ion Proton ™ sequencing platforms.
作为本发明的优选技术方案,所述转座酶识别序列为转座酶Tn5识别的19bp的嵌合端转座子末端。As a preferred technical solution of the present invention, the transposase recognition sequence is a 19 bp chimeric transposon end recognized by transposase Tn5.
作为本发明的优选技术方案,所述转座酶识别序列互补序列具有SEQ ID NO:1所示的碱基序列;所述第一测序接头序列具有SEQ ID NO:2所示的碱基序列;所述第二测序接头序列具有SEQ ID NO:3所示的碱基序列。As a preferred technical solution of the present invention, the transposase recognition sequence complementary sequence has the base sequence shown in SEQ ID NO: 1; the first sequencing adapter sequence has the base sequence shown in SEQ ID NO: 2; and the second sequencing adapter sequence has the base sequence shown in SEQ ID NO: 3.
作为本发明的优选技术方案,所述试剂盒还包括转座酶,所述转座酶优选为转座酶Tn5,本发明一个具体实施例选用了Vazyme公司的Tagment Enzyme,但是其它这类转座酶也适用于本发明。As a preferred technical solution of the present invention, the kit further comprises a transposase, and the transposase is preferably transposase Tn5. In a specific embodiment of the present invention, Tagment Enzyme from Vazyme is used, but other transposases of this type are also applicable to the present invention.
作为本发明的优选技术方案,所述试剂盒还包括用于切口平移反应的DNA聚合酶,本发明一个具体实施例选用了Life Technologies公司的Platinum Pfx DNA聚合酶,但是其它这类DNA聚合酶也适用。DNA聚合酶能够通过切口平移(nick translation)反应补平转座酶打断DNA后的切口,利于后续测序的进行。As a preferred embodiment of the present invention, the kit also includes a DNA polymerase for nick translation. In one embodiment, Life Technologies' Platinum Pfx DNA polymerase is used, but other DNA polymerases are also suitable. The DNA polymerase can fill in the DNA nicks created by the transposase through nick translation, facilitating subsequent sequencing.
第一方面中的说明也适用于第二方面,二者并无实质差别,所以在此不再赘述。The description in the first aspect also applies to the second aspect. There is no substantial difference between the two, so I will not go into details here.
需要说明的是,本发明中“第一”、“第二”等概念仅用于区分不同的表述对象,并能理解为有技术含义或有顺序限定的含义。It should be noted that the concepts of "first" and "second" in the present invention are only used to distinguish different objects of expression and can be understood as having technical meanings or meanings with sequential limitations.
根据本发明的第三方面,本发明提供如第二方面所述的试剂盒在测序文库的构建并通过测序进行染色体数目异常检测中的应用,优选在单细胞染色体数目异常检测中的应用。According to the third aspect of the present invention, the present invention provides the use of the kit as described in the second aspect in the construction of a sequencing library and the detection of abnormal chromosome numbers by sequencing, preferably in the detection of abnormal chromosome numbers in single cells.
相比现有技术,本发明的优势体现在:本发明的测序文库的构建方法使用转座酶识别序列互补序列、第一测序接头序列和第二测序接头序列,其中第二测序接头序列含有一段特别的样品标签序列作为样品的标签信息,使用转座酶能够实现一步法打断DNA并在5’端和3’端同时加入不同的测序接头,不需要像现有的基于转座酶的DNA打断方法那样通过DNA聚合酶等核酸修饰酶催化添加3’端标签序列。本发明的测序接头中包含样品标签信息,能够同时实现不同来源样品的测序,进一步实现染色体数目异常检测。本发明的方法比常规文库构建方法节省时间,可以节约文库构建耗时近5个小时,并且操作简单,对实验设备和反应条件的要求较低,利于新一代测序检测单细胞或微量DNA技术的推广应用。Compared with the existing technology, the advantages of the present invention are reflected in: the method for constructing the sequencing library of the present invention uses a transposase recognition sequence complementary sequence, a first sequencing adapter sequence and a second sequencing adapter sequence, wherein the second sequencing adapter sequence contains a special sample label sequence as the label information of the sample, and the use of transposase can achieve a one-step DNA shearing method and add different sequencing adapters at the 5' end and 3' end at the same time, without the need for DNA polymerase and other nucleic acid modification enzymes to catalyze the addition of 3' end label sequences as in the existing transposase-based DNA shearing method. The sequencing adapter of the present invention contains sample label information, which can realize the sequencing of samples from different sources at the same time, and further realize the detection of abnormal chromosome numbers. The method of the present invention saves time compared to conventional library construction methods, can save nearly 5 hours of library construction time, and is simple to operate, has low requirements for experimental equipment and reaction conditions, and is conducive to the promotion and application of next-generation sequencing detection of single cells or trace DNA technology.
附图说明BRIEF DESCRIPTION OF THE DRAWINGS
图1为传统的PF文库构建流程(左)和本发明的转座酶建库流程(右)对比图。FIG1 is a comparison diagram of the traditional PF library construction process (left) and the transposase library construction process of the present invention (right).
图2为本发明利用转座酶一步完成DNA打断和测序接头连接的原理示意图。FIG2 is a schematic diagram showing the principle of using transposase to complete DNA fragmentation and sequencing adapter ligation in one step according to the present invention.
图3为本发明中的样本S1采用本发明方法(a和c)和常规建库方法(b和d)建库测序分析得到的核型图(a和b)和结果峰图(c和d)比较。FIG3 is a comparison of the karyotype diagram (a and b) and the result peak diagram (c and d) obtained by sequencing analysis of the sample S1 in the present invention using the method of the present invention (a and c) and the conventional library construction method (b and d).
图4为本发明中的样本S2采用常规建库方法(a和c)和本发明方法(b和d)建库测序分析得到的核型图(a和b)和结果峰图(c和d)比较。Figure 4 is a comparison of the karyotype diagram (a and b) and the result peak diagram (c and d) obtained by sequencing analysis of sample S2 in the present invention using the conventional library construction method (a and c) and the method of the present invention (b and d).
具体实施方式DETAILED DESCRIPTION
下面通过具体实施方式结合附图对本发明作进一步详细说明。The present invention will be further described in detail below through specific embodiments with reference to the accompanying drawings.
如图1所示,传统的PF文库构建流程(图1中左图)包括Covaris打断仪打断DNA、末端修复、加接头、质量检测、混库(pooling)、缺口平移、再次质量检测和上机等步骤;而本发明的转座酶建库流程(图1中右图)包括转座反应混合液配置、转座反应(DNA片段化同时加接头)、质量检测、混库(pooling)、缺口平移、再次质量检测和上机等步骤。可见,本发明的转座反应一步代替了传统的Covaris打断仪打断DNA、末端修复和加接头三个步骤,明显节省了时间。As shown in Figure 1, the traditional PF library construction process (left figure in Figure 1) includes steps such as DNA fragmentation by Covaris fragmentation instrument, end repair, adapter addition, quality inspection, library pooling, gap translation, re-quality inspection, and loading; while the transposase library construction process of the present invention (right figure in Figure 1) includes steps such as transposition reaction mixture preparation, transposition reaction (DNA fragmentation and adapter addition), quality inspection, library pooling, gap translation, re-quality inspection, and loading. It can be seen that the transposition reaction of the present invention replaces the three steps of DNA fragmentation, end repair, and adapter addition by the traditional Covaris fragmentation instrument in one step, which significantly saves time.
如图2所示,本发明利用转座酶一步完成DNA打断和测序接头连接的原理为:转座酶识别序列-反向(ME-r,即转座酶识别序列互补序列)分别与测序接头序列1(即第一测序接头序列)和带标签的测序接头序列2(即第二测序接头序列,其中标签即样品标签序列)退火形成接头,然后将接头与转座酶包埋形成转座酶包埋复合物,再将该转座酶包埋复合物与基因组DNA或扩增产物孵育进行转座打断得到两端带有双接头的DNA片段,通过延伸(切口平移反应)得到DNA文库;然后通过乳液PCR生成单链,用于测序。As shown in Figure 2, the principle of the present invention using transposase to complete DNA shearing and sequencing adapter ligation in one step is as follows: the transposase recognition sequence-reverse (ME-r, i.e., the transposase recognition sequence complementary sequence) is annealed with sequencing adapter sequence 1 (i.e., the first sequencing adapter sequence) and tagged sequencing adapter sequence 2 (i.e., the second sequencing adapter sequence, where the tag is the sample tag sequence) to form adapters, which are then embedded with the transposase to form a transposase-embedded complex. The transposase-embedded complex is then incubated with genomic DNA or amplified products for transposition shearing to obtain DNA fragments with double adapters at both ends. A DNA library is obtained by extension (nick translation reaction); and single-stranded DNA is then generated by emulsion PCR for sequencing.
下面通过具体实施例详细说明本发明。The present invention is described in detail below through specific embodiments.
1、样本选择和全基因组扩增1. Sample selection and whole genome amplification
选择已知核型的人类淋巴细胞细胞系样本8例,包括非整倍体,片段缺失/重复大小不同的样本(其中最小为1.9Mb左右)。将其培养至最佳状态时,挑取单个细胞或细胞团,完成全基因组扩增并用Nanodrop分光光度计做DNA定量,Sigma Aldrich公司的GenomePlexWGA和New England Biolabs公司的Sureplex两种全基因扩增试剂盒平行扩增,每种细胞系设置单个细胞和多个细胞组,共32例全基因组扩增产物样本。Eight human lymphocyte cell lines with known karyotypes were selected, including samples with aneuploidy and deletions/duplications of varying sizes (the smallest of which was approximately 1.9 Mb). When these cells reached optimal growth, single cells or cell clusters were isolated for whole-genome amplification, and DNA quantification was performed using a Nanodrop spectrophotometer. Two whole-genome amplification kits, GenomePlex WGA from Sigma Aldrich and Sureplex from New England Biolabs, were used in parallel for amplification. Both single-cell and multi-cell groups were used for each cell line, resulting in a total of 32 whole-genome amplification product samples.
每例样本取100ng的DNA来完成本发明转座酶文库构建。100 ng of DNA was collected from each sample to complete the construction of the transposase library of the present invention.
另外每例样本再取100ng的DNA用Life Technologies公司官方网站公布的标准文库构建方法完成,作为对照。具体方法步骤参见Life Technologies公司官方网站(http://www.lifetechnologies.com)。In addition, 100 ng of DNA was collected from each sample and constructed using the standard library construction method published on the official website of Life Technologies as a control. For detailed procedures, please refer to the official website of Life Technologies (http://www.lifetechnologies.com).
2、接头制备2. Joint preparation
合成如下接头:Synthesize the following linker:
ME-r:5'-CTGTCTCTTATACACATCT-3'(SEQ ID NO:4);ME-r: 5'-CTGTCTCTTATACACATCT-3' (SEQ ID NO: 4);
P1:5'-CCACTACGCCTCCGCTTTCCTCTCTATGGGCAGTCGGTGATP1: 5'-CCACTACGCCTCCGCTTTCCTCTCTATGGGCAGTCGGTGAT
AGATGTGTATAAGAGACAG-3'(SEQ ID NO:5);AGATGTGTATAAGAGACAG-3' (SEQ ID NO: 5);
PA_1:5'-CCATCTCATCCCTGCGTGTCTCCGACTCAGCTAAGGTA PA_1: 5'-CCATCTCATCCCTGCGTGTCTCCGACTCAG CTAAGGTA
ACGATAGATGTGTATAAGAGACAG-3'(SEQ ID NO:6); AC GATAGATGTGTATAAGAGACAG-3' (SEQ ID NO: 6);
PA_2:5'-CCATCTCATCCCTGCGTGTCTCCGACTCAGTAAGGAGA PA_2: 5'-CCATCTCATCCCTGCGTGTCTCCGACTCAG TAAGGAGA
ACGATAGATGTGTATAAGAGACAG-3'(SEQ ID NO:7); AC GATAGATGTGTATAAGAGACAG-3' (SEQ ID NO: 7);
PA_3:5'-CCATCTCATCCCTGCGTGTCTCCGACTCAGAAGAGGAT PA_3: 5'-CCATCTCATCCCTGCGTGTCTCCGACTCAG AAGAGGAT
TCGATAGATGTGTATAAGAGACAG-3'(SEQ ID NO:8); TC GATAGATGTGTATAAGAGACAG-3' (SEQ ID NO: 8);
PA_4:5'-CCATCTCATCCCTGCGTGTCTCCGACTCAGTACCAAGA PA_4: 5'-CCATCTCATCCCTGCGTGTCTCCGACTCAG TACCAAGA
TCGATAGATGTGTATAAGAGACAG-3'(SEQ ID NO:9); TC GATAGATGTGTATAAGAGACAG-3' (SEQ ID NO: 9);
PA_5:5'-CCATCTCATCCCTGCGTGTCTCCGACTCAGCAGAAGGA PA_5: 5'-CCATCTCATCCCTGCGTGTCTCCGACTCAG CAGAAGGA
ACGATAGATGTGTATAAGAGACAG-3'(SEQ ID NO:10); AC GATAGATGTGTATAAGAGACAG-3' (SEQ ID NO: 10);
PA_6:5'-CCATCTCATCCCTGCGTGTCTCCGACTCAGCTGCAAGT PA_6: 5'-CCATCTCATCCCTGCGTGTCTCCGACTCAG CTGCAAGT
TCGATAGATGTGTATAAGAGACAG-3'(SEQ ID NO:11); TC GATAGATGTGTATAAGAGACAG-3' (SEQ ID NO: 11);
PA_7:5'-CCATCTCATCCCTGCGTGTCTCCGACTCAGTTCGTGAT PA_7: 5'-CCATCTCATCCCTGCGTGTCTCCGACTCAG TTCGTGAT
TCGATAGATGTGTATAAGAGACAG-3'(SEQ ID NO:12); TC GATAGATGTGTATAAGAGACAG-3' (SEQ ID NO: 12);
PA_8:5'-CCATCTCATCCCTGCGTGTCTCCGACTCAGTTCCGATA PA_8: 5'-CCATCTCATCCCTGCGTGTCTCCGACTCAG TTCCGATA
ACGATAGATGTGTATAAGAGACAG-3'(SEQ ID NO:13)。 AC GATAGATGTGTATAAGAGACAG-3' (SEQ ID NO: 13).
注:其中下划线部分为样品标签序列,在反应中,不同的样品选用带不同标签的PA_N以示区别。Note: The underlined part is the sample tag sequence. In the reaction, different samples are selected with different tags PA_N to distinguish them.
用退火缓冲液,将ME-r、P1、PA_1~8溶解至100μM。ME-r, P1, and PA_1-8 were dissolved to 100 μM using annealing buffer.
注:退火缓冲液配制方法如下,精确称量1.21g Tris-base(100mM),5.844gNaCl(1000mM),0.372g EDTA2Na(10mM),混合加入超纯水至终体积100mL,充分溶解,混合均匀,即配制成为10×退火缓冲液。Note: Annealing buffer is prepared as follows: accurately weigh 1.21 g Tris-base (100 mM), 5.844 g NaCl (1000 mM), and 0.372 g EDTA 2 Na (10 mM), add ultrapure water to a final volume of 100 mL, fully dissolve, and mix well to prepare 10× annealing buffer.
在200μL的PCR管中按下表配制反应体系(表1):Prepare the reaction system in a 200 μL PCR tube according to the following table (Table 1):
表1Table 1
将配制好的反应1和反应2中1~8共9份,分别涡旋振荡充分混匀,并短暂离心。置于PCR仪内按如下反应程序完成反应(表2):Vortex and mix thoroughly, then briefly centrifuge each of the nine prepared aliquots from reactions 1 and 2 (1-8). Place the aliquots in a PCR instrument and complete the reaction according to the following protocol (Table 2):
表2Table 2
反应结束后,将反应1和反应2的1~8分别等体积混合,混匀,分别命名为“退火接头混合液1~8”,-20℃保存。After the reaction is completed, equal volumes of 1 to 8 of reaction 1 and reaction 2 are mixed and homogenized, and the mixtures are named "annealing linker mixtures 1 to 8" and stored at -20°C.
3、退火接头混合液-转座酶包埋3. Annealing adapter mixture-transposase embedding
在8个200μL的PCR管中按下表分别配制反应体系(表3):Prepare the reaction system in eight 200 μL PCR tubes according to the following table (Table 3):
表3Table 3
注:实施例中所用转座酶为Vazyme公司生产的Tagment Enzyme,规格为(10U/μL);包埋缓冲液为转座酶配套试剂,同为Vazyme公司生产。Note: The transposase used in the examples is Tagment Enzyme produced by Vazyme, with a specification of (10U/μL); the embedding buffer is a supporting reagent for the transposase, also produced by Vazyme.
用移液器轻轻吹打至少20次充分混匀。Mix thoroughly by pipetting gently at least 20 times.
将配制好的反应体系置于PCR仪上30℃反应1小时,反应产物分别命名为“转座反应混合液1~8”,置于-20℃保存。The prepared reaction system was placed on a PCR instrument at 30°C for 1 hour. The reaction products were named "Transposition Reaction Mixtures 1 to 8" and stored at -20°C.
4、DNA片段化并加入测序接头4. DNA fragmentation and addition of sequencing adapters
于室温解冻转座反应缓冲液,上下颠倒混匀后备用。Thaw the transposition reaction buffer at room temperature, mix by inverting the tube until ready for use.
分别在8个PCR管中分别配制如下反应体系(表4):Prepare the following reaction systems (Table 4) in 8 PCR tubes respectively:
表4Table 4
注:转座反应缓冲液为转座酶配套试剂,同为Vazyme公司生产。Note: Transposition reaction buffer is a supporting reagent for transposase, also produced by Vazyme.
用移液器轻轻吹打至少20次充分混匀。Mix thoroughly by pipetting gently at least 20 times.
将混好的反应体系置于PCR仪上按如下程序进行反应(表5):Place the mixed reaction system on a PCR instrument and perform the reaction according to the following procedure (Table 5):
表5Table 5
反应完成后,取出PCR管,用1.5倍体积的Ampure XP Beads进行纯化,溶25μL的EB。After the reaction is completed, remove the PCR tube, purify with 1.5 times the volume of Ampure XP Beads, and dissolve in 25 μL of EB.
5、文库混合5. Library Pooling
将上一步骤得到的8份产物各取3μL等体积混合,得到24μL混合液。Take 3 μL of each of the 8 products obtained in the previous step and mix them in equal volumes to obtain 24 μL of a mixed solution.
6、切口平移6. Cut translation
在PCR管中配制如下反应体系(表6):Prepare the following reaction system in a PCR tube (Table 6):
表6Table 6
注:扩增酶为Life Technologies公司的Platinum Pfx DNA聚合酶,扩增缓冲液为配套试剂。Note: The amplification enzyme is Platinum Pfx DNA polymerase from Life Technologies, and the amplification buffer is the matching reagent.
用移液器轻轻吹打10次充分混匀。Mix thoroughly by pipetting gently 10 times.
将混合均匀的反应体系置于PCR仪上,72℃恒温孵浴20min。The mixed reaction system was placed on a PCR instrument and incubated at 72°C for 20 min.
反应完成后,取出PCR管,用1.2倍体积的Ampure XP Beads进行纯化,溶16μL的EB。After the reaction is completed, remove the PCR tube, purify with 1.2 times the volume of Ampure XP Beads, and dissolve in 16 μL of EB.
7、上机测序7. Sequencing
产物经文库检测合格后,使用Ion ProtonTM测序平台进行上机测序。After the product passed the library test, it was sequenced using the Ion Proton ™ sequencing platform.
8、测序后信息分析8. Post-sequencing information analysis
将8个样本上述流程测序得到的数据,连同常规建库得到的8份测序数据,同时按照如下流程进行信息分析:The data obtained from sequencing the 8 samples using the above process, together with the 8 sequencing data obtained from conventional library construction, were analyzed according to the following process:
1)提取有效数据:将bam格式的下机数据转换为比对软件所需的fastQ格式,并从读段(reads)的5’端截取50bp用于后续分析,在此基础上,再从其5’端切除20bp,以排除全基因组扩增(WGA)时引入的接头对后续分析的影响;1) Extract valid data: Convert the bam format off-machine data to the fastQ format required by the alignment software, and cut off 50 bp from the 5' end of the read for subsequent analysis. On this basis, cut off another 20 bp from the 5' end to eliminate the influence of the adapter introduced during whole genome amplification (WGA) on subsequent analysis;
2)序列比对:将截取后的reads与NCBI数据库中版本37.3(hg19;NCBIBuild37.3)的人类基因组参考序列用SOAPaligner/soap2进行比对;2) Sequence alignment: The extracted reads were aligned with the human genome reference sequence version 37.3 (hg19; NCBIBuild37.3) in the NCBI database using SOAPaligner/soap2;
3)Y染色体判断:根据Y染色体特异基因的支持数判断Y染色体是否存在;3) Y chromosome determination: Determine whether the Y chromosome exists based on the support number of Y chromosome-specific genes;
4)窗口划分:将人类基因组参考序列划分为100kb左右的窗口,并上下滑动20 kb;4) Window division: The human genome reference sequence is divided into windows of approximately 100 kb, and the windows are slid up and down by 20 kb;
5)GC含量校正:统计各窗口内的unique reads(即去重后的序列中在参考基因组上只有唯一比对位置的序列)数,并计算其GC含量(GC%),以各窗口中reads的GC%的中位数作为该窗口的GC%。分别将样本序列和参考序列上的各窗口按GC%(梯度为0.05)划分为不同校正单元,并计算各校正单元内不同窗口reads数的中位数(Mi),以此计算出各校正单元的校正系数,再算出各窗口校正后的Ratio值用于后续分析;5) GC content correction: Count the number of unique reads in each window (i.e., sequences with only one alignment position on the reference genome in the deduplicated sequence) and calculate their GC content (GC%). The median GC% of the reads in each window is used as the GC% of the window. Each window on the sample sequence and the reference sequence is divided into different correction units according to GC% (with a gradient of 0.05), and the median (Mi) of the number of reads in different windows within each correction unit is calculated. This is used to calculate the correction coefficient of each correction unit, and then the corrected Ratio value of each window is calculated for subsequent analysis;
6)断点筛查:将每个窗口视作一个点,对每一个点进行一次游程检验,以此得到初步的断点集,再对该断点集中的点进行多次筛选,确定最终的断点集;6) Breakpoint screening: Treat each window as a point and perform a run test on each point to obtain a preliminary breakpoint set. Then, perform multiple screening on the points in the breakpoint set to determine the final breakpoint set.
7)数据过滤及可视化:本发明中阳性信号(CNV)需满足三个条件:a)CNV片段不小于1M;b)P≤1e-10;c)Ratio≤0.7(缺失)或Ratio≥1.3(重复)。根据上述条件判断CNV,并画出其核型图以及各窗口Ratio值对应的峰图。7) Data Filtering and Visualization: In this method, a positive CNV signal must meet three conditions: a) CNV fragment size is not less than 1M; b) P ≤ 1e-10; c) Ratio ≤ 0.7 (deletion) or Ratio ≥ 1.3 (duplication). CNV is determined based on these conditions, and the corresponding karyotype and peak plot corresponding to the ratio value in each window are plotted.
9、结果分析9. Results Analysis
通过以上方法分析得到的结果如下表(表7)所示,本次检测的样本共计8例,本发明方法检测的结果与已知结果和常规建库得到的结果分别做对比,结果完全一致。The results obtained by the above analysis are shown in the following table (Table 7). A total of 8 samples were tested this time. The results of the detection method of the present invention were compared with the known results and the results obtained by conventional library construction, and the results were completely consistent.
表7Table 7
图3显示了本发明中的样本S1采用本发明方法和常规建库方法建库测序分析得到的核型图和结果峰图比较。其中,图3a为采用本发明方法得到的核型图;图3b为采用常规建库方法得到的核型图;图3c为采用本发明方法得到的结果峰图;图3d为采用常规建库方法得到的结果峰图。Figure 3 shows a comparison of the karyotypes and peak plots obtained for sample S1 using the present invention's method and conventional library construction methods. Figure 3a shows the karyotype obtained using the present invention's method; Figure 3b shows the karyotype obtained using the conventional library construction method; Figure 3c shows the peak plot obtained using the present invention's method; and Figure 3d shows the peak plot obtained using the conventional library construction method.
图4显示了本发明中的样本S2采用常规建库方法和本发明方法建库测序分析得到的核型图和结果峰图比较。其中,图4a为采用常规建库方法得到的核型图;图4b为采用本发明方法得到的核型图;图4c为采用常规建库方法得到的结果峰图;图4d为采用本发明方法得到的结果峰图。Figure 4 shows a comparison of the karyotypes and peak plots obtained for sample S2 using conventional library construction and sequencing analysis using the method of the present invention. Figure 4a shows the karyotype obtained using the conventional library construction method; Figure 4b shows the karyotype obtained using the method of the present invention; Figure 4c shows the peak plot obtained using the conventional library construction method; and Figure 4d shows the peak plot obtained using the method of the present invention.
图3和图4所示的结果显示,本发明方法检测的结果与常规建库方法得到的结果完全一致。说明本发明方法能够在保证结果真实性的前提下,大大简化建库程序,缩短建库时间。The results shown in Figures 3 and 4 demonstrate that the results obtained by the method of the present invention are completely consistent with those obtained by conventional library construction methods, indicating that the method of the present invention can greatly simplify the library construction process and shorten the library construction time while ensuring the authenticity of the results.
以上内容是结合具体的实施方式对本发明所作的进一步详细说明,不能认定本发明的具体实施只局限于这些说明。对于本发明所属技术领域的普通技术人员来说,在不脱离本发明构思的前提下,还可以做出若干简单推演或替换。The above is a further detailed description of the present invention in conjunction with specific embodiments, and the specific implementation of the present invention cannot be considered to be limited to these descriptions. For ordinary technicians in the technical field to which the present invention belongs, several simple deductions or substitutions can be made without departing from the concept of the present invention.
Claims (21)
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201410525219.3A CN105525357B (en) | 2014-09-30 | 2014-09-30 | The construction method and kit of a kind of sequencing library and application |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| HK1223660A1 HK1223660A1 (en) | 2017-08-04 |
| HK1223660B true HK1223660B (en) | 2019-09-06 |
Family
ID=
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN105525357B (en) | The construction method and kit of a kind of sequencing library and application | |
| JP7542672B2 (en) | Methods and compositions for analyzing nucleic acids | |
| AU2019250200B2 (en) | Error Suppression In Sequenced DNA Fragments Using Redundant Reads With Unique Molecular Indices (UMIs) | |
| US20230340590A1 (en) | Method for verifying bioassay samples | |
| CA3062174A1 (en) | Universal short adapters for indexing of polynucleotide samples | |
| JP2022036975A (en) | Rapid Sequencing of Short DNA Fragments Using Nanopore Technology | |
| CA3060369A1 (en) | Optimal index sequences for multiplex massively parallel sequencing | |
| WO2012068919A1 (en) | Dna library and preparation method thereof, and method and device for detecting snps | |
| CN103088433A (en) | Construction method and application of genome-wide methylation high-throughput sequencing library and | |
| AU2014362322A1 (en) | Methods for labeling DNA fragments to recontruct physical linkage and phase | |
| EP4172357B1 (en) | Methods and compositions for analyzing nucleic acid | |
| HK1223660B (en) | Sequencing library construction method, and kit and application thereof | |
| JP2025183345A (en) | Methods and compositions for analyzing nucleic acids | |
| HK40102784A (en) | Error suppression in sequenced dna fragments using redundant reads with unique molecular indices (umis) | |
| HK40040528A (en) | Error suppression in sequenced dna fragments using redundant reads with unique molecular indices (umis) | |
| HK40000941A (en) | Rapid sequencing of short dna fragments using nanopore technology |