WO2019090482A1 - 一种第二代高通量测序文库构建方法 - Google Patents
一种第二代高通量测序文库构建方法 Download PDFInfo
- Publication number
- WO2019090482A1 WO2019090482A1 PCT/CN2017/109770 CN2017109770W WO2019090482A1 WO 2019090482 A1 WO2019090482 A1 WO 2019090482A1 CN 2017109770 W CN2017109770 W CN 2017109770W WO 2019090482 A1 WO2019090482 A1 WO 2019090482A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- polynucleotide
- tail
- substrate
- tailing
- dna
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Images
Classifications
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N15/00—Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
- C12N15/09—Recombinant DNA-technology
- C12N15/10—Processes for the isolation, preparation or purification of DNA or RNA
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
-
- C—CHEMISTRY; METALLURGY
- C40—COMBINATORIAL TECHNOLOGY
- C40B—COMBINATORIAL CHEMISTRY; LIBRARIES, e.g. CHEMICAL LIBRARIES
- C40B50/00—Methods of creating libraries, e.g. combinatorial synthesis
- C40B50/06—Biochemical methods, e.g. using enzymes or whole viable microorganisms
Definitions
- the present invention relates to methods and kits for the construction of second generation high throughput sequencing libraries, and more particularly to methods and kits for the construction of high throughput sequencing libraries based on terminal transferases.
- the second-generation sequencing technology has a faster sequencing speed and higher throughput, which is in line with the current scientific and technological development requirements for sequencing.
- the second-generation sequencing technology platforms include Illumina's Hiseq, Miseq, Nextseq, Novaseq, and Life Technologies' SOLID system, PGM, Proton, and others.
- the technical idea of the second-generation sequencing technology is to synthesize and sequence, that is, to determine the DNA sequence according to the signal changes brought by the newly synthesized different bases.
- the Illumina sequencing platform detects the light signal
- the Life sequencing platform detects the acid-base change. The current changes.
- second-generation sequencing technologies become more mature, their clinical applications will become more widespread.
- Circulating DNA also known as free DNA
- free DNA is DNA that exists outside the cell in the blood.
- the main source of free DNA is apoptotic cells or bone marrow cells, and the DNA released by these cells is cleaved by nuclease in vivo to produce a small fragment DNA of about 166 bp in length (Y.M. Dennis Lo et al. Science Translational Medicine. 2010. 10: 61ra91).
- Free DNA is in a state of dynamic equilibrium in the body, so free DNA can be an important parameter for health assessment. Changes in tumorigenesis, organ transplantation, etc. can lead to changes in the properties of free DNA in peripheral blood. These properties include the length of free DNA, base information, and apparent modification. Therefore, free DNA can be used as an early diagnosis, monitoring, and prognostic evaluation of disease. An important marker of non-invasive detection.
- Methylation sequencing libraries need to be constructed prior to methylation sequencing of free DNA using second-generation high-throughput sequencing technology.
- the second generation of high-throughput methylation sequencing libraries is constructed by pre-library construction, including terminal fill-in, 5'-end phosphorylation, 3' end-suspension A and linker ligation steps; After the bisulfite treatment, the bisulfite treatment causes a large amount of DNA damage, and the template that can be finally sequenced accounts for less than 10% of the original template (Masahiko Shiraishi et al. 2004. 10: 409-415).
- the construction process of the methylation sequencing library needs 1) each step needs to be purified, and the operation is cumbersome; 2) the filling step will artificially introduce nucleotides and change the true methylation state; 3) a large amount of DNA template in hydrogen sulfite It is destroyed during salt treatment and is lost after PCR amplification. Therefore, it is necessary to develop a better method of building a library, which can reduce the damage of DNA damage caused by bisulfite.
- the present invention provides a method of tailing a deoxypolynucleotide substrate; and further provides a A second generation high throughput sequencing library construction method.
- the method of the present invention is applicable not only to normal DNA, but also to samples with severe damage such as FFPE samples, ancient DNA, and DNA samples after bisulfite treatment.
- the present invention provides a method of tailing a deoxypolynucleotide substrate, the method comprising the steps of: (1) mixing the deoxypolynucleotide substrate with a substance to form a first mixture: a) dGTP or dCTP nucleotide; b) terminal deoxynucleotidyl transferase; c) tail control component comprising a polynucleotide homopolymer of 5 to 20 nucleotides in length (abbreviated as 5b-20b) a tail-control region, wherein the polynucleotide homopolymer is complementary to a) nucleotide; (2) incubating the first mixture, a tailing reaction occurs at the 3' end of the deoxynucleotide substrate, at the substrate 3' The dGTP or dCTP polynucleotide is added to form a 3' tailing region of the substrate.
- the deoxypolynucleotide substrate is a double-stranded or single-stranded deoxynucleotide sequence; preferably a single-stranded deoxypolynucleotide substrate; preferably the polynucleotide homopolymer of the tail-control region is poly(dC) Homopolymer.
- the polynucleotide of the tail-control region is a heteropolymer sequence composed of dC and rC bases.
- the present invention provides a method for directly performing complementary strand synthesis on a tailed deoxypolynucleotide substrate, the method comprising: adding a ribonuclease RNase HII degradation after the tailing reaction of the step (2)
- the ribonucleotide in the polynucleotide homopolymer or the tail-controlling molecule of the tail-controlling component, the 3'-end free hydroxyl group is generated, and the tail-tailing component is used as a template for the complementary strand with the substrate having the 3' tailing region as a template
- Extending adding DNA polymerase and deoxynucleotide (including dATP, dTTP, dCTP and dGTP) to the first mixture to form a second mixture using step (3); and incubating the second step in step (4) a mixture, a nucleotide polymerization reaction occurs at the 3' end of the tail-control component complementary to the substrate after the tailing, synthesis A complementary
- a further embodiment of the present invention provides a method for synthesizing a complementary strand (hereinafter also referred to as "dual-chain synthesis") in a manner of strand displacement extension for a tailed deoxypolynucleotide substrate: step (3) -1): after the tailing reaction of the step (2), adding an extension primer, a DNA polymerase and a deoxynucleotide complementary to the substrate 3' tailing region to the first mixture to form a second mixture
- Step (4) incubating the second mixture, performing nucleotide polymerization at the 3' end of the extension primer to synthesize a complementary strand of the substrate to obtain a double-stranded deoxypolynucleotide; and step (5): from the The double-stranded deoxypolynucleotide is isolated in the second mixture.
- the two-chain synthesis is carried out by a degradative extension method, which first degrades the tail-control molecules in the tail-control component, and then adds the complementary 3' tailing region to the substrate.
- the primers, DNA polymerase and deoxynucleotides are extended to form a second mixture.
- a ligation step of adding a 5' sequencing linker to a double-stranded nucleotide substrate and step (6): adding 5' to the isolated double-stranded deoxypolynucleotide
- the linker and ligase are sequenced to form a third mixture, the third mixture is incubated, and the double-stranded deoxypolynucleotide is ligated to the 5' sequencing linker.
- kits comprising: a deoxynucleotide substrate, a dGTP or dCTP nucleotide, a terminal deoxynucleotidyl transferase, and a tail-control component, wherein the tail-control group
- the polynucleotide comprises a polynucleotide homopolymer of 5 to 20 nucleotides in length that is complementary to a dGTP or dCTP nucleotide.
- the kit can be used to control the tailing reaction of substrate nucleotides.
- kit of the present invention further comprises a DNA ligase or an RNA ligase.
- the kit of the present invention further comprises a DNA polymerase, and a deoxynucleotide comprising dATP, dTTP, dCTP and dGTP.
- kit of the present invention further comprises an extension primer.
- kit of the present invention further comprises at least one of RNase, USER enzyme and nicking enzyme.
- kit of the invention further comprises a 5' sequencing linker.
- the present invention effectively controls the length of the terminal transferase tailing at the 3' end of the polynucleotide substrate by designing a tail-control component. Further, the inventors have also found that the polynucleotide tail of the polynucleotide substrate can be linked to the linker while forming a double-stranded polynucleotide structure by annealing hybridization at a certain temperature with a poly region previously added to the tail-control component. By this method, the sequence of interest can be added very efficiently at the 3' end of the polynucleotide substrate, for example, the sequence of interest can be an priming sequence for sequencing sequencing for next generation sequencing.
- the present invention also finds that a terminal transferase adds a dGTP or dCTP core to a polynucleotide substrate.
- glycosidic acid forms a poly(dG) or (dC) tail
- the substrate utilization efficiency is significantly higher than the addition of dATP or dTTP nucleotides to form a poly(dA) or (dT) tail.
- the nucleic acid is denatured into a single strand, and after the substrate is tailed, the linker is ligated, and then the complementary strand extension of the polynucleotide substrate is completed, optionally the 3' end dA tail of the complementary strand is ligated, and the 5' linker is ligated. Enrichment, resulting in a library for next-generation sequencing.
- the library construction process can construct a whole genome methylation sequencing library for genomic DNA of as low as 10 ng human culture cell source, and obtain efficient sequencing results.
- FIG. 4 Schematic diagram of 5' sequencing linker
- Figure 5 Results of the addition of nucleotide homo-tails to a mixed single-stranded DNA polynucleotide substrate with different bases at the 3' end using TdT enzyme
- Figure 6 Results of the addition of nucleotide homo-tails to a mixed single-stranded DNA polynucleotide substrate containing a partial random sequence and having different bases at the 3' end using the TdT enzyme
- Figure 8 Fragment distribution results of methylated sequencing libraries constructed by controlled tailing of bisulfite-treated lambda DNA by TdT enzyme by polynucleotide homopolymer
- Figure 9 Experimental results of using a tail-controlling molecule to control the length of a poly(dG) tail added to a substrate by a TdT enzyme and attaching the poly(dG) tail of the substrate to the linker under the action of a ligase
- Figure 10 Experimental results of studying the effect of reaction time on the attachment of a substrate to a tail after the addition of a poly(dG) tail
- Figure 11 Fragment distribution results of a methylation sequencing library constructed by two-chain synthesis by degradation and strand displacement; wherein, Figure 11A and Figure 11B are respectively methylation sequencing libraries prepared by degradation and strand displacement. Fragment distribution.
- Figure 12 Fragment distribution results of a methylation sequencing library constructed by controlled tailing of bisulfite-treated ⁇ -DNA by TdT enzyme using a tail-tailing component with a stem-loop structure
- Figure 13 Fragment distribution results of a methylated sequencing library constructed by controlled tailing of bisulfite-treated ⁇ -DNA by TdT enzyme using tail-control components with different lengths of tail-controlling regions
- Figure 13A shows the fragment distribution results of the methylation sequencing library prepared by the tail-control component with a length of 5b poly(dC).
- Figure 13B shows the fragment distribution results of a methylation sequencing library prepared by a tail-control component with a tail length of 6b poly(dC).
- Figure 13C shows the fragment distribution results of a methylation sequencing library prepared by a tail-control component with a length of 7b poly(dC).
- Figure 13D shows the fragment distribution results of a methylation sequencing library prepared by a tail-control component with a tail length of 8b poly(dC).
- Figure 13E shows the fragment distribution results of a methylation sequencing library prepared by a tail-control component with a tail length of 9b poly(dC).
- Figure 13F shows the fragment distribution results of a methylation sequencing library prepared by a tail-control component with a length of 10b poly(dC).
- Figure 13G shows the fragment distribution results of a methylation sequencing library prepared by controlling the tail component of the 11b poly(dC)
- Figure 13H shows the fragment distribution results of a methylation sequencing library prepared by controlling the tail component of the length 12b poly(dC).
- Figure 13I shows the fragment distribution results of a methylation sequencing library prepared by the tail-control component with a length of 13b poly(dC).
- Figure 13J shows the fragment distribution results of a methylation sequencing library prepared by a tail-control component with a length of 20b poly(dC).
- Figure 14 Fragment distribution results of a human genome methylation sequencing library constructed based on the method of the present invention and a conventional method; wherein, Fig. 14A and Fig. 14B are respectively a human genome methylation sequencing library constructed based on the method of the present invention and a conventional method; Fragment distribution results.
- Polynucleotide substrates are polynucleotide substrate fragments that require tailing reactions and/or library construction.
- the polynucleotide substrate is single stranded or double stranded DNA.
- the polynucleotide substrate is a chemically treated nucleotide sequence including, but not limited to, a bisulfite treated polynucleotide.
- the polynucleotide substrate can be of natural origin or synthetic.
- a natural source is a polynucleotide sequence from a prokaryote or eukaryote, such as a human, mouse, virus, plant or bacterium.
- the polynucleotide substrate of the present invention may also be a severely damaged sample such as FFPE sample, ancient DNA, or bisulfite-treated DNA.
- the polynucleotide substrate is tailed and can be used in assays involving microarrays and to generate libraries for next generation nucleic acid sequencing.
- the tailed polynucleotide substrate can also be used for efficient cloning of polynucleotide sequences.
- the polynucleotide substrate is single stranded or double stranded and comprises a 3' terminal free hydroxyl group. In some In aspect, the polynucleotide substrate is double stranded and comprises blunt ends. In other aspects, the double stranded polynucleotide substrate comprises a 3' recessed end.
- the length of the protruding or recessed end of the polynucleotide substrate can vary. In various aspects, the length of the protruding or recessed end of the polynucleotide substrate is 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 or more nucleotides.
- the polynucleotide substrate is between about 10 and about 5000 nucleotides in length, or between about 40 and about 2000 nucleotides, or between about 50 and about Between 1000 nucleotides, or between about 100 and about 500 nucleotides. In a further aspect, the polynucleotide substrate is at least 3 up to about 50, 100 or 1000 nucleotides in length.
- the present invention controls the tail length and efficiency of the polynucleotide substrate by the addition of a tailing component.
- the tail-control component comprises a tail-control region consisting of a polynucleotide homopolymer of dGTP or dCTP, for example, the tail-control region may be a polynucleotide homopolymer of dGTP or dCTP of 5-13 nucleotides in length.
- Polynucleotide homopolymers are polynucleotide chains joined by the same nucleotide.
- the tailing region of the invention is preferably a poly(dC) or poly(dG) nucleotide homopolymer sequence; and a heteropolymer sequence comprising: (i) dC and rC nucleotides, or (ii) dG With rG nucleotides.
- the polynucleotide homopolymer of the tail-control region of the invention has a length of 5-20 nucleotides, preferably 7-20, 9-20 nucleotides, further preferably 5-10 nucleotides, 7-10 nucleotides, more preferably 7-9 nucleotides.
- a certain length of dGTP or dCTP polynucleotide homopolymer can effectively control the tailing of the polynucleotide substrate to about 20 nucleotides.
- the tailing component is a polynucleotide homopolymer, that is, the tailing component comprises only the tailing zone.
- the tail-controlling component is a tail-controlling molecular sequence joined by a tail-control region and an X-region (also referred to as an X-region sequence, or a sequencing linker sequence of a tail-control component), such as the invention.
- X-region sequence also referred to as an X-region sequence, or a sequencing linker sequence of a tail-control component
- SEQ ID NO: 11 SEQ ID NO: 12, SEQ ID NO: 13, SEQ ID NO: 14, SEQ ID NO: 15, SEQ ID NO: 16, SEQ ID NO: 17, SEQ ID NO: 18.
- SEQ ID NO: 19 and SEQ ID NO: 20 are examples of SEQ ID NO: 20.
- an "X region sequence” provides a priming sequence for amplification or sequencing of a nucleic acid fragment, or a marker sequence for distinguishing between different substrate molecules, and in some aspects for next generation sequencing applications.
- the X region sequence may be, but is not limited to, a Next Generation Sequencing (NGS) linker sequence compatible with Illumina, Ion Torrent, Roche 454 or SOLiD sequencing platforms, such as the Illumina Truseq shown in Table 7. Library sequence.
- the X region sequence can be a DNA sequence, an RNA sequence or a hybrid sequence comprising DNA and RNA.
- the tail-control component is a tail-control zone and an X-zone, and a linker sequence that is complementary to the X-region (see Figure 1), and the tail-control component is also referred to as a "tail-control connector".
- a linker having only a sequence complementary to the X region is Called “short link", as shown in Table 2.
- a linker sequence including an extended primer binding region is referred to as a "long linker" as shown in Table 3.
- the tail-control component may be the same polynucleotide molecule, and the polynucleotide forms a stem-loop structure to generate a partially double-stranded polynucleotide in which the X region is complementary to the linker sequence. In this stem-loop configuration, the polynucleotide is the same
- the polymer moiety is a single molecule; the tail-control component can also be two separate polynucleotide molecules that hybridize to each other, ie, the linker sequence is a polynucleotide molecule capable of complementing the X region sequence.
- the tail-control component of the present invention is preferably a double-stranded tail-control component selected from Table 2 or Table 3.
- the polynucleotide of the tailing component comprises a peptide nucleic acid, a Schizophyllan polysaccharide, a locked nucleic acid, and combinations thereof.
- the invention provides compositions wherein the tailing component is single stranded or at least partially double stranded.
- “Partially double-stranded” means that the tail-control component comprises a single-stranded portion and a double-stranded portion.
- the partially double stranded tailing component is hybridized to the linker molecule by the X region sequence of the tail control molecule (in some embodiments, the linker sequence is The NGS linker sequence is produced, and a partial sequence of the tail-controlling molecule is complementary to the linker molecule.
- hybridization occurs within a single tail-control molecule that forms a hairpin structure, such that the tail-control component that forms part of the double-stranded structure can be either a single molecule or a multi-molecular structure.
- the tailing component comprises a blocking group.
- a blocking group as used herein is a moiety that prevents extension by an enzyme. If there is no blocking group, the enzyme is able to synthesize the polynucleotide by adding nucleotides. Blocking groups include, but are not limited to, phosphate groups, carbon triarms, dideoxynucleotides, ribonucleotides, amino groups, and reverse deoxythymidine.
- the 5' end of the tailing component linker is phosphorylated, with a blocking group at the 3' end and a 3' blocking group at the tailing region.
- the term "tailing” as used herein may be interchanged with the term “controlled tailing.”
- the present invention provides a method of tailing a deoxypolynucleotide substrate for adding a desired amount of dGTP or dCTP nucleotide to the 3' end of the polynucleotide substrate in a controlled manner.
- the tail-control component comprises a polynucleotide homopolymer of 5 to 20 nucleotides in length, a newly added nucleoside of the tail-control component and the substrate.
- the acid and polytail sequences form a double-stranded structure, thus reducing the rate of polymerization and allowing the tail of the polynucleotide substrate to be controlled over a range of lengths (see Figure 2).
- the TdT enzyme is used to control the TdT enzyme to add a poly(dG) tail at the 3' end of the polynucleotide substrate (also known as a nucleus) using a tail-control component comprising a poly(dC) nucleotide homomeric sequence. Glycosylate (dG) homopolymeric tail). Further, the substrate of the polynucleotide The poly(dG) tail is linked to the linker of the tail-control component to form the 3' end plus tail of the substrate.
- the tail-control component comprises 5-20, preferably 5-13, further preferably 7-10, more preferably 7-9 poly(dC)-containing nucleotide homopolymers Sequence of objects.
- the molar concentration ratio of the polynucleotide substrate to the tail-control component ranges from 1:1 to 1:100, preferably from 1:5 to 1:50.
- the pH of the tailing reaction ranges from about 5.0 to about 9.0; the molar ratio of polynucleotide substrate to mononucleotide ranges from 1:10-1:20000, preferably 1:100-1: 2000; incubation time is from 1 minute to 120 minutes, preferably from 0.5 to 60 minutes, from 0.5 to 30 minutes, from 1 to 20 minutes, from 1 to 15 minutes or from 1 to 10 minutes.
- the extension reaction is further carried out using the deoxypolynucleotide substrate as a template (see the library construction of FIG. 2 and the two-chain synthesis of FIG. 3).
- RNase is added to degrade the polynucleotide homopolymer of the tailing component to produce a 3' hydroxyl group; DNA polymerase and deoxynucleotide are added, and the deoxynucleotide base is Incubate with the tail control component.
- This method generates a nucleotide extension reaction at the 3' end of the tail-control component to synthesize a complementary strand of the deoxypolynucleotide substrate to obtain a double-stranded deoxypolynucleotide.
- the incubation time for the extension reaction is from 1 minute to 60 minutes, preferably from 1 to 30 minutes, from 1 to 20 minutes, from 1 to 15 minutes, or from 1 to 10 minutes.
- the present invention first degrades the tail-controlling molecule, and then adds the extension primer to synthesize the double-stranded deoxypolynucleotide using the polynucleotide substrate as a template.
- the tail-control molecule comprises a ribonucleotide and is incubated with a ribonuclease (RNase) under conditions of sufficient activity thereof, followed by incubation at a temperature above 80 ° C. The substrate is separated.
- RNase ribonuclease
- the ribonuclease is selected from at least one of RNase H, RNase HII, RNase A, and RNase T1.
- the tail control molecule comprises dU nucleotides and can be incubated by dU glycosylation enzyme followed by incubation at a temperature above 80 ° C, or with dU glycosylase and depurination/depyrimidine nucleic acid A mixture of endonucleases, such as incubation with the USER enzyme, is degraded.
- the double-stranded region of the tail-control component comprises a specific sequence recognizable by a nicking endonuclease, and can be cleaved by a nicking enzyme such as Nt.BspQI, followed by a nick at 80 ° C or higher. Incubation at temperature separates the tail molecules from the substrate.
- an extension primer is added, and the tail molecule is separated from the substrate polynucleotide substrate by strand displacement, and the extension reaction is carried out using the deoxypolynucleotide substrate as a template.
- the method comprises: after the tailing reaction, adding an extension primer complementary to the 3' tailing region of the substrate, a DNA polymerase and a deoxynucleotide, and incubating with the substrate nucleotide substrate; 3' of the extension primer A nucleotide extension reaction occurs at the end to synthesize a complementary strand of the substrate to obtain a double-stranded deoxypolynucleotide.
- the DNA polymerase has strand displacement activity, for DNA aggregation Under the condition that the strand displacement activity of the synthase and the DNA polymerization activity are sufficient, the tail-control molecule can be separated from the polynucleotide substrate by strand displacement while completing the extension reaction.
- DNA polymerases with strand displacement activity useful in the practice of this patent include, but are not limited to, large fragment Bst DNA polymerase, Bst 3.0 DNA polymerase, Klenow large fragment, phi29 DNA polymerase.
- the tailing molecule polynucleotide in the tailing component is first degraded, and the extension primer is added for extension reaction.
- the method comprises: after the tailing reaction, adding RNase to degrade the tail-controlling polynucleotide in the tail-control component, and then adding an extension primer complementary to the 3' tailing region of the substrate, DNA polymerase and deoxynucleotide Incubation with a substrate nucleotide substrate; a nucleotide extension reaction occurs at the 3' end of the extension primer to synthesize a complementary strand of the substrate to obtain a double-stranded deoxy polynucleotide.
- the DNA polymerase in the extension reaction has 3'-5' exonuclease activity (corrected activity), resulting in a blunt-ended double-stranded structure upon extension; in other aspects, the DNA polymerase lacks 3 '-5' exonuclease activity (corrected activity), after the extension is completed, a double-stranded structure in which dA is prominent is obtained.
- DNA polymerases useful in the present patent include, but are not limited to, the following species or combinations thereof: full length Bst DNA polymerase, KAPA high fidelity hot start DNA polymerase, KAPA high fidelity hot start Uracil+ DNA polymerase, Bst 3.0DNA polymerase, Phusion TM High fidelity DNA polymerase, Hot Start Taq DNA polymerase, Ex Taq DNA polymerase, Deep Vent R TM DNA polymerase, T4 DNA polymerase, Klenow large fragment.
- the 5' sequencing linker provides priming sequences for amplification or sequencing of nucleic acid fragments and is used in some aspects for next generation sequencing applications (see Figure 4).
- the 5' sequencing linker is formed by annealing two polynucleotide strands.
- the 5' linker sequence is selected from, but not limited to, a Next Generation Sequencing (NGS) linker sequence compatible with Illumina, Ion Torrent, Roche 454 or SOLiD sequencing platforms, such as the Illumina Truseq library sequence shown in Table 7.
- the linker sequence may be a DNA sequence, an RNA sequence or a heteropolymer sequence comprising DNA and RNA, such as the 5' sequencing linker shown in Table 4.
- the 5' sequencing linker is ligated with the double-stranded polynucleotide substrate obtained after the extension reaction of step (4), and the structure of the terminus can be either a blunt end or a sticky end.
- Sticky ends include, but are not limited to, dt sticky ends.
- the double-stranded deoxypolynucleotide undergoes a ligation reaction with the 5'-sequencing linker.
- the 5' linker is only joined to the 3' end of the complementary strand of the polynucleotide substrate.
- the 5' linker is only joined to the 5' end of the polynucleotide substrate.
- the 5' linker is joined to the 5' end of the polynucleotide substrate and the 3' end of the complementary strand joining the polynucleotide substrate.
- a dA cohesive end is added to the 3' end of the polynucleotide substrate.
- the DNA polymerase lacking 3'-5' exonuclease activity (correcting activity) is used to continue adding the dA cohesive ends after completion of the extended duplex reaction.
- the DNA polymerase having 3'-5' exonuclease activity separates the substrate from the DNA polymerase after the blunt end is generated after completion of the extended strand reaction, and then the lack of 3 The '-5' exonuclease activity (corrected activity) of the DNA polymerase and the nucleotide are tailed, and a sticky end is added.
- a DNA polymerase lacking 3'-5' exonuclease activity (correcting activity) can be, but is not limited to, Klenow (deficient 3'-5' exonuclease activity), Taq DNA polymerase.
- Ligase enzymes useful in the methods of the invention may be DNA ligases and RNA ligases including, but not limited to, T4 DNA ligase, E. coli DNA ligase, T7 DNA ligase, and T4 RNA ligase.
- the ligase of the invention links the linker in the tail-control component to the substrate-tailed dGTP or dCTP polynucleotide.
- the ligase of the invention links a 5' sequencing linker to a synthetic double stranded deoxy polynucleotide.
- the polynucleotide substrate is purified. Purification of the polynucleotide substrate is carried out by any method known and understood by those skilled in the art.
- Purification of the polynucleotide substrate of the present invention can be carried out by adding magnetic beads whose surface is carboxyl modified. In other embodiments, purification of the polynucleotide substrate is carried out by column purification and precipitation.
- Phos phosphoric acid
- C3Spacer carbon 3 arm
- rC cytosine ribonucleotide
- rG guanine ribonucleotide
- 5mC 5-methyl-cytosine deoxynucleotide
- D dA, dT or dG Nucleotide
- N dA, dT, dC or dG nucleotide
- Phos phosphoric acid
- C3 Spacer carbon 3 arm
- rC cytosine ribonucleotide
- Phos phosphoric acid
- C3 Spacer carbon 3 arm
- rC cytosine ribonucleotide
- Phos phosphoric acid
- 5mC 5-methyl-cytosine deoxynucleotide
- nucleotide homo-tails to a mixed single-stranded DNA polynucleotide substrate having different bases at the 3' end using a TdT enzyme
- TdT enzyme Enzymatics, catalog number P7070L, 20U/ ⁇ L
- dATP (Takara, catalog number 4026, 100 mM)
- dGTP (Takara, catalog number 4027, 100 mM)
- dTTP (Takara, catalog number 4029, 100 mM)
- Substrate preparation DNA polynucleotide substrate 001 (3A end dA), 002 (3' end dG), 003 (3' end dT), 004 (3' end dU) and Mixing 005 (dC at the 3' end) in equimolar amounts to obtain a mixed single-stranded DNA polynucleotide substrate having different nucleotides at the 3' end to mimic bisulfite treatment or DNA polynucleotide after damage 3' terminal nucleotide composition.
- Substrate denaturation The mixed single-stranded DNA polynucleotide substrate was incubated at 95 ° C for 2 minutes and then rapidly placed on ice to maintain the substrate in a single-stranded state.
- nucleotides to the tail-end (tailing) reaction was carried out in the reaction solution; the reaction solution was incubated at 37 ° C for 5 minutes, 15 minutes and 30 minutes, respectively, and then incubated at 70 ° C for 10 minutes to inactivate the TdT enzyme.
- Lane 1 is a 20-500 bp DNA marker
- Lane 2 is a pre-tailed substrate
- Lanes 3-6, 7-10 and 11-14 are TdT enzymes for substrate addition (dG). ), poly(dC), poly(dA) and poly(dT) tails and reacted for 5 minutes, 15 minutes and 30 minutes of product.
- the addition of poly(dG) and poly(dC) tails to the TdT enzyme consumed substrate more rapidly, significantly faster than the addition of poly(dA) and poly(dT) tails (lanes 3-14).
- the TdT enzyme has a higher dissociation constant for the complex formed by adding poly(dG) or poly(dC) tail to the substrate, and the dissociation constant of the complex formed by adding poly(dA) or poly(dT) tail is also lower.
- the TdT enzyme can be dissociated by adding less poly(dG) or poly(dC) tails after binding to the substrate, so that the enzyme molecule has the opportunity to bind to the next substrate, so the utilization of the substrate is sufficient.
- the tail product of the substrate was more diffuse, indicating that the length of the four polynucleotides added was uncontrollable (lanes 3-14).
- TdT enzyme is more fully utilized for the substrate when the poly(dG) and poly(dC) tails are added to the mixed single-stranded DNA polynucleotide substrate with different nucleotides at the 3' end. Under the conditions of Example 1, the tail length of the substrate was not controllable.
- nucleotide homo-tails to a mixed single-stranded DNA polynucleotide substrate containing a portion of a random sequence and having different nucleotides at the 3' end using a TdT enzyme
- TdT enzyme Enzymatics, catalog number P7070L, 20U/ ⁇ L
- TdT enzyme (New England Biolabs, catalog number M0315L, 20U/ ⁇ L)
- TdT enzyme (ThermoFisher Scientific, catalog number EP0161, 20U/ ⁇ L)
- Substrate preparation DNA polynucleotide substrate 006 (3A end dA), 007 (3' end dT), 008 (3' end dC), 009 (3' end dG) and 010 (dU at the 3' end) was mixed in equimolar amounts to give a mixed single-stranded DNA polynucleotide substrate having a partially random sequence and a different nucleotide at the 3' end.
- Reaction Formulation containing 1 pmol of denatured substrate (polynucleotide 006-010 0.25 pmol each), 1x TdT reaction buffer, 200 ⁇ M dGTP or dCTP or dATP or dTTP, and 10 units of Enzymatics or New England Biolabs or ThermoFisher Scientific The 5 ⁇ L reaction solution of the produced TdT enzyme was incubated at 37 ° C for 30 minutes, and then incubated at 70 ° C for 10 minutes to inactivate the TdT enzyme.
- Lane 1 is a 20-500 bp DNA marker and Lane 2 is a pre-tailed substrate.
- the TdT enzyme was more fully utilized when adding poly(dG) and poly(dC) tails. This result is consistent with the results of Example 1 (lane 3 - 6), and the TdT enzymes produced by the three manufacturers Enzymatics, New England Biolabs and ThermoFisher Scientific have similar substrate utilization rates (lanes 3-6, 7-10 and 11-14).
- the substrate residue after the reaction in this example is slightly increased and may be caused by a complex secondary structure formed by a partial random sequence (lane 3).
- the TdT enzyme has a higher utilization of the substrate when the poly(dG) and poly(dC) tails are added to the mixed single-stranded DNA polynucleotide substrate with different nucleotides at the 3' end. This property is independent of the substrate. Sequence characteristics and producers of TdT enzymes.
- DNA marker for labeling substrate tail length No. 021, 022 (Table 1)
- 10x green buffer (Enzymatics, catalog number B0120, 20 mM Tris-acetate, 50 mM potassium acetate, 10 mM magnesium acetate, pH 7.9)
- TdT enzyme dGTP, 2x RNA loading buffer and 20-500 bp DNA label (manufacturer and catalog number same as Example 1) method:
- reaction solution for the tailing reaction 5 ⁇ L of the reaction solution for the tailing reaction was prepared as shown in Table 3-1 below, and the cells were incubated at 25 ° C, 37 ° C, and 45 ° C for 15 minutes, respectively, and then incubated at 70 ° C for 10 minutes to inactivate the TdT enzyme. .
- Lane 1 is a 20-500 bp DNA marker
- Lane 2 is a pre-tailing substrate
- Lane 3 is a reaction product of a TdT enzyme adding a poly(dG) tail to a substrate at 25 ° C
- 4-13 are the reaction products of the tailing molecule 011-020, respectively, and tailing at 25 ° C
- lanes 14, 26 and 38 are multi-cores of 60b and 70b for labeling the substrate to add poly (dG) tail length.
- Glycosidic acid markers (021 and 022).
- Lane 15 is the reaction product of the TdT enzyme adding a poly(dG) tail to the substrate at 37 ° C; Lanes 16-25 are the reaction products respectively adding a tail-control molecule 011-020 and tailing at 37 ° C; Lane 27 The reaction product of the poly(dG) tail was added to the substrate for the TdT enzyme at 45 ° C; lanes 28-37 were the reaction products of the tail-control molecule 011-020, respectively, and tailing at 45 ° C.
- the TdT enzyme does not fix the tail length of the substrate at 25 ° C, 37 ° C and 45 ° C, and the tail length is concentrated near 100 b (lanes 3, 15 and 27).
- the tailing efficiency of the DNA polynucleotide substrate e.g., double-stranded DNA having a 3' recess or blunt end
- the tail-controlling molecule can anneal to the poly(dG) tail added to the substrate by the TdT enzyme over a certain temperature range, forming a 3' hidden double-stranded structure, thereby reducing the addition.
- Tail efficiency and limit the length of the tail In the presence of a tail-controlling molecule, a tail-control molecule with 7-20b poly(dC) can control the tail length of the substrate, and the tail-control molecules with 7-20b, 8-20b and 9-20b poly(dC) can The length of the poly(dG) tail added to the substrate by the TdT enzyme was controlled at about 20b (lanes 6-13, 19-24 and 32-37) under the reaction conditions of 25 ° C, 37 ° C and 45 ° C.
- the controlled polynucleotide added to the polynucleotide substrate by the TdT enzyme is used as the complementary binding region of the extended primer to synthesize the second strand, thereby constructing a methylation sequencing library.
- dNTP (Takara, catalog number 4030, each 2.5 mM)
- RNase A (Takara, Cat. No. 2158, 10 mg/mL)
- a tailing reaction mixture was prepared as shown in Table 4-1, and the mixture was incubated at 37 ° C for 15 minutes, at 95 ° C for 2 minutes, and then at 4 ° C.
- the ⁇ -DNA was treated with bisulfite, and the degradable tail-tailing polynucleotide (025) was used to make the TdT enzyme add a controllable poly(dG) tail to the treated ⁇ -DNA, and then added
- the P5 region, the sample tag region and the P7 region (Table 7) of the Illumina Truseq library were amplified by PCR to obtain a structurally complete final sequencing library.
- the fragment distribution detection showed that the average size of the library fragment was 328 bp, no linker dimer, The library was shown to be of high purity (as shown in Figure 8); qPCR results showed that the library had a molar concentration of 191.3 nM, demonstrating that the bisulfite-treated lambda-DNA was immobilized by the sequencing linker and became an efficient sequencing library.
- the tail-tailing molecular polynucleotide allows the TdT enzyme to add a controlled polynucleotide homologous tail to the substrate, which can be used as a binding region for the extension primer; after the double-strand synthesis, the "5' sequencing linker" is connected, and PCR amplification is performed. Increased, an efficient sequencing library can be obtained.
- the tail-controlling molecule is used to control the length of the poly(dG) tail added to the substrate by the TdT enzyme and the poly(dG) tail of the substrate is linked to the linker by the action of the ligase.
- Short linker polynucleotide No. 024 (Table 1)
- E. coli DNA ligase (New England Biolabs, catalog number M0205S, 10 U/ ⁇ L)
- Lane 1 is the substrate before tailing
- Lane 2 is the reaction product of TdT enzyme adding poly(dG) tail to the substrate at 25 °C
- Lanes 3-12 are added respectively.
- Tail link 011/024-020/024 (as shown in Table 2) and the reaction product incubated at 25 ° C
- Lane 13 is the reaction product of the TdT enzyme adding a poly(dG) tail to the substrate at 37 ° C
- 14-23 are the reaction products respectively added with a tail-control joint 011/024-020/024 (as shown in Table 2) and incubated at 37 ° C
- Lane 24 is a TdT enzyme added to the substrate at 45 ° C ( dG)
- Lanes 25-34 are the reaction products which were respectively added with a tail-control joint 011/024-020/024 (as shown in Table 2) and incubated at 45 °C.
- the poly(dG) tail added to the substrate by the TdT enzyme is uncontrollable in the absence of the tail-control link (lanes 2, 13 and 24); after addition of the tail-control linker and ligase, the tailing sequence of the substrate is obtained and short
- the linker (024) is subjected to the ligation product after the ligation reaction (lanes 3-12, 14-23 and 25-34); when reacted at 25 ° C, 37 ° C and 45 ° C, the poly(dC) tail length is different.
- Controlled tail adaptors (as shown in Table 2) and ligase were able to obtain the product of the substrate attached to the short linker (024) after tailing (as shown in lanes 3-12, 14-23 and 25-34);
- the tail-control effect of the tail-end joint with poly (dC) tail of 7b or more is better.
- the TdT enzyme and ligase can complete the hybrid single Poly(dG) controlled tailing reaction of a stranded DNA polynucleotide substrate and ligation of the tail with a linker; control of 5-20b poly(dC) tails when performing the above reaction between 25-45 °C
- the tail linker can obtain the product of the substrate attached to the linker after tailing.
- the TdT enzyme adds a poly(dG) tail of indefinite length to the substrate (lane 2), and when the tail-control linker (016/024) is present, the TdT enzyme pair The length of the tail of the substrate is controlled (lane 3).
- Simultaneous addition of TdT enzyme, tail-controlling linker and E. coli DNA ligase for 1 minute gave a ligation product of a substrate with a fixed length poly(dG) tail and a short linker (024) (lane 4); The amount of ligation product remained unchanged (lanes 5-12), indicating that the tailing of the substrate and the ligation reaction were completed quickly.
- Non-degradable tail-controlling polynucleotide 016 (Table 1)
- dNTP (Takara, catalog number 4030, each 2.5 mM)
- RNase A (Takara, Cat. No. 2158, 10 mg/mL)
- the bisulfite-treated ⁇ -DNA was denatured into a single-stranded state, and the substrate was added in the presence of TdT enzyme, E. coli DNA ligase, and tail-control link (025/026 or 016/026).
- the controlled poly(dG) tail is ligated to the long linker; the tail-tailing molecular polynucleotide (025 or 016) is replaced by degradation or stranding and the two-strand synthesis is completed; and the substrate DNA of the double-stranded synthesis is completed with "5'
- the sequencing linker is ligated and amplified by PCR to obtain a sequencing library with a complete structure. As shown in FIG. 11A and FIG.
- the distribution of the library fragments obtained by the above two methods is in the range of 150-1000 bp, and there is no linker dimer, and the library has high purity.
- concentration of the sequencing libraries obtained by the two two-strand synthesis methods were similar, 588.9 nM and 707.4 nM, respectively, which met the sequencing requirements of the Illumina sequencer (the sequencing requirement of the Illumina Nextseq sequencer was: the library volume was greater than or equal to 1.3).
- the library concentration is greater than or equal to 1.8 nM; the sequencing requirements of the Illumina xTen sequencer are: library volume greater than or equal to 5 ⁇ l, library concentration greater than or equal to 3 nM).
- ⁇ -DNA bisulfite treatment kit, 5x annealing buffer, 10x green buffer, TdT enzyme, E. coli DNA ligase, dGTP, ⁇ -nicotinamide adenine dinucleotide, 10x isothermal amplification buffer II, dNTP, Bst 3.0 DNA polymerase, RNase A, 2x T4 DNA rapid ligation buffer, T4 DNA ligase, 2x high fidelity hot start methylation PCR mix, Beckman Ampure XP magnetic beads and enzyme-free water ( Manufacturer and catalog number are the same as in the example 7)
- Methylation sequencing library preparation A methylation sequencing library was constructed according to the method described in Example 7 using 10 ng of ⁇ -DNA, wherein the two-strand synthesis was carried out according to a degradation type method.
- the experimental results are shown in Figure 12.
- the methylation sequencing library was constructed using a single-stranded polynucleotide with a stem-loop-structured tail-crossing linker (030).
- the average size of the final sequencing library fragment was 396 bp, no linker II.
- the purity of the library was high; the qPCR assay showed that the library concentration was 15.2 nM, indicating that the DNA substrate was immobilized by the sequencing linker, and an efficient sequencing library was obtained.
- ⁇ -DNA bisulfite treatment kit, 5x annealing buffer, 10x green buffer, TdT enzyme, E. coli DNA ligase, dGTP, ⁇ -nicotinamide adenine dinucleotide, 10x isothermal amplification buffer II, dNTP, Bst 3.0 DNA polymerase, T4 DNA ligase, 2x high fidelity hot start methylation PCR mix, Beckman Ampure XP magnetic beads and enzyme-free water (manufacturer and catalog number same as Example 7)
- a non-degradable tail joint was prepared according to the joint preparation method of Example 4 (011/026, 012/026, 013/026, 014/026, 015/026, 016/026, 017/026, 018/026, 019/026, and 020/026, as shown in Table 3) and the dt "5' sequencing linker" (031/032, as shown in Table 4).
- Methylation sequencing library preparation a methylation sequencing library was constructed according to the method described in Example 7 using 6.75 ng of ⁇ -DNA; wherein the tailing and ligation reactions were respectively carried out using a poly(dC) tail length of 5-20b Controlled tail joints (011/026, 012/026, 013/026, 014/026, 015/026, 016/026, 017/026, 018/026, 019/026, and 020/026), two-chain synthesis The chain displacement extension method is carried out.
- the experimental results are shown in Fig. 13A to Fig. 13J, and the fragmentation of the methylation sequencing library constructed by the poly(dC) tail length of 5-20b to the methylated sequencing library after the bisulfite treatment is distributed in the range of 200-1000 bp.
- Meshless dimer, high library purity as shown in Table 9-1, the lowest concentration of the library constructed using the 5b poly(dC) tailed tailer was 6.2 nM with a tail of 20b poly(dC) tail
- the library constructed by the linker had the highest concentration of 31.7 nM.
- the tail-tailing linker with 5-20b poly(dC) tail can effectively construct methylation sequencing library of bisulfite-treated DNA.
- 10x End Repair Buffer (New England Biolabs, Cat. No. B6052S, 50 mM Tris-HCl, 10 mM Magnesium Chloride, 10 mM Dithiothreitol, 1 mM Adenosine Triphosphate, 0.4 mM dATP, 0.4 mM dCTP, 0.4 mM dGTP, 0.4 mM dTTP, pH 7.5 )
- T4 DNA polymerase Enzymatics, catalog number P7080L, 3U/ ⁇ L
- 10x dA tailing buffer (New England Biolabs, Cat. No. B6059S, 10 mM Tris-HCl, 10 mM magnesium chloride, 50 mM sodium chloride, 1 mM dithiothreitol, 0.2 mM dATP, pH 7.9)
- Bisulfite treatment kit 5x annealing buffer, 10x green buffer, TdT enzyme, E. coli DNA ligase, dGTP, ⁇ -nicotinamide adenine dinucleotide, 10x isothermal amplification buffer II, dNTP, Bst 3.0 DNA Polymerase, T4 DNA Rapid Link Buffer, T4 DNA Fast Ligase, 2x High Fidelity Hot Start Methylation PCR Mix, Beckman Ampure XP Magnetic Beads and Enzyme Free Water (Manufacturer and Catalog Number Same as Example 7 )
- the terminal repair reaction mixture was prepared as shown in Table 10-1, and reacted at 20 ° C for 30 minutes, and then 45 ⁇ l of Beckman Ampure XP magnetic beads were added to recover the repaired DNA, and eluted with 26 ⁇ l of enzyme-free water.
- PCR amplification reaction mixture was prepared, except that the P7 PCR tag primer was a polynucleotide numbered 033; followed by the PCR amplification procedure shown in Table 4-5. The procedure was carried out except that the number of amplification cycles was 18; after the reaction was over, the PCR product was recovered using 75 ⁇ l of Beckman Ampure XP magnetic beads and eluted with 20 ⁇ l of enzyme-free water to obtain a final sequencing library.
- the experimental results are shown in Table 10-4 below.
- the concentration of the methylation sequencing library constructed from 10 ng of human genomic DNA based on the TdT enzyme and the tail-tailing method was 151.8 nM, while the library concentration of the conventional method was 99.6 nM.
- the amount of sequencing data is 19.4Gb
- the library constructed by single-strand tail-joining method has a redundancy of 13.4%, covering 89.6% of CpG regions, and the average sequencing depth is 4.8x.
- the library sequencing data constructed by the traditional method is 22.0.
- the redundancy of Gb reached 70.4%, covering only 13.8% of CpG region, and the average sequencing depth was 0.3x.
- the tail-joining method based on TdT enzyme and tail-tailing can efficiently construct methylation sequencing library of a small amount of human genomic DNA.
- the efficiency of database construction and sequencing data are far superior to traditional methods.
Landscapes
- Chemical & Material Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Organic Chemistry (AREA)
- Health & Medical Sciences (AREA)
- Engineering & Computer Science (AREA)
- Biochemistry (AREA)
- Genetics & Genomics (AREA)
- Wood Science & Technology (AREA)
- Zoology (AREA)
- Microbiology (AREA)
- Molecular Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- General Engineering & Computer Science (AREA)
- Biotechnology (AREA)
- Biophysics (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Immunology (AREA)
- Analytical Chemistry (AREA)
- Medicinal Chemistry (AREA)
- General Chemical & Material Sciences (AREA)
- Chemical Kinetics & Catalysis (AREA)
- Crystallography & Structural Chemistry (AREA)
- Plant Pathology (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
提供了一种用于第二代高通量测序文库构建的方法和试剂盒。提高了核酸模板的利用效率,简化了测序文库的构建流程,使得测序结果更加准确和覆盖度更加均一。
Description
本发明涉及用于第二代高通量测序文库构建的方法和试剂盒,更具体地,本发明涉及基于末端转移酶用于高通量测序文库构建的方法和试剂盒。
相对于第一代测序技术,第二代测序技术测序速度更快,通量更高,符合目前科技发展对测序的需求。目前,第二代测序技术的平台主要包括Illumina公司的Hiseq、Miseq、Nextseq、Novaseq以及Life Technologies公司的SOLID system、PGM、Proton等。第二代测序技术的技术思路是边合成边测序,即根据新合成的不同碱基带来的信号变化确定DNA序列,比如,Illumina测序平台是检测光信号,Life测序平台是检测酸碱变化引起的电流变化。随着第二代测序技术日益成熟,其在临床方面的应用也将越来越广泛。
循环DNA又称为游离DNA,是血液中在细胞外存在的DNA。游离DNA的主要来源是凋亡细胞或骨髓细胞,这些细胞释放的DNA再经体内核酸酶切割后,产生了长度约为166bp的小片段DNA(Y.M.Dennis Lo et al.ScienceTranslationalMedicine.2010.10:61ra91)。游离DNA在体内处于一个动态平衡状态,所以,游离DNA可以作为健康评估的一个重要参数。肿瘤发生、器官移植等变化都会导致外周血游离DNA的性质发生改变,这些性质包括游离DNA的长度、碱基信息、表观修饰等;所以,游离DNA可以作为疾病的早期诊断、监测和预后评估的一种无创检测的重要标志物。
目前,以游离DNA作为分子标记开展的无创产前诊断临床应用已经获得了全方面认可,多个国家已经全面推进该技术应用。除了碱基信息,游离DNA的长度信息也是一种非常重要的分子标记。有研究发现,不同组织或不同状态细胞的核小体、转录因子或DNA结合蛋白会与DNA的不同区域结合,最终导致游离DNA长度和测序覆盖度发生变化,根据这些差异可以追溯这些游离DNA的来源,这将给癌症早诊、器官移植、监控等领域带来新的曙光(Matthew W.Snyder et al.Cell 2016.1:57-68)。另一个方面,利用高通量测序方法对肿瘤甲基化的研究发现,利用甲基化测序分析肿瘤与正常组织的DNA甲基化差异信号后,可以通过此差异实现癌症的早期诊断,再结合不同组织特异的甲基化信号,还可以对肿瘤的具体位置进行定位,这对于癌症早筛后诊治具有重大意义(Kun Sun et al.2015.5:5503-12;
ShichengGuo et al.2017.3:635–642)。
利用第二代高通量测序技术对游离DNA进行甲基化测序前,需要先构建甲基化测序文库。目前,第二代高通量甲基化测序文库的构建流程是先进行预文库构建,包括末端补平,5’末端磷酸化,3’末端悬A和接头连接步骤;在预文库构建完成后,再进行亚硫酸氢盐处理,亚硫酸氢盐处理会导致大量DNA损伤,最终可以进行测序的模板占原始模板的比例不到10%(Masahiko Shiraishi et al.2004.10:409-415)。甲基化测序文库的构建流程需要,1)每一步都需要纯化,操作繁琐;2)补平步骤会人为引入核苷酸,改变真实的甲基化状态;3)大量DNA模板在亚硫酸氢盐处理时被破坏,并在PCR扩增后丢失。因此,需要开发出一种更优的建库方法,能够减少亚硫酸氢盐对DNA的损伤进行高效建库。
发明内容
鉴于目前基于亚硫酸氢盐处理的DNA甲基化测序文库构建过程中所遇到的问题,本发明提供了一种对脱氧多核苷酸底物进行加尾的方法;并进而提供了一种第二代高通量测序文库构建方法。本发明的方法不但适用于正常DNA,还适用于FFPE样本、古DNA、亚硫酸氢盐处理后的DNA样本等损伤严重的样本。
本发明提供了一种对脱氧多核苷酸底物进行加尾的方法,所述方法包括如下步骤:(1)将所述脱氧多核苷酸底物与如下物质混合以形成第一混合物:a)dGTP或dCTP核苷酸;b)末端脱氧核苷酸转移酶;c)控尾组分,其包含5至20个核苷酸长度(简写为5b-20b)的多核苷酸同聚物构成的控尾区,其中多核苷酸同聚物与a)核苷酸互补;(2)孵育所述的第一混合物,脱氧多核苷酸底物的3’端发生加尾反应,在底物3’端添加dGTP或dCTP多核苷酸,形成底物的3’加尾区。
进一步地,脱氧多核苷酸底物是双链或单链的脱氧多核苷酸序列;优选为单链脱氧多核苷酸底物;优选控尾区的多核苷酸同聚物为聚(dC)的同聚物。
进一步地,控尾区的多核苷酸为dC和rC碱基组成的杂聚序列。
进一步地,本发明提供了一种对于加尾的脱氧多核苷酸底物直接进行互补链合成的方法,所述方法包括:在步骤(2)的加尾反应之后,加入核糖核酸酶RNase HII降解控尾组分的多核苷酸同聚物或控尾分子中的核糖核苷酸,产生3’端游离的羟基,利用控尾组分以具有3’加尾区的底物为模板进行互补链延伸,采用步骤(3)向所述的第一混合物中添加DNA聚合酶和脱氧核苷酸(包含dATP,dTTP,dCTP和dGTP),以形成第二混合物;步骤(4)孵育所述第二混合物,在与加尾后的底物互补的控尾组分3’端发生核苷酸聚合反应,合成单
链脱氧多核苷酸底物的互补链,得到双链脱氧多核苷酸;步骤(5):从所述的第二混合物中分离双链脱氧多核苷酸。
本发明另外的实施方式,在于提供了一种对于加尾的脱氧多核苷酸底物以链置换延伸的方式合成互补链(后文也称为“二链合成”)的方法:用步骤(3-1):在步骤(2)的加尾反应之后,向所述的第一混合物添加与底物3’加尾区互补的延伸引物、DNA聚合酶和脱氧核苷酸,以形成第二混合物;步骤(4):孵育所述第二混合物,在延伸引物的3’端发生核苷酸聚合反应,合成底物的互补链,得到双链脱氧多核苷酸;步骤(5):从所述的第二混合物中分离双链脱氧多核苷酸。
进一步,本发明在另外的实施方式中,采用降解型延伸的方法来进行二链合成,该方法对控尾组分中的控尾分子先进行降解,然后加入与底物3’加尾区互补的延伸引物、DNA聚合酶和脱氧核苷酸,以形成第二混合物。
进一步,本发明另外的实施方式,提供了一种对双链核苷酸底物进行添加5’测序接头的连接步骤,步骤(6):在分离的双链脱氧多核苷酸中,加入5’测序接头和连接酶,以形成第三混合物,孵育第三混合物,使双链脱氧多核苷酸连接上所述的5’测序接头。
本发明另一方面还涉及一种试剂盒,其包含:脱氧多核苷酸底物、dGTP或dCTP核苷酸、末端脱氧核苷酸转移酶,和控尾组分,其中所述的控尾组分包含5至20个核苷酸长度的多核苷酸同聚物,该多核苷酸同聚物与dGTP或dCTP核苷酸互补。该试剂盒能用于控制底物核苷酸的加尾反应。
进一步地,本发明的试剂盒还包括DNA连接酶或RNA连接酶。
进一步地,本发明的试剂盒还包括DNA聚合酶,以及包含dATP,dTTP,dCTP和dGTP的脱氧核苷酸。
进一步地,本发明的试剂盒还包括延伸引物。
进一步地,本发明的试剂盒还包括RNA酶、USER酶和切刻内切酶中的至少一种。
进一步地,本发明的试剂盒还包括5’测序接头。
本发明通过设计控尾组分,可有效控制末端转移酶在多核苷酸底物3’末端加尾的长度。进一步,本发明人还发现多核苷酸底物的多核苷酸尾能够与事先加入控尾组分的poly区在一定温度下通过退火杂交形成双链多核苷酸结构的同时完成与连接子连接。通过此方法,可以非常高效的在多核苷酸底物3’端添加目的序列,例如,目的序列可以是用于下一代测序实施测序的引发序列。本发明还发现,末端转移酶对多核苷酸底物添加dGTP或dCTP核
苷酸而形成聚(dG)或(dC)尾时,底物利用效率明显高于添加添加dATP或dTTP核苷酸形成聚(dA)或(dT)尾。更进一步,将核酸变性为单链,在底物加尾后与连接子连接、之后完成多核苷酸底物互补链延伸、任选互补链3’端悬dA尾、连接5’接头后,PCR富集,得到可进行下一代测序的文库。依据本发明,本建库流程可对低至10ng人类培养细胞来源的基因组DNA构建全基因组甲基化测序文库,并得到高效的测序结果。
图1:控尾组分的结构
图2:DNA文库构建流程图
图3:二链合成流程图
图4:5’测序接头连接示意图
图5:利用TdT酶对3’端具有不同碱基的混合单链DNA多核苷酸底物添加核苷酸同聚尾的实验结果
图6:利用TdT酶对包含部分随机序列且3’末端具有不同碱基的混合单链DNA多核苷酸底物添加核苷酸同聚尾的实验结果
图7:通过控尾分子多核苷酸控制TdT酶对底物添加聚(dG)尾的长度的实验结果
图8:通过多核苷酸同聚物使TdT酶对亚硫酸氢盐处理后的λ-DNA进行受控加尾后所构建的甲基化测序文库的片段分布结果
图9:利用控尾分子控制TdT酶对底物添加聚(dG)尾的长度,并在连接酶的作用下使底物的聚(dG)尾与连接子连接的实验结果
图10:研究反应时间对底物添加聚(dG)尾后与控尾组分连接的影响的实验结果
图11:通过降解和链置换两种方式进行二链合成所构建的甲基化测序文库的片段分布结果;其中,图11A和图11B分别为通过降解和链置换方式制备的甲基化测序文库的片段分布。
图12:利用具有茎环结构的控尾组分使TdT酶对亚硫酸氢盐处理后的λ-DNA进行受控加尾后所构建的甲基化测序文库的片段分布结果
图13:利用具有不同长度控尾区的控尾组分使TdT酶对亚硫酸氢盐处理后的λ-DNA进行受控加尾后所构建的甲基化测序文库的片段分布结果
其中,图13A为控尾区长度为5b聚(dC)的控尾组分制备的甲基化测序文库的片段分布结果
图13B为控尾区长度为6b聚(dC)的控尾组分制备的甲基化测序文库的片段分布结果
图13C为控尾区长度为7b聚(dC)的控尾组分制备的甲基化测序文库的片段分布结果
图13D为控尾区长度为8b聚(dC)的控尾组分制备的甲基化测序文库的片段分布结果
图13E为控尾区长度为9b聚(dC)的控尾组分制备的甲基化测序文库的片段分布结果
图13F为控尾区长度为10b聚(dC)的控尾组分制备的甲基化测序文库的片段分布结果
图13G为控尾区长度为11b聚(dC)的控尾组分制备的甲基化测序文库的片段分布结果
图13H为控尾区长度为12b聚(dC)的控尾组分制备的甲基化测序文库的片段分布结果
图13I为控尾区长度为13b聚(dC)的控尾组分制备的甲基化测序文库的片段分布结果
图13J为控尾区长度为20b聚(dC)的控尾组分制备的甲基化测序文库的片段分布结果
图14:基于本发明方法与传统方法构建的人类基因组甲基化测序文库的片段分布结果;其中,图14A和图14B分别为基于本发明方法与传统方法构建的人类基因组甲基化测序文库的片段分布结果。
发明详述
说明书中的3’、3’端和3’末端的含义相同,5’、5’端和5’末端的含义相同,他们分别指核苷酸序列的3’端或者5’端。
多核苷酸底物
多核苷酸底物是需要进行加尾反应和/或文库构建的多核苷酸底物片段。在各种实施方案中,多核苷酸底物为单链或双链的DNA。在另外的实施方案中,多核苷酸底物是经过化学处理的核苷酸序列,包括但不限于是经过亚硫酸氢盐处理的多核苷酸。
多核苷酸底物可以是天然来源的或合成的。天然来源是来自原核生物或真核生物,如人、小鼠、病毒、植物或细菌的多核苷酸序列。本发明的多核苷酸底物还可以是,FFPE样本、古DNA、亚硫酸氢盐处理后的DNA等损伤严重的样本。多核苷酸底物被加尾,能用于涉及微阵列的测定并且产生用于下一代核酸测序的文库。加尾的多核苷酸底物还可以用于多核苷酸序列的有效克隆。
在一些实施方式中,多核苷酸底物是单链的或双链的并且包含3’端游离羟基。在一些
方面,多核苷酸底物是双链的并且包含平末端。在其它方面,双链多核苷酸底物包含3’凹陷末端。多核苷酸底物的突出末端或凹陷末端的长度可以变化。在各个方面,多核苷酸底物的突出末端或凹陷末端的长度为1、2、3、4、5、6、7、8、9、10或更多个核苷酸。
在一些方面,多核苷酸底物的长度介于约10个与约5000个核苷酸之间,或介于约40个与约2000个核苷酸之间,或介于约50个与约1000个核苷酸之间,或介于约100个与约500个核苷酸之间。在另外的方面,多核苷酸底物的长度为至少3个到至多约50、100或1000个核苷酸。
控尾组分
本发明通过添加控尾组分来控制多核苷酸底物的加尾长度和效率。控尾组分包含一段由dGTP或dCTP的多核苷酸同聚物组成的控尾区,例如控尾区可以是5-13个核苷酸长度的dGTP或dCTP的多核苷酸同聚物。
多核苷酸同聚物又称“poly区”,是相同的核苷酸连接成的多核苷酸链。本发明的控尾区优选的是聚(dC)或聚(dG)核苷酸同聚物序列;以及包含以下组合的杂聚序列:(i)dC与rC核苷酸,或(ii)dG与rG核苷酸。优选地,本发明控尾区的多核苷酸同聚物的长度为5-20个核苷酸,优选7-20、9-20个核苷酸,进一步优选为5-10个核苷酸,7-10个核苷酸,更优选为7-9个核苷酸。一定长度的dGTP或dCTP多核苷酸同聚物可以有效控制多核苷酸底物加尾在20个核苷酸左右。在一些实施方式中,控尾组分是一种多核苷酸同聚物,也就是说,控尾组分仅包括控尾区。
在本发明的一些实施方式中,控尾组分是由控尾区和X区(又称X区序列,或控尾组分的测序接头序列)连接而成的控尾分子序列,例如本发明所列举的SEQ ID NO:11,SEQ ID NO:12,SEQ ID NO:13,SEQ ID NO:14,SEQ ID NO:15,SEQ ID NO:16,SEQ ID NO:17,SEQ ID NO:18,SEQ ID NO:19和SEQ ID NO:20。
“X区序列”提供用于核酸片段的扩增或测序的引发序列,或用于区分不同的底物分子的标记序列,并且在一些方面用于下一代测序应用。在本发明的一些实施方式中,X区序列可以是但不限于与Illumina、Ion Torrent、Roche 454或SOLiD测序平台相容的含下一代测序(NGS)接头序列,例如表7所示的Illumina Truseq文库序列。X区序列可以是DNA序列、RNA序列或者包含DNA和RNA的杂聚序列。
在本发明的一些实施方式中,控尾组分是由控尾区和X区,以及能够与X区互补的连接子序列(见图1),控尾组分又称为“控尾接头”。只具有与X区互补序列的连接子被
称为“短连接子”,如表2所示。除了与X区互补序列外还包括延伸引物结合区的连接子序列,被称为“长连接子”,如表3所示。该控尾组分可以是同一多核苷酸分子,多核苷酸形成茎环结构以产生X区与连接子序列互补的部分双链的多核苷酸,在这个茎环构型中,多核苷酸同聚物部分是单分子;控尾组分也可以是彼此杂交的两个单独的多核苷酸分子,即连接子序列是能够与X区序列互补多核苷酸分子。本发明的控尾组分优选为选自表2或表3中的双链控尾组分。
在本发明的一些实施方式中,控尾组分的多核苷酸包含肽核酸、裂褶多糖(Schizophyllan polysaccharide)、锁核酸以及其组合。
本发明提供其中所述控尾组分是单链或至少部分是双链的组合物。“部分双链”指的是所述控尾组分包含单链部分和双链部分。在控尾组分是至少部分双链的情况下,部分双链的控尾组分是通过使控尾分子的X区序列杂交到连接子分子(在一些实施方案中,所述连接子序列是NGS接头序列)来产生,所述控尾分子的部分序列与连接子分子是互补的。在另一些实施方式中,杂交是发生在形成发夹结构的单个控尾分子内,因此形成部分双链结构的控尾组分既可以是单分子,也可以是多分子结构。
在本发明的一些实施方式中,控尾组分包含封闭基团。本文所使用的封闭基团是阻止通过酶进行延伸的部分。如果没有封闭基团,酶能够通过添加核苷酸合成多核苷酸。封闭基团包括但不限于:磷酸基团、碳三间臂、双脱氧核苷酸、核糖核苷酸、氨基以及反向脱氧胸苷。
在本发明的一些实施方式中,控尾组分连接子的5’端有磷酸化修饰,3’端有封闭基团,控尾区的3’有封闭基团。
加尾反应
如本文所使用的术语“加尾”可与术语“受控加尾”互换。本发明提供了一种对脱氧多核苷酸底物进行加尾的方法,所述方法用于以受控方式将所需数量的dGTP或dCTP核苷酸添加至多核苷酸底物3’端。通过举例并且非限制性地,通过添加控尾组分,该控尾组分包含了5至20个核苷酸长度的多核苷酸同聚物,控尾组分与底物新添加的核苷酸同聚尾序列形成双链结构,因此降低了聚合过程的速率,使多核苷酸底物的尾部控制在一定的长度范围内(见图2)。
在一个具体实施方案中,采用含聚(dC)的核苷酸同聚序列的控尾组分控制TdT酶在多核苷酸底物的3’端添加聚(dG)尾(又称为,核苷酸(dG)同聚尾)。进一步,多核苷酸底物的
聚(dG)尾与控尾组分的连接子连接,形成底物的3’端加尾区。
在一个具体实施方案中,控尾组分包含了5-20个、优选5-13个、进一步优选为7-10个、更优选为7-9个含聚(dC)的核苷酸同聚物序列。多核苷酸底物与控尾组分的摩尔浓度比范围为1:1-1:100,优选1:5-1:50。
在一个具体实施方案中,加尾反应的pH范围为约5.0到约9.0;多核苷酸底物与单核苷酸摩尔浓度比范围为1:10-1:20000,优选1:100-1:2000;孵育的时间在1分钟到120分钟,优选0.5-60分钟,0.5-30分钟,1-20分钟,1-15分钟或1-10分钟。
延伸反应
本发明在对脱氧多核苷酸底物进行加尾之后,进一步以脱氧多核苷酸底物为模板进行延伸反应(见图2的文库构建和图3的二链合成)。
在一个实施方案中,在加尾反应之后,加入RNA酶来降解控尾组分的多核苷酸同聚物以产生3’羟基;加入DNA聚合酶和脱氧核苷酸,与脱氧多核苷酸底物和控尾组分孵育。该方法在控尾组分3’端发生核苷酸延伸反应,合成脱氧多核苷酸底物的互补链,得到双链脱氧多核苷酸。一般来说,延伸反应的孵育时间在1分钟到60分钟,优选1-30分钟、1-20分钟、1-15分钟或1-10分钟。
在另一个具体的实施方式中,本发明先降解控尾分子,再加入延伸引物以多核苷酸底物为模板合成双链脱氧多核苷酸。在一些实施方式中,所述控尾分子包含核糖核苷酸,并且通过与核糖核酸酶(RNA酶)在其活性充足的条件下孵育,之后在80℃以上的温度下孵育是控尾分子与底物分离。在相关方面,核糖核酸酶选自RNase H、RNase HII、RNase A以及RNase T1中的至少一种。在另外的方面,所述控尾分子包含dU核苷酸并且可通过用dU糖基化酶孵育,之后在80℃以上的温度下孵育,或用dU糖基化酶和脱嘌呤/脱嘧啶核酸内切酶的混合物,比如使用USER酶孵育来降解。,在另外的方面,所述控尾组分的双链区域含有能被切刻内切酶识别的特异序列,并且可用切刻内切酶例如Nt.BspQI切割产生缺刻,之后在80℃以上的温度下孵育使控尾分子与底物分离。
在一个具体实施方案中,在加尾反应之后,加入延伸引物,通过链置换使控尾分子与底物多核苷酸底物分离,进而以脱氧多核苷酸底物为模板进行延伸反应。该方法包括:在加尾反应之后,加入与底物3’加尾区互补的延伸引物、DNA聚合酶和脱氧核苷酸,与底物核苷酸底物进行孵育;在延伸引物的3’端发生核苷酸延伸反应,合成底物的互补链,得到双链脱氧多核苷酸。在一些方面,所述的DNA聚合酶具有链置换活性,在对于DNA聚
合酶的链置换活性和DNA聚合活性充足的条件下,可在完成延伸反应的同时通过链置换使控尾分子与多核苷酸底物分离。可用于实施本专利的具有链置换活性的DNA聚合酶包括但不限于:大片段Bst DNA聚合酶、Bst 3.0DNA聚合酶、Klenow大片段、phi29 DNA聚合酶。
在另一实施方案中,在加尾反应之后,先降解控尾组分中的控尾分子多核苷酸,再加入延伸引物进行延伸反应。该方法包括:在加尾反应之后,加入RNA酶降解控尾组分中的控尾分子多核苷酸,再加入与底物3’加尾区互补的延伸引物、DNA聚合酶和脱氧核苷酸,与底物核苷酸底物进行孵育;在延伸引物的3’端发生核苷酸延伸反应,合成底物的互补链,得到双链脱氧多核苷酸。
在一些方面,延伸反应中的DNA聚合酶具有3’-5’外切酶活性(校正活性),在延伸完成后得到平末端的双链结构;在另外的方面,所述DNA聚合酶缺乏3’-5’外切酶活性(校正活性),在延伸完成后得到dA突出的双链结构。因此,在相关方面,可用于本专利的DNA聚合酶包含但不限于以下种类或是它们的组合:全长Bst DNA聚合酶、KAPA高保真热启动DNA聚合酶、KAPA高保真热启动Uracil+DNA聚合酶、Bst 3.0DNA聚合酶、PhusionTM高保真DNA聚合酶、热启动Taq DNA聚合酶、Ex Taq DNA聚合酶、Deep VentR
TMDNA聚合酶、T4 DNA聚合酶、Klenow大片段。
5’测序接头的连接反应
5’测序接头提供用于核酸片段的扩增或测序的引发序列,并且在一些方面用于下一代测序应用(见图4)。在本发明的一些实施方式中,5’测序接头由两条多核苷酸链退火形成。在一些方面,5’接头序列选自但不限于与Illumina、Ion Torrent、Roche 454或SOLiD测序平台相容的含下一代测序(NGS)接头序列,例如表7所示的Illumina Truseq文库序列。接头序列可以是DNA序列、RNA序列或者包含DNA和RNA的杂聚序列,例如表4所示的5’测序接头。
在本发明的一些实施方式中,5’测序接头与步骤(4)的延伸反应之后得到的双链多核苷酸底物发生连接反应,连接端的结构既可以是平末端也可以是粘性末端。粘性末端包括但不限于悬dT粘性末端。
在步骤(4)的延伸反应和步骤(5)的纯化步骤之后,双链脱氧多核苷酸与5’测序接头发生连接反应。在本发明的一些实施方式中,5’接头只与多核苷酸底物的互补链的3’端连接。
在另一些具体实施方式中,5’接头只与多核苷酸底物的5’端连接。在另一些具体实施方式中,5’接头与多核苷酸底物的5’端以及连接多核苷酸底物的互补链的3’端都进行连接。
在步骤(4)的延伸反应之后,在多核苷酸底物的3’端添加dA粘性末端。在一个实施方案中,利用缺乏3’-5’外切酶活性(校正活性)的DNA聚合酶在完成延伸二链反应之后继续添加dA粘性末端。
在另一个实施方案中,具有3’-5’外切酶活性(校正活性)的DNA聚合酶在完成延伸二链反应之后产生平末端之后,对底物与DNA聚合酶分离,再加入缺乏3’-5’外切酶活性(校正活性)的DNA聚合酶和核苷酸进行加尾反应,添加粘性末端。在相关方面,缺乏3’-5’外切酶活性(校正活性)的DNA聚合酶可以但不局限于:Klenow(3’-5’外切酶活性缺乏)、Taq DNA聚合酶。
连接酶
可用于本发明方法的连接酶可以是DNA连接酶和RNA连接酶,包括但不限于T4DNA连接酶、大肠杆菌DNA连接酶、T7DNA连接酶以及T4RNA连接酶。
本发明的连接酶使控尾组分中的连接子与底物加尾的dGTP或dCTP多核苷酸连接。在另一些实施方式中,本发明的连接酶使5’测序接头与合成的双链脱氧多核苷酸连接。
分离步骤
在一些实施方案中,对多核苷酸底物进行纯化。多核苷酸底物的纯化通过本领域技术人员已知和理解的任何方法来进行。
在本发明多核苷酸底物的纯化可以通过加入表面是羧基修饰的磁珠来进行。在其它的具体实施方式中,通过柱纯化和沉淀来进行多核苷酸底物的纯化。
实施例中所用的序列参见如下表1-表7
表1.用于实施例的DNA多核苷酸
Phos:磷酸;C3Spacer:碳3间臂;rC:胞嘧啶核糖核苷酸;rG:鸟嘌呤核糖核苷酸;5mC:5-甲基-胞嘧啶脱氧核苷酸;D:dA、dT或dG核苷酸;N:dA、dT、dC或dG核苷酸
表2.用于实施例5的悬聚(dC)尾的控尾接头
Phos:磷酸;C3Spacer:碳3间臂
表3.用于甲基化测序文库构建的悬聚(dC)控尾组分
Phos:磷酸;C3 Spacer:碳3间臂;rC:胞嘧啶核糖核苷酸
表4.用于甲基化测序文库构建的悬dT“5’测序接头”
Phos:磷酸
表5.用于实施例8的“茎环状”控尾接头
Phos:磷酸;C3 Spacer:碳3间臂;rC:胞嘧啶核糖核苷酸
表6.用于甲基化测序文库构建的“传统甲基化测序接头”
Phos:磷酸;5mC:5-甲基-胞嘧啶脱氧核苷酸
表7.Illumina Truseq文库结构
N:dA、dT、dC或dG核苷酸
实施例1
利用TdT酶对3’端具有不同碱基的混合单链DNA多核苷酸底物添加核苷酸同聚尾
材料:
DNA多核苷酸底物:编号001-005(表1)
TdT酶(Enzymatics,目录号P7070L,20U/μL)
1x TdT反应缓冲液:20mM Tris-醋酸盐、50mM乙酸钾、10mM乙酸镁、0.25mM氯化钴,pH 7.9
dATP(Takara,目录号4026,100mM)
dGTP(Takara,目录号4027,100mM)
dTTP(Takara,目录号4029,100mM)
dCTP(Takara,目录号4028,100mM)
2x RNA上样缓冲液(Solarbio,目录号R1055)
20-500bp DNA标记物(Takara,目录号3420A)
方法:
(1)底物配制:将DNA多核苷酸底物001(3’末端为dA)、002(3’末端为dG)、003(3’末端为dT)、004(3’末端为dU)以及005(3’末端为dC)等摩尔量混合,得到3’末端具有不同核苷酸的混合单链DNA多核苷酸底物,用以模拟亚硫酸氢盐处理或损伤后的DNA多核苷酸的3’末端核苷酸组成。
(2)底物变性:将混合单链DNA多核苷酸底物在95℃下孵育2分钟后快速置于冰上,使底物保持单链状态。
(3)反应:在含有1x TdT反应缓冲液、1pmol经变性处理的混合单链DNA多核苷酸底物(多核苷酸001-005各0.25pmol)、200μM dGTP或200μM dCTP或40μM dATP或40μMdTTP{其中,TdT酶对底物添加聚(dG)或聚(dC)尾与添加聚(dA)或聚(dT)尾的米氏常数之比为5比1}、以及10个单位TdT酶的5μL反应液中进行添加核苷酸同聚尾(加尾)反应;将反应液在37℃下分别孵育5分钟、15分钟和30分钟之后,在70℃下孵育10分钟使TdT酶失活。
(4)反应产物检测:加入5μL 2x RNA上样缓冲液,在70℃下孵育10分钟后,将样本在10%TBE-尿素聚丙烯酰胺凝胶上电泳,使用GelRed核酸染料(Biotium,目录号41003)染色,在凝胶成像仪(天能,目录号Tanon-2500)上照相。
实验结果如图5所示,泳道1为20-500bp DNA标记物;泳道2为加尾前底物;泳道3-6、7-10和11-14分别为TdT酶对底物添加聚(dG)、聚(dC)、聚(dA)和聚(dT)尾并反应5分钟、15分钟以及30分钟的产物。TdT酶添加聚(dG)和聚(dC)尾时对底物的消耗更为迅速,明显快于添加聚(dA)和聚(dT)尾时(泳道3-14)。TdT酶对底物添加聚(dG)或聚(dC)尾形成的复合物解离常数较高,同样情况下添加聚(dA)或聚(dT)尾后形成的复合物解离常数较低;反应中,TdT酶与底物结合后添加较少的聚(dG)或聚(dC)尾即可解离,使得酶分子有机会结合下一个底物,因此对底物的利用较充分。此外,底物的加尾产物较为弥散,表明添加的四种多核苷酸同聚尾的长度不可控(泳道3-14)。
结论:TdT酶对3’末端具有不同核苷酸的混合单链DNA多核苷酸底物添加聚(dG)和聚(dC)尾时对底物利用更充分。实施例1的条件下,底物的加尾长度不可控。
实施例2
利用TdT酶对包含部分随机序列且3’末端具有不同核苷酸的混合单链DNA多核苷酸底物添加核苷酸同聚尾
材料
DNA多核苷酸底物:编号006-010(表1)
TdT酶(Enzymatics,目录号P7070L,20U/μL)
TdT酶(New England Biolabs,目录号M0315L,20U/μL)
TdT酶(ThermoFisher Scientific,目录号EP0161,20U/μL)
1x TdT反应缓冲液、dATP、dGTP、dTTP、dCTP、2x RNA上样缓冲液和20-500bp DNA标记物(生产商和目录号同实施例1)
方法:
(1)底物配制:将DNA多核苷酸底物006(3’末端为dA)、007(3’末端为dT)、008(3’末端为dC)、009(3’末端为dG)以及010(3’末端为dU)等摩尔量混合,得到具有部分随机序列且3’末端为不同核苷酸的混合单链DNA多核苷酸底物。
(2)底物变性方法同实施例1。
(3)反应:配制含有1pmol变性后底物(多核苷酸006-010各0.25pmol)、1x TdT反应缓冲液、200μM dGTP或dCTP或dATP或dTTP以及10个单位Enzymatics或New England Biolabs或ThermoFisher Scientific生产的TdT酶的5μL反应液,在37℃下孵育30分钟后,在70℃下孵育10分钟使TdT酶失活。
(4)反应产物检测方法同实施例1。
实验结果如图6所示,泳道1为20-500bp DNA标记物,泳道2为加尾前底物。相比较于添加聚(dA)和聚(dT)尾,TdT酶在添加聚(dG)和聚(dC)尾时,底物利用更充分,此结果与实施例1的结果一致(泳道3-6),且Enzymatics、New England Biolabs和ThermoFisher Scientific三个厂商生产的TdT酶对底物利用率相似(泳道3-6、7-10以及11-14)。本实施例中反应后的底物残留略微增加,可能由部分随机序列形成的复杂二级结构导致(泳道3)。结论:TdT酶对3’末端具有不同核苷酸的混合单链DNA多核苷酸底物添加聚(dG)和聚(dC)尾时的底物利用率更高,该特性不依赖底物的序列特征和TdT酶的生产商。
实施例3
通过控尾分子多核苷酸控制TdT酶对底物添加聚(dG)尾的长度
材料:
DNA多核苷酸底物:编号001-005(表1)
具有不同长度控尾区的控尾分子多核苷酸:编号011-020(表1)
用于标记底物加尾长度的DNA标记物:编号021、022(表1)
10x绿色缓冲液(Enzymatics,目录号B0120,20mM Tris-醋酸盐、50mM乙酸钾、10mM乙酸镁,pH 7.9)
TdT酶、dGTP、2x RNA上样缓冲液和20-500bp DNA标记物(生产商和目录号同实施例1)方法:
(1)底物配制和变性方法同实施例1。
(2)反应:如下表3-1所示配制进行加尾反应的5μL反应液,分别在25℃、37℃以及45℃下孵育15分钟之后,在70℃下孵育10分钟使TdT酶失活。
表3-1
(3)反应产物检测方法同实施例1,不同之处在于TBE-尿素聚丙烯酰胺凝胶浓度为15%。
实验结果如图7所示,泳道1为20-500bp DNA标记物;泳道2为加尾前底物;泳道3为TdT酶在25℃下对底物添加聚(dG)尾的反应产物;泳道4-13为分别加入了控尾分子011-020,并在25℃下加尾的反应产物;泳道14、26和38为用于标记底物添加聚(dG)尾长度的60b和70b的多核苷酸标记物(021和022)。泳道15为TdT酶在37℃下对底物添加聚(dG)尾的反应产物;泳道16-25为分别加入了控尾分子011-020,并在37℃下加尾的反应产物;泳道27为TdT酶在45℃下对底物添加聚(dG)尾的反应产物;泳道28-37为分别加入了控尾分子011-020,并在45℃下加尾的反应产物。
当控尾分子不存在时,TdT酶在25℃、37℃和45℃条件下对底物的加尾长度均不固定,加尾长度集中在100b附近(泳道3、15和27)。由于TdT酶对3’末端隐藏的DNA多核苷酸底物(例如具有3’凹陷或平末端的双链DNA)的加尾效率显著低于3’末端暴露的底物(例
如单链DNA或3’突出的双链DNA),控尾分子能够在一定温度范围内与TdT酶对底物添加的聚(dG)尾退火,形成3’隐藏的双链结构,从而降低加尾效率并限制加尾长度。在控尾分子存在下,具有7-20b聚(dC)的控尾分子能控制底物的加尾长度,具有7-20b、8-20b和9-20b聚(dC)的控尾分子能够分别在25℃、37℃以及45℃反应条件下使TdT酶对底物添加聚(dG)尾长度控制在20b左右(泳道6-13、19-24和32-37)。
结论:加入与TdT酶对底物添加的多核苷酸同聚尾互补的控尾分子多核苷酸能够有效控制TdT酶对底物的加尾长度。
实施例4
利用TdT酶对多核苷酸底物添加的受控多核苷酸同聚尾作为延伸引物互补结合区合成第二链,进而构建甲基化测序文库
材料:
可降解的控尾分子多核苷酸:编号025(表1)
延伸引物多核苷酸:编号023(表1)
制备“5’测序接头”的多核苷酸对:编号028、029(表1)
P5 PCR引物多核苷酸:编号031(表1)
P7 PCR标签引物多核苷酸:编号032(表1)
100bp DNA梯度标记物(Takara,目录号3422A)
λ-DNA(Takara,目录号3010)
亚硫酸氢盐处理试剂盒(Zymo Research,目录号D5005)
10x等温扩增缓冲液II(New England Biolabs,目录号B9005S)
dNTP(Takara,目录号4030,每种2.5mM)
Bst 3.0DNA聚合酶(New England Biolabs,目录号M0374S,8U/μL)
RNA酶A(Takara,目录号2158,10mg/mL)
Beckman Ampure XP磁珠(Beckman,目录号A63882)
2x T4 DNA快速连接缓冲液(Enzymatics,目录号B1010)
T4 DNA快速连接酶(Enzymatics,目录号L6030-HC-L,600U/μL)
2x高保真热启动甲基化PCR混合液(KAPA Biosystems,目录号KK2802)
无酶水(索莱宝,目录号R1600-100)
5x退火缓冲液(碧云天,目录号D0251)
10x绿色缓冲液、TdT酶、dGTP(生产商和目录号同实施例3)
方法:
(1)接头制备:将多核苷酸对(031/032)等摩尔量混合,在1x退火缓冲液中于95℃孵育2分钟,然后缓慢冷却至室温,得到悬dT“5’测序接头”(如表4所示)。
(2)使用亚硫酸氢盐处理试剂盒对10ngλ-DNA进行亚硫酸氢盐处理。
(3)使用聚焦超声仪(Covaris,目录号S220)将上一步产物片段化至300bp。
(4)将上一步的λ-DNA于95℃处理2分钟之后,立即插入冰上并孵育2分钟后待用。
(5)按照表4-1所示制备加尾反应混合液,在37℃下温浴15分钟后95℃处理2分钟,然后保持在4℃。
表4-1
(6)制备如表4-2所示的用于二链合成的反应混合液,在37℃下孵育15分钟使控尾分子降解,然后在95℃下孵育2分钟,在47℃(孵育1分钟)然后65℃(孵育2分钟)的条件下循环16次后保持在65℃条件下(使延伸引物和底物的聚(dG)尾部序列充分杂交),然后加入1μl Bst 3.0DNA聚合酶于65℃下反应30分钟之后,保持在4℃;使用75μl Beckman Ampure XP磁珠回收完成二链合成的DNA样本,并用12μl无酶水洗脱。
表4-2
(7)按照如表4-3所示制备已完成二链合成的DNA与悬dT“5’测序接头”连接的反应混合液,在25℃温浴15分钟,65℃温浴10分钟使连接酶失活,保持在4℃;使用21μl Beckman AmpureXP磁珠回收连接后DNA,再加入23μl无酶水洗脱。
表4-3
(8)如表4-4所示制备PCR扩增反应混合液,按照表4-5所示的PCR扩增程序运行;使用75μl Beckman Ampure XP磁珠回收PCR产物,再加入20μl无酶水洗脱,得到最终测序文库。
表4-4
表4-5
(9)使用安捷伦2100生物分析仪(Agilent Technologies,目录号G2939BA)和安捷伦高灵敏DNA试剂盒(Agilent Technologies,目录号5067-4626)检测文库片段分布;使用文库定量试剂盒(KAPA Biosystems,目录号KK4824)以及DNA定量标准品和预混引物试剂盒(KAPA Biosystems,目录号KK4808)检测文库摩尔浓度。
实验结果:对λ-DNA进行亚硫酸氢盐处理,利用可降解的控尾分子多核苷酸(025)使TdT酶对处理后的λ-DNA添加可控的聚(dG)尾,再加入与聚(dG)尾序列互补的延伸引物{023,该引物具有与聚(dG)尾互补区以及Illumina Truseq文库的Read 2区(表7)}进行二链合成,然后连接“5’测序接头”并通过PCR扩增引入Illumina Truseq文库的P5区、样本标签区以及P7区(表7)得到结构完整的最终测序文库;片段分布检测结果显示,文库片段平均大小为328bp,无接头二聚体,表明文库纯度较高(如图8所示);qPCR结果显示文库的摩尔浓度为191.3nM,证明亚硫酸氢盐处理后的λ-DNA被测序接头固定,成为有效的测序文库。
结论:控尾分子多核苷酸使TdT酶对底物添加受控的多核苷酸同聚尾,可作为延伸引物的结合区;在二链合成后连接“5’测序接头”,再进行PCR扩增,能够得到有效的测序文库。
实施例5
利用控尾分子控制TdT酶对底物添加聚(dG)尾的长度并在连接酶的作用下使底物的聚(dG)尾与连接子连接
材料:
DNA多核苷酸底物:编号001-005(表1)
具有不同长度控尾区的控尾分子多核苷酸:编号011-020(表1)
短连接子多核苷酸:编号024(表1)
5x退火缓冲液(碧云天,目录号D0251)
大肠杆菌DNA连接酶(New England Biolabs,目录号M0205S,10U/μL)
β-烟酰胺腺嘌呤二核苷酸(New England Biolabs,目录号B9007S,50mM)
10x绿色缓冲液、TdT酶、dGTP和2x RNA上样缓冲液(生产商和目录号同实施例3)
方法:
(1)接头制备:按照实施例4所描述的接头制备方法制备具有不同长度聚(dC)尾的控尾组分(011/024、012/024、013/024、014/024、015/024、016/024、017/024、018/024、019/024
以及020/024,如表2所示)。
(2)底物制备和变性方法同实施例1。
(3)反应:如下表5-1所示配制进行加尾、连接反应的10μL反应液,分别在25℃、37℃以及45℃下孵育15分钟,然后在70℃下孵育10分钟使TdT酶和大肠杆菌DNA连接酶失活。
表5-1
(4)反应产物检测方法同实施例3。
实验结果如图9所示,泳道1为加尾前底物;泳道2为TdT酶在25℃下TdT酶对底物添加聚(dG)尾的反应产物;泳道3-12为分别加入了控尾接头011/024-020/024(如表2所示),并在25℃下孵育的反应产物;泳道13为TdT酶在37℃下对底物添加聚(dG)尾的反应产物;泳道14-23为分别加入了控尾接头011/024-020/024(如表2所示),并在37℃下孵育的反应产物;泳道24为TdT酶在45℃下对底物添加聚(dG)尾的反应产物;泳道25-34为分别加入了控尾接头011/024-020/024(如表2所示),并在45℃下孵育的反应产物。
在控尾接头不存在时,TdT酶对底物添加的聚(dG)尾不可控(泳道2、13以及24);加入控尾接头和连接酶后,得到了底物的加尾序列与短连接子(024)进行连接反应后的连接产物(泳道3-12、14-23和25-34);在25℃、37℃以及45℃条件下反应时,加入聚(dC)尾长度不同的控尾接头(如表2所示)和连接酶均能得到底物加尾后与短连接子(024)连接的产物(如泳道3-12、14-23和25-34所示);其中,具有7b以上的聚(dC)尾的控尾接头的控尾效果更好。
结论:在具有聚(dC)尾的控尾接头存在下,TdT酶和连接酶能够分别完成对混合单
链DNA多核苷酸底物的聚(dG)受控加尾反应以及该尾部与连接子的连接反应;在25-45℃之间进行上述反应时,具有5-20b聚(dC)尾的控尾接头均能得到底物加尾后与连接子连接的产物。
实施例6
研究反应时间对底物添加聚(dG)尾后与控尾接头连接的影响
材料:
DNA多核苷酸底物001-005(表1)
控尾分子多核苷酸016(表1)
短连接子多核苷酸024(表1)
5x退火缓冲液、TdT酶、10x绿色缓冲液、大肠杆菌DNA连接酶、dGTP、β-烟酰胺腺嘌呤二核苷酸和2x RNA上样缓冲液(生产商和目录号同实施例5)
方法:
(1)接头制备:制备方法同实施例4,得到含10b聚(dC)尾的控尾接头(016/024,如表2所示)。
(2)底物制备和变性方法同实施例1。
(3)反应:配制含有0.5pmol变性后底物、26μMβ-烟酰胺腺嘌呤二核苷酸、5pmol控尾接头(016/024)、10个单位的TdT酶、5个单位的大肠杆菌DNA连接酶以及100μM dGTP的10μL反应液,在37℃下分别孵育1、5、10、15、20、30、45、60以及120分钟后,在70℃下孵育10分钟使TdT酶和大肠杆菌DNA连接酶失活。
(4)反应产物检测方法同实施例3。
实验结果如图10所示,在控尾接头不存在时,TdT酶使底物添加不定长度的聚(dG)尾(泳道2),当控尾接头(016/024)存在时,TdT酶对底物的加尾长度受到控制(泳道3)。同时加入TdT酶、控尾接头以及大肠杆菌DNA连接酶后反应1分钟,得到了具有固定长度聚(dG)尾的底物与短连接子(024)的连接产物(泳道4);继续反应时连接产物量维持不变(泳道5-12),表明底物的加尾和连接反应快速完成。
结果:在具有聚(dC)尾的控尾接头存在下,加尾并连接连接子的反应非常迅速,反应1分钟已经能检测到底物加聚(dG)尾后与连接子连接的产物。
实施例7
验证对亚硫酸氢盐处理后的DNA进行加尾并连接控尾接头后通过降解或链置换控尾分子两种方法进行二链合成对甲基化测序文库构建的影响
材料:
不可降解的控尾分子多核苷酸016(表1)
可降解的控尾分子多核苷酸025(表1)
长连接子多核苷酸026(表1)
延伸引物多核苷酸027(表1)
制备“5’测序接头”的多核苷酸对028和029(表1)
P5 PCR引物多核苷酸031(表1)
P7 PCR标签引物多核苷酸032(表1)
λ-DNA(Takara,目录号3010)
亚硫酸氢盐处理试剂盒(Zymo Research,目录号D5005)
10x等温扩增缓冲液II(New England Biolabs,目录号B9005S)
dNTP(Takara,目录号4030,每种2.5mM)
Bst 3.0DNA聚合酶(New England Biolabs,目录号M0374S,8U/μL)
RNA酶A(Takara,目录号2158,10mg/mL)
Beckman Ampure XP磁珠(Beckman,目录号A63882)
2x T4 DNA快速连接缓冲液(Enzymatics,目录号B1010)
T4 DNA快速连接酶(Enzymatics,目录号L6030-HC-L,600U/μL)
2x高保真热启动甲基化PCR混合液(KAPA Biosystems,目录号KK2802)
无酶水(索莱宝,目录号R1600-100)
5x退火缓冲液、10x绿色缓冲液、TdT酶、大肠杆菌DNA连接酶、dGTP和β-烟酰胺腺嘌呤二核苷酸(生产商和目录号同实施例5)
方法:
(1)接头制备:按照实施例4中接头制备方法制备可降解型和非降解型控尾接头(025/026和016/026,如表2所示)以及悬dT“5’测序接头”(028/029,如表4所示);其中,可降解型控尾接头包含RNA核苷酸,能够被RNA酶降解,而非降解型控尾接头不含有RNA核苷酸,不能被RNA酶降解。
(2)对λ-DNA进行亚硫酸氢盐处理、片段化以及变性处理(方法同实施例4)。
(3)制备两份如表7-1所示的加尾、连接反应混合液,其中控尾接头分别使用025/026和
016/026;将反应混合液在37℃温浴15分钟之后,95℃处理2分钟,然后保持在4℃;使用36μl Beckman Ampure XP磁珠回收完成二链合成的DNA样本,并用20μl无酶水洗脱。下一步骤中,上述两种控尾接头得到的反应产物分别通过降解控尾分子(025)后延伸和链置换控尾分子(016)延伸两种方式合成二链。
表7-1
(4)制备如表7-2所示的两种二链合成的反应混合液,在37℃下孵育5分钟之后,95℃孵育2分钟,然后65℃孵育2分钟,再加入1μl Bst 3.0DNA聚合酶,在65℃下孵育30分钟后保持在4℃;使用75μl Beckman Ampure XP磁珠回收完成二链合成的DNA,并用12μl无酶水洗脱。
表7-2
(5)制备如表4-3所示的已完成二链合成的DNA与悬dT“5’测序接头”连接的反应混合液,
在25℃温浴15分钟之后,在65℃下温浴10分钟使连接酶失活、然后保持在4℃;使用21μl Beckman Ampure XP磁珠回收连接“5’测序接头”的DNA,并用23μl无酶水洗脱。(6)如表4-4所示制备PCR扩增反应混合液,并按照PCR扩增程序运行(表4-5);使用75μl Beckman Ampure XP磁珠回收PCR产物,并用20μl无酶水洗脱,得到通过降解型延伸或链置换延伸完成二链合成的两种测序文库。
(7)文库片段分布检测及浓度定量方法同实施例4。
实验结果:将亚硫酸氢盐处理后的λ-DNA变性成单链状态,在TdT酶、大肠杆菌DNA连接酶以及控尾接头(025/026或016/026)同时存在下,使底物添加受控聚(dG)尾后与长连接子连接;通过降解或链置换控尾分子多核苷酸(025或016)并完成二链合成;再将完成二链合成的底物DNA与“5’测序接头”连接,经PCR扩增后得到具有完整结构的测序文库。片段分布结果检测如图11A和图11B所示,上述两种方式得到的文库片段分布均在150-1000bp范围内,无接头二聚体,文库纯度高。如下表所示,两种二链合成方法得到的测序文库的浓度相似,分别为588.9nM和707.4nM,均满足Illumina测序仪的测序要求(IlluminaNextseq测序仪的测序要求为:文库体积大于或等于1.3μl、文库浓度大于或等于1.8nM;Illumina xTen测序仪的测序要求为:文库体积大于或等于5μl、文库浓度大于或等于3nM)。
| 二链合成方式 | 降解型延伸 | 链置换延伸 |
| 文库浓度 | 588.9nM | 707.4nM |
结论:通过qPCR检测文库浓度,证明DNA片段已被本发明所述的方法所固定。降解或链置换控尾分子两种二链合成方式均能有效的完成二链合成,并对亚硫酸氢盐处理后的DNA构建甲基化测序文库。
实施例8
利用具有茎环结构的控尾接头对亚硫酸氢盐处理后的DNA进行甲基化测序文库构建
材料:
用于制备茎环控尾接头的多核苷酸030(表1)
延伸引物多核苷酸027(表1)
制备“5’测序接头”的多核苷酸对028和029(表1)
P5 PCR引物多核苷酸031(表1)
P7 PCR标签引物多核苷酸032(表1)
λ-DNA、亚硫酸氢盐处理试剂盒、5x退火缓冲液、10x绿色缓冲液、TdT酶、大肠杆菌DNA连接酶、dGTP、β-烟酰胺腺嘌呤二核苷酸、10x等温扩增缓冲液II、dNTP、Bst 3.0DNA聚合酶、RNA酶A、2x T4 DNA快速连接缓冲液、T4 DNA快速连接酶、2x高保真热启动甲基化PCR混合液、Beckman Ampure XP磁珠和无酶水(生产商和目录号同实施例7)
方法:
(1)接头制备:按照实施例4中接头制备方法制备茎环状控尾接头(030)以及悬dT“5’测序接头”(028/029)。
(2)甲基化测序文库制备:使用10ngλ-DNA按照实施例7所描述的方法构建甲基化测序文库,其中二链合成按照降解型方法进行。
(3)文库浓度检测方法同实施例4。
实验结果如图12所示,利用一条单链多核苷酸制备的含茎环结构的控尾接头(030)进行甲基化测序文库构建,得到的最终测序文库片段平均大小为396bp,无接头二聚体,文库纯度高;qPCR检测结果显示,文库浓度为15.2nM,表明DNA底物被测序接头固定,得到了有效的测序文库。
结论:利用一条单链多核苷酸制备的含茎环结构的控尾接头能够有效的进行文库构建。
实施例9
检验控尾区长度对甲基化测序文库构建的影响
材料:
具有不同长度聚(dC)的控尾分子多核苷酸011-020,
长连接子多核苷酸026(表1)
延伸引物多核苷酸027(表1)
制备“5’测序接头”的多核苷酸对028和029(表1)
P5 PCR引物多核苷酸031(表1)
P7 PCR标签引物多核苷酸032(表1)
λ-DNA、亚硫酸氢盐处理试剂盒、5x退火缓冲液、10x绿色缓冲液、TdT酶、大肠杆菌DNA连接酶、dGTP、β-烟酰胺腺嘌呤二核苷酸、10x等温扩增缓冲液II、dNTP、Bst 3.0DNA聚合酶、T4 DNA快速连接酶、2x高保真热启动甲基化PCR混合液、Beckman Ampure XP磁珠和无酶水(生产商和目录号同实施例7)
方法:
(1)接头制备:按照实施例4中接头制备方法制备非降解型控尾接头(011/026、012/026、013/026、014/026、015/026、016/026、017/026、018/026、019/026以及020/026,如表3所示)以及悬dT“5’测序接头”(031/032,如表4所示)。
(2)甲基化测序文库制备:使用6.75ngλ-DNA按照实施例7所描述的方法构建甲基化测序文库;其中加尾、连接反应分别使用聚(dC)尾长度为5-20b的十种控尾接头(011/026、012/026、013/026、014/026、015/026、016/026、017/026、018/026、019/026以及020/026),二链合成按照链置换延伸方法进行。
(3)文库浓度检测方法同实施例7。
实验结果如图13A-图13J所示,聚(dC)尾长度为5-20b的控尾组分对亚硫酸氢盐处理后DNA构建的甲基化测序文库的片段分布在200-1000bp范围,无接头二聚体,文库纯度高;如表9-1所示,使用具有5b聚(dC)尾的控尾接头构建的文库浓度最低,为6.2nM,具有20b聚(dC)尾的控尾接头构建的文库浓度最高,为31.7nM。
结论:具有5-20b聚(dC)尾的控尾接头均能有效的对亚硫酸氢盐处理后的DNA进行甲基化测序文库构建。
表9-1
| 悬聚(dC)尾长度(b) | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 20 |
| 文库浓度(nM) | 6.2 | 13.2 | 16.0 | 22.9 | 25.4 | 31.3 | 28.3 | 31.2 | 30.0 | 31.7 |
实施例10
比较本发明方法与传统方法在构建人类基因组甲基化测序文库中的差异
材料:
控尾分子多核苷酸016(表1)
长连接子多核苷酸026(表1)
延伸引物多核苷酸027(表1)
制备“5’测序接头”的多核苷酸对028和029(表1)
P5 PCR引物多核苷酸031(表1)
P7 PCR标签引物多核苷酸032(表1)
P7 PCR标签引物多核苷酸033(表1)
传统甲基化测序接头正向多核苷酸034(表1)
传统甲基化测序接头反向多核苷酸035(表1)
人类基因组DNA(Coriell,目录号NA12878)
无甲基λ-DNA(Takara,目录号3019)
10x末端修复缓冲液(New England Biolabs,目录号B6052S,50mM Tris-盐酸、10mM氯化镁、10mM二硫苏糖醇、1mM三磷酸腺苷、0.4mM dATP、0.4mM dCTP、0.4mM dGTP、0.4mM dTTP,pH 7.5)
T4 DNA聚合酶(Enzymatics,目录号P7080L,3U/μL)
T4多聚核苷酸激酶(Enzymatics,目录号Y9040L,10U/μL)
10x dA加尾缓冲液(New England Biolabs,目录号B6059S,10mM Tris-盐酸、10mM氯化镁、50mM氯化钠、1mM二硫苏糖醇、0.2mM dATP,pH 7.9)
Klenow大片段(Exo-)(Enzymatics,目录号P7010-LC-L,10U/μL)
亚硫酸氢盐处理试剂盒、5x退火缓冲液、10x绿色缓冲液、TdT酶、大肠杆菌DNA连接酶、dGTP、β-烟酰胺腺嘌呤二核苷酸、10x等温扩增缓冲液II、dNTP、Bst 3.0DNA聚合酶、T4 DNA快速连接缓冲液、T4 DNA快速连接酶、2x高保真热启动甲基化PCR混合液、Beckman Ampure XP磁珠和无酶水(生产商和目录号同实施例7)
方法:
(一)利用本发明方法构建人类基因组甲基化测序文库
(1-1)接头制备:按照实施例4中接头制备方法制备非降解型控尾接头(016/026,如表2所示)以及悬dT“5’测序接头”(031/032,如表4所示)。
(1-2)样本制备:将50pg无甲基化修饰的λ-DNA掺入10ng人类基因组DNA(NA12878)中,作为统计亚硫酸氢盐转化效率的参考品。
(1-3)甲基化测序文库制备:将已掺入λ-DNA的10ng人类基因组DNA按照实施例7所描述的方法构建甲基化测序文库;其中,二链合成前加入36μl Beckman Ampure XP磁珠回收完成加尾连接的DNA,并用20μl无酶水洗脱;二链合成按照链置换延伸方法进行。
(1-4)文库浓度检测方法同实施例4。
(1-5)使用Illumina-xTen测序仪对上一步文库进行150PE模式测序,利用软件Cutadapt(v1.12)去除接头序列;采用软件Bwa-Meth(v0.2.0)对甲基化测序序列进行基因组比对;利用软件包Sambamba(v0.5.4)标记重复序列;最后,利用软件包Samtools(v0.1.19)对测序深度进行统计。
(二)利用传统方法构建人类基因组甲基化测序文库
(2-1)接头制备,按照实施例4中接头制备方法制备“传统甲基化测序接头”(034/035,如
表6所示)。
(2-2)将50pg无甲基λ-DNA掺入10ng人类基因组DNA(NA12878)中,并使用聚焦超声仪(Covaris,目录号S220)将混合DNA片段化至300bp。
(2-3)如表10-1所示配制末端修复反应混合液,在20℃下反应30分钟,然后加入45μlBeckman Ampure XP磁珠回收修复后DNA,使用26μl无酶水洗脱。
表10-1
(2-4)如表10-2所示配制dA加尾反应混合液,在37℃下反应30分钟之后,加入45μl Beckman Ampure XP磁珠回收完成dA加尾的DNA,使用12μl无酶水洗脱。
表10-2
(2-5)如表10-3所示配制接头连接反应混合液,在25℃下反应15分钟,然后65℃孵育10分钟使连接酶失活,再加入21μl Beckman Ampure XP磁珠回收完成接头连接DNA,使用20μl无酶水洗脱。
表10-3
(2-6)使用亚硫酸氢盐处理试剂盒对上一步DNA进行亚硫酸氢盐处理,使用23μl无酶水洗脱。
(2-7)如表4-4所示,制备PCR扩增反应混合液,不同之处在于P7 PCR标签引物为编号033的多核苷酸;之后按照表4-5所示的PCR扩增程序运行,不同之处在于扩增循环数为18个;反应结束后,使用75μl Beckman Ampure XP磁珠回收PCR产物,并用20μl无酶水洗脱,得到最终测序文库。
(2-8)文库浓度检测方法同实施例4。
(2-9)测序和数据分析方法同1-5。
实验结果如下表10-4所示,基于TdT酶和控尾接头的加尾连接法对10ng人类基因组DNA构建的甲基化测序文库的浓度为151.8nM,而传统方法构建的文库浓度为99.6nM。测序数据量为19.4Gb时,单链加尾连接法构建的文库的冗余度为13.4%,能够覆盖89.6%的CpG区域,平均测序深度为4.8x;传统方法构建的文库测序数据量为22.0Gb时冗余度达到70.4%,只能覆盖13.8%的CpG区域,平均测序深度为0.3x;两种方法构建的甲基化测序文库的亚硫酸氢盐转化效率和比对率无明显区别。此外,片段分布结果显示,两种文库均未见接头二聚体,表明文库纯度较高(图14A和图14B)。
表10-4
| 建库方法 | 本发明方法 | 传统方法 |
| DNA类型 | 人类基因组DNA | 人类基因组DNA |
| DNA投入量 | 10ng | 10ng |
| 亚硫酸氢盐处理时间点 | 建库前 | 建库后 |
| 文库浓度 | 151.8nM | 99.6nM |
| 测序仪 | Illumina-xTen | Illumina-xTen |
| 测序模式 | 150PE | 150PE |
| 测序数据量 | 19.4Gb | 22.0Gb |
| 亚硫酸氢盐转化效率 | 99.4% | 99.6% |
| 比对率 | 99.3% | 99.9% |
| 重复率 | 13.4% | 70.4% |
| CpG区域覆盖率 | 89.6% | 13.8% |
| CpG区域平均测序深度 | 4.8x | 0.3x |
结论:使用基于TdT酶和控尾接头的加尾连接方法能够高效的对少量人类基因组DNA进行甲基化测序文库构建,建库效率和测序数据表现远远优于传统方法。
Claims (35)
- 一种对脱氧多核苷酸底物进行加尾的方法,所述方法包括如下步骤:(1)将所述脱氧多核苷酸底物与如下物质混合以形成第一混合物:a)dGTP或dCTP核苷酸;b)末端脱氧核苷酸转移酶;c)控尾组分,其包含5至20个核苷酸长度的多核苷酸同聚物,其中多核苷酸同聚物与a)核苷酸互补;(2)孵育所述的第一混合物,脱氧多核苷酸底物的3’端发生加尾反应,在底物3’端添加dGTP或dCTP多核苷酸,形成底物的3’加尾区。
- 如权利要求1所述的方法,其中的脱氧多核苷酸底物是单链脱氧多核苷酸底物。
- 如权利要求1或2所述的方法,其中步骤(2)的加尾反应,在添加dGTP或dCTP多核苷酸的3’端进一步连接控尾组分的连接子,形成底物的3’加尾区。
- 如权利要求1-3任一项所述的方法,其还包括:步骤(3):在步骤(2)的加尾反应之后,降解控尾组分的多核苷酸同聚物3’端以产生游离羟基,向所述的第一混合物中添加DNA聚合酶和脱氧核苷酸,以形成第二混合物;(4)孵育所述第二混合物,在与加尾后的底物互补的控尾组分3’端发生核苷酸延伸反应,合成单链脱氧多核苷酸底物的互补链,得到双链脱氧多核苷酸;(5)从所述的第二混合物中分离双链脱氧多核苷酸。
- 如权利要求1-3任一项所述的方法,其还包括:步骤(3-1):在步骤(2)的加尾反应之后,向所述的第一混合物添加与底物3’加尾区互补的延伸引物、DNA聚合酶和脱氧核苷酸,以形成第二混合物;(4)孵育所述第二混合物,在延伸引物的3’端发生核苷酸延伸反应,合成底物的互补链,得到双链脱氧多核苷酸;(5)从所述的第二混合物中分离双链脱氧多核苷酸。
- 如权利要求4所述的方法,其中步骤(3)中的降解采用RNase HII核糖核酸酶。
- 如权利要求5所述的方法,其包括:在步骤(3-1)中,先降解控尾组分中与加尾的核苷酸底物互补的控尾分子多核苷酸,再添加与底物3’加尾区互补的延伸引物、DNA聚合酶和脱氧核苷酸,以形成第二混合物。
- 权利要求4-7任一项所述的方法,其还包括:步骤(6),在分离的双链脱氧多核苷酸中,加入5’测序接头和连接酶,以形成第三混合物,孵育第三混合物,使双链脱氧多核苷酸连接上所述的5’测序接头。
- 权利要求7所述的方法,其中所述的步骤(3-1),控尾分子序列被RNA酶、USER酶或DNA特异性核酸酶降解。
- 权利要求4或5所述的方法,其中步骤(4)的延伸反应在延伸链的3’端添加dA粘性末端。
- 前述任一项权利要求所述的方法,其中步骤(1)中的控尾组分是i)多核苷酸同聚物;ii)多核苷酸同聚物和X区组成的控尾分子多核苷酸;或iii)由多核苷酸同聚物和链接的X区,以及与X区互补的连接子多核苷酸组成的部分双链核苷酸分子。
- 权利要求11所述的方法,其中所述的连接子包含5’端磷酸。
- 前述任一项权利要求所述的方法,其中步骤(1)中的控尾组分包含5至20个,优选5至13个,7至10个,更优选7至9个核苷酸长度的多核苷酸同聚物。
- 前述任一项权利要求所述的方法,其中的多核苷酸同聚物是多核苷酸dC同聚物。
- 权利要求14所述的方法,其中的多核苷酸同聚物是7至9个核苷酸长度的多核苷酸dC同聚物。
- 前述任一项权利要求所述的组分,其特征在于控尾组分的多核苷酸同聚物包含3’封闭基团。
- 根据权利要求16所述的组分,封闭基团选自以下组分的一种或几种:核糖核苷酸、碳三间臂、磷酸、双脱氧核苷酸、氨基基团和倒置脱氧胸腺嘧啶核苷。
- 权利要求11所述的方法,控尾组分中的连接子多核苷酸包含5’端磷酸和3’端封闭基团。
- 权利要求11所述的方法,其中作为部分双链核苷酸分子的控尾组分是单分子或多分子。
- 权利要求19所述的方法,其中所述的单分子是茎环结构的单分子。
- 权利要求19所述的方法,其中第一混合物中还包括连接酶。
- 权利要求8或21所述的方法,其中的连接酶是DNA连接酶或RNA连接酶。
- 前述任一项权利要求所述的方法,其中步骤(2)的加尾反应在20℃-50℃反应,优选25℃-45℃,更优选25℃-37℃。
- 前述任一项权利要求所述的方法,其中所述的控尾组分是肽核酸、锁核酸及其组合。
- 一种试剂盒,其包含:脱氧多核苷酸底物、dGTP或dCTP核苷酸、末端脱氧核苷酸转移酶,和控尾组分,其中所述的控尾组分包含5至20个核苷酸长度的多核苷酸同聚物,该多核苷酸同聚物与dGTP或dCTP核苷酸互补。
- 权利要求25所述的试剂盒,其中所述的脱氧多核苷酸底物是单链脱氧多核苷酸底物。
- 权利要求25或26所述的方法,其中的控尾组分是i)多核苷酸同聚物;ii)多核苷酸同聚物和X区组成的控尾分子多核苷酸;或iii)由多核苷酸同聚物和链接的X区,以及与X区互补的连接子多核苷酸组成的部分双链核苷酸分子。
- 权利要求25-27任一项所述的试剂盒,进一步包括DNA连接酶或RNA连接酶。
- 权利要求25-28任一项所述的试剂盒,进一步包括DNA聚合酶和脱氧核苷酸。
- 权利要求29所述的试剂盒,进一步包括延伸引物。
- 权利要求30所述的试剂盒,进一步包括RNA酶、USER酶和切刻内切酶中的至少一种。
- 权利要求29-31任一项所述的试剂盒,进一步包括5’测序接头。
- 权利要求25-32任一项所述的试剂盒,其中所述的控尾组分是部分双链核苷酸分子。
- 权利要求25-33任一项所述的试剂盒,其中所述的控尾组分包含5至13个,优选7至10个,更优选7至9个核苷酸长度的多核苷酸同聚物。
- 权利要求34所述的试剂盒,其中所述的多核苷酸同聚物是多核苷酸dC同聚物。
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| PCT/CN2017/109770 WO2019090482A1 (zh) | 2017-11-07 | 2017-11-07 | 一种第二代高通量测序文库构建方法 |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| PCT/CN2017/109770 WO2019090482A1 (zh) | 2017-11-07 | 2017-11-07 | 一种第二代高通量测序文库构建方法 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2019090482A1 true WO2019090482A1 (zh) | 2019-05-16 |
Family
ID=66437471
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/CN2017/109770 Ceased WO2019090482A1 (zh) | 2017-11-07 | 2017-11-07 | 一种第二代高通量测序文库构建方法 |
Country Status (1)
| Country | Link |
|---|---|
| WO (1) | WO2019090482A1 (zh) |
Cited By (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN113249441A (zh) * | 2021-07-06 | 2021-08-13 | 广州赛哲生物科技股份有限公司 | 用于血流感染病原微生物检测的参考品及其制备方法 |
Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US6114154A (en) * | 1997-09-29 | 2000-09-05 | Li; Huiwu | Method of constructing full-length target cDNA molecules |
| CN102732629A (zh) * | 2012-08-01 | 2012-10-17 | 复旦大学 | 利用高通量测序同时测定基因表达量和多聚腺苷酸加尾的方法 |
| CN104395480A (zh) * | 2012-03-13 | 2015-03-04 | 斯威夫特生物科学公司 | 用于通过核酸聚合酶对衬底多核苷酸进行大小受控的同聚物加尾的方法和组合物 |
| CN106661612A (zh) * | 2014-01-27 | 2017-05-10 | 通用医疗公司 | 制备用于测序的核酸的方法 |
| CN106636074A (zh) * | 2017-02-23 | 2017-05-10 | 厦门大学 | 针对3’端带有a 重复序列的情况下获得完整3’末端序列的3’ race 方法 |
-
2017
- 2017-11-07 WO PCT/CN2017/109770 patent/WO2019090482A1/zh not_active Ceased
Patent Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US6114154A (en) * | 1997-09-29 | 2000-09-05 | Li; Huiwu | Method of constructing full-length target cDNA molecules |
| CN104395480A (zh) * | 2012-03-13 | 2015-03-04 | 斯威夫特生物科学公司 | 用于通过核酸聚合酶对衬底多核苷酸进行大小受控的同聚物加尾的方法和组合物 |
| CN102732629A (zh) * | 2012-08-01 | 2012-10-17 | 复旦大学 | 利用高通量测序同时测定基因表达量和多聚腺苷酸加尾的方法 |
| CN106661612A (zh) * | 2014-01-27 | 2017-05-10 | 通用医疗公司 | 制备用于测序的核酸的方法 |
| CN106636074A (zh) * | 2017-02-23 | 2017-05-10 | 厦门大学 | 针对3’端带有a 重复序列的情况下获得完整3’末端序列的3’ race 方法 |
Non-Patent Citations (2)
| Title |
|---|
| CHAN, K. C. A. ET AL.: "Cancer Genome Scanning in Plasma: Detection of Tumor-Associated Copy Number Aberrations, Single-Nucleotide Variants, and Tumoral Heterogeneity by Massively Parallel Sequencing", CLINICAL CHEMISTRY, vol. 59, 31 December 2013 (2013-12-31), pages 211 - 224, XP055181307, DOI: doi:10.1373/clinchem.2012.196014 * |
| SUN, K. ET AL.: "Plasma DNA Tissue Mapping by Genome-wide Methylation Sequencing for Noninvasive Prenatal, Cancer, and Transplantation Assessments", PNAS, 21 September 2015 (2015-09-21), pages E5503 - E5512, XP055374200 * |
Cited By (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN113249441A (zh) * | 2021-07-06 | 2021-08-13 | 广州赛哲生物科技股份有限公司 | 用于血流感染病原微生物检测的参考品及其制备方法 |
| CN113249441B (zh) * | 2021-07-06 | 2021-12-14 | 湖南赛哲智造科技有限公司 | 用于血流感染病原微生物检测的参考品及其制备方法 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US11697843B2 (en) | Methods for creating directional bisulfite-converted nucleic acid libraries for next generation sequencing | |
| CN108060191B (zh) | 一种双链核酸片段加接头的方法、文库构建方法和试剂盒 | |
| KR102643955B1 (ko) | 근접 보존 전위 | |
| EP2585593B1 (en) | Methods for polynucleotide library production, immortalization and region of interest extraction | |
| JP7026248B2 (ja) | 二本鎖dnaを増幅するための方法およびキット | |
| CN106103743A (zh) | 用于产生双链dna文库的方法和用于鉴定甲基化胞嘧啶的测序方法 | |
| ES3025432T3 (en) | Controlled strand-displacement for paired-end sequencing | |
| JP2020501554A (ja) | 短いdna断片を連結することによる一分子シーケンスのスループットを増加する方法 | |
| JP2023513606A (ja) | 核酸を評価するための方法および材料 | |
| CN111989406B (zh) | 一种测序文库的构建方法 | |
| US12188012B2 (en) | Compositions and methods for making controls for sequence-based genetic testing | |
| US20230374574A1 (en) | Compositions and methods for highly sensitive detection of target sequences in multiplex reactions | |
| WO2019090482A1 (zh) | 一种第二代高通量测序文库构建方法 | |
| ES2985378T3 (es) | Métodos y composiciones para preparar bibliotecas de ácidos nucleicos | |
| JP2022546485A (ja) | 腫瘍高精度アッセイのための組成物および方法 | |
| WO2018009677A1 (en) | Fast target enrichment by multiplexed relay pcr with modified bubble primers | |
| HK40109510A (zh) | 用於制备核酸文库的方法和组合物 | |
| CN119932155A (zh) | 用于靶向基因组富集的方法和试剂盒 | |
| HK40087449A (zh) | 用於制备核酸文库的方法和组合物 | |
| HK40087449B (zh) | 用於制备核酸文库的方法和组合物 | |
| JP2025517399A (ja) | 腫瘍アッセイのための組成物及び方法 | |
| HK1248767A1 (zh) | 一种双链核酸片段加接头的方法、文库构建方法和试剂盒 |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 17931264 Country of ref document: EP Kind code of ref document: A1 |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| 32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 18/01/2021) |
|
| 122 | Ep: pct application non-entry in european phase |
Ref document number: 17931264 Country of ref document: EP Kind code of ref document: A1 |