WO2014086037A1 - Procédé de construction d'une banque de séquençage d'acides nucléiques et sa mise en oeuvre - Google Patents
Procédé de construction d'une banque de séquençage d'acides nucléiques et sa mise en oeuvre Download PDFInfo
- Publication number
- WO2014086037A1 WO2014086037A1 PCT/CN2012/086164 CN2012086164W WO2014086037A1 WO 2014086037 A1 WO2014086037 A1 WO 2014086037A1 CN 2012086164 W CN2012086164 W CN 2012086164W WO 2014086037 A1 WO2014086037 A1 WO 2014086037A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- sequencing
- nucleic acid
- library
- primer
- tag
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Classifications
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N15/00—Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
- C12N15/09—Recombinant DNA-technology
- C12N15/10—Processes for the isolation, preparation or purification of DNA or RNA
- C12N15/1034—Isolating an individual clone by screening libraries
- C12N15/1093—General methods of preparing gene libraries, not provided for in other subgroups
Definitions
- the present invention relates to the field of biotechnology, and in particular, to a method of constructing a nucleic acid sequencing library and its use, and more particularly to a method of constructing a nucleic acid sequencing library, a nucleic acid sequencing library, a nucleic acid sequencing method, and a method of determining nucleic acid sequence information.
- Background technique
- the second-generation sequencing technology represented by Illumina solexa, AB Solid, and oche 454, has greatly reduced the cost of sequencing and has grown rapidly in recent years and has become an important tool for genomics research. Compared to the Sanger sequencing technology of the chain termination method, the second generation sequencing technology adopts a technology strategy of sequencing while synthesizing.
- the second generation of sequencing technology is characterized by high throughput, which can simultaneously sequence hundreds of millions of DNA fragments. Currently, a high-throughput sequencer can generate up to 200 Gb of data at a time, equivalent to one person's The whole genome was sequenced 65 times.
- this high-throughput sequencing technique breaks the genome into a series of small fragments by ultrasound or other methods, and adds a linker to both sides of the small fragment, and then forms a bridge PCR or emulsion PCR amplification by a linker primer.
- the present invention aims to solve at least one of the technical problems existing in the prior art.
- the invention proposes an isolated nucleic acid tag consisting of an oligonucleotide having the sequence set forth in SEQ ID NO: 1-6.
- the present invention provides six nucleic acid tags, namely: ACTCTTAC (SEQ ID NO: 1), GATGGACT (SEQ ID NO: 2), TATGGTAG (SEQ ID NO: 3), CCATATCC ( SEQ ID NO: 4), CTAGCGCT (SEQ ID NO: 5) and ATATAGA (SEQ ID NO: 6)
- ACTCTTAC SEQ ID NO: 1
- GATGGACT SEQ ID NO: 2
- TATGGTAG SEQ ID NO: 3
- CCATATCC SEQ ID NO: 4
- CTAGCGCT SEQ ID NO: 5
- ATATAGA SEQ ID NO: 5
- a nucleic acid tag can be obtained by ligating a nucleic acid tag with a DNA fragment of a nucleic acid sample or an equivalent thereof to obtain a nucleic acid sequencing library having a tag, and by sequencing the nucleic acid sequencing library, a nucleic acid sample can be obtained.
- the sequence and the sequence of the tag, and thus the sequence based on the tag, can accurately characterize the sample source of the nucleic acid sample.
- a sequencing library of a plurality of nucleic acid samples can be simultaneously constructed, from The nucleic acid sequence information of the plurality of samples can be obtained by mixing the nucleic acid sample sequencing libraries derived from different samples and performing sequencing simultaneously, and classifying the nucleic acid sequences of the samples based on the nucleic acid tags.
- This allows for the full use of high-throughput sequencing technologies, such as the use of Solexa sequencing technology to simultaneously sequence nucleic acid sequences from multiple samples, thereby increasing the efficiency and throughput of high-throughput sequencing techniques and reducing the sequence information for nucleic acid samples. the cost of.
- the invention proposes a set of isolated PCR primers consisting of oligonucleotides having the sequences set forth in SEQ ID NOs: 7-12.
- the PCR primer as the 5' primer, the PCR primer described above can be introduced at the 5' end of the nucleic acid sequencing library by PCR reaction, thereby obtaining the nucleic acid tag by linking the DNA fragment with the DNA fragment of the nucleic acid sample or its equivalent.
- the nucleic acid sequencing library of the tag by sequencing the nucleic acid sequencing library, can obtain the sequence of the nucleic acid sample and the sequence of the tag, and then the sample source of the nucleic acid sample can be accurately characterized based on the sequence of the tag.
- a sequencing library of a plurality of nucleic acid samples can be simultaneously constructed, and the nucleic acid sequences of the samples can be classified based on nucleic acid tags by mixing and sequencing the nucleic acid sample sequencing libraries derived from different samples.
- PCR primers are sometimes referred to herein as "PCR primers” or “tag primers” or "PCR tag primers”.
- the invention proposes a set of isolated PCR primers consisting of oligonucleotides having the sequences set forth in SEQ ID NOs: 13-18.
- the label primer described above can be introduced at the 3' end of the nucleic acid sequencing library by a PCR reaction, thereby obtaining the nucleic acid tag by linking the DNA fragment to the DNA fragment of the nucleic acid sample or its equivalent.
- the nucleic acid sequencing library of the tag by sequencing the nucleic acid sequencing library, can obtain the sequence of the nucleic acid sample and the sequence of the tag, and then the sample source of the nucleic acid sample can be accurately characterized based on the sequence of the tag.
- a sequencing library of a plurality of nucleic acid samples can be simultaneously constructed, and the nucleic acid sequences of the samples can be classified based on nucleic acid tags by mixing and sequencing the nucleic acid sample sequencing libraries derived from different samples.
- the invention proposes a method of constructing a nucleic acid sequencing library.
- the method comprises the following: fragmenting a nucleic acid sample to obtain a DNA fragment; end-repairing the DNA fragment to obtain a DNA fragment that has undergone end repair; adding at the end of the end-repaired DNA fragment Base A, in order to obtain a DNA fragment having a sticky end A; the two ends of the DNA fragment having the sticky end A are respectively linked to the first linker and the second linker to obtain a ligation product having a linker; 5, primer and 3 a primer for amplifying the ligation product having the linker to obtain an amplification product; and isolating the amplification product, the amplification product construct A nucleic acid sequencing library, wherein at least one of the first linker, the second linker, the 5' primer, and the 3' primer comprises a nucleic acid tag such that the amplification product contains at least one nucleic acid tag.
- the nucleic acid tag is at least one selected from the group consisting of the oligonucleotides having the sequences shown in SEQ ID NOs: 1-6. According to an embodiment of the present invention, it is preferred to simultaneously introduce the tags shown in the aforementioned SEQ ID NOS: 1 to 6 into the 5' primer and the 3' primer.
- the 5' primer is at least one selected from the group consisting of the oligonucleotides having the sequences shown in SEQ ID NOS: 7 to 12
- the 3' primer is selected from the group consisting of SEQ ID NO: 13- At least one of the oligonucleotides of the sequence shown in 18.
- the nucleic acid sequencing library for nucleic acid sequencing can be efficiently constructed by using the method, and the label primers described above can be efficiently introduced into at least one of the 3' end and the 5' end of the nucleic acid sequencing library by a PCR reaction, thereby
- the nucleic acid tag is linked to the DNA fragment of the nucleic acid sample or its equivalent to obtain a tagged nucleic acid sequencing library.
- a sequencing library of a plurality of nucleic acid samples can be simultaneously constructed, and the nucleic acid sequences of the samples can be classified based on nucleic acid tags by mixing and sequencing the nucleic acid sample sequencing libraries derived from different samples.
- the inventors have surprisingly found that when the nucleic acid sequencing library containing various nucleic acid tags is constructed using oligonucleotides having different tags for the same sample based on the above method, the stability of the obtained sequencing data results and The repeatability is very good, so that multiple samples can be sequenced in the same reaction system.
- the method of constructing a sequencing library may further have the following additional technical features:
- the nucleic acid sample is a genomic DNA sample.
- the nucleic acid sample is a human genomic DNA sample.
- the DNA fragment is between 100 and 800 bp in length.
- the fragmentation is carried out by at least one of atomization, ultrasonic fragmentation, HydroShear, and enzymatic digestion.
- the end fragmenting of the DNA fragment is carried out by Klenow, T4 polymerase and T4 polynucleotide kinase.
- the addition of base A at the end of the end-repaired DNA fragment is carried out using Klenow Frgment (3'-5'exo-) polymerase.
- the amplification uses PCR primers using the oligonucleotides shown in SEQ ID NOS: 7 to 12 as 5' primers, and the oligonucleotides shown in SEQ ID NOS: 13 to 18 are used. As a 3' primer.
- the isolated amplification product is electrophoresed by using a 2% agarose gel. Purified by purification.
- the invention proposes a nucleic acid sequencing library constructed according to the method of constructing a sequencing library as described above.
- the inventors found that the constructed genome sequencing library is suitable for second generation sequencing technology, especially solexa sequencing technology. Therefore, the nucleic acid sequence information of the plurality of samples can be obtained by mixing the nucleic acid sample sequencing libraries derived from different samples and simultaneously performing sequencing, and classifying the nucleic acid sequences of the samples based on the nucleic acid tags.
- This allows for the full use of high-throughput sequencing technologies, such as the use of Solexa sequencing technology to simultaneously sequence nucleic acid sequences from multiple samples, thereby increasing the efficiency and throughput of high-throughput sequencing techniques and reducing the sequence information for nucleic acid samples. the cost of.
- the invention proposes a nucleic acid sequencing method.
- the method comprises the steps of: constructing a sequencing library according to the method of constructing a sequencing library as described above for a nucleic acid sample; and performing sequencing on the sequencing library.
- the inventors found that the constructed genome sequencing library is suitable for second generation sequencing technology, especially solexa sequencing technology.
- the nucleic acid sample sequencing libraries derived from different samples can be mixed by the tag library technology, and simultaneously sequenced, and the nucleic acid sequences of the samples are classified based on the nucleic acid tags to obtain nucleic acid sequence information of various samples.
- sequencing is performed by using a second-generation sequencing platform, and a nucleic acid sequencing library having a tag at both the 3' end and the 5' end is constructed, sometimes in this paper.
- a nucleic acid sequencing library having a tag at both the 3' end and the 5' end is constructed, sometimes in this paper.
- dual-label library is a single-label library that uses a linker ligation or PCR process to direct the tag to the 3' end of the library.
- N libraries can be used for mixed sequencing using N tags, and using these N tags, N x N libraries can be mixed and sequenced using the dual-label library of the present invention.
- the specific steps for sequencing a dual-label library of the invention are different from common single-label libraries.
- the steps for sequencing a common single-label library are generally as follows: First, the tag is introduced to the 3' end of the library by a linker ligation or PCR process; then, the single-tag library is subjected to SBS sequencing, for example, in Illumina Solexa/Hiseq sequencing. The platform was sequenced by sequencing primers, and the synthesis/reading direction was 5'end ⁇ 3' end, and the synthesized sequence was sequentially the 5' end sequence of the insert ⁇ tag sequence ⁇ 3' end sequence.
- sequencing the sequencing library of the present invention (ie, a dual-label library) further comprises: sequencing from the 5' end of the sequencing library to sequentially obtain the first sequencing data, the 3' end tag sequence of the 5' end And a 5' end tag sequence and a 2' end of the second sequencing data.
- sequencing the sequencing library (ie, the dual-label library) of the present invention further comprises: sequencing using the first sequencing primer to obtain the 5' end First sequencing data; sequencing using a second sequencing primer to obtain a 3' end tag sequence; obtaining a 5' end tag sequence using a library coding strand synthesis; sequencing using a third sequencing primer to obtain a 3' end second sequencing Data, wherein the first sequencing primer binds to the 3' end of the library template strand, the second sequencing primer binds to the 5' end of the library template strand, and the third sequencing primer binds to the 3' end of the library coding strand.
- sequencing the sequencing library (ie, the dual-label library) of the present invention further comprises: firstly, using the first sequencing primer to bind to the 3' end of the library template strand for sequencing, and synthesizing the first sequencing data Readl
- the 5' end sequence information of the library coding strand is then sequenced by using the second sequencing primer binding to the 5' end of the library template strand to obtain the sequence information of the first tag sequencing data, ie, the 3' end tag of the library, and then deblocked, by sequencing An oligonucleotide chain on the chip that matches the 3' end of the library template strand, so that when the library coding strand is synthesized, the second tag sequencing data, that is, the sequence information of the 5' end tag of the library is read, and finally the library code is synthesized.
- the third sequencing primer is used to bind to the 3' end of the library coding strand for sequencing, and the second sequencing data ead2, that is, the 3' end sequence
- the invention proposes a method of determining nucleic acid sequence information.
- the method comprises the steps of: sequencing a nucleic acid sample according to the method described above to obtain a sequencing result; and determining sequence information of the nucleic acid sample based on the sequencing result.
- the nucleic acid sequence information of a plurality of samples can be efficiently determined.
- the invention proposes a kit for constructing a nucleic acid sequencing library.
- the kit comprises: a first PCR primer, wherein the first PCR primer is an oligonucleotide represented by SEQ ID NOs: 7-12 or an oligonucleotide represented by SEQ ID NO: 19; And a second PCR primer, wherein the second PCR primer is an oligonucleotide represented by SEQ ID NOS: 13-18.
- the aforementioned label primer can be introduced by at least one of the 3' end and the 5' end of the nucleic acid sequencing library by PCR reaction, thereby equating the nucleic acid tag with the DNA fragment of the nucleic acid sample or the like.
- the nucleic acids are sequenced to obtain a tagged nucleic acid sequencing library.
- sequencing the nucleic acid sequencing library the sequence of the nucleic acid sample and the sequence of the tag can be obtained, and the sample source of the nucleic acid sample can be accurately characterized based on the sequence of the tag.
- a sequencing library of a plurality of nucleic acid samples can be simultaneously constructed, and the nucleic acid sequences of the samples can be classified based on nucleic acid tags by mixing and sequencing the nucleic acid sample sequencing libraries derived from different samples.
- FIG. 1 shows a schematic flow diagram of a method of constructing a sequencing library according to an embodiment of the present invention
- FIG. 2 shows a library using a 5'-end or P5-end tag primer sequence of the present invention and using a common label according to an embodiment of the present invention.
- Figure 3 is a graph showing the change in the light intensity of a library using the 5'-end or P5-end tag primer sequences of the present invention and the number of cycles using a common tag, according to one embodiment of the present invention
- Figure 4 is a graph showing the variation of the base distribution of a library using the 5'-end or P5-end tag primer sequences of the present invention and the number of cycles using a common tag according to one embodiment of the present invention
- Figure 5 is a graph showing the variation of the library error rate with the number of cycles using a library of the 5'-end or P5-end tag primer sequences of the present invention and using a common tag, according to one embodiment of the present invention.
- first and second are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the indicated technology.
- features defining “first,”, “second,” may include one or more of the features, either explicitly or implicitly.
- the meaning of “plurality” is two or more unless otherwise stated.
- the invention proposes an isolated nucleic acid tag consisting of an oligonucleotide having the sequence set forth in SEQ ID NO: 1-6.
- the present invention provides six nucleic acid tags listed in Table 1, namely: ACTCTTAC (SEQ ID NO: 1), GATGGACT (SEQ ID NO: 2), TATGGTAG (SEQ ID NO: 3), CCATATCC (SEQ ID NO: 4), CTAGCGCT (SEQ ID NO: 5) and ATATAGA (SEQ ID NO: 6).
- a nucleic acid tag can be obtained by ligating a nucleic acid tag with a DNA fragment of a nucleic acid sample or an equivalent thereof to obtain a nucleic acid sequencing library having a tag, and by sequencing the nucleic acid sequencing library, a nucleic acid sample can be obtained.
- the sequence and the sequence of the tag, and thus the sequence based on the tag, can accurately characterize the sample source of the nucleic acid sample.
- a sequencing library of a plurality of nucleic acid samples can be simultaneously constructed, and the nucleic acid sequences of the samples can be classified based on nucleic acid tags by mixing and sequencing the nucleic acid sample sequencing libraries derived from different samples.
- the GT content of each base site on the mixed label must be considered. Since the excitation fluorescence of the bases G and T is the same in the Solexa/HiSeq sequencing process, the excitation lights of the bases A and C are the same, so the balance of the base "GT” content and the base “AC” content must be considered. Finally, consider the accuracy and repeatability of the data output. In the process of designing the label, the present invention fully considers the above factors, and avoids the occurrence of 3 or more consecutive identical bases between the label sequences. The appearance of this can reduce the error rate of the sequence during the synthesis process or during the sequencing process.
- tags of the present invention are 8 bp in length, and the difference between them is more than 5 bases, and any one of the three 8 bp bases has a sequencing error or Synthetic errors do not affect the final identification of the label.
- the present invention proposes a PCR primer carrying the above-mentioned tag, which can efficiently introduce the above tag into the 5' end or the 3' end of the sequencing library by PCR reaction.
- the invention proposes a set of isolated PCR primers (also referred to as tag primers) consisting of oligonucleotides having the sequences set forth in SEQ ID NOS: 7-12.
- the label primer described above can be introduced at the 5' end of the nucleic acid sequencing library by a PCR reaction, thereby obtaining a DNA fragment by ligating the nucleic acid tag with the DNA fragment of the nucleic acid sample or its equivalent.
- the nucleic acid sequencing library of the tag by sequencing the nucleic acid sequencing library, can obtain the sequence of the nucleic acid sample and the sequence of the tag, and then the sample source of the nucleic acid sample can be accurately characterized based on the sequence of the tag.
- a sequencing library of a plurality of nucleic acid samples can be simultaneously constructed, and the nucleic acid sequences of the samples can be classified based on nucleic acid tags by mixing and sequencing the nucleic acid sample sequencing libraries derived from different samples. Obtain nucleic acid sequence information for a variety of samples.
- the invention proposes a set of isolated PCR primers consisting of oligonucleotides having the sequences set forth in SEQ ID NOs: 13-18.
- the label primer described above can be introduced at the 3' end of the nucleic acid sequencing library by a PCR reaction, thereby obtaining the nucleic acid tag by linking the DNA fragment to the DNA fragment of the nucleic acid sample or its equivalent.
- the nucleic acid sequencing library of the tag by sequencing the nucleic acid sequencing library, can obtain the sequence of the nucleic acid sample and the sequence of the tag, and then the sample source of the nucleic acid sample can be accurately characterized based on the sequence of the tag.
- a sequencing library of a plurality of nucleic acid samples can be simultaneously constructed, and the nucleic acid sequences of the samples can be classified based on nucleic acid tags by mixing and sequencing the nucleic acid sample sequencing libraries derived from different samples.
- the efficiency and throughput of the technology reduces the cost of determining the sequence information of the nucleic acid sample.
- TCTTTCCCTACACGACGCTCTTCCGATCT SEQ ID NO: 7
- TTCAGACGTGTGCTCTTCCGATCT (SEQ ID NO: 13) tag sequence (index2-2) TATGGTAG (SEQ ID NO: 2)
- CTTTCCCTACACGACGCTCTTCCGATCT SEQ ID NO: 8
- GTTCAGACGTGTGCTCTTCCGATCT (SEQ ID NO: 15) tag sequence (index2-4) CCATATCC (SEQ ID NO: 4)
- CTTTCCCTACACGACGCTCTTCCGATCT SEQ ID NO: 10
- TTCAGACGTGTGCTCTTCCGATCT SEQ ID NO: 16
- tag sequence index2-5
- CTAGCGCT SEQ ID NO: 5
- CTTTCCCTACACGACGCTCTTCCGATCT SEQ ID NO: 11
- GTTCAGACGTGTGCTCTTCCGATCT (SEQ ID NO: 17) tag sequence (index2-6) ATATAAGA (SEQ ID NO: 6)
- the 5' end tag primer sequence is also referred to herein as the P5 end tag primer sequence
- the 3' end tag primer sequence is also referred to herein as the P7 end tag primer sequence.
- the invention proposes a kit for constructing a nucleic acid sequencing library.
- the kit may comprise: a first PCR primer, wherein the first PCR primer is an oligonucleotide represented by SEQ ID NO: 7-12 or an oligonucleoside represented by SEQ ID NO: Acid (5'- AATGATACGGCGACCA
- the second PCR primer is the oligonucleotide shown in SEQ ID NO: 13-18.
- nucleic acid sequencing library having a tag by linking the nucleic acid tag to a DNA fragment of the nucleic acid sample or an equivalent thereof, and sequencing the nucleic acid sequencing library
- the sequence of the nucleic acid sample and the sequence of the tag, and further based on the sequence of the tag can accurately characterize the sample source of the nucleic acid sample.
- a sequencing library of a plurality of nucleic acid samples can be simultaneously constructed, thereby being able to The nucleic acid sample sequencing library of the sample is mixed and simultaneously sequenced, and the nucleic acid sequence of the sample is classified based on the nucleic acid tag.
- nucleic acid sequence information from a variety of samples, thereby making full use of high-throughput sequencing technologies, such as Solexa sequencing technology, while sequencing nucleic acid sequences of multiple samples, thereby improving the efficiency and throughput of high-throughput sequencing technologies.
- the cost of determining the sequence information of the nucleic acid sample is reduced.
- the method of constructing a nucleic acid sequencing library of the present invention may comprise the following steps: S100: Fragmentation
- the nucleic acid sample is fragmented to obtain a DNA fragment.
- the type of nucleic acid sample that can be processed is not limited in any way, and can be used for any common biological sample, such as plants, such as Arabidopsis thaliana, rice; animals such as humans, mice; microorganisms, for example E. coli and the like.
- the nucleic acid sample is a genomic DNA sample.
- the nucleic acid sample is a human genomic DNA sample.
- the DNA fragment is between 100 and 800 bp in length, preferably at 500 bp. This can further improve the construction of sequencing libraries and subsequent sequencing s efficiency.
- genomic DNA may be disrupted by any known method.
- the fragmentation is at least by atomization, ultrasonic fragmentation, HydroShear, and enzymatic digestion.
- One is carried out.
- it is preferred to interrupt the group DNA by ultrasonication.
- the inventors have found that the genomic DNA is disrupted by ultrasonic disruption, and the resulting fragment length is easily controlled and does not affect subsequent sequencing operations.
- the DNA fragment is subjected to end repair to obtain a DNA fragment which has been repaired at the end.
- Those skilled in the art can end-end the DNA fragments by any known method, and there are many commercial kits to choose from in the art.
- the end fragmenting of the DNA fragment is by Klenow, T4 polymerase and T4 polynucleotide kinase.
- a base A is added to the end of the end-repaired DNA fragment to obtain a DNA fragment having a sticky end A.
- the end-repaired random fragment has two oligonucleotide strands, wherein base A is added at the 3' end of the two oligonucleotide strands.
- base A can be added to both the 3' ends of both oligonucleotide strands.
- the addition of A at the end of the end-repaired DNA fragment is carried out using Klenow Frgment (3'-5'exo-) polymerase.
- Both ends of the DNA fragment having the sticky end A are connected to the first linker and the second linker, respectively, to obtain a ligation product having a linker.
- joints used herein those skilled in the art can select the procedures for adding the joints according to the sequencing platform used, and can also refer to the instructions provided by the manufacturer.
- the ligation product having the linker is amplified using a 5' primer and a 3' primer to obtain an amplification product.
- at least one of the first linker, the second linker, the 5' primer and the 3' primer comprises a nucleic acid tag such that the amplification product contains at least one nucleic acid tag.
- the nucleic acid tag is at least one selected from the group consisting of the sequences having the sequences set forth in SEQ ID NOs: 1-6. According to an embodiment of the present invention, it is preferred to simultaneously introduce the aforementioned label of SEQ ID NOS: 1-6 into the 5' primer and the 3' primer.
- the 5' primer is at least one selected from the group consisting of the sequences having the sequences shown in SEQ ID NOS: 7 to 12, and the 3' primer is selected from the group consisting of SEQ ID NO: 13 ⁇ At least one of the oligonucleotides of the sequence shown in 18.
- the oligonucleotide shown in SEQ ID NOS: 7 to 12 or the oligonucleotide shown in SEQ ID NO: 19 is used as a 5' primer, as shown by SEQ ID NOS: 13-18.
- the PCR primer of the oligonucleotide serves as a 3' primer which carries the nucleic acid tag described above.
- amplification is performed using SEQ ID NO:
- the oligonucleotide shown in 7-12 was used as a 5' primer
- a PCR primer using the oligonucleotide shown in SEQ ID NOS: 13-18 was used as a primer. It should be noted that these label primers were obtained by the inventors through a large number of screening work and were significantly superior to other primer combinations.
- the amplification product is isolated, and the amplification product constitutes a nucleic acid sequencing library.
- the method for separating and recovering the amplified product is also not particularly limited, and those skilled in the art can select an appropriate method and apparatus for separation according to the characteristics of the amplified product, for example, by electrophoresis and recycling of a specific length. The method of the target fragment is recycled.
- the isolated amplification product is carried out by electrophoresis using a 2% agarose gel and purification.
- the nucleic acid sequencing library for nucleic acid sequencing can be efficiently constructed by using the method, and the label primers described above can be efficiently introduced into at least one of the 3' end and the 5' end of the nucleic acid sequencing library by a PCR reaction, thereby
- the nucleic acid tag is linked to the DNA fragment of the nucleic acid sample or its equivalent to obtain a tagged nucleic acid sequencing library.
- a sequencing library of a plurality of nucleic acid samples can be simultaneously constructed, and the nucleic acid sequences of the samples can be classified based on nucleic acid tags by mixing and sequencing the nucleic acid sample sequencing libraries derived from different samples.
- the inventors have surprisingly found that when the nucleic acid sequencing library containing various nucleic acid tags is constructed using oligonucleotides having different tags for the same sample based on the above method, the stability of the obtained sequencing data results and The repeatability is very good, so that multiple samples can be sequenced in the same reaction system.
- the invention proposes a nucleic acid sequencing library constructed according to the method of constructing a sequencing library as described above.
- the inventors found that the constructed genome sequencing library is suitable for second generation sequencing technology, especially solexa sequencing technology. Therefore, the nucleic acid sequence information of the plurality of samples can be obtained by mixing the nucleic acid sample sequencing libraries derived from different samples and simultaneously performing sequencing, and classifying the nucleic acid sequences of the samples based on the nucleic acid tags.
- This allows for the full use of high-throughput sequencing technologies, such as the use of Solexa sequencing technology to simultaneously sequence nucleic acid sequences from multiple samples, thereby increasing the efficiency and throughput of high-throughput sequencing techniques and reducing the sequence information for nucleic acid samples. the cost of.
- the invention provides a nucleic acid sequencing method.
- the method may comprise the steps of: constructing a sequencing library for the nucleic acid sample according to the method of constructing the sequencing library as described above; and sequencing the sequencing library.
- the inventors found that the constructed genome sequencing library is suitable for second generation sequencing technology, especially solexa sequencing technology.
- the constructed genome sequencing library is suitable for second generation sequencing technology, especially solexa sequencing technology.
- the mixture is simultaneously sequenced, and the nucleic acid sequence of the sample is classified based on the nucleic acid tag to obtain nucleic acid sequence information of a plurality of samples.
- sequencing is performed using a second generation sequencing platform.
- the nucleic acid sequencing library having the tagged 3' and 5' ends of the present invention can be sequenced, and the end of the 3' tag can be first sequenced by using the sequencing primer to obtain the first end.
- Sequencing the data Readl and then sequencing the second end containing the 5' tag by using the sequencing primer to obtain the second end sequencing data Read2.
- the sequencing library can be efficiently sequenced.
- Sequencing primers can also be used for sequencing directly, depending on where the tag is set.
- sequencing the sequencing library further comprises: in other words, in one embodiment of the present invention, sequencing is performed from the 5th end by using a sequencing primer first, and sequencing data of Readl is performed at the 5th end. The sequence of the 3' end tag is then obtained, and then the 5' end tag sequence is obtained. Finally, the 3' end is sequenced using the sequencing primer to obtain the sequencing data Read2.
- the invention proposes a method of determining nucleic acid sequence information.
- the method may comprise the steps of: sequencing a nucleic acid sample according to the method described above to obtain a sequencing result; and determining sequence information of the nucleic acid sample based on the sequencing result.
- the nucleic acid sequence information of a plurality of samples can be efficiently determined.
- NanoDrop 1000 Take 1 ⁇ 2 ⁇ ⁇ human peripheral blood genomic DNA sample, use NanoDrop 1000 to measure sample concentration, OD260/280 Ratio, OD260/230 ratio, etc.
- the samples were subjected to agarose gel electrophoresis.
- the electrophoresis result and the measured OD value it is judged whether the total amount and quality of the sample are qualified, and whether or not the sample preparation can be judged.
- Sample purity 260/280 should be between 1.8 and 2.0, no protein, polysaccharide and RNA contamination; sample concentration: the concentration of the sample should be at least lOOng/ ⁇ ;
- Sample amount To ensure the quality of the library was prepared, the total amount of sample required no less than 45 ⁇ ⁇ .
- sample breaking methods Nebulization and Covaris, which can break the sample DNA into fragments ranging from 100 to 800 b and the main band is about 500 bp. If the sample is interrupted DNA, you can skip this step.
- Adapter Adapter Oligo Mix is formed by annealing the commonly used sense sequence PE adapter F and the antisense sequence PE adapter R.
- step 5 The sample obtained in step 5 was electrophoresed on a 2% agarose gel at 100 V for 120 min;
- PC Primer 1 is the oligonucleotide shown in SEQ ID NO: 7 to 12 (label carrying the label of the present invention) TTCCGATCT (SEQ ID NO: 19, primers without a tag), and
- the PCR product was electrophoresed on a 2% agarose gel at 100 V for 120 min, and the n+120 bp (n insert size) position was cut out, and then recovered using QIAquick Gel Extraction Kit (Qiagen).
- the DNA was dissolved in 40 ⁇ l of ⁇ . in.
- Sequencing Primer 1 5 - ACACTCTTTCCCTACACGACGCTCTTCCGATCT (SEQ ID NO: 23). Also schedule the HiSeq2000 for sequencing according to the manufacturer's protocol.
- the oligonucleotides shown in SEQ ID NOS: 7 to 12 were used as 5'-label primers and the conventional P7-end primer primer sequences were used as 3'-label primers, and the 5'-end introduction table was constructed as described in the above general method.
- the sequencing library with the tag shown in 1 and the conventional tag at the 3' end corresponds to SEQ ID NO: 1-6, and is named as follows: METqdiMBEPEI-46, METqdiMBGPEI-137, METqdiMBDPEI-37, METqdiMBFPEI-122, METqdiMBBPEI- 169, METqdiMBBPEI- 109.
- the first sequencing primer when sequencing the above six libraries, is used to bind to the 3' end of the library template strand for sequencing, and the first sequencing data Readl is the 5' end of the library. Sequence information, and then using the second sequencing primer to bind to the 5' end of the library template strand for sequencing, to obtain the first tag sequencing data
- the sequence information of the 3' end tag of the library is then blocked, by means of an oligonucleotide strand on the sequencing chip that matches the 3' end of the library template strand, so that, by using the library coding strand synthesis, by using the second sequencing primer
- the synthesis can be continued to obtain the sequence information of the second tag sequencing data, that is, the 5' end tag of the library, and finally, after synthesizing the library coding strand, the third sequencing primer is used to bind to the 3' end of the library coding strand for synthesis and sequencing, and the second is obtained.
- the sequencing data Read2 is the 3' end sequence information of the library.
- First sequencing primer 5*-ACACTCTTTCCCTACACGACGCTCTTCCGATCT (SEQ ID NO: 23); Second sequencing primer: 5'-GATCGGAAGAGCACACGTCTGAACTCCAGTCAC (SEQ ID NO: 23);
- Third sequencing primer 5'-GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT (SEQ ID NO: 25).
- the 5' end of the library refers to the “5' end of the library coding strand”
- the "3' end of the library” refers to the 3' end of the library coding strand
- '(End) end” or “5' (End) end” is not specifically limited, and refers to the 3' (end) or 5' (end) end of the coding strand of the library.
- the construct has no tag at the 5' end but at 3' Sequencing libraries with conventional tags at the ends were named: CSZPE0120821001 and CSZPE0120821002.
- RawCluster/Tile number of all identifiable DNA clusters/cells throughout the sequencing reaction
- %PF The percentage of standard DNA clusters filtered by the entire sequencing reaction
- %Index 0 mismatch The proportion of the number of mismatches in the tag is 0;
- %Index 1 mismatch The proportion of the number of mismatches in the tag is 1.
- Filtered adaptor Filtered connector.
- the quality value can reflect the quality of the sequencing, between 0-40. Within this range, the higher the quality, the better.
- Q20 refers to the proportion of bases with mass values greater than 20 in all bases, which can reflect the quality of the sequence sequenced. The closer the value is to 1, the more the sequencing quality is. it is good.
- the ratio of (Index Omismatch+Index 1 mismatch ) of the 6 libraries (METqdiMBEPEI-46, METqdiMBGPEI-137, METqdiMBDPEI-37, METqdiMBFPEI-122, METqdiMBBPEI-169, METqdiMBBPEI-109) introduced into the 5' or P5 end More than 90%, the label primer sequence was sequenced normally, demonstrating the availability of the nucleic acid tags, tag primers, and library construction methods and sequencing methods provided by the present invention.
- Figures 2-5 show the library of the tag of the present invention introduced at the 5' end of the present embodiment
- METqdiMBEPEI-46 and METqdiMBGPEI-137 libraries are representative of the results of mass comparisons with libraries that routinely introduce tags at the 3' end (CSZPE0120821001 and CSZPE0120821002). 2, FIG.
- FIG. 2 is a diagram showing a library using the 5'-end or P5-end tag primer sequences of the present invention and a change in the library quality value using a common tag as a function of the number of cycles, wherein the abscissa indicates the number of cycles, and the ordinate indicates the quality condition;
- Figure 3 is a graph showing the change in the light intensity of a library using the 5'-end or P5-end tag primer sequences of the present invention and the number of cycles using a common tag, wherein the abscissa indicates the number of cycles and the ordinate indicates the light intensity.
- Figure 4 shows a plot of the base distribution of the library using the 5' or P5 end tag primer sequences of the present invention and the number of cycles of the library using the common tag, where the abscissa Reads ) indicates the number of cycles run, and the ordinate (Percent ) indicates the percentage of different bases in this cycle.
- FIG. 5 is a graph showing the change in the error rate of the library using the 5'-end or P5-end tag primer sequences of the present invention and the number of cycles using the common tag, wherein the cross-sitting (Position along reads) indicates the number of cycles to run, the ordinate (% Error-rate) indicates the error rate (ie, the proportion of sequencing errors occurred in this cycle), and the solid line indicates the error rate (Error ate, which is sequenced in this cycle). error occurrence rate), the broken line shows the nucleotide ratio (Blank rate) can not be analyzed, which shows the different libraries on the error rate in the regions other 1 j (error-rate along reads ) »
- Sample purity 260/280 should be between 1.8 and 2.0, no protein, polysaccharide and RNA contamination;
- Sample size The minimum requirement is 50 ng.
- the enzyme was digested with Transposomes, and the sample was digested with DNA to a fragment of about 300 bp.
- the specific conditions are as follows:
- the size of the fragment was detected using Bioanalyzer, which was approximately 150 bp to lkb.
- the 50 ⁇ primer mixture includes the following:
- PC primer 1 is: 5'-AATGATACGGCGACCACCGA (SEQ ID NO: 20);
- PC primer 2 is: 5'-CAAGCAGAAGACGGCATACGA (SEQ ID NO: 21).
- NNNNNNNN is at least one selected from the group consisting of SEQ ID NOS: 1 to 6).
- the method for constructing a nucleic acid sequencing library of the present invention can be effectively applied to the construction and sequencing of a DNA sequencing library of sample DNA, and the obtained library has good quality and accurate sequencing results.
Landscapes
- Life Sciences & Earth Sciences (AREA)
- Genetics & Genomics (AREA)
- Health & Medical Sciences (AREA)
- Chemical & Material Sciences (AREA)
- Engineering & Computer Science (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Organic Chemistry (AREA)
- Zoology (AREA)
- Biomedical Technology (AREA)
- Wood Science & Technology (AREA)
- Biotechnology (AREA)
- General Engineering & Computer Science (AREA)
- Crystallography & Structural Chemistry (AREA)
- Plant Pathology (AREA)
- Molecular Biology (AREA)
- Microbiology (AREA)
- Biophysics (AREA)
- Physics & Mathematics (AREA)
- Biochemistry (AREA)
- General Health & Medical Sciences (AREA)
- Bioinformatics & Computational Biology (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
Font l'objet de cette invention un procédé de construction d'une banque de séquençage d'acides nucléiques et sa mise en oeuvre. Selon le procédé, un échantillon d'acides nucléiques est morcelé de manière à obtenir un fragment d'ADN qui va subir une réparation d'extrémités, et ce, afin d'obtenir un fragment d'ADN à extrémités réparées auxquelles on va coller une base A pour obtenir un fragment d'ADN à extrémités adhésives respectivement accolées à une première articulation et à une seconde articulation afin d'obtenir des produits de ligature amplifiés par une amorce 5 ' et une amorce 3' qui seront ensuite séparés pour construire une banque de séquençage d'acides nucléiques se caractérisant par une étiquette d'acide nucléique comportant au moins une première articulation, une seconde articulation, une amorce 5' et une amorce 3 de telle sorte que les produits amplifiés présentent au moins une étiquette d'acide nucléique.
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| PCT/CN2012/086164 WO2014086037A1 (fr) | 2012-12-07 | 2012-12-07 | Procédé de construction d'une banque de séquençage d'acides nucléiques et sa mise en oeuvre |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| PCT/CN2012/086164 WO2014086037A1 (fr) | 2012-12-07 | 2012-12-07 | Procédé de construction d'une banque de séquençage d'acides nucléiques et sa mise en oeuvre |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2014086037A1 true WO2014086037A1 (fr) | 2014-06-12 |
Family
ID=50882790
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/CN2012/086164 Ceased WO2014086037A1 (fr) | 2012-12-07 | 2012-12-07 | Procédé de construction d'une banque de séquençage d'acides nucléiques et sa mise en oeuvre |
Country Status (1)
| Country | Link |
|---|---|
| WO (1) | WO2014086037A1 (fr) |
Cited By (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN105506748A (zh) * | 2016-01-18 | 2016-04-20 | 北京百迈客生物科技有限公司 | 一种dna高通量测序建库方法 |
| CN106282332A (zh) * | 2016-08-08 | 2017-01-04 | 中国科学院北京基因组研究所 | 用于多重核酸测序的标签和引物 |
| CN107075513A (zh) * | 2014-09-12 | 2017-08-18 | 深圳华大基因科技有限公司 | 分离的寡核苷酸及其在核酸测序中的用途 |
Citations (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2008093098A2 (fr) * | 2007-02-02 | 2008-08-07 | Illumina Cambridge Limited | Procedes pour indexer des echantillons et sequencer de multiples matrices nucleotidiques |
| CN102409047A (zh) * | 2010-09-21 | 2012-04-11 | 深圳华大基因科技有限公司 | 一种构建杂交测序文库的方法 |
-
2012
- 2012-12-07 WO PCT/CN2012/086164 patent/WO2014086037A1/fr not_active Ceased
Patent Citations (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2008093098A2 (fr) * | 2007-02-02 | 2008-08-07 | Illumina Cambridge Limited | Procedes pour indexer des echantillons et sequencer de multiples matrices nucleotidiques |
| CN102409047A (zh) * | 2010-09-21 | 2012-04-11 | 深圳华大基因科技有限公司 | 一种构建杂交测序文库的方法 |
Non-Patent Citations (1)
| Title |
|---|
| FAIRCLOTH, B.C. ET AL.: "Not All Sequence Tags Are Created Equal: Designing and Validating Sequence Identification Tags Robust to Indels.", PLOS ONE., vol. 7, no. 8, August 2012 (2012-08-01), pages 1 - 11 * |
Cited By (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN107075513A (zh) * | 2014-09-12 | 2017-08-18 | 深圳华大基因科技有限公司 | 分离的寡核苷酸及其在核酸测序中的用途 |
| CN107075513B (zh) * | 2014-09-12 | 2020-11-03 | 深圳华大智造科技有限公司 | 分离的寡核苷酸及其在核酸测序中的用途 |
| CN105506748A (zh) * | 2016-01-18 | 2016-04-20 | 北京百迈客生物科技有限公司 | 一种dna高通量测序建库方法 |
| CN106282332A (zh) * | 2016-08-08 | 2017-01-04 | 中国科学院北京基因组研究所 | 用于多重核酸测序的标签和引物 |
| CN106282332B (zh) * | 2016-08-08 | 2019-11-15 | 中国科学院北京基因组研究所 | 用于多重核酸测序的标签和引物 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US20240352507A1 (en) | Method for increasing throughput of single molecule sequencing by concatenating short dna fragments | |
| CN105506125B (zh) | 一种dna的测序方法及一种二代测序文库 | |
| CN106795514B (zh) | 泡状接头及其在核酸文库构建及测序中的应用 | |
| CN105400776B (zh) | 寡核苷酸接头及其在构建核酸测序单链环状文库中的应用 | |
| CN105463585B (zh) | 基于单链dna分子构建测序文库的方法及其应用 | |
| CN110139931B (zh) | 用于定相测序的方法和组合物 | |
| CN105925675B (zh) | 扩增dna的方法 | |
| CN107002292B (zh) | 一种核酸的双接头单链环状文库的构建方法和试剂 | |
| WO2012116661A1 (fr) | Étiquette à adn et son utilisation | |
| WO2012037880A1 (fr) | Index d'adn et son application | |
| CN102839168A (zh) | 核酸探针及其制备方法和应用 | |
| AU2016281758B2 (en) | Reagents, kits and methods for molecular barcoding | |
| WO2012068919A1 (fr) | Bibliothèque d'adn et procédé de préparation de celle-ci, procédé et dispositif de détection de snp | |
| CN103131754A (zh) | 一种检测核酸羟甲基化修饰的方法及其应用 | |
| CN109477245A (zh) | 生成用于大规模平行测序的dna文库的方法和试剂盒 | |
| CN108885649A (zh) | 使用纳米孔技术对短dna片段进行快速测序 | |
| WO2012037883A1 (fr) | Etiquettes d'acide nucléique et utilisation de celles-ci | |
| WO2013192292A1 (fr) | Analyse de séquence d'acide nucléique spécifique d'un locus multiplexe massivement parallèle | |
| WO2012126398A1 (fr) | Marqueur adn et son utilisation | |
| WO2012037875A1 (fr) | Etiquettes d'adn et leur utilisation | |
| WO2013086964A1 (fr) | Procédé permettant un enrichissement, une construction de génothèque et une analyse du polymorphisme mononucléotidique des régions géniques dans un génome complexe d'une plante supérieure | |
| WO2012037881A1 (fr) | Marqueurs d'acides nucléiques et leurs utilisations | |
| CN114958997A (zh) | 用于检测伴侣基因的方法 | |
| CN109750086B (zh) | 单链环状文库的构建方法 | |
| WO2014086037A1 (fr) | Procédé de construction d'une banque de séquençage d'acides nucléiques et sa mise en oeuvre |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 12889695 Country of ref document: EP Kind code of ref document: A1 |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| 122 | Ep: pct application non-entry in european phase |
Ref document number: 12889695 Country of ref document: EP Kind code of ref document: A1 |