[go: up one dir, main page]

WO2016049929A1 - Procédé pour construire une banque de séquençage et application de celui-ci - Google Patents

Procédé pour construire une banque de séquençage et application de celui-ci Download PDF

Info

Publication number
WO2016049929A1
WO2016049929A1 PCT/CN2014/088059 CN2014088059W WO2016049929A1 WO 2016049929 A1 WO2016049929 A1 WO 2016049929A1 CN 2014088059 W CN2014088059 W CN 2014088059W WO 2016049929 A1 WO2016049929 A1 WO 2016049929A1
Authority
WO
WIPO (PCT)
Prior art keywords
sequencing data
sequencing
strand
sequence
nucleic acid
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/CN2014/088059
Other languages
English (en)
Chinese (zh)
Inventor
吕小星
钱朝阳
管彦芳
常连鹏
易鑫
朱红梅
杨玲
吴仁花
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Bgi Tianjin
BGI Shenzhen Co Ltd
Original Assignee
Bgi Tianjin
BGI Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Bgi Tianjin, BGI Shenzhen Co Ltd filed Critical Bgi Tianjin
Priority to PCT/CN2014/088059 priority Critical patent/WO2016049929A1/fr
Publication of WO2016049929A1 publication Critical patent/WO2016049929A1/fr
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Images

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12MAPPARATUS FOR ENZYMOLOGY OR MICROBIOLOGY; APPARATUS FOR CULTURING MICROORGANISMS FOR PRODUCING BIOMASS, FOR GROWING CELLS OR FOR OBTAINING FERMENTATION OR METABOLIC PRODUCTS, i.e. BIOREACTORS OR FERMENTERS
    • C12M1/00Apparatus for enzymology or microbiology
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • CCHEMISTRY; METALLURGY
    • C40COMBINATORIAL TECHNOLOGY
    • C40BCOMBINATORIAL CHEMISTRY; LIBRARIES, e.g. CHEMICAL LIBRARIES
    • C40B50/00Methods of creating libraries, e.g. combinatorial synthesis
    • C40B50/06Biochemical methods, e.g. using enzymes or whole viable microorganisms

Definitions

  • the invention relates to the field of biomedicine.
  • the invention relates to methods of constructing sequencing libraries, sequencing methods, methods of determining nucleic acid sequences, devices for constructing sequencing libraries, sequencing devices, and systems for determining nucleic acid sequences.
  • High-throughput sequencing is gaining increasing attention, but the detection of low-frequency mutations for high-throughput sequencing is still to be improved.
  • the present invention aims to solve at least one of the technical problems existing in the prior art. To this end, in accordance with embodiments of the present invention, the present invention proposes methods for constructing sequencing libraries and means for detecting low frequency mutations.
  • the invention proposes a method of constructing a sequencing library.
  • the method comprises: (a) separately joining a linker at both ends of the double-stranded DNA fragment to obtain a ligation product, wherein the linker comprises a first strand and a second strand, the first strand Matching the second strand portion and the first strand comprises a first tag sequence such that the linker defines a double stranded region and two single stranded tails, the sequence of one of the two single stranded tails comprising the first a label; (b) cleavage of the ligated product into a single-stranded DNA fragment; (c) performing a strand extension reaction on the single-stranded DNA fragment using a first primer to obtain a strand extension product, wherein the first primer comprises a second tag sequence, and the first primer is adapted to form a double stranded structure with the first strand of the linker, except that there is a mis
  • a sequencing library can be efficiently constructed, and at the same time, the constructed sequencing library is directed to the same double-stranded DNA fragment (also referred to herein as a "source sequence").
  • Each of the chains has an amplification product having a first tag sequence and a second tag sequence, respectively, whereby in the analysis of subsequent sequencing results, mutual calibration can be performed according to the sequencing results of the two tags, and the analysis is improved. The reliability of the results.
  • the double-stranded DNA fragment is obtained by subjecting a nucleic acid sample to end repair to obtain a repaired nucleic acid sample; and adding a base A at the 5' end of the nucleic acid sample,
  • the nucleic acid samples having viscous terminal bases A at both ends constitute the double-stranded DNA fragment.
  • a linker can be conveniently added to both ends of the double-stranded DNA fragment in a subsequent operation. Thereby, the efficiency of constructing a sequencing library is improved.
  • the nucleic acid sample is at least a portion of human genomic DNA or a free nucleic acid.
  • the human free nucleic acid is extracted from the peripheral blood of the patient.
  • the The patient has cancer, which is at least one selected from the group consisting of bladder cancer, prostate cancer, lung cancer, colorectal cancer, stomach cancer, breast cancer, kidney cancer, pancreatic cancer, ovarian cancer, endometrial cancer, thyroid cancer. , cervical cancer, esophageal cancer and liver cancer. Therefore, the method of the embodiment of the present invention can effectively analyze gene mutations of human disease patients, and can be effectively used for early diagnosis, individualized medication, and postoperative monitoring of common tumors.
  • At least a portion of the human genomic DNA is obtained by random disruption of human genomic DNA.
  • a linker can be conveniently added to both ends of the double-stranded DNA fragment in a subsequent operation. Thereby, the efficiency of constructing a sequencing library is improved.
  • the linker has a 3' base T sticky end.
  • a linker can be conveniently added to both ends of the double-stranded DNA fragment in a subsequent operation. Thereby, the efficiency of constructing a sequencing library is improved.
  • the single-stranded DNA fragment is obtained by subjecting the ligation product to denaturation treatment.
  • the denaturation treatment may be a heat denaturation treatment or an alkali denaturation treatment.
  • the single-stranded DNA fragment is screened using a probe prior to performing the strand extension, wherein the probe specifically recognizes a predetermined region.
  • the sequencing library can be efficiently constructed for the region of interest, and the efficiency of constructing the sequencing library and subsequent sequencing is improved.
  • the predetermined area comprises one of the following:
  • the probe is provided in the form of a chip. Thereby, the efficiency of probe screening can be improved.
  • the strand extension reaction is carried out in the presence of a UDG enzyme/FPG enzyme.
  • the first tag sequence and the second tag sequence are each independently 4 to 10 nt in length. According to an embodiment of the invention, the first tag sequence and the second tag sequence are both 8 nt in length. According to an embodiment of the invention, there is a mismatch of at least 2 nt between the first tag sequence and the second tag sequence. The inventors have surprisingly found that with such an arrangement, the efficiency of correction using the first tag sequence and the second tag sequence in subsequent analysis can be effectively improved.
  • the first strand of the linker has the sequence set forth in SEQ ID NO: 1
  • the second strand of the linker has the sequence set forth in SEQ ID NO: 2
  • the first tag having SEQ ID NO: shown in any of 3-6 a sequence
  • the second tag having the sequence set forth in at least one of SEQ ID NOs: 7-10
  • the first primer having the sequence set forth in SEQ ID NO: 11
  • the The primers of the first tag sequence and the second tag sequence have the sequences set forth in SEQ ID NO: 13 and SEQ ID NO: 13.
  • the labels include, but are not limited to, the four pairs described above, and multiple pairs of labels can be designed as needed for simultaneous detection of multiple samples.
  • the invention proposes a sequencing method comprising: constructing a sequencing library according to the method described above; sequencing the sequencing library.
  • the sequencing was performed on Hiseq2000 or Hiseq 2500 according to an embodiment of the invention.
  • the invention provides a method of determining a nucleic acid sequence, comprising:
  • sequencing is performed according to the methods previously described in the claims to obtain sequencing results consisting of multiple sequencing data;
  • At least one subset of sequencing data is constructed, wherein all sequencing data in each subset of sequencing data corresponds to the same source sequence on the nucleic acid sample;
  • a sequence of the nucleic acid sample is determined based on the corrected sequencing data.
  • the calibration can be effectively performed based on the positive strand sequencing data and the negative strand sequencing data, thereby improving the reliability of the analysis result.
  • the sequencing is a double-end sequencing, the sequencing result consisting of pairs of pairs of sequencing data.
  • constructing at least one subset of sequencing data based on the sequencing results is performed by the following steps:
  • a paired sequencing data index for each pair of the plurality of pairs of sequenced data, the paired sequencing data index consisting of an initial N bases of each of the paired sequencing data, wherein N is An integer between 10 and 20;
  • the at least one preliminary sequencing data subset is subdivided based on a Hamming distance between the sequencing data in the preliminary sequencing data subset to obtain a plurality of the sequencing data subsets.
  • N is 12.
  • the Hamming distance of any two pairs of paired sequencing data does not exceed 20.
  • the positive strand sequencing data and the negative strand sequencing data are at least two, respectively.
  • determining the corrected sequencing data based on the positive strand sequencing data and the negative strand sequencing data is based on the following principles:
  • Each base in the corrected sequencing data is simultaneously supported by at least 50% positive strand sequencing data and at least 50% negative strand sequencing data.
  • each base in the corrected sequencing data simultaneously obtains at least 80% positive stranding Order data and support for at least 80% negative strand sequencing data.
  • the method further includes:
  • the corrected sequencing data is aligned to a reference sequence and the sequencing data with a quality of less than 30 is deleted.
  • SNV analysis or Indel analysis is performed based on the sequence of the nucleic acid sample.
  • the invention proposes an apparatus for constructing a sequencing library.
  • the apparatus comprises:
  • a linking unit for respectively connecting a linker at both ends of the double-stranded DNA fragment to obtain a ligation product, wherein the linker includes a first strand and a second strand, the first strand and the second strand portion are matched and The first strand comprises a first tag sequence such that the linker defines a double-stranded region and two single-stranded tails, the sequence of one of the two single-stranded tails comprising a first label;
  • cleavage unit for cleaving the ligation product into a single-stranded DNA fragment
  • a strand extension unit for performing a strand extension reaction on the single-stranded DNA fragment with a first primer to obtain a strand extension product, wherein the first primer includes a second tag sequence, and the first primer is adapted to The first strand of the linker forms a double-stranded structure, except that there is a mismatch between the first tag sequence and the second tag sequence;
  • An amplification unit for amplifying the strand extension product to obtain an amplification product, the amplification product constituting the sequencing library, wherein the amplification is adapted to simultaneously amplify the first label a sequence and a primer for the second tag sequence.
  • the above apparatus can effectively implement the method for constructing a sequencing library described above, and can efficiently construct a sequencing library, and at the same time, the constructed sequencing library targets the same double-stranded DNA fragment (in this paper)
  • Each of the strands also referred to as "source sequences”
  • source sequences obtains an amplification product having a first tag sequence and a second tag sequence, respectively, whereby, in the analysis of subsequent sequencing results, sequencing of the two tags can be performed.
  • the results are mutually corrected to improve the reliability of the analysis results.
  • the method further includes:
  • An end repair unit for end-repairing a nucleic acid sample to obtain a repaired nucleic acid sample
  • a terminal modification unit for adding a base A at the 5' end of the nucleic acid sample to obtain a nucleic acid sample having a sticky terminal base A at each end, wherein the two ends respectively have a nucleic acid sample having a sticky terminal base A The double-stranded DNA fragment.
  • a screening unit for screening the single-stranded DNA fragment using a probe before the chain extension is performed, wherein the probe specifically recognizes a predetermined region.
  • the predetermined area comprises one of the following:
  • the probe is provided in the form of a chip.
  • the strand extension reaction is carried out in the presence of a UDG enzyme/FPG enzyme.
  • the first tag sequence and the second tag sequence are each independently 4 to 10 nt in length.
  • the first tag sequence and the second tag sequence are both 8 nt in length.
  • the first strand of the linker has the sequence set forth in SEQ ID NO: 1
  • the second strand of the linker has the sequence set forth in SEQ ID NO: 2
  • the first tag having SEQ ID NO: the sequence of any one of 3-6
  • the second tag has the sequence set forth in at least one of SEQ ID NOs: 7-10
  • the first primer having the sequence set forth in SEQ ID NO:11
  • the primers suitable for simultaneously amplifying the first tag sequence and the second tag sequence have the sequences set forth in SEQ ID NO: 12 and SEQ ID NO: 13.
  • the labels include, but are not limited to, the four pairs described above, and multiple pairs of labels may be involved as needed for simultaneous detection of multiple samples.
  • the invention proposes a sequencing device.
  • the sequencing device comprises: a device for constructing a sequencing library according to the foregoing; a sequencing device for sequencing the sequencing library.
  • the sequencing device is Hiseq2000 or Hiseq 2500.
  • the invention proposes a system for determining a nucleic acid sequence.
  • the system comprises:
  • a sequencing data subset construction device for constructing at least one subset of sequencing data based on the sequencing result, wherein all sequencing data in each subset of sequencing data corresponds to the same source sequence on the nucleic acid sample;
  • a sequencing data classification device configured to determine, for each subset of the sequencing data, sequencing data corresponding to the first label sequence as positive strand sequencing data, and sequencing data corresponding to the second label sequence as negative strand sequencing data ;
  • a sequencing data correction device for correcting the sequencing data for each of the sequencing data subsets based on the positive strand sequencing data and the negative strand sequencing data, respectively, to determine corrected sequencing data
  • a sequence determining device for determining a sequence of the nucleic acid sample based on the corrected sequencing data.
  • the method of determining a nucleic acid sequence as described above can be efficiently carried out using a system for determining a nucleic acid sequence according to an embodiment of the present invention. Therefore, the calibration can be effectively performed based on the positive strand sequencing data and the negative strand sequencing data, thereby improving the reliability of the analysis result.
  • the sequencing is a double-end sequencing, the sequencing result consisting of pairs of pairs of sequencing data.
  • the sequencing data subset construction device comprises:
  • a sequencing data index determining device for determining a paired sequencing data index for each pair of the plurality of pairs of paired sequencing data, the paired sequencing data indexing from the first N of each of the paired sequencing data Base composition, wherein N is an integer between 10 and 20;
  • a preliminary screening device for constructing at least one preliminary sequencing data subset based on the paired sequencing data index, wherein each of the sequencing data subsets has the same paired sequencing data index;
  • a secondary screening device for subdividing the at least one preliminary sequencing data subset based on a Hamming distance between the sequencing data in the preliminary sequencing data subset to obtain a plurality of the sequencing data subsets.
  • N is 12.
  • the Hamming distance of any two pairs of paired sequencing data does not exceed 20.
  • the positive strand sequencing data and the negative strand sequencing data are at least two, respectively.
  • determining the corrected sequencing data based on the positive strand sequencing data and the negative strand sequencing data is based on the following principles:
  • Each base in the corrected sequencing data is simultaneously supported by at least 50% positive strand sequencing data and at least 50% negative strand sequencing data.
  • each base in the corrected sequencing data is simultaneously supported by at least 80% positive strand sequencing data and at least 80% negative strand sequencing data.
  • the method further includes:
  • the corrected sequencing data is aligned to a reference sequence and the sequencing data with a quality of less than 30 is deleted.
  • sequence analysis device for performing SNV analysis or Indel analysis based on the sequence of the nucleic acid sample.
  • FIG. 1 shows a flow chart of a method of constructing a sequencing library in accordance with an embodiment of the present invention
  • Figure 3 shows the results of analysis of a catastrophe spectrum according to an embodiment of the present invention
  • Figure 5 shows the results of analysis of a mutated spectrum in accordance with one embodiment of the present invention
  • Figure 7 shows the results of analysis of a mutation spectrum according to an embodiment of the present invention.
  • Figure 8 shows an analysis result of the same index reads cluster in accordance with one embodiment of the present invention.
  • Figure 9 shows the results of analysis of a mutation spectrum according to an embodiment of the present invention.
  • Figure 10 shows the results of an analysis of the same indexed reads cluster in accordance with one embodiment of the present invention.
  • Figure 11 shows the results of analysis of the catastrophe spectrum according to one embodiment of the present invention.
  • the exon sequence of the relevant gene was retrieved.
  • the final chip only involved the CDS region of the above gene and extended the CDS region by 20 bp.
  • the chip is covered with a rich capture probe with a 98% coverage area, which enriches the target DNA fragment from the complex genome and captures the genomic region with high specificity and high coverage on the same chip.
  • index1 The label, named index1, not only has the ability to distinguish between different samples, it will also be used for subsequent positive-chain markings).
  • the obtained ligation product was subjected to chip hybridization capture, and the eluted single-stranded template product was amplified by one round of one cycle of primers labeled with index 2, so that the anti-strand was labeled.
  • UDG/FPG enzyme was added during the PCR to incubate to eliminate the DNA damage in the template strand and reduce the occurrence of false positives.
  • the product obtained by double-indexing of the positive and negative chains is purified, and then subjected to a second round of PCR enrichment to complete the preparation of the library.
  • the sequencing method adopts Hiseq2000 or Hiseq2500. According to the difference in the amount of sequencing and the number of samples, the appropriate sequencing platform can be flexibly selected.
  • the specific steps include:
  • the cfDNA extracted from the plasma was then subjected to a three-step enzymatic reaction according to the KAPA LTP Library Preparation Kit.
  • an early screening related chip for cancer designed by the inventors is used, and hybridization capture is performed with reference to a specification provided by the chip manufacturer. Finally eluted back to dissolve 21 ⁇ L of ddH 2 O band hybrid eluting magnetic beads.
  • Double index positive and negative chain tagging and enrichment
  • PCR1 was subjected to reverse strand labeling and template DNA damage repair
  • PCR2 was subjected to amplification and enrichment to complete library preparation.
  • the hybrid elution magnetic beads were first removed, and then 40 ⁇ L of Agencourt AMPure XP reagent was added for magnetic bead purification, and finally 20 ul of ddH 2 O was dissolved, and magnetic beads were used for the next reaction.
  • the magnetic beads of the previous step were removed first, then 50 ⁇ L of Agencourt AMPure XP reagent was re-added, magnetic beads were purified, and finally 25 ⁇ L of ddH 2 O was dissolved, and QC and the machine were performed.
  • the paired reads (paired sequencing data) of the first 12 bp base of reads1 and the first 12 bp base of reads2 are connected into a short sequence of 24 bp, and the 24 bp is used as an index of paired reads, and according to Its index marks the positive and negative chains.
  • any two pairs in each small cluster Paired reads have a Hamming distance of no more than 10 in order to distinguish between reads that have the same index but come from different DNA templates.
  • step 4 The copy clusters of the same DNA template obtained in step 3 are screened. If the number of reads of the positive and negative strands is more than 2 pairs, subsequent analysis is performed.
  • the new reads are re-aligned to the genome using the bwa mem algorithm, and the reads with a quality less than 30 are screened out.
  • the base type which is inconsistent with the mainstream base type is a mutated base. type.
  • Example 1 Early screening of gynecological reproductive tract tumors
  • the WCNpan chip includes: Driver Gene (driver gene) related to gynecological genital tract tumors (cervical cancer, endometrial cancer, ovarian cancer), high-frequency mutated genes, and important genes in 12 signaling pathways of cancer, totaling 42 genes. , 300KB.
  • the specific design process of the chip According to the human genome HG19, the exon sequences of the above 42 genes are retrieved. Considering the size and cost of the capture region, the final chip only covers the CDS region of the above gene, and extends the CDS region before and after. 20bp, the chip totals 300kb. The chip is covered with a rich capture probe with a 98% coverage area, which enriches the target DNA fragment from a complex genome and captures approximately 300KB of genomic region with high specificity and high coverage on the same chip. .
  • the positive and negative chain interoperability rate based on the ratio of the total clusters on the clusters/3 reads on the positive and negative chains of 3 reads, to evaluate the positive and negative chain interoperability in the available data; effective data utilization: based on The ratio of the number of reads error correction of at least 2+/2-cluster to the total number of sequencing reads is satisfied; the average sequencing depth: the average coverage of bases in the target region after error correction based on effective data.
  • Fig. 2 The analysis result of the same index reads cluster is shown in Fig. 2, in which the abscissa indicates the number of duplication (dup) of the table cluster, and the ordinate indicates the total number of reads of the cluster satisfying a certain number of dup. It can be seen from the results of Fig. 2 that most of the dup clusters are around 8, and the larger part of the clusters can satisfy the condition of 2 plus + 2 inverses.
  • the effective utilization rate of the final data is 5.14%, and the average sequencing depth is 1153.6X.
  • Fig. 3 The results of the catastrophe spectrum analysis are shown in Fig. 3, in which the complementary mutation type is substantially the same as the theoretical mutation frequency for the double-stranded molecule (DNA).
  • the abscissa represents the type of base mutation; the ordinate represents the number of mutations.
  • the results in Fig. 3 show that the mutated base type distribution is balanced, and the mutation frequency (Mutations per nucleotide) is: 1.7 ⁇ 10 -6 . .
  • Example 2 Twelve common tumor individualized medication
  • the CANPer-YY chip includes: oncogenes, tumor suppressor genes, 12 common cancer high-frequency genes, important genes in 12 signal pathways of cancer, target drugs and chemotherapeutic drugs, etc., a total of 524 genes, 750KB.
  • the main design process of the chip is divided into 4 steps:
  • the mutated samples of the two selected intervals are used as the sample database, and the third interval is screened in the same way until the sample database includes all the samples to count the exon region set, and for the unfiltered All intervals of the genes in any interval are added to the chip interval.
  • the killing effect of chemotherapeutic drugs on tumor cells is significantly correlated with the expression and/or polymorphism of a specific (a group of) genes.
  • the detection of related genes predicts the efficacy of chemotherapeutic drugs and selects appropriate drugs for individualized chemotherapy. It has become a reasonable choice to improve efficacy and reduce ineffective treatment.
  • the PharmGKB database is used to integrate all the current chemotherapeutic drugs and the genes related to curative effect and predictive evaluation of therapeutic effects, and to form a database for interpretation of individualized drugs for chemotherapy.
  • the chemotherapy data was integrated into the individualized information flow of the tumor to complete the automated interpretation of the chemotherapy drug.
  • Targeted drugs have the characteristics of significant drug efficacy and few side effects in tumor therapy, but they are dependent on targets (including protein, DNA, etc.). Target analysis must be performed on patients before they can determine whether patients can take drugs. Integrate current FDA-approved targeted drugs, as well as drugs in clinical III and IV. According to the NCCN clinical guidelines, the clinical drug gene research collates the relationship between the drug target gene and the target drug, and forms a database of individualized target drug interpretation.
  • a patient with advanced gastric cancer (one of the 12 common tumors) is subjected to the individualized drug guidance test according to the steps of the above method, and the results are as follows:
  • the positive and negative chain interoperability rate based on the ratio of the clusters above 3 positive and negative chains/3 total reads, to evaluate the positive and negative chain interoperability in the available data; effective data utilization: based on The ratio of the number of reads error correction of at least 2+/2-cluster to the total number of sequencing reads is satisfied; the average sequencing depth: the average coverage of bases in the target region after error correction based on effective data.
  • Fig. 4 The analysis result of the same index reads cluster is shown in Fig. 4, in which the abscissa represents the number of duplication (dup) of the cluster, and the ordinate represents the total number of reads of the cluster satisfying a certain number of dup.
  • the results in Figure 4 show that most of the dup clusters are around 5, and most of the clusters can satisfy the conditions of 2 plus + 2 inverses.
  • the final data effective utilization rate is 3.5%, and the average sequencing depth is: 667X.
  • Fig. 5 The results of the mutational profiling are shown in Fig. 5, in which the complementary mutation type is substantially the same for the double-stranded molecule (DNA), the abscissa represents the type of base mutation, and the ordinate represents the number of mutations.
  • the results in Figure 5 show that the distribution of the mutated base type is basically balanced, and the mutation frequency (Mutations per nucleotide) is: 4.2 ⁇ 10 -6 .
  • the chemotherapy sites are shown in the following table:
  • the Colorectalpan chip includes: Driver Gene, a high-frequency mutated gene, and an important gene in 12 signaling pathways of cancer, a total of 60 genes, 123KB.
  • the main design process of the chip is divided into 4 steps:
  • the mutated samples of the two selected intervals are used as the sample database, and the third interval is screened in the same way until the sample database includes all the samples to count the exon region set, and for the unfiltered All intervals of the genes in any interval are added to the chip interval.
  • a colorectal cancer early screening test is performed on a patient with intestinal polyps according to the steps of the above method, and the results are as follows:
  • the positive and negative chain interoperability rate based on the ratio of the clusters above 3 positive and negative chains/3 total reads, to evaluate the positive and negative chain interoperability in the available data; effective data utilization: based on The ratio of the number of reads error correction of at least 2+/2-cluster to the total number of sequencing reads is satisfied; the average sequencing depth: the average coverage of bases in the target region after error correction based on effective data.
  • Fig. 6 The analysis of the same index reads cluster is shown in Fig. 6, where the abscissa represents the number of duplications (dup) of the cluster, and the ordinate represents the total number of reads of the cluster satisfying a certain number of dups.
  • the results in Figure 6 show that most of the dup clusters are around 6, and most of the clusters can satisfy the conditions of 2 plus + 2 inverses.
  • the final data effective utilization rate is 5.12%, and the average sequencing depth is: 1033X.
  • the results of the mutational profiling are shown in Figure 7, in which the complementary mutation type is substantially the same for the double-stranded molecule (DNA), the abscissa represents the type of base mutation, and the ordinate represents the number of mutations.
  • the results of Fig. 7 show that the distribution of the mutated base type is basically balanced, and the mutation frequency (Mutations per nucleotide) is: 2.2 ⁇ 10 -6 .
  • Mutation detection list details (based on exon area and non-synonymous mutation statistics):
  • the Lungpan chip includes: lung cancer-related Driver Gene, high-frequency mutated gene, and important genes in 12 signaling pathways of cancer, totaling 145 genes, 250KB.
  • the main design process of the chip is divided into 4 steps:
  • the mutated samples of the two selected intervals are used as the sample database, and the third interval is screened in the same way until the sample database includes all the samples to count the exon region set, and for the unfiltered All intervals of the genes in any interval are added to the chip interval.
  • a lung nodule patient is subjected to early screening of lung cancer according to the steps of the above method, and the results are as follows:
  • the positive and negative chain interoperability rate based on the ratio of the clusters above 3 positive and negative chains/3 total reads, to evaluate the positive and negative chain interoperability in the available data; effective data utilization: based on The ratio of the number of reads error correction of at least 2+/2-cluster to the total number of sequencing reads is satisfied; the average sequencing depth: the average coverage of bases in the target region after error correction based on effective data.
  • Fig. 8 The analysis result of the same index reads cluster is shown in Fig. 8.
  • the abscissa represents the number of duplication (dup) of the cluster, and the ordinate represents the total number of reads of the cluster satisfying a certain number of dup.
  • the results of Fig. 8 show that most of the dup clusters are around 10, and the larger part of the cluster can satisfy the condition of 2 plus + 2 inverse.
  • the effective utilization rate of the final data is 4.12%, and the average sequencing depth is 898X.
  • Fig. 9 The results of the mutational profiling are shown in Fig. 9, in which the complementary mutation type is substantially the same for the double-stranded molecule (DNA), the abscissa represents the type of base mutation, and the ordinate represents the number of mutations.
  • the results in Figure 9 show that the mutated base type distribution is basically balanced, and its mutation frequency (Mutations per nucleotide) is: 2.6 ⁇ 10 -6 .
  • Mutation detection list details (based on exon area and non-synonymous mutation statistics):
  • the CANPer-JK chip includes: 12 common cancer-related Driver Genes, high-frequency mutated genes, and important genes in 12 cancer signaling pathways, totaling 547 genes, 800 KB.
  • the main design process of the chip is divided into 4 steps:
  • the mutated samples of the two selected intervals are used as the sample database, and the third interval is screened in the same way until the sample database includes all the samples to count the exon region set, and for the unfiltered All intervals of the genes in any interval are added to the chip interval.
  • a postoperative breast cancer patient (one of 12 common tumors) is subjected to postoperative monitoring and detection of breast cancer according to the steps of the above method, and the results are as follows:
  • the positive and negative chain interoperability rate based on the ratio of the clusters above 3 positive and negative chains/3 total reads, to evaluate the positive and negative chain interoperability in the available data; effective data utilization: based on The ratio of the number of reads error correction of at least 2+/2-cluster to the total number of sequencing reads is satisfied; the average sequencing depth: the average coverage of bases in the target region after error correction based on effective data.
  • Fig. 10 The analysis result of the same index reads cluster is shown in Fig. 10, in which the abscissa represents the number of duplication (dup) of the cluster, and the ordinate represents the total number of reads of the cluster satisfying a certain number of dup.
  • the results in Figure 10 show that most of the dup clusters are around 6, and most of the clusters can satisfy the conditions of 2 plus + 2 inverses.
  • the effective utilization rate of the final data is 4.74%, and the average sequencing depth is: 1028.6X.
  • Fig. 11 The results of the catastrophe spectrum analysis are shown in Fig. 11, in which the complementary mutation type is substantially the same for the double-stranded molecule (DNA), the abscissa represents the type of base mutation, and the ordinate represents the number of mutations.
  • the results of Fig. 11 show that the distribution of the mutated base type is basically balanced, and the mutation frequency (Mutations per nucleotide) is: 3.1 ⁇ 10 -6 .
  • Mutation detection list details (based on exon area and non-synonymous mutation statistics):

Landscapes

  • Chemical & Material Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Organic Chemistry (AREA)
  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Biochemistry (AREA)
  • Zoology (AREA)
  • Wood Science & Technology (AREA)
  • Microbiology (AREA)
  • Biotechnology (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Genetics & Genomics (AREA)
  • Medicinal Chemistry (AREA)
  • Molecular Biology (AREA)
  • Chemical Kinetics & Catalysis (AREA)
  • General Chemical & Material Sciences (AREA)
  • Sustainable Development (AREA)
  • Physics & Mathematics (AREA)
  • Analytical Chemistry (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Immunology (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

La présente invention concerne un procédé pour construire une banque de séquençage, un procédé de séquençage, un procédé pour déterminer une séquence d'acide nucléique, un dispositif pour construire une banque de séquençage, un dispositif de séquençage et un système pour déterminer une séquence d'acide nucléique.
PCT/CN2014/088059 2014-09-30 2014-09-30 Procédé pour construire une banque de séquençage et application de celui-ci Ceased WO2016049929A1 (fr)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2014/088059 WO2016049929A1 (fr) 2014-09-30 2014-09-30 Procédé pour construire une banque de séquençage et application de celui-ci

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2014/088059 WO2016049929A1 (fr) 2014-09-30 2014-09-30 Procédé pour construire une banque de séquençage et application de celui-ci

Publications (1)

Publication Number Publication Date
WO2016049929A1 true WO2016049929A1 (fr) 2016-04-07

Family

ID=55629352

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2014/088059 Ceased WO2016049929A1 (fr) 2014-09-30 2014-09-30 Procédé pour construire une banque de séquençage et application de celui-ci

Country Status (1)

Country Link
WO (1) WO2016049929A1 (fr)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106636063A (zh) * 2016-09-27 2017-05-10 广州精科医学检验所有限公司 引物组合物、其用途、构建文库和确定核酸序列的方法
CN110791813A (zh) * 2018-08-01 2020-02-14 广州华大基因医学检验所有限公司 对单链dna进行处理的方法及应用
CN113249454A (zh) * 2020-02-12 2021-08-13 赛纳生物科技(北京)有限公司 一种多碱基基因测序中获得单位信号的方法
WO2023092601A1 (fr) * 2021-11-29 2023-06-01 京东方科技集团股份有限公司 Marqueur moléculaire umi et application, adaptateur, réactif de ligature d'adaptateur et son kit, et procédé de construction de banque
WO2025180330A1 (fr) * 2024-03-01 2025-09-04 深圳市真迈生物科技有限公司 Procédé et appareil de construction de base de données de résistance aux médicaments, procédé et appareil de détection de résistance aux médicaments, et dispositif

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102409045A (zh) * 2010-09-21 2012-04-11 深圳华大基因科技有限公司 一种基于dna接头连接的标签文库构建方法及其所使用标签和标签接头
CN102534811A (zh) * 2010-12-16 2012-07-04 深圳华大基因科技有限公司 一种dna文库及其制备方法、一种dna测序方法和装置
CN103667442A (zh) * 2013-09-13 2014-03-26 西南民族大学 一种针对微量样本的转录组高通量测序方法

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102409045A (zh) * 2010-09-21 2012-04-11 深圳华大基因科技有限公司 一种基于dna接头连接的标签文库构建方法及其所使用标签和标签接头
CN102534811A (zh) * 2010-12-16 2012-07-04 深圳华大基因科技有限公司 一种dna文库及其制备方法、一种dna测序方法和装置
CN103667442A (zh) * 2013-09-13 2014-03-26 西南民族大学 一种针对微量样本的转录组高通量测序方法

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106636063A (zh) * 2016-09-27 2017-05-10 广州精科医学检验所有限公司 引物组合物、其用途、构建文库和确定核酸序列的方法
CN110791813A (zh) * 2018-08-01 2020-02-14 广州华大基因医学检验所有限公司 对单链dna进行处理的方法及应用
CN110791813B (zh) * 2018-08-01 2023-06-16 广州华大基因医学检验所有限公司 对单链dna进行处理的方法及应用
CN113249454A (zh) * 2020-02-12 2021-08-13 赛纳生物科技(北京)有限公司 一种多碱基基因测序中获得单位信号的方法
WO2023092601A1 (fr) * 2021-11-29 2023-06-01 京东方科技集团股份有限公司 Marqueur moléculaire umi et application, adaptateur, réactif de ligature d'adaptateur et son kit, et procédé de construction de banque
WO2025180330A1 (fr) * 2024-03-01 2025-09-04 深圳市真迈生物科技有限公司 Procédé et appareil de construction de base de données de résistance aux médicaments, procédé et appareil de détection de résistance aux médicaments, et dispositif

Similar Documents

Publication Publication Date Title
CN104293938B (zh) 构建测序文库的方法及其应用
US11001837B2 (en) Low-frequency mutations enrichment sequencing method for free target DNA in plasma
CN104294371B (zh) 构建测序文库的方法及其应用
CN109880910B (zh) 一种肿瘤突变负荷的检测位点组合、检测方法、检测试剂盒及系统
Liu et al. The contribution of hereditary cancer-related germline mutations to lung cancer susceptibility
CN109427412B (zh) 用于检测肿瘤突变负荷的序列组合和其设计方法
US20240229112A1 (en) Compositions and methods for analyzing cell-free dna in methylation partitioning assays
CN111996257A (zh) 基于二代测序技术的胃癌检测panel及其应用
CN107922973A (zh) 用于基于测序的变型检测的方法和系统
JP2016513959A5 (fr)
CN113249483B (zh) 一种检测肿瘤突变负荷的基因组合、系统及应用
US20220399080A1 (en) Methods and products for minimal residual disease detection
US20250273295A1 (en) Detecting the presence of a tumor based on methylation status of cell-free nucleic acid molecules
WO2016049929A1 (fr) Procédé pour construire une banque de séquençage et application de celui-ci
US20210087637A1 (en) Methods and systems for screening for conditions
US20230193355A1 (en) Methods and compositions for high-throughput target sequencing in single cells
US20230091151A1 (en) Compositions and Methods for Targeted NGS Sequencing of cfRNA and cfTNA
KR20240049800A (ko) 비정상적으로 메틸화된 단편을 갖는 체세포 변이 동시 발생
TWI873135B (zh) Dna標記
US20250218532A1 (en) Systems and methods for cancer therapy monitoring
US20250179585A1 (en) Methods and compositions for identifying structural variants
CN118043892A (zh) 体细胞变体与异常甲基化片段的共现
CN116829736A (zh) 用于将样品分拣为临床相关类别的方法
CN114908163A (zh) 预测肺癌免疫检查点抑制剂疗效的标志物及其应用
US20240105279A1 (en) Methods and systems employing targeted next generation sequencing for classifying a tumor sample as having a level of homologous recombination deficiency similar to that associated with mutations in brca1 or brca2 genes

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 14903291

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 14903291

Country of ref document: EP

Kind code of ref document: A1