[go: up one dir, main page]

WO2025007252A1 - Repeated sequencing method - Google Patents

Repeated sequencing method Download PDF

Info

Publication number
WO2025007252A1
WO2025007252A1 PCT/CN2023/105577 CN2023105577W WO2025007252A1 WO 2025007252 A1 WO2025007252 A1 WO 2025007252A1 CN 2023105577 W CN2023105577 W CN 2023105577W WO 2025007252 A1 WO2025007252 A1 WO 2025007252A1
Authority
WO
WIPO (PCT)
Prior art keywords
sequencing
sequence
sample
primer
base
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
PCT/CN2023/105577
Other languages
French (fr)
Chinese (zh)
Inventor
罗银玲
徐崇钧
龚梅花
李计广
欧阳凯
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
MGI Tech Co Ltd
Original Assignee
MGI Tech Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by MGI Tech Co Ltd filed Critical MGI Tech Co Ltd
Priority to PCT/CN2023/105577 priority Critical patent/WO2025007252A1/en
Publication of WO2025007252A1 publication Critical patent/WO2025007252A1/en
Anticipated expiration legal-status Critical
Pending legal-status Critical Current

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6869Methods for sequencing

Definitions

  • the present application relates to the field of biological sequencing technology, and in particular to a method for repeated sequencing.
  • the technology of gene sequencing has advanced by leaps and bounds, from the first-generation Sanger sequencing method, the second-generation sequencing by synthesis (SBS) method to the third-generation single-molecule sequencing method.
  • SBS second-generation sequencing by synthesis
  • These sequencing technologies will inevitably have sequencing errors during the sequencing process due to various reasons such as sequencing reagents, sequencing samples, sequencing instruments and external environment.
  • the second-generation sequencing technology is based on the PCR principle. During the process of sequencing by synthesis, sequencing errors are inevitable due to internal and external environments, resulting in reduced sequencing accuracy.
  • an embodiment of the present application provides a repeated sequencing method.
  • the first aspect of the present application proposes a repeated sequencing method, comprising: performing a first sequencing on a sample based on a first sequencing primer to obtain a first sequencing sequence based on a first sequencing synthetic chain; contacting an elution reagent with the first sequencing synthetic chain to remove the first sequencing synthetic chain; performing a second sequencing on the sample based on the first sequencing primer to obtain a second sequencing sequence based on a second sequencing synthetic chain; and comparing the first sequencing sequence and the second sequencing sequence to determine the sequence information of the sample.
  • the first sequencing or the second sequencing of the sample based on the first sequencing primer includes: annealing the first sequencing primer to the sample and performing multiple base sequencing cycles, wherein each cycle completes signal detection of one base, thereby obtaining an optical signal of the first sequencing synthesis chain or the second sequencing synthesis chain, and obtaining the first sequencing sequence or the second sequencing sequence based on the optical signal of the first sequencing synthesis chain or the second sequencing synthesis chain.
  • performing a first sequencing on the sample based on the first sequencing primer further comprises: fixing the sample on a solid phase carrier for sequencing.
  • the solid phase carrier is a bead or a chip.
  • the concentration of formamide is 30%-100%. In some embodiments, the concentration of formamide is 40-100%. In some embodiments, the concentration of E. coli exonuclease III is 1-50 U/ ⁇ L. In some embodiments, the concentration of E. coli exonuclease III is 1-50 U/ ⁇ L. The concentration of exonuclease III is 1-10 U/ ⁇ L.
  • the elution time of the elution reagent is 0-20 min. In some embodiments, the elution time of the elution reagent is 5-15 min. In some embodiments, the elution time of the elution reagent is 5-10 min.
  • the elution temperature of the elution reagent is 10-60° C. In some embodiments, the elution temperature of the elution reagent is 20-50° C. In some embodiments, the elution temperature of the elution reagent is 25-45° C.
  • the first sequencing sequence and the second sequencing sequence are compared to determine the sequence information of the sample, including: comparing the bases at corresponding positions of the first sequencing sequence and the second sequencing sequence; in response to the bases at corresponding positions of the first sequencing sequence and the second sequencing sequence being the same, determining the same base as the base type at the position; in response to the bases at corresponding positions of the first sequencing sequence and the second sequencing sequence being different, determining the base type at the position by statistical data.
  • the statistical data is a Q value.
  • determining the base type at the position by using statistical data specifically includes: comparing the Q values of bases at corresponding positions of the first sequencing sequence and the second sequencing sequence, and determining the base x as the base type at the position based on the fact that the Q value of the base x at the position of the first sequencing sequence is greater than the Q value of the base y at the position of the second sequencing sequence.
  • the repeated sequencing method further includes: repeating the sequencing process based on the first sequencing primer, including: sequencing the sample m times again to obtain m sequencing sequences based on m sequencing synthesis chains; allowing the elution reagent to contact the m or m-1 sequencing synthesis chains respectively to remove the m or m-1 sequencing synthesis chains; and comparing the first sequencing sequence, the second sequencing sequence and the m sequencing sequences to determine the sequence information of the sample.
  • m is a positive integer greater than 0. In some embodiments, 1 ⁇ m ⁇ 10.
  • the comparing the first sequencing sequence, the second sequencing sequence, and the m sequencing sequences to determine the sequence information of the sample includes: comparing the bases at corresponding positions of the first sequencing sequence, the second sequencing sequence, and the m sequencing sequences; in response to the bases at corresponding positions of the first sequencing sequence, the second sequencing sequence, and the m sequencing sequences being the same, determining the same bases as the base type at the position; in response to the bases at corresponding positions of the first sequencing sequence, the second sequencing sequence, and the m sequencing sequences being different, determining the base with the highest occurrence frequency among m+2 bases as the base type at the position, or in response to the bases at corresponding positions of the first sequencing sequence, the second sequencing sequence, and the m sequencing sequences being different, determining the base type at the position by statistical data.
  • the repeated sequencing method further comprises: in response to the first sequencing sequence, the second sequencing sequence and the m sequencing sequences having different bases at corresponding positions, and the m+2 bases having base types with the same frequency of occurrence, determining the base type at the position by using statistical data.
  • the statistical data is a Q value.
  • an embodiment of the present application also proposes a method for double-end repeated sequencing, comprising: based on a first sequencing primer, performing first-end repeated sequencing on a sample according to the repeated sequencing method as described in any of the above embodiments of the present application to obtain first-end sequence information of the sample; based on a second sequencing primer, performing second-end sequencing on the sample to obtain second-end sequence information of the sample; and analyzing the first-end sequence information and the second-end sequence information to determine the sequence information of the sample, wherein the first sequencing primer is one of a forward sequencing primer or a reverse sequencing primer, and the second sequencing primer is the other of the forward sequencing primer or the reverse sequencing primer.
  • an embodiment of the present application also proposes a method for double-end repeated sequencing, comprising: based on a first sequencing primer, performing first-end repeated sequencing on a sample according to a repeated sequencing method as described in any of the above embodiments of the present application to obtain first-end sequence information of the sample; based on a second sequencing primer, performing second-end repeated sequencing on the sample according to a repeated sequencing method as described in any of the above embodiments of the present application to obtain second-end sequence information of the sample; and analyzing the first-end sequence information and the second-end sequence information to determine the sequence information of the sample, wherein the first sequencing primer is a forward sequencing primer, and the first-end sequence information is forward-read sequence information, and the second sequencing primer is a reverse sequencing primer, and the second-end sequence information is reverse-read sequence information.
  • performing second end repeat sequencing on the sample based on the second sequencing primer includes: generating a complementary strand of the sample, and performing the second end repeat sequencing based on the complementary strand.
  • the method further comprises: removing the sample, and performing second-end sequencing or second-end repeated sequencing based on the complementary chain.
  • the present application also proposes a repeated sequencing system, including: a sequencing reagent and an elution reagent, wherein the elution reagent is one or more of the following: a DNA denaturant and/or a nuclease, wherein the DNA denaturant includes: methanol, ethanol, urea, formamide, sodium hydroxide, and the nuclease includes: snake venom phosphodiesterase, Escherichia coli nuclease I, Escherichia coli nuclease II, Escherichia coli nuclease III, spleen phosphodiesterase, and Lactobacillus acidophilus nuclease.
  • the DNA denaturant is formamide and/or sodium hydroxide
  • the exonuclease is Escherichia coli exonuclease III.
  • the sequencing reagent comprises a first sequencing primer, a reaction enzyme and a dNTP. In some embodiments, a fluorescent group is attached to the dNTP for base reporting. In some embodiments, the sequencing reagent further comprises a second sequencing primer. In some embodiments, the sequencing reagent further comprises a first barcode sequencing primer and optionally a second barcode sequencing primer.
  • the reaction enzymes include a DNA polymerase and optionally a DNA ligase.
  • Another aspect of the present application also provides a sequencing kit, comprising a repeated sequencing system as described in any of the above embodiments of the present application.
  • the repeated sequencing method and its application in the embodiment of the present application improve the sequencing accuracy from the sequencing logic, by sequencing the same nucleotide sequence template multiple times to obtain multiple sequencing sequences, and then using bioinformatics analysis to mutually correct the obtained multiple sequencing sequences, so as to achieve effective detection of sequence errors, effectively reduce the impact of sequencing errors caused by internal and external environments on sequencing results during sequencing, and significantly improve sequencing accuracy; at the same time, there is no need to perform multiple independent repeated sequencing processes, but only multiple sequencing operations for the same nucleotide sequence template in a single experiment, so it has the advantages of simple operation and low cost.
  • this method is applicable to multiple sequencing platforms and can be widely used to improve the sequencing accuracy of multiple platforms.
  • FIG1 is a schematic diagram of a repeated sequencing method according to an embodiment of the present application.
  • FIG. 2 is a schematic diagram of another repeated sequencing method according to an embodiment of the present application.
  • the second-generation sequencing platforms all use the SBS method, and their sequencing processes are similar, including injecting the reaction solution into the chip by pressure, generating a polymerization reaction, collecting light signals, and then eluting to sequence the next base.
  • Such a process based on the PCR principle is affected by multiple factors such as the external environment, the instrument itself, sequencing reagents, and samples. For example, base mismatches are prone to occur, so sequencing errors cannot be avoided, and with the accumulation of sequencing read lengths, the probability of sequencing errors increases, resulting in a high error rate at the end of the sequencing fragment. Reading bases more accurately is an important development direction for sequencing development.
  • the optimization of related platforms focuses on biochemical reactions and signal acquisition, such as improving fluorescence intensity, improving the fidelity of polymerases, improving the efficiency of elution, or improving the accuracy of optical systems.
  • biochemical reactions and signal acquisition such as improving fluorescence intensity, improving the fidelity of polymerases, improving the efficiency of elution, or improving the accuracy of optical systems.
  • improvements increase sequencing costs and increase dependence on dedicated sequencing reagents and sequencing platforms.
  • multiple independent repeated sequencing is often performed for the same sample to increase the amount of data to increase the sequencing depth, thereby improving the accuracy of sample sequence reduction.
  • multiple independent repeated sequencing requires a large sample input, and the consumption of sequencing reagents is large, and the sequencing cost is high.
  • the inventors have developed a method of repeated sequencing after many experiments and tests. This method optimizes the sequencing scheme in terms of sequencing logic. By repeatedly sequencing the same nucleotide sequence template in a single sequencing, multiple sequencing sequences are obtained, and then these multiple sequencing sequences are mutually corrected, thereby effectively improving the sequencing accuracy.
  • the factors affecting the sequencing errors mentioned above can be effectively reduced.
  • the bases with errors in the first sequencing can be sequenced correctly in the second or third sequencing, or the bases with errors in the second sequencing can be sequenced correctly in the first or third sequencing, and the identification and correction of the wrong bases can be effectively completed, that is, based on sequencing the same DNA template twice or more, the base sequence with the same result twice or more at the same position is determined to be the true correct sequence, thereby effectively improving the sequencing accuracy.
  • a repeated detection method based on single sequencing is simple to operate and has low cost.
  • sample refers to nucleic acid samples to be tested.
  • the sample can be a library to be tested applied to various sequencing platforms, such as single-stranded DNBs, double-stranded DNA libraries, etc.
  • the sample contains a primer binding sequence that can bind to a sample sequencing primer.
  • the sample also contains a barcode primer binding sequence that can bind to a barcode sequencing primer, so as to determine the forward/reverse sequence based on the sequencing results of the barcode (barcode or index), or to distinguish between different samples.
  • the sample also contains a sequence that can bind to a solid phase carrier to fix the sample on the solid phase carrier.
  • solid phase carrier refers to a solid phase medium on which a nucleic acid sample to be tested can be fixed for subsequent sequencing.
  • a sequence that can identify and bind to the nucleic acid sample to be tested is fixed on the solid phase carrier to fix the nucleic acid sample to be tested thereon.
  • a groove of a prefabricated size is provided on the solid phase carrier to match the size of the nucleic acid sample to be tested to fix it thereon.
  • a chemical group is attached to the solid phase carrier to be connected to the nucleic acid sample to be tested by chemical bonds, van der Waals forces, etc. to fix it thereon.
  • the solid phase carrier can be a bead, a chip, etc.
  • the solid phase carrier can be a chip.
  • the chip is used to fix a conventional double-stranded DNA library or a single-stranded DNA library.
  • the chip is also used for DNA nanoballs (DNBs) prepared based on the double-stranded DNA library or the single-stranded DNA library.
  • DNB DNA nanoballs
  • the DNB can be fixed on the chip by means of, for example, probe technology, and the sequencing chip is sequenced based on cPAL (combined probe anchoring ligation method) or CPAS (combined probe anchoring polymerization technology), so as to obtain the nucleic acid information of the DNA nanoball.
  • the DNA nanoball is fixed on the chip through a mesh hole or hexamethyldisilazane.
  • replicateated sequencing refers to a single independent sequencing process, for the same sample to be tested, by eluting the synthetic chain generated in the sequencing process and restarting the sequencing process while synthesizing, so as to perform multiple repeated sequencing processes, such as performing the first sequencing, the second sequencing or more repeated sequencing.
  • the same sample can also be sequenced m times on the basis of two sequencings to obtain m sequencing sequences based on m sequencing synthetic chains; using elution reagents to remove the m or m-1 sequencing synthetic chains respectively; and comparing the first sequencing sequence, the second sequencing sequence and the m sequencing sequences to determine the sequence information of the sample, wherein m can be any positive integer greater than 0, for example, it can be 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 or more times of repeated sequencing.
  • sampling primers may include sample sequencing primers, such as a forward sample sequencing primer for forward sequencing (Forward strand) and a reverse sample sequencing primer for reverse sequencing (Reverse strand), i.e., a first sequencing primer and/or a second sequencing primer.
  • sample sequencing primers such as a forward sample sequencing primer for forward sequencing (Forward strand) and a reverse sample sequencing primer for reverse sequencing (Reverse strand), i.e., a first sequencing primer and/or a second sequencing primer.
  • “sequencing primers” may also include barcode sequencing primers, such as a forward barcode sequencing primer for forward sequencing (Forward strand) and a reverse barcode sequencing primer for reverse sequencing (Reverse strand), i.e., a barcode 1 sequencing primer and the barcode 2 sequencing primer.
  • "obtaining a sequencing sequence based on a sequencing synthetic chain” refers to the principle of sequencing while synthesizing: by adding DNA polymerase, primers and four dNTPs with base-specific fluorescent labels to the reaction system at the same time, the 3'-OH of these dNTPs is protected by chemical methods, so that only one dNTP can be added at a time, which ensures that only one base will be added at a time during the sequencing process.
  • all unused free dNTPs and DNA polymerase will be washed away. The recording of the fluorescence signal is completed by optical equipment, and finally the optical signal is converted into a sequencing base by computer analysis.
  • obtaining a sequencing sequence based on a sequencing synthetic chain means that a sequencing synthetic chain is generated when the sample is sequenced, and the base reading can be completed by optical signal reading and conversion based on the specific optical signal (fluorescent signal) released when the synthetic chain is synthesized, thereby obtaining a sequencing sequence.
  • elution of synthetic chains refers to the use of elution reagents to remove the synthetic chains generated during the sequencing process.
  • the elution reagent can be a reagent that breaks the hydrogen bonds between the base pairs of the DNA double strands, thereby converting the double strands into single strands.
  • the elution reagent can be a DNA denaturant and/or an exonuclease, wherein the DNA denaturant can be an organic or inorganic reagent that can destroy the double helix structure; the exonuclease hydrolyzes the DNA molecule chain into single nucleotides by sequentially hydrolyzing the phosphodiester bonds at the ends of the DNA molecule chain.
  • the DNA denaturant can be: methanol, ethanol, urea, formamide, sodium hydroxide, etc.
  • the exonuclease can be: snake venom phosphodiesterase, Escherichia coli exonuclease I, Escherichia coli exonuclease II, Escherichia coli exonuclease II, Bacillus exonuclease III, spleen phosphodiesterase and Lactobacillus acidophilus nuclease (Lac-tobacillus acidophilus), etc.
  • the exonuclease may include snake venom phosphodiesterase, Escherichia coli exonuclease II and/or Escherichia coli exonuclease III.
  • the concentration of DNA denaturing agents such as formamide or sodium hydroxide can be 30%-100%, preferably 40-100%. In some embodiments, the concentration of DNA denaturing agents is 50% or 100%. In some embodiments, the concentration of exonuclease can be 1-50U/ ⁇ L, preferably 1-10U/ ⁇ L. In some embodiments, the concentration of exonuclease can be 4-8U/ ⁇ L and any concentration value within the range thereof, such as 4U/ ⁇ L, 5U/ ⁇ L, 6U/ ⁇ L, 7U/ ⁇ L, 8U/ ⁇ L, etc.
  • the elution time of the elution reagent can be 0-20min, preferably 5-15min, more preferably 5-10min, for example, 5min, 6min, 7min, 8min, 9min, 10min.
  • the elution temperature of the elution reagent can be 10-60°C, preferably 20-50°C, and more preferably any value between 25-45°C, for example, it can be 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 37.5, 38, 38.5, 39, 39.5, 40, 40.5, 41, 42, 43, 44, 45 or any value therebetween.
  • the sequencing with a library amount of 10fmol per single channel (4 channels on a slide, and a corresponding sequencing system volume of 40 ⁇ L for a single channel) is taken as an example, and the concentration of formamide used can be adjusted according to the temperature and time of the reaction (the concentration of formamide is 100%). In some embodiments, when the elution time is 2min-20min and the elution temperature is 25°C-40°C, the concentration of formamide can be any value between 25%-100%.
  • the reaction conditions can be a final concentration of Escherichia coli of 1U/ ⁇ L-2U/ ⁇ L, a reaction temperature of 25-37°C, and a reaction time of 2-10min.
  • the comparison between sequences obtained by repeated sequencing refers to comparing multiple sequences measured by repeated sequencing for the same sample.
  • the comparison can determine the position of each base of each sequence obtained by repeated sequencing based on the coordinate information of the same reference sequence, and perform statistics on each base at the same position to analyze the true type of the base at the position.
  • the comparison can be performed directly between sequences obtained by repeated sequencing without the need for the reference sequence to provide coordinate information.
  • the sequence information of the sample is determined based on the comparison of the sequencing sequences obtained by repeated sequencing. Specifically, when the number of repeated sequencing is 2, the bases at the same position (also the corresponding position) of the two sequences obtained by repeated sequencing are counted, and in response to the same bases at the corresponding positions of the first sequencing sequence and the second sequencing sequence, the same base is determined as the base type at the position; in response to the different bases at the corresponding positions of the first sequencing sequence and the second sequencing sequence, the base type at the position is determined by statistical data.
  • the statistical data is a Q value (Q-score/Q phred, i.e., base quality value, which reflects the quality value score of the measured base.
  • the Q values of the bases at the corresponding positions of the first sequencing sequence and the second sequencing sequence are compared. According to the Q value weighting method, based on the fact that the Q value of the base x at the position of the first sequencing sequence is greater than the Q value of the base y at the position of the second sequencing sequence, the base x is determined as the base type at the position.
  • the comparison between the repeated sequencing sequences includes: comparing the first sequencing sequence, the second sequencing sequence, and the m sequencing sequences to determine the sequence information of the sample.
  • the process specifically includes: comparing the bases at the corresponding positions of the first sequencing sequence, the second sequencing sequence, and the m sequencing sequences; in response to the bases at the corresponding positions of the first sequencing sequence, the second sequencing sequence, and the m sequencing sequences being the same, determining the same base as the base type at the position; in response to the bases at the corresponding positions of the first sequencing sequence, the second sequencing sequence, and the m sequencing sequences Different, the frequencies of m+2 bases are counted, and the base with the highest frequency among the m+2 bases is determined as the base type at the position.
  • the base type at the position is determined by statistical data.
  • the statistical data is a Q value, that is, the base type with a larger Q value is determined as the base type at the position. It is understandable that by performing error detection and correction on multiple sequencing of the same position, sequencing errors caused by environmental factors inside and outside the sequencing can be effectively reduced.
  • the above-mentioned repeated sequencing method can be applied to single-end sequencing (Single-end sequencing) and/or double-end sequencing (Pair-end sequencing).
  • the above-mentioned repeated sequencing method can be applied to single-end sequencing, and this single-end sequencing can be forward sequencing or reverse sequencing.
  • the repeated sequencing method proposed in the embodiments of the present application will not affect the sequencing of the other end when applied to single-end sequencing, that is, after the repeated sequencing of the single end is performed, the sequencing of the other end can be performed normally.
  • the sequencing of the other end can be a single sequencing, or the other end can be repeatedly sequenced based on the sequencing primer at the other end according to the repeated sequencing method described in any of the above embodiments of the present application.
  • the present application embodiment proposes a double-end sequencing method.
  • the double-end sequencing method when the other end is normal sequencing (i.e., single sequencing), includes: based on the second sequencing primer, the sample is sequenced at the second end to obtain the second end sequence information of the sample; and the first end sequence information and the second end sequence information obtained by single-end repeated sequencing are analyzed to determine the sequence information of the sample, wherein the first sequencing primer is one of the forward sequencing primer or the reverse sequencing primer, and the second sequencing primer is the other of the forward sequencing primer or the reverse sequencing primer. It is understandable that the accuracy of the sequencing data can be effectively improved by repeated sequencing of the single end and normal sequencing of the other end.
  • the embodiment of the present application also proposes another double-end sequencing method.
  • the double-end sequencing method includes: based on the first sequencing primer, the sample is repeated sequencing at the first end according to the repeated sequencing method described in any of the above embodiments of the present application to obtain the first end sequence information of the sample; based on the second sequencing primer, the sample is repeated sequencing at the second end according to the repeated sequencing method described in any of the above embodiments of the present application to obtain the second end sequence information of the sample; and the first end sequence information and the second end sequence information are analyzed to determine the sequence information of the sample, wherein the first sequencing primer is a forward sequencing primer, the first end sequence information is the forward reading sequence information, and the second sequencing primer is a reverse sequencing primer, and the second end sequence information is the reverse reading sequence information.
  • the double-end sequencing method proposed in the embodiment of the present application includes two single-end repeated sequencing, and the base sequence is determined based on multiple sequences of repeated sequencing, thereby effectively improving the accuracy of the sequencing
  • the other-end sequencing refers to the sequencing relative to the first single-end sequencing.
  • the other-end sequencing refers to reverse sequencing; when the first single-end sequencing is reverse sequencing, the other-end sequencing refers to forward sequencing.
  • reverse sequencing the sample may further include: generating a complementary strand (i.e., a second strand) of the sample, and performing reverse sequencing based on the complementary strand.
  • the sample may be removed to perform reverse sequencing based on the complementary strand.
  • the generation of the complementary chain is based on multiple displacement amplification (MDA) of DNA polymerase, so the complementary chain can also be called MDA chain.
  • MDA multiple displacement amplification
  • the method when performing single-end sequencing, the method further includes: sequencing the barcode sequence in the sample, wherein the barcode sequence has sample specificity to distinguish different samples.
  • the barcode sequence can also be repeatedly sequenced.
  • the method when performing double-end sequencing, the method further includes: respectively sequencing barcode 1 and barcode 2 contained in the sample Sequencing is performed to obtain the sequence information of barcode 1 and the barcode 2; and the sequence information of the sample is screened out according to the sequence information of barcode 1 and barcode 2.
  • barcode 1 and barcode 2 are direction-specific. It is understandable that by combining the sequence information of barcode 1 and barcode 2, the direction of the sequence information of the sample can be distinguished, that is, a certain sequence is determined to be a forward read sequence or a reverse read sequence. Therefore, in some embodiments, the sequence information of the sample may include the first end sequence information (forward read sequence information) and the second end sequence information (reverse read sequence information).
  • the later statistics and analysis of the sequencing data can be implemented by analysis software, scripts, command lines, etc., as long as the sequence alignment, Q value statistics, base determination and other steps in the embodiments of the present application can be completed.
  • the repeated sequencing method proposed in the embodiments of the present application is applicable to multiple sequencing platforms, and can be applied to the sequencing of DNB templates based on the principle of Rolling Circle Replication, and can also be applied to the sequencing of linear DNA templates based on the principle of Bridge PCR. It is based on the template and/or the reverse complementary chain of the template, and the sequencing synthetic chain generated during the sequencing process is eluted, which effectively improves the accuracy of the sequence output of multiple platforms.
  • the repeated sequencing method in the embodiment of the present application improves the sequencing accuracy from the sequencing logic, by sequencing the same nucleotide sequence template multiple times to obtain multiple sequencing sequences, and then using bioinformatics analysis to mutually correct the obtained multiple sequencing sequences, so as to achieve effective detection of sequence errors, effectively reduce the impact of sequencing errors caused by internal and external environments on sequencing results during sequencing, and significantly improve sequencing accuracy; at the same time, there is no need to perform multiple independent repeated sequencing processes, but only multiple sequencing operations for the same nucleotide sequence template in a single experiment, so it has the advantages of simple operation and low cost.
  • this method is applicable to multiple sequencing platforms and can be widely used to improve the sequencing accuracy of multiple platforms.
  • the present application embodiment also proposes a repeated sequencing system, including: sequencing reagents and elution reagents, wherein the elution reagent is one or more of the following: DNA denaturants and/or exonucleases, wherein the DNA denaturants include: methanol, ethanol, urea, formamide, sodium hydroxide; the exonucleases include: snake venom phosphodiesterase, Escherichia coli exonuclease I, Escherichia coli exonuclease II, Escherichia coli exonuclease III, spleen phosphodiesterase and Lactobacillus acidophilus nuclease.
  • the DNA denaturant is formamide and/or sodium hydroxide
  • the exonuclease is Escherichia coli exonuclease III.
  • the sequencing reagent includes a sequencing primer, such as a forward sequencing primer and/or a reverse sequencing primer (corresponding to the first sequencing primer and/or the second sequencing primer), a reaction enzyme, and a dNTP.
  • a fluorescent group is attached to the dNTP for base reporting to read the base type based on different fluorescent colors to obtain the sequencing sequence of the sample.
  • the sequencing reagent also includes a barcode sequencing primer for reading the barcode information in single-end or double-end sequencing.
  • the reaction enzyme in the sequencing reagent is used for amplification of the synthetic chain, etc., and the reaction enzyme may include DNA polymerase and optionally DNA ligase.
  • the embodiments of the present application also provide a sequencing kit, comprising the repeated sequencing system provided in any of the above embodiments.
  • Figure 1 is a schematic diagram of a repeated sequencing method according to Example 1 of the present application.
  • the method may include i. a biochemical experiment scheme and ii. a bioinformatics analysis scheme, wherein the biochemical experiment scheme includes:
  • a Fix the nucleotide sequence template to be tested on the chip, introduce sequencing primers for annealing, and then perform the first sequencing. Detect and record the fluorescence information of multiple rounds of SBS reactions in the first sequencing, and obtain the first base sequence information of the nucleotide template to be tested through the fluorescence information.
  • An organic solvent capable of denaturing the DNA double-strand or an enzyme with 3' end exo-cleavage function is introduced to remove the first sequencing chain described in (a) above, and sequencing primers are introduced again for annealing, and a second sequencing is performed.
  • the fluorescence information of multiple rounds of SBS reactions in the second sequencing is detected and recorded, and the second base sequence information of the nucleotide template to be tested is obtained through the fluorescence information.
  • An organic solvent capable of denaturing the DNA double-strand or an enzyme with 3' end exo-cleavage function is introduced to remove the second sequencing chain described in (b) above, and sequencing primers are introduced again for annealing, and a third sequencing is performed, and the fluorescence information of multiple rounds of SBS reactions in the third sequencing is detected and recorded, and the third base sequence information of the nucleotide template to be tested is obtained through the fluorescence information.
  • the bioinformatics analysis scheme includes the following steps: bioinformatics analysis is performed on the sequences obtained by sequencing three times for each nucleotide sequence template: first, the same base is read in the three sequencings of the same alignment site, and the three identical bases are identified as the correct bases of the site; the bases of the same alignment site are the same twice and different once, and the two identical bases are identified as the correct bases of the site; if the bases of the same alignment site are different three times, further correction is performed, specifically: the Q value weight method is used to determine the base with the largest Q value as the correct base of the site.
  • the correctly read bases and the corrected bases are spliced into a complete sequence to improve the accuracy of the sequencing sequence.
  • the repeated sequencing method in this embodiment effectively improves the sequence accuracy by sequencing the same template twice or more (sequencing-removing the sequencing chain-resequencing) and performing mutual comparison and correction analysis on the two or more sequencing results.
  • FIG. 2 is a schematic diagram of another repeated sequencing method according to Example 2 of the present application. As shown in Figure 2, the method can be applied to related sequencing technologies based on bridge amplification, including i. a biochemical experimental scheme and ii. a bioinformatics analysis scheme, wherein the biochemical experimental scheme includes:
  • the sequencing primers are introduced again for annealing, followed by multiple rounds of SBS reactions for the second sequencing (one base is measured in each round of SBS reactions), and the fluorescence information of the multiple rounds of SBS reactions in the second sequencing is detected and recorded, and the base sequence of the second sequencing is obtained through the fluorescence information. information.
  • the bioinformatics analysis scheme includes the following steps: bioinformatics analysis is performed on the sequence obtained by three sequencing of each nucleotide sequence template: first, the same base is read in three sequencings of the same comparison site, and the three identical bases are identified as the correct bases of the site; the bases of the same comparison site are the same twice and different once, and the two identical bases are identified as the correct bases of the site; if the bases of the same comparison site are different three times, further correction is performed, specifically: the Q value weight method is used to determine the base with the largest Q value as the correct base of the site.
  • the measured bases are the same twice and different once, the minority obeys the majority method is adopted; if the measured bases are different in the three sequencings, the Q value weight method is adopted to take the base with the largest Q value. Finally, the correctly read bases and the corrected bases are spliced into a complete sequence to improve the accuracy of the sequencing sequence.
  • the repeated sequencing method based on bridge amplification in this embodiment may also optionally include reverse sequencing of the two strands to obtain double-end sequencing results, and the method may include: synthesizing a complementary strand (i.e., a second strand) of each nucleotide sequence template; removing each nucleotide sequence template; and reverse sequencing based on the complementary strand.
  • a complementary strand i.e., a second strand
  • Its bioinformatics analysis scheme is the same as the analysis scheme of the above-mentioned single-end sequencing.
  • the repeated sequencing method based on bridge amplification in this embodiment effectively improves the sequence accuracy by sequencing the same template twice or more (sequencing-removal of sequencing chain-resequencing) and performing mutual comparison and correction analysis on the two or more sequencing results.
  • E. coli PCR free samples using the MGISEQ-2000RS high-throughput sequencing kit (MGI, catalog number 1000012552) on the MGISEQ-2000 platform, according to the instructions for use of the kit and sequencing platform, three SE100 sequencings (i.e., single-end 100bp repeated sequencing) were performed using the nucleotide sequence of the sample as a template:
  • the nucleotide sequence template of the E. coli PCR free sample is fixed on the chip, and the CPAS AD153 sequencing primer working solution in the sequencing kit above is passed through for annealing, and then the first SE100 sequencing is performed, and the fluorescence information of multiple rounds of SBS reactions in the first sequencing is detected and recorded (one base is measured in each round of SBS reaction), and the first base sequence information of the nucleotide template is obtained through the fluorescence information.
  • Table 1 shows the results of the first SE100 sequencing (i.e., 1st) obtained in step 1); Table 2 shows the results of the second (2nd) and third SE100 sequencing (3rd) obtained in steps 2) and 3).
  • the quality index Q30 of the first sequencing is 91.03%.
  • the quality index Q30 of the second sequencing is 83.46%.
  • the quality index Q30 of the third sequencing is 81.07%. It can be seen that the use of formamide to remove the sequencing chain will affect the production of DNB templates. Some damage occurred, resulting in a decrease in the second and third sequencing quality, but the data as a whole still maintained a high quality, that is, the three rounds of sequencing maintained a high data quality output.
  • the BWA MEM alignment algorithm (https://doi.org/10.48550/arXiv.1303.3997) was used to calibrate the sequences obtained from the three SE100 sequencing runs: first, the same base was read in three sequencing runs at the same alignment site, and the three identical bases were identified as the correct bases for the site; the same alignment site had the same base twice and a different base once, and the two identical bases were identified as the correct bases for the site; if the three sequencing bases at the same alignment site were different, further correction was performed: the Q value weighting method was used to determine the base with the largest Q value as the correct base for the site.
  • Table 3 shows the bioinformatics analysis results of the first SE100 sequencing of this embodiment
  • Table 4 shows the results of mutual calibration of the three SE100 sequencings of this embodiment.
  • the error rate index Mismatch rate! N (%) showed a downward trend, and the Mismatch rate! N (%) during the first sequencing was 0.48%.
  • the Mismatch rate! N (%) was 0.12%, which was 75% lower than the error rate of the first sequencing; in addition, compared with the single sequencing, the data quality indexes Q20 and Q30 values after the integrated analysis of the three sequencings were significantly improved, and the mapping rate (%) also increased, indicating that the repeated sequencing method of the embodiment of the present application can ensure the quality of the sequencing results and effectively improve the accuracy of the sequencing results.
  • E. coli PCR free samples using the MGISEQ-2000RS high-throughput sequencing kit (MGI, catalog number 1000012551) on the MGISEQ-2000 platform, according to the instructions for use of the kit and sequencing platform, three SE50 sequencing (i.e., single-end 50bp repeated sequencing) was performed using the nucleotide sequence of the sample as a template:
  • the nucleotide sequence template of the E. coli PCR free sample is fixed on the chip, and the CPAS AD153 sequencing primer working solution is passed through for annealing. Then, the first SE50 sequencing is performed, and the fluorescence information of multiple rounds of SBS reactions in the first sequencing is detected and recorded (one base is measured in each round of SBS reaction), and the first base sequence information of the nucleotide template is obtained through the fluorescence information.
  • Table 5 shows the results of the first SE50 sequencing (i.e., 1st) obtained in step 1); Table 6 shows the results of the second (2nd) and third SE50 sequencing (3rd) obtained in steps 2) and 3).
  • the quality index Q30 of the first sequencing is 95.34%.
  • the quality index Q30 of the second sequencing is 93.82%.
  • the quality index Q30 of the third sequencing is 93.71%. It can be seen that the use of formamide to remove the sequencing chain will affect the DNB template production.
  • the second and third rounds of sequencing were damaged to a certain extent, resulting in a slight decrease in the quality of the second and third rounds of sequencing, but the data as a whole still maintained a high quality.
  • the sequencing read length of SE50 in this example is shorter, its overall Q30 is improved, and the decrease in the second and third rounds of sequencing quality is not obvious, that is, the three rounds of sequencing maintain extremely high data quality output.
  • the BWA MEM alignment algorithm was used to calibrate the sequences obtained from the three SE50 sequencing runs: first, the same base was read in three sequencing runs at the same alignment site, and the three identical bases were identified as the correct bases for the site; the same alignment site had the same base twice and a different base once, and the two identical bases were identified as the correct bases for the site; if the three sequencing bases at the same alignment site were different, further correction was performed: the Q value weighting method was used to determine the base with the largest Q value as the correct base for the site.
  • Table 7 is the bioinformatics analysis result of the first SE50 sequencing of this embodiment
  • Table 8 is the result of mutual calibration of the three SE50 sequencings of this embodiment.
  • the error rate index Mismatch rate! N (%) showed a downward trend, and the Mismatch rate! N (%) during the first sequencing was 0.2%.
  • the Mismatch rate! N (%) was 0.06%, which was 60% lower than the error rate of the first sequencing; in addition, compared with the single sequencing, the data quality indexes Q20 and Q30 values after the integrated analysis of the three sequencings were significantly improved, and the mapping rate (%) also increased, indicating that the repeated sequencing method of the embodiment of the present application can ensure the quality of the sequencing results and effectively improve the accuracy of the sequencing results.
  • the nucleotide sequence templates of the two samples, E. coli PCR free and E. coli PCR, were fixed on the chip, and the CPAS AD153 sequencing primer working solution was introduced for annealing. Then, the first SE50 sequencing was performed, and the fluorescence information of multiple rounds of SBS reactions in the first sequencing was detected and recorded (one base was measured in each round of SBS reaction), and the first base sequence information of the two nucleotide templates was obtained through the fluorescence information.
  • Table 9 shows the results of the first SE50 sequencing of the two samples obtained in step 1) (i.e., 1st); Table 10 shows the results of the second (2nd) and third SE50 sequencing (3rd) of the two samples obtained in steps 2) and 3).
  • Table 9 shows the results of the first SE50 sequencing of the two samples obtained in step 1) (i.e., 1st); Table 10 shows the results of the second (2nd) and third SE50 sequencing (3rd) of the two samples obtained in steps 2) and 3).
  • the quality index Q30 of the first sequencing the PCR free sample is 92.75%; the PCR sample is 92.71%.
  • the quality index Q30 for the second sequencing is 86.33% for the PCR free sample; the PCR sample is 85.42%.
  • the quality index Q30 for the third sequencing is 82.82% for the PCR free sample; the PCR sample is 82.35%. It can be seen that the use of Exonuclease III to remove the sequencing chain will cause certain damage to the DNB template, resulting in the second and third The sequencing quality of each sequencing was reduced, but the data as a whole still maintained a high quality, that is, the three rounds of sequencing maintained a high data quality output.
  • the BWA MEM alignment algorithm was used to calibrate the sequences obtained from the three SE50 sequencing of the above two samples: first, the three sequencing reads of the same alignment site were the same base, and the three identical bases were identified as the correct bases for the site; the same alignment site had the same base twice and a different base once, and the two identical bases were identified as the correct bases for the site; if the three sequencing bases of the same alignment site were different, further correction was performed: the Q value weight method was used to determine the base with the largest Q value as the correct base for the site.
  • Table 11 shows the bioinformatics analysis results of the first SE50 sequencing of the two samples of this embodiment
  • Table 12 shows the results of mutual calibration of the three SE50 sequencing of the two samples of this embodiment.
  • the error rate indicator Mismatch rate! N (%) showed a downward trend, that is, for the Mismatch rate! N (%) during the first sequencing, the PCR free library was 0.04%; the PCR library was 0.06%.
  • the Mismatch rate! N (%) of the PCR free library was 0.03%, which was a 25% decrease compared with the error rate of the first sequencing; the PCR library was 0.05%, which was a 17% decrease compared with the error rate of the first sequencing.
  • the data quality indicators Q20 and Q30 values of the two samples after the integrated analysis of the three sequencings were significantly improved, indicating that the repeated sequencing method of the embodiment of the present application can ensure the quality of the sequencing results and effectively improve the accuracy of the sequencing results.
  • E. coli PCR sample using E. coli PCR sample (MGI), using MGISEQ-2000RS high-throughput sequencing kit (MGI, catalog number 1000012551) on the MGISEQ-2000 platform, according to the instructions for use of the kit and sequencing platform, use the nucleotide sequence of the sample as a template to perform SE50 sequencing (i.e., single-end 50bp sequencing), and after removing the sequencing synthesis chain, perform PE50 sequencing (i.e., double-end 50bp sequencing):
  • SE50 sequencing i.e., single-end 50bp sequencing
  • PE50 sequencing i.e., double-end 50bp sequencing
  • the nucleotide sequence template of the E. coli PCR sample was fixed on the chip, and the CPAS AD153 sequencing primer working solution was introduced for annealing.
  • SE50 sequencing was performed once, and the fluorescence information of multiple rounds of SBS reactions in this SE50 sequencing was detected and recorded (one base was measured in each round of SBS reaction).
  • the base sequence information of the nucleotide template in this SE50 sequencing was obtained through the fluorescence information.
  • Table 13 shows the results of a single SE50 sequencing (i.e., 1st) obtained in step 1);
  • Table 14 shows the results of PE50 sequencing obtained in step 3).
  • the quality index Q30 of the first single SE sequencing is 96.1%. After the first formamide reaction to remove the sequencing chain, the quality index Q30 of the subsequent PE50 sequencing is 96.37% for Read1, 96.0% for Read2, and 96.19% for Total. It can be seen that the use of formamide to remove the sequencing chain of SE50 and then perform PE50 sequencing did not show that formamide had any damage to the DNB template, and the data maintained a relatively high quality overall.
  • the BWA MEM alignment algorithm was used to perform bioinformatics analysis on the above SE50 and PE50 sequencing results.
  • the data statistics are shown in Tables 15 and 16.
  • the repeated sequencing method proposed in the embodiment of the present application can be flexibly combined with normal single-end/double-end sequencing, and can obtain high-quality and high-accuracy sequencing results.
  • the nucleotide sequence template of the E. coli PCR sample is fixed on the chip, and the high-fidelity polymerase and dNTP substrate with chain displacement function in the kit are introduced. Under the action of this polymerase, the second chain sequencing template (i.e., the complementary chain of the first chain) is synthesized.
  • the CPAS AD153 sequencing primer 2 working solution was introduced again for annealing, and a second 50-base sequencing was performed.
  • the fluorescence information of the multiple rounds of SBS reactions in the second sequencing was detected and recorded, and the second 50-base sequence information based on the second chain nucleotide template was obtained through the fluorescence information.
  • Table 17 shows the results of the first 50bp sequencing obtained in step 2) (i.e., 1st);
  • Table 18 shows the results of the second (2nd), third (3rd), and fourth (4th) 50bp sequencing obtained in steps 4) and 5).
  • the quality index Q30 of the first 50bp sequencing is 94.7%, after the first formamide reaction to remove the sequencing chain, the quality index Q30 of the second 50bp sequencing is 94.7%, after the second formamide reaction to remove the sequencing chain, the quality index Q30 of the third 50bp sequencing is 94.79%, and after the third formamide reaction to remove the sequencing chain, the quality index Q30 of the fourth 50bp sequencing is 93.18%.
  • the BWA MEM alignment algorithm was used to calibrate the sequences obtained from the above four 50 bp double-strand sequencing, including the mutual calibration between the three sequencing results (Triple) and the mutual calibration between the four sequencing results (Quadruple), as follows:
  • the mutual correction process of three sequencing runs (the first, third, and fourth sequencing data are analyzed): First, the same base is read in three sequencing runs for the same alignment site, and the three identical bases are identified as the correct bases for the site; the bases read in two sequencing runs for the same alignment site are the same and one is different, and the identical bases read in the two reads are identified as the correct bases for the site; if the bases read in three sequencing runs for the same alignment site are different, the Q value weighting method is used to select the base with the largest Q value as the correct base for the site.
  • the mutual correction process of four sequencings First, the same base is read in four sequencings of the same alignment site, and the four identical bases are identified as the correct bases for the site; the bases read in three sequencings of the same alignment site are the same and one is different, and the identical bases read twice are identified as the correct bases for the site; if the bases read in two sequencings of the same alignment site are the same and two are different, or the bases read in four sequencings are different, further correction is performed: the Q value weight method is used to take the base with the largest Q value as the correct base for the site.
  • Table 20 Bioinformatics analysis results of four 50 bp sequencing runs - mutual calibration of three runs and mutual calibration of four runs
  • Table 19 is the bioinformatics analysis result of the first 50bp sequencing of this embodiment; Table 20 is the result of mutual correction of three 50bp sequencing and four 50bp sequencing of this embodiment. It can be seen from the results shown in Tables 19 and 20 that after four times of two-chain sequencing, the data volume indicator BaseNum shows a downward trend.
  • the BaseNum of the first 50bp sequencing of the second chain is about 26.87Gb. After the sequencing chain is removed by formamide, the second, third and fourth 50bp sequencing data volume is slightly reduced.
  • the BaseNum used in the mutual correction of three 50bp sequencing is about 25.48Gb
  • the BaseNum used in the mutual correction of four 50bp sequencing is about 25.36Gb. It can be seen that even after three repeated sequencing, the total data volume generated by the repeated sequencing method of the embodiment of the present application is still small in volume, which is easy to store and process later.
  • the error rate indicator Mismatch rate! N (%) showed a downward trend, and the Mismatch rate! N (%) of the first 50bp sequencing of the second strand was 0.18%.
  • the Mismatch rate! N (%) after three 50bp sequencings were mutually corrected was 0.08%, which was 55.6% lower than the error rate of the first 50bp sequencing;
  • the Mismatch rate! N (%) after four 50bp sequencings were mutually corrected was 0.07%, which was 61.1% lower than the error rate of the first 50bp sequencing, indicating that the repeated sequencing method of the embodiment of the present application can ensure the quality of the sequencing results and effectively improve the accuracy of the sequencing results.
  • first and second are used for descriptive purposes only and should not be understood as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as “first” or “second” may explicitly or implicitly include at least one of the features. In the description of the present disclosure, “plurality” means at least two, such as two, three, etc., unless otherwise clearly and specifically defined.

Landscapes

  • Life Sciences & Earth Sciences (AREA)
  • Chemical & Material Sciences (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Organic Chemistry (AREA)
  • Zoology (AREA)
  • Wood Science & Technology (AREA)
  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Microbiology (AREA)
  • Immunology (AREA)
  • Biotechnology (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Analytical Chemistry (AREA)
  • Physics & Mathematics (AREA)
  • Biochemistry (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Genetics & Genomics (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

Provided is a repeated sequencing method, comprising: on the basis of a first sequencing primer, performing first sequencing on a sample to obtain a first sequencing sequence based on a first sequencing synthetic strand; bringing an elution reagent into contact with the first sequencing synthetic strand so as to remove the first sequencing synthetic strand; on the basis of the first sequencing primer, performing second sequencing on the sample to obtain a second sequencing sequence based on a second sequencing synthetic strand; and comparing the first sequencing sequence with the second sequencing sequence to determine sequence information of the sample.

Description

重复测序的方法Repeat sequencing method 技术领域Technical Field

本申请涉及生物测序技术领域,具体涉及一种重复测序的方法。The present application relates to the field of biological sequencing technology, and in particular to a method for repeated sequencing.

背景技术Background Art

近年来,随着基因测序行业的发展,基因测序的技术突飞猛进,从一代的Sanger测序法、二代边合成边测序(sequencing by synthesis,SBS)法到三代单分子测序法,这些测序技术在测序过程中,由于测序试剂、测序样本和测序仪器和外部环境等多种原因,不可避免地会出现测序错误。特别是二代测序技术,其基于PCR原理,在边合成边测序的过程中由于内部、外部环境,难免出现测序错误,导致测序准确度降低。In recent years, with the development of the gene sequencing industry, the technology of gene sequencing has advanced by leaps and bounds, from the first-generation Sanger sequencing method, the second-generation sequencing by synthesis (SBS) method to the third-generation single-molecule sequencing method. These sequencing technologies will inevitably have sequencing errors during the sequencing process due to various reasons such as sequencing reagents, sequencing samples, sequencing instruments and external environment. In particular, the second-generation sequencing technology is based on the PCR principle. During the process of sequencing by synthesis, sequencing errors are inevitable due to internal and external environments, resulting in reduced sequencing accuracy.

为了提高测序准确度,各个测序平台提出了相关的方案,如Illumina近期公布了“Chemistry X”的新的测序准确度高的化学方法技术,包括新的染料、连接剂、阻断剂、聚合酶;Singular公司发布了“Sequencing Engine”(测序引擎)技术;塞纳生物结合了Fluorogenic荧光发生测序化学技术以及ECC(Error-Correction Code)纠错编码技术,以提高测序准确度。然而,这些平台的优化多侧重生化反应和信号采集,如提高荧光强度、提高聚合酶的保真性、提高洗脱的效率或者提高光学系统的精度,这使得测序成本增加,并且对测序平台的依赖性较高。In order to improve sequencing accuracy, various sequencing platforms have proposed relevant solutions. For example, Illumina recently announced "Chemistry X", a new chemical method technology with high sequencing accuracy, including new dyes, linkers, blockers, and polymerases; Singular released the "Sequencing Engine" technology; Sena Bio combined Fluorogenic fluorescence sequencing chemistry technology and ECC (Error-Correction Code) error correction coding technology to improve sequencing accuracy. However, the optimization of these platforms focuses on biochemical reactions and signal acquisition, such as improving fluorescence intensity, improving the fidelity of polymerases, improving elution efficiency, or improving the accuracy of optical systems, which increases sequencing costs and is highly dependent on sequencing platforms.

由此,亟待提供一种适用于多平台的简单易行且成本较低的高准确度测序方法。Therefore, there is an urgent need to provide a simple, easy, low-cost, and high-accuracy sequencing method suitable for multiple platforms.

发明内容Summary of the invention

为此,本申请的实施例提供了一种重复测序方法。To this end, an embodiment of the present application provides a repeated sequencing method.

本申请第一方面实施例提出了重复测序方法,包括:基于第一测序引物,对样本进行第一测序,以获得基于第一测序合成链的第一测序序列;使洗脱试剂与所述第一测序合成链接触,以去除所述第一测序合成链;基于所述第一测序引物,对所述样本进行第二测序,以获得基于第二测序合成链的第二测序序列;和将所述第一测序序列和所述第二测序序列进行比对,以确定所述样本的序列信息。The first aspect of the present application proposes a repeated sequencing method, comprising: performing a first sequencing on a sample based on a first sequencing primer to obtain a first sequencing sequence based on a first sequencing synthetic chain; contacting an elution reagent with the first sequencing synthetic chain to remove the first sequencing synthetic chain; performing a second sequencing on the sample based on the first sequencing primer to obtain a second sequencing sequence based on a second sequencing synthetic chain; and comparing the first sequencing sequence and the second sequencing sequence to determine the sequence information of the sample.

在一些实施例中,所述基于第一测序引物,对样本进行第一测序或第二测序包括:使所述第一测序引物与所述样本退火结合并进行多个碱基测序循环,其中每个循环完成一个碱基的信号检测,由此获得所述第一测序合成链或所述第二测序合成链的光学信号,并基于所述第一测序合成链或所述第二测序合成链的光学信号获得所述第一测序序列或所述第二测序序列。In some embodiments, the first sequencing or the second sequencing of the sample based on the first sequencing primer includes: annealing the first sequencing primer to the sample and performing multiple base sequencing cycles, wherein each cycle completes signal detection of one base, thereby obtaining an optical signal of the first sequencing synthesis chain or the second sequencing synthesis chain, and obtaining the first sequencing sequence or the second sequencing sequence based on the optical signal of the first sequencing synthesis chain or the second sequencing synthesis chain.

在一些实施例中,基于第一测序引物,对样本进行第一测序还包括:将所述样本固定于用于测序的固相载体。在一些实施例中,所述固相载体为珠粒或芯片。In some embodiments, performing a first sequencing on the sample based on the first sequencing primer further comprises: fixing the sample on a solid phase carrier for sequencing. In some embodiments, the solid phase carrier is a bead or a chip.

在一些实施例中,所述洗脱试剂为以下的一种或多种:DNA变性剂和/或核酸外切酶,其中所述DNA变性剂包括:甲醇、乙醇、尿素、甲酰胺和氢氧化钠,所述核酸外切酶包括:蛇毒磷酸二酯酶、大肠杆菌核酸外切酶I、大肠杆菌核酸外切酶II、大肠杆菌核酸外切酶III、脾脏磷酸二酯酶和嗜酸乳杆菌核酸酶。在一些实施例中,所述DNA变性剂为甲酰胺和/或氢氧化钠,所述核酸外切酶为大肠杆菌核酸外切酶III。In some embodiments, the elution reagent is one or more of the following: a DNA denaturant and/or an exonuclease, wherein the DNA denaturant includes methanol, ethanol, urea, formamide, and sodium hydroxide, and the exonuclease includes snake venom phosphodiesterase, Escherichia coli exonuclease I, Escherichia coli exonuclease II, Escherichia coli exonuclease III, spleen phosphodiesterase, and Lactobacillus acidophilus nuclease. In some embodiments, the DNA denaturant is formamide and/or sodium hydroxide, and the exonuclease is Escherichia coli exonuclease III.

在一些实施例中,所述甲酰胺的浓度为30%-100%。在一些实施例中,所述甲酰胺的浓度为40-100%。在一些实施例中,所述大肠杆菌核酸外切酶III的浓度为1-50U/μL。在一些实施例中,所述大肠杆菌核 酸外切酶III的浓度为1-10U/μL。In some embodiments, the concentration of formamide is 30%-100%. In some embodiments, the concentration of formamide is 40-100%. In some embodiments, the concentration of E. coli exonuclease III is 1-50 U/μL. In some embodiments, the concentration of E. coli exonuclease III is 1-50 U/μL. The concentration of exonuclease III is 1-10 U/μL.

在一些实施例中,所述洗脱试剂的洗脱时间为0-20min。在一些实施例中,所述洗脱试剂的洗脱时间为5-15min。在一些实施例中,所述洗脱试剂的洗脱时间为5-10min。In some embodiments, the elution time of the elution reagent is 0-20 min. In some embodiments, the elution time of the elution reagent is 5-15 min. In some embodiments, the elution time of the elution reagent is 5-10 min.

在一些实施例中,所述洗脱试剂的洗脱温度为10-60℃。在一些实施例中,所述洗脱试剂的洗脱温度为20-50℃。在一些实施例中,所述洗脱试剂的洗脱温度为25-45℃。In some embodiments, the elution temperature of the elution reagent is 10-60° C. In some embodiments, the elution temperature of the elution reagent is 20-50° C. In some embodiments, the elution temperature of the elution reagent is 25-45° C.

在一些实施例中,将所述第一测序序列和所述第二测序序列进行比对,以确定所述样本的序列信息,包括:将所述第一测序序列和所述第二测序序列的对应位置的碱基进行比对;响应于所述第一测序序列和所述第二测序序列的对应位置的碱基相同,将所述相同碱基确定为所述位置处的碱基类型;响应于所述第一测序序列和所述第二测序序列的对应位置的碱基不同,通过统计学数据确定所述位置处的碱基类型。In some embodiments, the first sequencing sequence and the second sequencing sequence are compared to determine the sequence information of the sample, including: comparing the bases at corresponding positions of the first sequencing sequence and the second sequencing sequence; in response to the bases at corresponding positions of the first sequencing sequence and the second sequencing sequence being the same, determining the same base as the base type at the position; in response to the bases at corresponding positions of the first sequencing sequence and the second sequencing sequence being different, determining the base type at the position by statistical data.

在一些实施例中,所述统计学数据为Q值。In some embodiments, the statistical data is a Q value.

在一些实施例中,所述通过统计学数据确定所述位置处的碱基类型具体包括:比较所述第一测序序列和所述第二测序序列的对应位置的碱基的所述Q值,基于所述第一测序序列的所述位置处的碱基x的所述Q值大于所述第二测序序列的所述位置处的碱基y的所述Q值,将所述碱基x确定为所述位置处的碱基类型。In some embodiments, determining the base type at the position by using statistical data specifically includes: comparing the Q values of bases at corresponding positions of the first sequencing sequence and the second sequencing sequence, and determining the base x as the base type at the position based on the fact that the Q value of the base x at the position of the first sequencing sequence is greater than the Q value of the base y at the position of the second sequencing sequence.

在一些实施例中,所述重复测序的方法还包括:重复基于所述第一测序引物的测序过程,包括:对所述样本再进行m次测序,以获得基于m个测序合成链的m个测序序列;使所述洗脱试剂分别与所述m个或m-1个测序合成链接触,以去除所述m个或m-1个测序合成链;和将所述第一测序序列、所述第二测序序列和所述m个测序序列进行比对,以确定所述样本的序列信息。In some embodiments, the repeated sequencing method further includes: repeating the sequencing process based on the first sequencing primer, including: sequencing the sample m times again to obtain m sequencing sequences based on m sequencing synthesis chains; allowing the elution reagent to contact the m or m-1 sequencing synthesis chains respectively to remove the m or m-1 sequencing synthesis chains; and comparing the first sequencing sequence, the second sequencing sequence and the m sequencing sequences to determine the sequence information of the sample.

在一些实施例中,m为大于0的正整数。在一些实施例中,1≤m≤10。In some embodiments, m is a positive integer greater than 0. In some embodiments, 1≤m≤10.

在一些实施例中,所述将所述第一测序序列、所述第二测序序列和所述m个测序序列进行比对,以确定所述样本的序列信息包括:将所述第一测序序列、所述第二测序序列和所述m个测序序列的对应位置的碱基进行比对;响应于所述第一测序序列、所述第二测序序列和所述m个测序序列的对应位置的碱基相同,将所述相同碱基确定为所述位置处的碱基类型;响应于所述第一测序序列、所述第二测序序列和所述m个测序序列的对应位置的碱基不同,将m+2个碱基中出现频率最高的碱基确定为所述位置处的碱基类型,或者响应于所述第一测序序列、所述第二测序序列和所述m个测序序列的对应位置的碱基不同,则通过统计学数据确定所述位置处的碱基类型。In some embodiments, the comparing the first sequencing sequence, the second sequencing sequence, and the m sequencing sequences to determine the sequence information of the sample includes: comparing the bases at corresponding positions of the first sequencing sequence, the second sequencing sequence, and the m sequencing sequences; in response to the bases at corresponding positions of the first sequencing sequence, the second sequencing sequence, and the m sequencing sequences being the same, determining the same bases as the base type at the position; in response to the bases at corresponding positions of the first sequencing sequence, the second sequencing sequence, and the m sequencing sequences being different, determining the base with the highest occurrence frequency among m+2 bases as the base type at the position, or in response to the bases at corresponding positions of the first sequencing sequence, the second sequencing sequence, and the m sequencing sequences being different, determining the base type at the position by statistical data.

在一些实施例中,所述重复测序的方法还包括:响应于所述第一测序序列、所述第二测序序列和所述m个测序序列的对应位置的碱基不同,且m+2个碱基中存在出现频率相同的碱基类型,则通过统计学数据确定所述位置处的碱基类型。在一些实施例中,所述统计学数据为Q值。In some embodiments, the repeated sequencing method further comprises: in response to the first sequencing sequence, the second sequencing sequence and the m sequencing sequences having different bases at corresponding positions, and the m+2 bases having base types with the same frequency of occurrence, determining the base type at the position by using statistical data. In some embodiments, the statistical data is a Q value.

在一些实施例中,所述第一测序引物是正向测序引物或反向测序引物。In some embodiments, the first sequencing primer is a forward sequencing primer or a reverse sequencing primer.

本申请另一方面实施例还提出了一种双端重复测序的方法,包括:基于第一测序引物,根据如本申请上述任一实施例中所述的重复测序方法对样本进行第一端重复测序,以获得所述样本的第一端序列信息;基于第二测序引物,对样本进行第二端测序,以获得所述样本的第二端序列信息;和分析所述第一端序列信息和所述第二端序列信息,以确定所述样本的序列信息,其中所述第一测序引物为正向测序引物或反向测序引物中的一个,所述第二测序引物为正向测序引物或反向测序引物中的另一个。 On the other hand, an embodiment of the present application also proposes a method for double-end repeated sequencing, comprising: based on a first sequencing primer, performing first-end repeated sequencing on a sample according to the repeated sequencing method as described in any of the above embodiments of the present application to obtain first-end sequence information of the sample; based on a second sequencing primer, performing second-end sequencing on the sample to obtain second-end sequence information of the sample; and analyzing the first-end sequence information and the second-end sequence information to determine the sequence information of the sample, wherein the first sequencing primer is one of a forward sequencing primer or a reverse sequencing primer, and the second sequencing primer is the other of the forward sequencing primer or the reverse sequencing primer.

本申请另一方面实施例还提出了一种双端重复测序的方法,包括:基于第一测序引物,根据如本申请上述任一实施例中所述的重复测序方法对样本进行第一端重复测序,以获得所述样本的第一端序列信息;基于第二测序引物,根据如本申请上述任一实施例中所述的重复测序方法对样本进行第二端重复测序,以获得所述样本的第二端序列信息;和分析所述第一端序列信息和所述第二端序列信息,以确定所述样本的序列信息,其中所述第一测序引物为正向测序引物,所述第一端序列信息为正读序列信息,并且所述第二测序引物为反向测序引物,所述第二端序列信息为反读序列信息。On the other hand, an embodiment of the present application also proposes a method for double-end repeated sequencing, comprising: based on a first sequencing primer, performing first-end repeated sequencing on a sample according to a repeated sequencing method as described in any of the above embodiments of the present application to obtain first-end sequence information of the sample; based on a second sequencing primer, performing second-end repeated sequencing on the sample according to a repeated sequencing method as described in any of the above embodiments of the present application to obtain second-end sequence information of the sample; and analyzing the first-end sequence information and the second-end sequence information to determine the sequence information of the sample, wherein the first sequencing primer is a forward sequencing primer, and the first-end sequence information is forward-read sequence information, and the second sequencing primer is a reverse sequencing primer, and the second-end sequence information is reverse-read sequence information.

在一些实施例中,基于所述第二测序引物对所述样本进行第二端重复测序包括:生成所述样本的互补链,并基于所述互补链进行所述第二端重复测序。In some embodiments, performing second end repeat sequencing on the sample based on the second sequencing primer includes: generating a complementary strand of the sample, and performing the second end repeat sequencing based on the complementary strand.

在一些实施例中,在生成所述样本的互补链之后,所述方法还包括:将所述样本移除,并基于所述互补链进行第二端测序或所述第二端重复测序。In some embodiments, after generating the complementary chain of the sample, the method further comprises: removing the sample, and performing second-end sequencing or second-end repeated sequencing based on the complementary chain.

在一些实施例中,所述样本为DNA纳米球,所述互补链为MDA链。In some embodiments, the sample is a DNA nanoball, and the complementary strand is an MDA strand.

在一些实施例中,所述样本中含有条形码1和可选地条形码2,所述方法还包括:分别对所述条形码1和所述条形码2进行测序,以获得所述条形码1和所述条形码2的序列信息;和根据所述条形码1和所述条形码2的序列信息,筛选出所述样本的序列信息。In some embodiments, the sample contains barcode 1 and optionally barcode 2, and the method further includes: sequencing the barcode 1 and the barcode 2 respectively to obtain sequence information of the barcode 1 and the barcode 2; and screening out the sequence information of the sample based on the sequence information of the barcode 1 and the barcode 2.

本申请另一方面实施例还提出了一种重复测序体系,包括:测序试剂和洗脱试剂,其中所述洗脱试剂为以下的一种或多种:DNA变性剂和/或核酸外切酶,其中所述DNA变性剂包括:甲醇、乙醇、尿素、甲酰胺、氢氧化钠,所述核酸外切酶包括:蛇毒磷酸二酯酶、大肠杆菌核酸外切酶I、大肠杆菌核酸外切酶II、大肠杆菌核酸外切酶III、脾脏磷酸二酯酶和嗜酸乳杆菌核酸酶。在一些实施例中,所述DNA变性剂为甲酰胺和/或氢氧化钠,所述核酸外切酶为大肠杆菌核酸外切酶III。On the other hand, the present application also proposes a repeated sequencing system, including: a sequencing reagent and an elution reagent, wherein the elution reagent is one or more of the following: a DNA denaturant and/or a nuclease, wherein the DNA denaturant includes: methanol, ethanol, urea, formamide, sodium hydroxide, and the nuclease includes: snake venom phosphodiesterase, Escherichia coli nuclease I, Escherichia coli nuclease II, Escherichia coli nuclease III, spleen phosphodiesterase, and Lactobacillus acidophilus nuclease. In some embodiments, the DNA denaturant is formamide and/or sodium hydroxide, and the exonuclease is Escherichia coli exonuclease III.

在一些实施例中,所述测序试剂包括第一测序引物、反应酶和dNTP。在一些实施例中,所述dNTP上附接荧光基团以用于碱基报告。在一些实施例中,所述测序试剂还包括第二测序引物。在一些实施例中,所述测序试剂还包括第一条形码测序引物和可选地第二条形码测序引物。In some embodiments, the sequencing reagent comprises a first sequencing primer, a reaction enzyme and a dNTP. In some embodiments, a fluorescent group is attached to the dNTP for base reporting. In some embodiments, the sequencing reagent further comprises a second sequencing primer. In some embodiments, the sequencing reagent further comprises a first barcode sequencing primer and optionally a second barcode sequencing primer.

在一些实施例中,所述反应酶包括DNA聚合酶和可选地DNA连接酶。In some embodiments, the reaction enzymes include a DNA polymerase and optionally a DNA ligase.

本申请另一方面实施例还提出了一种测序试剂盒,包括如本申请上述任一实施例中所述的重复测序体系。Another aspect of the present application also provides a sequencing kit, comprising a repeated sequencing system as described in any of the above embodiments of the present application.

本申请的技术方案实现了如下技术效果:The technical solution of this application achieves the following technical effects:

本申请实施例中的重复测序方法及其应用,从测序逻辑上对测序准确度进行了提升,通过对同一核苷酸序列模板进行多次测序,得到多条测序序列,然后利用生物信息分析对所得的多条测序序列进行互相校正,实现了对序列错误的有效检出,有效降低了测序过程中由于内部、外部环境等引起的测序错误对测序结果产生的影响,显著提高了测序准确度;同时,无需进行多次独立的重复测序过程,而仅需在单次实验中针对同一核苷酸序列模板进行多次测序操作,因此具有操作简单且成本较低的优点。此外,此方法适用于多个测序平台,能够广泛用于多平台的测序准确度的提升。The repeated sequencing method and its application in the embodiment of the present application improve the sequencing accuracy from the sequencing logic, by sequencing the same nucleotide sequence template multiple times to obtain multiple sequencing sequences, and then using bioinformatics analysis to mutually correct the obtained multiple sequencing sequences, so as to achieve effective detection of sequence errors, effectively reduce the impact of sequencing errors caused by internal and external environments on sequencing results during sequencing, and significantly improve sequencing accuracy; at the same time, there is no need to perform multiple independent repeated sequencing processes, but only multiple sequencing operations for the same nucleotide sequence template in a single experiment, so it has the advantages of simple operation and low cost. In addition, this method is applicable to multiple sequencing platforms and can be widely used to improve the sequencing accuracy of multiple platforms.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

为了更清楚地说明本申请实施例中的技术方案,下面将对实施例中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。 In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings required for use in the embodiments will be briefly introduced below. Obviously, the drawings described below are some embodiments of the present application. For ordinary technicians in this field, other drawings can be obtained based on these drawings without paying creative work.

图1为根据本申请实施例的重复测序方法示意图;FIG1 is a schematic diagram of a repeated sequencing method according to an embodiment of the present application;

图2为根据本申请实施例的另一重复测序方法示意图。FIG. 2 is a schematic diagram of another repeated sequencing method according to an embodiment of the present application.

具体实施方式DETAILED DESCRIPTION

下面结合具体实施方式对本发明进行进一步的详细描述,给出的实施例仅为了阐明本发明,并非限制本发明的范围。以下提供的实施例可作为本技术领域普通技术人员进行进一步改进的指南,并不以任何方式构成对本发明的限制。The present invention is further described in detail below in conjunction with specific embodiments. The examples provided are only for illustrating the present invention and are not intended to limit the scope of the present invention. The examples provided below can be used as a guide for further improvements by those of ordinary skill in the art and are not intended to limit the present invention in any way.

本申请是基于发明人的以下认识做出的:This application is made based on the following knowledge of the inventor:

在相关技术中,二代测序平台均使用SBS方法,它们的测序过程相似,包括将反应液通过压力注入到芯片,发生聚合反应,采集光信号,然后进行洗脱,进行下一个碱基测序。这样的基于PCR原理的过程受外环境、仪器本身、测序试剂、样本等多种因素影响,比如容易出现碱基错配,因此无法避免会出现测序错误,且随着测序读长的累积,测序错误概率增加,导致测序片段末端的错误率偏高。更准确地读取碱基是测序发展的重要发展方向。目前相关平台的优化多侧重生化反应和信号采集上,如提高荧光强度、提高聚合酶的保真性、提高洗脱的效率或者提高光学系统的精度,这些改进使得测序成本增加,且对专用测序试剂、测序平台的依赖性增加。此外,相关技术中,往往针对同一样本进行多次独立的重复测序来增加数据量以增加测序深度,由此提高样本序列还原的准确性。然而,这样的多次独立的重复测序需要较大的样本投入量,并且测序试剂的消耗量大,测序成本高昂。In the related art, the second-generation sequencing platforms all use the SBS method, and their sequencing processes are similar, including injecting the reaction solution into the chip by pressure, generating a polymerization reaction, collecting light signals, and then eluting to sequence the next base. Such a process based on the PCR principle is affected by multiple factors such as the external environment, the instrument itself, sequencing reagents, and samples. For example, base mismatches are prone to occur, so sequencing errors cannot be avoided, and with the accumulation of sequencing read lengths, the probability of sequencing errors increases, resulting in a high error rate at the end of the sequencing fragment. Reading bases more accurately is an important development direction for sequencing development. At present, the optimization of related platforms focuses on biochemical reactions and signal acquisition, such as improving fluorescence intensity, improving the fidelity of polymerases, improving the efficiency of elution, or improving the accuracy of optical systems. These improvements increase sequencing costs and increase dependence on dedicated sequencing reagents and sequencing platforms. In addition, in the related art, multiple independent repeated sequencing is often performed for the same sample to increase the amount of data to increase the sequencing depth, thereby improving the accuracy of sample sequence reduction. However, such multiple independent repeated sequencing requires a large sample input, and the consumption of sequencing reagents is large, and the sequencing cost is high.

基于此,发明人经多次实验和测试,开发了一种重复测序的方法,该方法在测序逻辑上对测序方案进行了优化,通过在单次测序中对同一核苷酸序列模板进行重复测序,得到多条测序序列,然后对此多条测序序列进行互相校正,有效提高了测序准确度。Based on this, the inventors have developed a method of repeated sequencing after many experiments and tests. This method optimizes the sequencing scheme in terms of sequencing logic. By repeatedly sequencing the same nucleotide sequence template in a single sequencing, multiple sequencing sequences are obtained, and then these multiple sequencing sequences are mutually corrected, thereby effectively improving the sequencing accuracy.

在本申请实施例中,通过对相同DNA模板进行第二次或者第三次测序,能够有效降低如上提到的测序错误的影响因素,比如基于多次测序,第一次测序错误的碱基在第二次或者第三次测序中能测序正确,或者第二次测序错误的碱基在第一次或者第三次测序中能测序正确,能够有效完成错误碱基的识别和校正,即基于对同一DNA模板进行两次或两次以上测序,同一位置处出现两次或两次以上相同结果的碱基序列判定为真实的正确的序列,由此有效提高了测序准确度。同时,相较于多次独立的测序过程,这样的基于单次测序的重复检测方法操作简单且成本较低。In the embodiments of the present application, by sequencing the same DNA template for the second or third time, the factors affecting the sequencing errors mentioned above can be effectively reduced. For example, based on multiple sequencing, the bases with errors in the first sequencing can be sequenced correctly in the second or third sequencing, or the bases with errors in the second sequencing can be sequenced correctly in the first or third sequencing, and the identification and correction of the wrong bases can be effectively completed, that is, based on sequencing the same DNA template twice or more, the base sequence with the same result twice or more at the same position is determined to be the true correct sequence, thereby effectively improving the sequencing accuracy. At the same time, compared with multiple independent sequencing processes, such a repeated detection method based on single sequencing is simple to operate and has low cost.

本申请实施例中,“样本”、“待测序列”、“待测样本”、“待测的核苷酸序列模板”是指待测核酸样品。在一些实施例中,该样本可以为应用于各个测序平台的待测文库,例如单链的DNB、双链DNA文库等。在一些实施例中,样本中包含能够与样本测序引物结合的引物结合序列。在一些实施中,样本中还包含能够与条形码测序引物结合的条形码引物结合序列,以基于条形码(barcode或index)的测序结果确定正向/反向序列,或实现不同样本间的区分。在一些实施例中,样本中还包含能与固相载体结合的序列,以将该样本固定于固相载体上。In the embodiments of the present application, "sample", "sequence to be tested", "sample to be tested", and "nucleotide sequence template to be tested" refer to nucleic acid samples to be tested. In some embodiments, the sample can be a library to be tested applied to various sequencing platforms, such as single-stranded DNBs, double-stranded DNA libraries, etc. In some embodiments, the sample contains a primer binding sequence that can bind to a sample sequencing primer. In some implementations, the sample also contains a barcode primer binding sequence that can bind to a barcode sequencing primer, so as to determine the forward/reverse sequence based on the sequencing results of the barcode (barcode or index), or to distinguish between different samples. In some embodiments, the sample also contains a sequence that can bind to a solid phase carrier to fix the sample on the solid phase carrier.

本申请实施例中,“固相载体”是指能够将待测核酸样本固定于其上以便后续测序的固相介质。在一些实施例中,固相载体上固定有能够识别并于待测核酸样本结合的序列,以将待测核酸样本固定于其上。在另一些实施例中,固相载体上具有预制大小的凹槽,以通过与待测核酸样本的大小匹配以将其固定在其上。在另一些实施例中,固相载体上附接有化学基团,以通过化学键、范德华力等与待测核酸样本连接以将其固定在其上。在一些实施例中,固相载体可以是珠粒(bead)、芯片等。 In the present application embodiment, "solid phase carrier" refers to a solid phase medium on which a nucleic acid sample to be tested can be fixed for subsequent sequencing. In certain embodiments, a sequence that can identify and bind to the nucleic acid sample to be tested is fixed on the solid phase carrier to fix the nucleic acid sample to be tested thereon. In other embodiments, a groove of a prefabricated size is provided on the solid phase carrier to match the size of the nucleic acid sample to be tested to fix it thereon. In other embodiments, a chemical group is attached to the solid phase carrier to be connected to the nucleic acid sample to be tested by chemical bonds, van der Waals forces, etc. to fix it thereon. In certain embodiments, the solid phase carrier can be a bead, a chip, etc.

在一些实施例中,固相载体可以是芯片。在一些实施例中,该芯片用于固定常规双链DNA文库或单链DNA文库。在一些实施例中,该芯片还用于基于该双链DNA文库或单链DNA文库所制备的DNA纳米球(DNB)。在待测核酸样本为DNB的情况下,可以借助于例如探针技术将DNB固定在该芯片上,并基于cPAL(组合探针锚定连接法)或CPAS(联合探针锚定聚合技术)对测序芯片进行测序,从而获得DNA纳米球的核酸信息。具体地,例如,可以利用四种不同颜色标记的探针读取碱基,然后DNA连接酶将四种不同颜色标记的探针结合到模板的相应碱基上,通过对荧光基团的成像来判断碱基类型。在一些实施例中,DNA纳米球通过网状小孔或者六甲基二硅氮烷固定在芯片上。In some embodiments, the solid phase carrier can be a chip. In some embodiments, the chip is used to fix a conventional double-stranded DNA library or a single-stranded DNA library. In some embodiments, the chip is also used for DNA nanoballs (DNBs) prepared based on the double-stranded DNA library or the single-stranded DNA library. In the case where the nucleic acid sample to be tested is DNB, the DNB can be fixed on the chip by means of, for example, probe technology, and the sequencing chip is sequenced based on cPAL (combined probe anchoring ligation method) or CPAS (combined probe anchoring polymerization technology), so as to obtain the nucleic acid information of the DNA nanoball. Specifically, for example, four different color-labeled probes can be used to read the base, and then the DNA ligase binds the four different color-labeled probes to the corresponding bases of the template, and the base type is judged by imaging the fluorescent group. In some embodiments, the DNA nanoball is fixed on the chip through a mesh hole or hexamethyldisilazane.

本申请实施例中,“重复测序”是指在单个独立的测序过程中,针对同一待测样本,通过洗脱测序过程中生成的合成链并重新启动边合成边测序过程,以进行多次重复的测序过程,例如进行第一测序、第二测序或更多次重复的测序。在一些实施例中,还可以对同一样本在两次测序的基础上再进行m次测序,以获得基于m个测序合成链的m个测序序列;使用洗脱试剂分别将该m个或m-1个测序合成链去除;和将第一测序序列、第二测序序列和该m个测序序列进行比对,以确定样本的序列信息,其中m可以为大于0的任一正整数,例如可以为1、2、3、4、5、6、7、8、9、10、11、12、13、14、15或更多次重复测序。可以理解的是,两次或三次重复测序有效提高了结果的准确性,而随着重复测序次数的增加,测序结果的准确率也会在一定程度上有所提高,重复测序的次数可以根据用户的实际需求确定。此外,可以理解的是,在多次(两次或两次以上)重复测序时,最后一次生成的测序合成链可以选择性的去除。In the embodiments of the present application, "repeated sequencing" refers to a single independent sequencing process, for the same sample to be tested, by eluting the synthetic chain generated in the sequencing process and restarting the sequencing process while synthesizing, so as to perform multiple repeated sequencing processes, such as performing the first sequencing, the second sequencing or more repeated sequencing. In some embodiments, the same sample can also be sequenced m times on the basis of two sequencings to obtain m sequencing sequences based on m sequencing synthetic chains; using elution reagents to remove the m or m-1 sequencing synthetic chains respectively; and comparing the first sequencing sequence, the second sequencing sequence and the m sequencing sequences to determine the sequence information of the sample, wherein m can be any positive integer greater than 0, for example, it can be 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 or more times of repeated sequencing. It can be understood that two or three times of repeated sequencing effectively improves the accuracy of the results, and as the number of repeated sequencing increases, the accuracy of the sequencing results will also be improved to a certain extent, and the number of repeated sequencing can be determined according to the actual needs of the user. In addition, it is understood that when sequencing is repeated multiple times (twice or more than twice), the last generated sequencing synthesis chain can be selectively removed.

本申请实施例中,“测序引物”可以包括样本测序引物,如用于正向测序(Forward strand)的正向样本测序引物和用于反向测序(Reverse strand)的反向样本测序引物,即第一测序引物和/或第二测序引物。在一些实施例中,“测序引物”还可以包括条形码测序引物,如用于正向测序(Forward strand)的正向条形码测序引物和用于反向测序(Reverse strand)的反向条形码测序引物,即条形码1测序引物和所述条形码2测序引物。In the embodiments of the present application, "sequencing primers" may include sample sequencing primers, such as a forward sample sequencing primer for forward sequencing (Forward strand) and a reverse sample sequencing primer for reverse sequencing (Reverse strand), i.e., a first sequencing primer and/or a second sequencing primer. In some embodiments, "sequencing primers" may also include barcode sequencing primers, such as a forward barcode sequencing primer for forward sequencing (Forward strand) and a reverse barcode sequencing primer for reverse sequencing (Reverse strand), i.e., a barcode 1 sequencing primer and the barcode 2 sequencing primer.

本申请实施例中,“基于测序合成链获得测序序列”是指基于边合成边测序的原理:通过向反应体系中同时添加DNA聚合酶、引物和带有碱基特异荧光标记的4种dNTP,这些dNTP的3’-OH被化学方法所保护,因而每次只能添加一个dNTP,这就确保了在测序过程中,一次只会被添加一个碱基。同时在dNTP被添加到合成链上后,所有未使用的游离dNTP和DNA聚合酶会被洗脱掉。通过光学设备完成荧光信号的记录,最后利用计算机分析将光学信号转化为测序碱基。这样荧光信号记录完成后,再加入化学试剂淬灭荧光信号并去除dNTP 3’-OH保护基团,以便能进行下一轮的测序反应(也就是下一个碱基的读取)。因此,可以理解的是,“基于测序合成链获得测序序列”是指在对样本进行测序时生成了测序合成链,基于合成该合成链时所释放的特异性光学信号(荧光信号),通过光学信号读取与转换,即可完成碱基的读取,从而获得测序序列。In the embodiment of the present application, "obtaining a sequencing sequence based on a sequencing synthetic chain" refers to the principle of sequencing while synthesizing: by adding DNA polymerase, primers and four dNTPs with base-specific fluorescent labels to the reaction system at the same time, the 3'-OH of these dNTPs is protected by chemical methods, so that only one dNTP can be added at a time, which ensures that only one base will be added at a time during the sequencing process. At the same time, after the dNTP is added to the synthetic chain, all unused free dNTPs and DNA polymerase will be washed away. The recording of the fluorescence signal is completed by optical equipment, and finally the optical signal is converted into a sequencing base by computer analysis. After the fluorescence signal is recorded, a chemical reagent is added to quench the fluorescence signal and remove the dNTP 3'-OH protecting group so that the next round of sequencing reaction (that is, the reading of the next base) can be carried out. Therefore, it can be understood that "obtaining a sequencing sequence based on a sequencing synthetic chain" means that a sequencing synthetic chain is generated when the sample is sequenced, and the base reading can be completed by optical signal reading and conversion based on the specific optical signal (fluorescent signal) released when the synthetic chain is synthesized, thereby obtaining a sequencing sequence.

本申请实施例中,“合成链的洗脱”是指使用洗脱试剂去除测序过程中产生的合成链。在一些实施例中,洗脱试剂可以为使DNA双链的碱基对间的氢键断裂,从而使双链变成单链的试剂。在一些实施例中,洗脱试剂可以为DNA变性剂和/或核酸外切酶,其中DNA变性剂可以为有机或无机试剂,其能够破坏双螺旋结构;核酸外切酶通过将DNA分子链的末端顺次水解磷酸二酯键,而使该DNA分子链水解为单核苷酸。在一些实施例中,DNA变性剂可以为:甲醇、乙醇、尿素、甲酰胺、氢氧化钠等。在一些实施例中,核酸外切酶可以为:蛇毒磷酸二酯酶、大肠杆菌核酸外切酶I、大肠杆菌核酸外切酶II、大肠 杆菌核酸外切酶III、脾脏磷酸二酯酶和嗜酸乳杆菌核酸酶(Lac-tobacillus acidophilus)等。在一些实施例中,核酸外切酶可以包括蛇毒磷酸二酯酶、大肠杆菌核酸外切酶II和/或大肠杆菌核酸外切酶III。In the embodiments of the present application, "elution of synthetic chains" refers to the use of elution reagents to remove the synthetic chains generated during the sequencing process. In some embodiments, the elution reagent can be a reagent that breaks the hydrogen bonds between the base pairs of the DNA double strands, thereby converting the double strands into single strands. In some embodiments, the elution reagent can be a DNA denaturant and/or an exonuclease, wherein the DNA denaturant can be an organic or inorganic reagent that can destroy the double helix structure; the exonuclease hydrolyzes the DNA molecule chain into single nucleotides by sequentially hydrolyzing the phosphodiester bonds at the ends of the DNA molecule chain. In some embodiments, the DNA denaturant can be: methanol, ethanol, urea, formamide, sodium hydroxide, etc. In some embodiments, the exonuclease can be: snake venom phosphodiesterase, Escherichia coli exonuclease I, Escherichia coli exonuclease II, Escherichia coli exonuclease II, Bacillus exonuclease III, spleen phosphodiesterase and Lactobacillus acidophilus nuclease (Lac-tobacillus acidophilus), etc. In some embodiments, the exonuclease may include snake venom phosphodiesterase, Escherichia coli exonuclease II and/or Escherichia coli exonuclease III.

本申请实施例中,DNA变性剂例如甲酰胺或氢氧化钠的使用浓度可以为30%-100%,优选40-100%。在一些实施例中,DNA变性剂的使用浓度为50%或100%。在一些实施例中,核酸外切酶的使用浓度可以为1-50U/μL,优选为1-10U/μL。在一些实施例中,外切酶的使用浓度可以为4-8U/μL及其范围内的任一浓度值,例如4U/μL、5U/μL、6U/μL、7U/μL、8U/μL等。在一些实施例中,洗脱试剂的洗脱时间可以为0-20min,优选为5-15min,更优选为5-10min,例如可以为5min、6min、7min、8min、9min、10min。在一些实施例中,洗脱试剂的洗脱温度可以为10-60℃,优选为20-50℃,更优选为25-45℃间的任一数值,例如可以为25、26、27、28、29、30、31、32、33、34、35、36、37、37.5、38、38.5、39、39.5、40、40.5、41、42、43、44、45或其间的任一数值。In the embodiments of the present application, the concentration of DNA denaturing agents such as formamide or sodium hydroxide can be 30%-100%, preferably 40-100%. In some embodiments, the concentration of DNA denaturing agents is 50% or 100%. In some embodiments, the concentration of exonuclease can be 1-50U/μL, preferably 1-10U/μL. In some embodiments, the concentration of exonuclease can be 4-8U/μL and any concentration value within the range thereof, such as 4U/μL, 5U/μL, 6U/μL, 7U/μL, 8U/μL, etc. In some embodiments, the elution time of the elution reagent can be 0-20min, preferably 5-15min, more preferably 5-10min, for example, 5min, 6min, 7min, 8min, 9min, 10min. In some embodiments, the elution temperature of the elution reagent can be 10-60°C, preferably 20-50°C, and more preferably any value between 25-45°C, for example, it can be 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 37.5, 38, 38.5, 39, 39.5, 40, 40.5, 41, 42, 43, 44, 45 or any value therebetween.

本申请实施例中,以文库量为10fmol每单个通道(一张载片4个通道,相应的测序体系量为单通道体积40μL)的测序为例,可根据反应的温度和时间对甲酰胺的使用浓度进行调整(甲酰胺的浓度母液为100%)。在一些实施例中,在洗脱时间为2min-20min、洗脱温度为25℃-40℃的情况下,甲酰胺的浓度可以为25%-100%之间的任一数值。在一些实施例中,同样以文库量为10fmol每单个通道(一张载片4个通道,相应的测序体系量为单通道体积40μL)的测序为例,基于大肠杆菌核酸外切酶III的母液浓度为100U/μL,反应条件可以为大肠杆菌终浓度为1U/μL-2U/μL、反应温度为25-37℃、反应时间为2-10min。In the embodiment of the present application, the sequencing with a library amount of 10fmol per single channel (4 channels on a slide, and a corresponding sequencing system volume of 40μL for a single channel) is taken as an example, and the concentration of formamide used can be adjusted according to the temperature and time of the reaction (the concentration of formamide is 100%). In some embodiments, when the elution time is 2min-20min and the elution temperature is 25°C-40°C, the concentration of formamide can be any value between 25%-100%. In some embodiments, also taking the sequencing with a library amount of 10fmol per single channel (4 channels on a slide, and a corresponding sequencing system volume of 40μL for a single channel) as an example, based on the mother liquor concentration of Escherichia coli exonuclease III of 100U/μL, the reaction conditions can be a final concentration of Escherichia coli of 1U/μL-2U/μL, a reaction temperature of 25-37°C, and a reaction time of 2-10min.

本申请实施例中,重复测序获得的序列之间的比对(如第一测序序列和第二测序序列间的比对)是指将针对同一样本经多次重复测序测出的多条序列进行比较,该比对可以基于同一参考序列的坐标信息确定重复测序获得的各个序列的各个碱基的位置,并对同一位置处的各个碱基进行统计,以分析该位置处的碱基的真实类型。在另一些实施例中,该比对可以在重复测序获得的序列之间直接进行,而无需参考序列提供坐标信息。In the embodiments of the present application, the comparison between sequences obtained by repeated sequencing (such as the comparison between the first sequencing sequence and the second sequencing sequence) refers to comparing multiple sequences measured by repeated sequencing for the same sample. The comparison can determine the position of each base of each sequence obtained by repeated sequencing based on the coordinate information of the same reference sequence, and perform statistics on each base at the same position to analyze the true type of the base at the position. In other embodiments, the comparison can be performed directly between sequences obtained by repeated sequencing without the need for the reference sequence to provide coordinate information.

本申请实施例中,基于重复测序获得的测序序列的比对来确定样本的序列信息。具体而言,在重复测序次数为2的情况下,通过将重复测序获得的两个序列的同一位置(也为对应位置)处的碱基进行统计,响应于第一测序序列和第二测序序列的对应位置的碱基相同,则将该相同碱基确定为该位置处的碱基类型;响应于第一测序序列和第二测序序列的对应位置的碱基不同,则通过统计学数据确定该位置处的碱基类型。在一些实施例中,该统计学数据为Q值(Q-score/Q phred,即碱基质量值,其体现测得的碱基的质量值分数,一般情况下Q值越大,表示识别错误的可能性越小,可信度就越高)。在一些实施例中,比较第一测序序列和第二测序序列的对应位置的碱基的Q值,根据Q值权重法,基于第一测序序列的该位置处的碱基x的Q值大于第二测序序列的位置处的碱基y的Q值,则将碱基x确定为该位置处的碱基类型。In the embodiment of the present application, the sequence information of the sample is determined based on the comparison of the sequencing sequences obtained by repeated sequencing. Specifically, when the number of repeated sequencing is 2, the bases at the same position (also the corresponding position) of the two sequences obtained by repeated sequencing are counted, and in response to the same bases at the corresponding positions of the first sequencing sequence and the second sequencing sequence, the same base is determined as the base type at the position; in response to the different bases at the corresponding positions of the first sequencing sequence and the second sequencing sequence, the base type at the position is determined by statistical data. In some embodiments, the statistical data is a Q value (Q-score/Q phred, i.e., base quality value, which reflects the quality value score of the measured base. Generally, the larger the Q value, the smaller the possibility of identification error and the higher the credibility). In some embodiments, the Q values of the bases at the corresponding positions of the first sequencing sequence and the second sequencing sequence are compared. According to the Q value weighting method, based on the fact that the Q value of the base x at the position of the first sequencing sequence is greater than the Q value of the base y at the position of the second sequencing sequence, the base x is determined as the base type at the position.

在一些实施例中,在重复测序次数为2+m的情况下(其中m为大于0的正整数),上述重复测序序列之间的比对包括:将第一测序序列、第二测序序列和m个测序序列进行比对,以确定样本的序列信息。在一些实施例中,该过程具体包括:将第一测序序列、第二测序序列和m个测序序列的对应位置的碱基进行比对;响应于第一测序序列、第二测序序列和m个测序序列的对应位置的碱基相同,则将该相同碱基确定为该位置处的碱基类型;响应于第一测序序列、第二测序序列和m个测序序列的对应位置的碱基 不同,统计m+2个碱基的频率,将m+2个碱基中出现频率最高的碱基确定为该位置处的碱基类型。进一步地,响应于第一测序序列、第二测序序列和m个测序序列的对应位置的碱基不同,且m+2个碱基中存在出现频率相同的碱基类型,则通过统计学数据确定该位置处的碱基类型。在一些实施例中,统计学数据为Q值,即,将Q值较大的碱基类型确定为该位置处的碱基类型。可以理解的是,通过对相同位置多次测序进行错误检测和校正,能够有效减少由测序内外环境因素导致的测序错误。In some embodiments, when the number of repeated sequencing is 2+m (where m is a positive integer greater than 0), the comparison between the repeated sequencing sequences includes: comparing the first sequencing sequence, the second sequencing sequence, and the m sequencing sequences to determine the sequence information of the sample. In some embodiments, the process specifically includes: comparing the bases at the corresponding positions of the first sequencing sequence, the second sequencing sequence, and the m sequencing sequences; in response to the bases at the corresponding positions of the first sequencing sequence, the second sequencing sequence, and the m sequencing sequences being the same, determining the same base as the base type at the position; in response to the bases at the corresponding positions of the first sequencing sequence, the second sequencing sequence, and the m sequencing sequences Different, the frequencies of m+2 bases are counted, and the base with the highest frequency among the m+2 bases is determined as the base type at the position. Further, in response to the different bases at the corresponding positions of the first sequencing sequence, the second sequencing sequence, and the m sequencing sequences, and there are base types with the same frequency of occurrence among the m+2 bases, the base type at the position is determined by statistical data. In some embodiments, the statistical data is a Q value, that is, the base type with a larger Q value is determined as the base type at the position. It is understandable that by performing error detection and correction on multiple sequencing of the same position, sequencing errors caused by environmental factors inside and outside the sequencing can be effectively reduced.

本申请实施例中,上述重复测序的方法可以应用于单端测序(Single-end sequencing)和/或双端测序(Pair-end sequencing)。在一些实施例中,基于测序引物是正向测序引物或反向测序引物,上述重复测序的方法可以应用于单端测序,此单端测序可以为正向测序或反向测序。可以理解的是,本申请实施例提出的重复测序方法在应用于单端测序的情况下,不会影响另一端的测序,也就是说,在进行单端的重复测序后,可以正常进行另一端的测序。在一些实施例中,另一端的测序可以为单次测序,也可以根据本申请上述任一实施例中所述的重复测序方法,基于另一端测序引物,对另一端进行重复测序。In the embodiments of the present application, the above-mentioned repeated sequencing method can be applied to single-end sequencing (Single-end sequencing) and/or double-end sequencing (Pair-end sequencing). In some embodiments, based on the sequencing primer being a forward sequencing primer or a reverse sequencing primer, the above-mentioned repeated sequencing method can be applied to single-end sequencing, and this single-end sequencing can be forward sequencing or reverse sequencing. It can be understood that the repeated sequencing method proposed in the embodiments of the present application will not affect the sequencing of the other end when applied to single-end sequencing, that is, after the repeated sequencing of the single end is performed, the sequencing of the other end can be performed normally. In some embodiments, the sequencing of the other end can be a single sequencing, or the other end can be repeatedly sequenced based on the sequencing primer at the other end according to the repeated sequencing method described in any of the above embodiments of the present application.

由此,本申请实施例提出了一种双端测序方法。在一些实施例中,在另一端为正常测序的情况下(即单次测序),该双端测序方法包括:基于第二测序引物,对样本进行第二端测序,以获得该样本的第二端序列信息;和分析通过单端重复测序获得的第一端序列信息和该第二端序列信息,以确定样本的序列信息,其中第一测序引物为正向测序引物或反向测序引物中的一个,第二测序引物为正向测序引物或反向测序引物中的另一个。可以理解的是,通过单端的重复测序和另一端的正常测序,能够有效提高测序数据的准确性。Thus, the present application embodiment proposes a double-end sequencing method. In some embodiments, when the other end is normal sequencing (i.e., single sequencing), the double-end sequencing method includes: based on the second sequencing primer, the sample is sequenced at the second end to obtain the second end sequence information of the sample; and the first end sequence information and the second end sequence information obtained by single-end repeated sequencing are analyzed to determine the sequence information of the sample, wherein the first sequencing primer is one of the forward sequencing primer or the reverse sequencing primer, and the second sequencing primer is the other of the forward sequencing primer or the reverse sequencing primer. It is understandable that the accuracy of the sequencing data can be effectively improved by repeated sequencing of the single end and normal sequencing of the other end.

本申请实施例还提出了另一种双端测序方法。在另一些实施例中,在另一端为重复测序的情况下,该双端测序方法包括:基于第一测序引物,根据本申请上述任一实施例中所述的重复测序方法对样本进行第一端重复测序,以获得该样本的第一端序列信息;基于第二测序引物,同样根据本申请上述任一实施例中所述的重复测序方法对样本进行第二端重复测序,以获得该样本的第二端序列信息;和分析第一端序列信息和第二端序列信息,以确定所述样本的序列信息,其中第一测序引物为正向测序引物,第一端序列信息为正读序列信息,并且第二测序引物为反向测序引物,第二端序列信息为反读序列信息。可以理解的是,本申请实施例提出的双端测序方法包括了两个单端的重复测序,并基于重复测序的多个序列确定碱基序列,由此有效提高了测序结果的准确性。The embodiment of the present application also proposes another double-end sequencing method. In other embodiments, when the other end is repeated sequencing, the double-end sequencing method includes: based on the first sequencing primer, the sample is repeated sequencing at the first end according to the repeated sequencing method described in any of the above embodiments of the present application to obtain the first end sequence information of the sample; based on the second sequencing primer, the sample is repeated sequencing at the second end according to the repeated sequencing method described in any of the above embodiments of the present application to obtain the second end sequence information of the sample; and the first end sequence information and the second end sequence information are analyzed to determine the sequence information of the sample, wherein the first sequencing primer is a forward sequencing primer, the first end sequence information is the forward reading sequence information, and the second sequencing primer is a reverse sequencing primer, and the second end sequence information is the reverse reading sequence information. It can be understood that the double-end sequencing method proposed in the embodiment of the present application includes two single-end repeated sequencing, and the base sequence is determined based on multiple sequences of repeated sequencing, thereby effectively improving the accuracy of the sequencing results.

可以理解的是,本申请实施例中,另一端测序是指相对于先进行的一个单端而言的。换言之,在先进行的单端测序为正向测序的情况下,该另一端测序是指反向测序;在先进行的单端测序为反向测序的情况下,该另一端测序是指正向测序。It is understood that in the embodiments of the present application, the other-end sequencing refers to the sequencing relative to the first single-end sequencing. In other words, when the first single-end sequencing is forward sequencing, the other-end sequencing refers to reverse sequencing; when the first single-end sequencing is reverse sequencing, the other-end sequencing refers to forward sequencing.

在一些实施例中,对样本进行反向测序还可以包括:生成该样本的互补链(即第二链),并基于该互补链进行反向测序。在一些实施例中,在生成样本的互补链后,还包括将样本移除,以基于该互补链进行反向测序。In some embodiments, reverse sequencing the sample may further include: generating a complementary strand (i.e., a second strand) of the sample, and performing reverse sequencing based on the complementary strand. In some embodiments, after generating the complementary strand of the sample, the sample may be removed to perform reverse sequencing based on the complementary strand.

可以理解的是,当样本为DNB时,该互补链(第二链)的生成基于DNA聚合酶的多重置换扩增(multiple displacement amplification,MDA),由此该互补链也可称为MDA链。It can be understood that when the sample is DNB, the generation of the complementary chain (second chain) is based on multiple displacement amplification (MDA) of DNA polymerase, so the complementary chain can also be called MDA chain.

在本申请实施例中,在进行单端测序时,该方法还包括:对样本中的条形码序列进行测序,其中该条形码序列具有样本特异性,以区分不同样本。在一些实施例中,还可以对条形码序列进行重复测序。In the embodiment of the present application, when performing single-end sequencing, the method further includes: sequencing the barcode sequence in the sample, wherein the barcode sequence has sample specificity to distinguish different samples. In some embodiments, the barcode sequence can also be repeatedly sequenced.

在本申请实施例中,在进行双端测序时,该方法还包括:分别对样本中包含的条形码1和条形码2 进行测序,以获得条形码1和所述条形码2的序列信息;和根据条形码1和根据条形码2的序列信息,筛选出样本的序列信息。在一些实施例中,条形码1和条形码2具有方向特异性。可以理解的是,通过结合条形码1和条形码2的序列信息,能够区分样本的序列信息的方向,即确定某序列为正向读长序列或反向读长序列。因此在一些实施例中,样本的序列信息可以包括第一端序列信息(正读序列信息)和第二端序列信息(反读序列信息)。In the embodiment of the present application, when performing double-end sequencing, the method further includes: respectively sequencing barcode 1 and barcode 2 contained in the sample Sequencing is performed to obtain the sequence information of barcode 1 and the barcode 2; and the sequence information of the sample is screened out according to the sequence information of barcode 1 and barcode 2. In some embodiments, barcode 1 and barcode 2 are direction-specific. It is understandable that by combining the sequence information of barcode 1 and barcode 2, the direction of the sequence information of the sample can be distinguished, that is, a certain sequence is determined to be a forward read sequence or a reverse read sequence. Therefore, in some embodiments, the sequence information of the sample may include the first end sequence information (forward read sequence information) and the second end sequence information (reverse read sequence information).

本申请实施例中,测序数据的后期统计和分析可以以分析软件、脚本、命令行等实现,只要保证能够完成本申请实施例中的序列比对、Q值统计、碱基确定等步骤即可。In the embodiments of the present application, the later statistics and analysis of the sequencing data can be implemented by analysis software, scripts, command lines, etc., as long as the sequence alignment, Q value statistics, base determination and other steps in the embodiments of the present application can be completed.

可以理解的是,本申请实施例中提出的重复测序方法适用于多个测序平台,其既可以应用于以滚环扩增(Rolling Circle Replication)为原理的DNB模板的测序,也可应用于以桥式扩增(Bridge PCR)为原理的线性DNA模板的测序,基于模板和/或模板的反向互补链,并对测序过程中生成的测序合成链进行洗脱,有效提升了多个平台的序列产出的准确度。It can be understood that the repeated sequencing method proposed in the embodiments of the present application is applicable to multiple sequencing platforms, and can be applied to the sequencing of DNB templates based on the principle of Rolling Circle Replication, and can also be applied to the sequencing of linear DNA templates based on the principle of Bridge PCR. It is based on the template and/or the reverse complementary chain of the template, and the sequencing synthetic chain generated during the sequencing process is eluted, which effectively improves the accuracy of the sequence output of multiple platforms.

需要说明的是,本申请实施例中提出的双端测序方法,在另一端测序也为重复测序的情况下,该另一端的重复测序的操作和数据处理也与本申请上方任一实施例中提出的单端的重复测序方法相同,此处不再赘述。It should be noted that for the double-end sequencing method proposed in the embodiments of the present application, when the sequencing at the other end is also repeated sequencing, the operation and data processing of the repeated sequencing at the other end are the same as those of the single-end repeated sequencing method proposed in any of the above embodiments of the present application, and will not be repeated here.

本申请实施例中的重复测序方法,从测序逻辑上对测序准确度进行了提升,通过对同一核苷酸序列模板进行多次测序,得到多条测序序列,然后利用生物信息分析对所得的多条测序序列进行互相校正,实现了对序列错误的有效检出,有效降低了测序过程中由于内部、外部环境等引起的测序错误对测序结果产生的影响,显著提高了测序准确度;同时,无需进行多次独立的重复测序过程,而仅需在单次实验中针对同一核苷酸序列模板进行多次测序操作,因此具有操作简单且成本较低的优点。此外,此方法适用于多个测序平台,能够广泛用于多平台的测序准确度的提升。The repeated sequencing method in the embodiment of the present application improves the sequencing accuracy from the sequencing logic, by sequencing the same nucleotide sequence template multiple times to obtain multiple sequencing sequences, and then using bioinformatics analysis to mutually correct the obtained multiple sequencing sequences, so as to achieve effective detection of sequence errors, effectively reduce the impact of sequencing errors caused by internal and external environments on sequencing results during sequencing, and significantly improve sequencing accuracy; at the same time, there is no need to perform multiple independent repeated sequencing processes, but only multiple sequencing operations for the same nucleotide sequence template in a single experiment, so it has the advantages of simple operation and low cost. In addition, this method is applicable to multiple sequencing platforms and can be widely used to improve the sequencing accuracy of multiple platforms.

本申请实施例还提出了一种重复测序体系,包括:测序试剂和洗脱试剂,其中洗脱试剂为以下的一种或多种:DNA变性剂和/或核酸外切酶,其中DNA变性剂包括:甲醇、乙醇、尿素、甲酰胺、氢氧化钠;核酸外切酶包括:蛇毒磷酸二酯酶、大肠杆菌核酸外切酶I、大肠杆菌核酸外切酶II、大肠杆菌核酸外切酶III、脾脏磷酸二酯酶和嗜酸乳杆菌核酸酶。在一些实施例中,DNA变性剂为甲酰胺和/或氢氧化钠,核酸外切酶为大肠杆菌核酸外切酶III。The present application embodiment also proposes a repeated sequencing system, including: sequencing reagents and elution reagents, wherein the elution reagent is one or more of the following: DNA denaturants and/or exonucleases, wherein the DNA denaturants include: methanol, ethanol, urea, formamide, sodium hydroxide; the exonucleases include: snake venom phosphodiesterase, Escherichia coli exonuclease I, Escherichia coli exonuclease II, Escherichia coli exonuclease III, spleen phosphodiesterase and Lactobacillus acidophilus nuclease. In some embodiments, the DNA denaturant is formamide and/or sodium hydroxide, and the exonuclease is Escherichia coli exonuclease III.

在一些实施例中,测序试剂包括测序引物,例如正向测序引物和/或反向测序引物(对应于第一测序引物和/或第二测序引物)、反应酶和dNTP。在一些实施例中,该dNTP上附接荧光基团以用于碱基报告,以基于不同的荧光颜色读取碱基类型,从而获得样本的测序序列。在一些实施例中,测序试剂中还包括条形码测序引物,以用于进行单端或双端测序中的条形码信息的读取。In some embodiments, the sequencing reagent includes a sequencing primer, such as a forward sequencing primer and/or a reverse sequencing primer (corresponding to the first sequencing primer and/or the second sequencing primer), a reaction enzyme, and a dNTP. In some embodiments, a fluorescent group is attached to the dNTP for base reporting to read the base type based on different fluorescent colors to obtain the sequencing sequence of the sample. In some embodiments, the sequencing reagent also includes a barcode sequencing primer for reading the barcode information in single-end or double-end sequencing.

在一些实施例中,测序试剂中的反应酶用于合成链的扩增等,该反应酶可以包括DNA聚合酶和可选地DNA连接酶。In some embodiments, the reaction enzyme in the sequencing reagent is used for amplification of the synthetic chain, etc., and the reaction enzyme may include DNA polymerase and optionally DNA ligase.

本申请实施例还提出了一种测序试剂盒,包括如上任一实施例提出的重复测序体系。The embodiments of the present application also provide a sequencing kit, comprising the repeated sequencing system provided in any of the above embodiments.

需要说明的是,上方对重复测序方法的实施例的解释说明也适用于上述实施例中的重复测序体系和测序试剂盒,在此不再赘述。It should be noted that the above explanation of the embodiment of the repeated sequencing method is also applicable to the repeated sequencing system and sequencing kit in the above embodiment, and will not be repeated here.

下述实施例中的实验方法,如无特殊说明,均为常规方法,按照本领域内的文献所描述的技术或条 件或者按照产品说明书进行。下述实施例中所用的材料、试剂等,如无特殊说明,均可从商业途径得到。The experimental methods in the following examples, unless otherwise specified, are all conventional methods, according to the techniques or conditions described in the literature in the field. Unless otherwise specified, the materials and reagents used in the following examples can be obtained from commercial sources.

如无特殊说明,以下实施例中的定量试验,均设置三次重复实验,结果取平均值。Unless otherwise specified, the quantitative tests in the following examples were performed three times and the results were averaged.

实施例1Example 1

图1为根据本申请实施例1的重复测序方法示意图。如图1所示,该方法可以包括i.生化实验方案和ii.生物信息分析方案,其中生化实验方案包括:Figure 1 is a schematic diagram of a repeated sequencing method according to Example 1 of the present application. As shown in Figure 1, the method may include i. a biochemical experiment scheme and ii. a bioinformatics analysis scheme, wherein the biochemical experiment scheme includes:

a.将待测的核苷酸序列模板固定在芯片上,通入测序引物进行退火,然后进行第一次测序,检测并记录第一次测序中的多轮SBS反应的荧光信息,通过荧光信息获得待测核苷酸模板第一次的碱基序列信息。a. Fix the nucleotide sequence template to be tested on the chip, introduce sequencing primers for annealing, and then perform the first sequencing. Detect and record the fluorescence information of multiple rounds of SBS reactions in the first sequencing, and obtain the first base sequence information of the nucleotide template to be tested through the fluorescence information.

b.通入能使DNA双链变性的有机溶剂或者具有3’端外切功能的酶,对如上(a)中描述的第一次测序链进行去除,再次通入测序引物进行退火,进行第二次测序,检测并记录第二次测序中的多轮SBS反应的荧光信息,通过荧光信息获得待测核苷酸模板第二次的碱基序列信息。b. An organic solvent capable of denaturing the DNA double-strand or an enzyme with 3' end exo-cleavage function is introduced to remove the first sequencing chain described in (a) above, and sequencing primers are introduced again for annealing, and a second sequencing is performed. The fluorescence information of multiple rounds of SBS reactions in the second sequencing is detected and recorded, and the second base sequence information of the nucleotide template to be tested is obtained through the fluorescence information.

c.通入能使DNA双链变性的有机溶剂或者具有3’端外切功能的酶,对如上(b)中描述的第二次测序链进行去除,再次通入测序引物进行退火,进行第三次测序,检测并记录第三次测序中的多轮SBS反应的荧光信息,通过荧光信息获得待测核苷酸模板第三次的碱基序列信息。c. An organic solvent capable of denaturing the DNA double-strand or an enzyme with 3' end exo-cleavage function is introduced to remove the second sequencing chain described in (b) above, and sequencing primers are introduced again for annealing, and a third sequencing is performed, and the fluorescence information of multiple rounds of SBS reactions in the third sequencing is detected and recorded, and the third base sequence information of the nucleotide template to be tested is obtained through the fluorescence information.

d.重复进行更多次:去除测序链→测序引物退火→测序,从而获得相同核苷酸序列模板的多次测序碱基序列信息。d. Repeat more times: remove sequencing chain → annealing of sequencing primers → sequencing, so as to obtain multiple sequencing base sequence information of the same nucleotide sequence template.

生物信息分析方案包括以下步骤:对每个核苷酸序列模板的三次测序所得到的序列进行生物信息分析:首先,同一比对位点三次测序读取到的都是相同碱基,将此三次相同碱基识别为该位点正确的碱基;同一比对位点两次测序碱基相同、一次碱基不同,将此两次相同的碱基识别为该位点正确的碱基;同一比对位点三次测序碱基都不同,则进行进一步校正,具体为:采取Q值权重法将Q值最大者确定为该位点正确的碱基。例如三次测序中,测得碱基有两次相同一次不同,则采取少数服从多数法;如三次测序中,测得的碱基都不同,则采取Q值权重法取Q值最大者。最后将读取正确的碱基和校正后碱基再拼接成完整序列,以此来提高测序序列的准确性。The bioinformatics analysis scheme includes the following steps: bioinformatics analysis is performed on the sequences obtained by sequencing three times for each nucleotide sequence template: first, the same base is read in the three sequencings of the same alignment site, and the three identical bases are identified as the correct bases of the site; the bases of the same alignment site are the same twice and different once, and the two identical bases are identified as the correct bases of the site; if the bases of the same alignment site are different three times, further correction is performed, specifically: the Q value weight method is used to determine the base with the largest Q value as the correct base of the site. For example, in the three sequencings, if the measured bases are the same twice and different once, the minority obeys the majority method is adopted; if the measured bases are different in the three sequencings, the Q value weight method is adopted to take the base with the largest Q value. Finally, the correctly read bases and the corrected bases are spliced into a complete sequence to improve the accuracy of the sequencing sequence.

本实施例中的重复测序方法,通过对同一模板进行两次或两次以上测序(测序-去除测序链-再测序),并对两次或两次以上测序结果进行相互比对校正分析,有效提高了序列准确度。The repeated sequencing method in this embodiment effectively improves the sequence accuracy by sequencing the same template twice or more (sequencing-removing the sequencing chain-resequencing) and performing mutual comparison and correction analysis on the two or more sequencing results.

实施例2Example 2

图2为根据本申请实施例2的另一重复测序方法示意图。如图2所示,该方法可以应用于相关的基于桥式扩增的测序技术,包括i.生化实验方案和ii.生物信息分析方案,其中生化实验方案包括:Figure 2 is a schematic diagram of another repeated sequencing method according to Example 2 of the present application. As shown in Figure 2, the method can be applied to related sequencing technologies based on bridge amplification, including i. a biochemical experimental scheme and ii. a bioinformatics analysis scheme, wherein the biochemical experimental scheme includes:

1.将待测样本的DNA分子固定在布满接头的流动槽(Flowcell)上。1. Fix the DNA molecules of the sample to be tested on a flow cell covered with connectors.

2.以流动槽上的接头为引物进行多轮桥式扩增,形成Cluster,即待测核苷酸序列模板。2. Use the adapters on the flow cell as primers to perform multiple rounds of bridge amplification to form a Cluster, which is the nucleotide sequence template to be tested.

3.通入测序引物进行退火,接着进行第一次测序的多轮SBS反应(每一轮SBS反应测得一个碱基),检测并记录第一次测序中的多轮SBS反应的荧光信息,通过荧光信息获得第一次测序碱基序列信息。3. Introduce sequencing primers for annealing, and then perform multiple rounds of SBS reactions for the first sequencing (one base is measured in each round of SBS reactions), detect and record the fluorescence information of the multiple rounds of SBS reactions in the first sequencing, and obtain the first sequencing base sequence information through the fluorescence information.

4.使用NaOH碱溶液将第一次测序的测序生成链去除。4. Use NaOH alkaline solution to remove the sequencing chain generated by the first sequencing.

5.再次通入测序引物进行退火,接着进行第二次测序的多轮SBS反应(每一轮SBS反应测得一个碱基),检测并记录第二次测序中的多轮SBS反应的荧光信息,通过荧光信息获得第二次测序碱基序列 信息。5. The sequencing primers are introduced again for annealing, followed by multiple rounds of SBS reactions for the second sequencing (one base is measured in each round of SBS reactions), and the fluorescence information of the multiple rounds of SBS reactions in the second sequencing is detected and recorded, and the base sequence of the second sequencing is obtained through the fluorescence information. information.

6.重复步骤4和5,检测并记录第三次测序中的多轮SBS反应的荧光信息(每一轮SBS反应测得一个碱基),通过荧光信息获得第三次测序的碱基序列信息。6. Repeat steps 4 and 5 to detect and record the fluorescence information of multiple rounds of SBS reactions in the third sequencing (one base is measured in each round of SBS reaction), and obtain the base sequence information of the third sequencing through the fluorescence information.

生物信息分析方案包括以下步骤:对每个核苷酸序列模板的三次测序所得到的序列进行生物信息分析:首先,同一比对位点三次测序读取到的都是相同碱基,将此三次相同碱基识别为该位点正确的碱基;同一比对位点两次测序碱基相同、一次碱基不同,将此两次相同的碱基识别为该位点正确的碱基;同一比对位点三次测序碱基都不同,则进行进一步校正,具体为:采取Q值权重法将Q值最大者确定为该位点正确的碱基例如三次测序中,测得碱基有两次相同一次不同,则采取少数服从多数法;如三次测序中,测得的碱基都不同,则采取Q值权重法取Q值最大者。最后将读取正确的碱基和校正后碱基再拼接成完整序列,以此来提高测序序列的准确性。The bioinformatics analysis scheme includes the following steps: bioinformatics analysis is performed on the sequence obtained by three sequencing of each nucleotide sequence template: first, the same base is read in three sequencings of the same comparison site, and the three identical bases are identified as the correct bases of the site; the bases of the same comparison site are the same twice and different once, and the two identical bases are identified as the correct bases of the site; if the bases of the same comparison site are different three times, further correction is performed, specifically: the Q value weight method is used to determine the base with the largest Q value as the correct base of the site. For example, in the three sequencings, the measured bases are the same twice and different once, the minority obeys the majority method is adopted; if the measured bases are different in the three sequencings, the Q value weight method is adopted to take the base with the largest Q value. Finally, the correctly read bases and the corrected bases are spliced into a complete sequence to improve the accuracy of the sequencing sequence.

本实施例中的基于桥式扩增的重复测序方法还可选地包括对二链进行反向测序,以获得双端测序结果,该方法可以包括:合成每个核苷酸序列模板的互补链(即第二链);将每个核苷酸序列模板移除;基于该互补链进行反向测序。其生物信息分析方案与上述单端测序的分析方案相同。The repeated sequencing method based on bridge amplification in this embodiment may also optionally include reverse sequencing of the two strands to obtain double-end sequencing results, and the method may include: synthesizing a complementary strand (i.e., a second strand) of each nucleotide sequence template; removing each nucleotide sequence template; and reverse sequencing based on the complementary strand. Its bioinformatics analysis scheme is the same as the analysis scheme of the above-mentioned single-end sequencing.

本实施例中的基于桥式扩增的重复测序方法,通过对同一模板进行两次或两次以上测序(测序-去除测序链-再测序),并对两次或两次以上测序结果进行相互比对校正分析,有效提高了序列准确度。The repeated sequencing method based on bridge amplification in this embodiment effectively improves the sequence accuracy by sequencing the same template twice or more (sequencing-removal of sequencing chain-resequencing) and performing mutual comparison and correction analysis on the two or more sequencing results.

实施例3Example 3

采用E.coli PCR free样本(MGI),利用MGISEQ-2000RS高通量测序试剂盒(MGI,货号1000012552)在MGISEQ-2000平台上,按照试剂盒和测序平台的使用说明,以该样本的核苷酸序列为模板进行三次SE100测序(即单端100bp的重复测序):Using E. coli PCR free samples (MGI), using the MGISEQ-2000RS high-throughput sequencing kit (MGI, catalog number 1000012552) on the MGISEQ-2000 platform, according to the instructions for use of the kit and sequencing platform, three SE100 sequencings (i.e., single-end 100bp repeated sequencing) were performed using the nucleotide sequence of the sample as a template:

1)首先将该E.coli PCR free样本的核苷酸序列模板固定在芯片上,通入上方测序试剂盒中的CPAS AD153测序引物工作液进行退火,然后进行第一次SE100测序,检测并记录第一次测序中的多轮SBS反应的荧光信息(其中每轮SBS反应测得一个碱基),通过荧光信息获得核苷酸模板第一次的碱基序列信息。1) First, the nucleotide sequence template of the E. coli PCR free sample is fixed on the chip, and the CPAS AD153 sequencing primer working solution in the sequencing kit above is passed through for annealing, and then the first SE100 sequencing is performed, and the fluorescence information of multiple rounds of SBS reactions in the first sequencing is detected and recorded (one base is measured in each round of SBS reaction), and the first base sequence information of the nucleotide template is obtained through the fluorescence information.

2)通入40μL浓度为100%的甲酰胺在40℃温度条件反应8min(为单通道(lane)用量),以对如上1)中描述的第一次测序链(即测序合成链)进行去除,再次通入CPAS AD153测序引物工作液进行退火,进行第二次SE100测序,检测并记录第二次测序中的多轮SBS反应的荧光信息(其中每轮SBS反应测得一个碱基),通过荧光信息获得核苷酸模板的第二次的碱基序列信息。2) Pass 40 μL of 100% formamide and react at 40°C for 8 minutes (single lane dosage) to remove the first sequencing chain (i.e., sequencing synthesis chain) described in 1) above, pass CPAS AD153 sequencing primer working solution again for annealing, perform a second SE100 sequencing, detect and record the fluorescence information of multiple rounds of SBS reactions in the second sequencing (one base is measured in each round of SBS reaction), and obtain the second base sequence information of the nucleotide template through the fluorescence information.

3)再次通入40μL浓度为100%的甲酰胺在40℃温度条件反应8min,对如上2)中描述的第二次测序链进行去除,再次通入CPAS AD153测序引物工作液进行退火,进行第三次SE100测序,检测并记录第三次测序中的多轮SBS反应的荧光信息(其中每轮SBS反应测得一个碱基),通过荧光信息获得核苷酸模板第三次的碱基序列信息。3) Pass 40 μL of 100% formamide again and react at 40°C for 8 minutes to remove the second sequencing chain described in 2) above, pass CPAS AD153 sequencing primer working solution again for annealing, perform SE100 sequencing for the third time, detect and record the fluorescence information of multiple rounds of SBS reactions in the third sequencing (one base is measured in each round of SBS reaction), and obtain the third base sequence information of the nucleotide template through the fluorescence information.

4)对三次测得的碱基序列信息进行分析,以获得高准确度的碱基序列。4) Analyze the base sequence information obtained three times to obtain a highly accurate base sequence.

表1展示了步骤1)所得的第一次SE100测序的结果(即1st);表2展示了步骤2)和3)所得的第二次(2nd)和第三次SE100测序(3rd)的结果。由表1和2可见,第一次测序的质量指标Q30为91.03%,经过第一遍甲酰胺反应去除测序链,第二次测序的质量指标Q30为83.46%,再次经过第二遍甲酰胺反应去除测序链,第三次测序的质量指标Q30为81.07%。可见使用甲酰胺去除测序链,会对DNB模板产 生一定的损伤,导致第二第三次的测序质量有所下降,但数据整体上仍保持较高的质量,即三轮测序均保持着较高的数据质量产出。Table 1 shows the results of the first SE100 sequencing (i.e., 1st) obtained in step 1); Table 2 shows the results of the second (2nd) and third SE100 sequencing (3rd) obtained in steps 2) and 3). As can be seen from Tables 1 and 2, the quality index Q30 of the first sequencing is 91.03%. After the first formamide reaction to remove the sequencing chain, the quality index Q30 of the second sequencing is 83.46%. After the second formamide reaction to remove the sequencing chain, the quality index Q30 of the third sequencing is 81.07%. It can be seen that the use of formamide to remove the sequencing chain will affect the production of DNB templates. Some damage occurred, resulting in a decrease in the second and third sequencing quality, but the data as a whole still maintained a high quality, that is, the three rounds of sequencing maintained a high data quality output.

表1:第一次SE100测序结果
Table 1: Results of the first SE100 sequencing

表2:第二次和第三次SE100测序结果
Table 2: Second and third SE100 sequencing results

使用BWA MEM比对算法(https://doi.org/10.48550/arXiv.1303.3997),对以上三次SE100测序所得到的序列进行互相校正:首先,同一比对位点三次测序读取到的都是相同碱基,将此三次相同碱基识别为该位点正确的碱基;同一比对位点两次测序碱基相同、一次碱基不同,将此两次相同的碱基识别为该位点正确的碱基;同一比对位点三次测序碱基都不同,则进行进一步校正:采取Q值权重法将Q值最大者确定为该位点正确的碱基。The BWA MEM alignment algorithm (https://doi.org/10.48550/arXiv.1303.3997) was used to calibrate the sequences obtained from the three SE100 sequencing runs: first, the same base was read in three sequencing runs at the same alignment site, and the three identical bases were identified as the correct bases for the site; the same alignment site had the same base twice and a different base once, and the two identical bases were identified as the correct bases for the site; if the three sequencing bases at the same alignment site were different, further correction was performed: the Q value weighting method was used to determine the base with the largest Q value as the correct base for the site.

表3为本实施例的第一次SE100测序的生物信息分析结果;表4为本实施例的三次SE100测序进行互相校正的结果。Table 3 shows the bioinformatics analysis results of the first SE100 sequencing of this embodiment; Table 4 shows the results of mutual calibration of the three SE100 sequencings of this embodiment.

表3:第一次SE100测序的生物信息分析结果-单次测序
Table 3: Bioinformatics analysis results of the first SE100 sequencing - single sequencing

表4:三次SE100测序的生物信息分析结果-三次测序互相校正

Table 4: Bioinformatics analysis results of three SE100 sequencing runs - Mutual correction of three sequencing runs

由表3和表4分别所示的第一次SE100的单次测序分析结果和三次SE100测序互相校正后的结果的比较可见,三次测序后,数据量指标BaseNum呈现出下降趋势,第一次测序时的BaseNum约为47.8Gb,经过甲酰胺去除测序链,第二次和第三次测序数据量有所减少,最后在三次测序互相校正中用到的BaseNum约为27.9Gb,可见即使经由三次重复测序,本申请实施例的重复测序方法所产生的总数据量仍体积较小,易于后期存储和处理。From the comparison of the single sequencing analysis results of the first SE100 and the results after mutual calibration of the three SE100 sequencings shown in Tables 3 and 4, it can be seen that after three sequencings, the data volume indicator BaseNum shows a downward trend. The BaseNum in the first sequencing is about 47.8 Gb. After the sequencing chain is removed by formamide, the data volume of the second and third sequencing is reduced. Finally, the BaseNum used in the mutual calibration of the three sequencings is about 27.9 Gb. It can be seen that even after three repeated sequencing, the total data volume generated by the repeated sequencing method of the embodiment of the present application is still small in volume, which is easy to store and process later.

同时,三次测序后,错误率指标Mismatch rate!N(%)呈现出下降趋势,第一次测序时的Mismatch rate!N(%)为0.48%。三次测序进行互相校正后的Mismatch rate!N(%)为0.12%,与第一次测序的错误率相比下降了75%;此外,相较于单次测序,三次测序整合分析后的数据质量指标Q20和Q30值明显提升,且比对率Mapping rate(%)也有所增加,说明本申请实施例的重复测序方法能够保证测序结果的质量,并有效提升测序结果的准确性。At the same time, after three sequencings, the error rate index Mismatch rate! N (%) showed a downward trend, and the Mismatch rate! N (%) during the first sequencing was 0.48%. After the three sequencings were mutually corrected, the Mismatch rate! N (%) was 0.12%, which was 75% lower than the error rate of the first sequencing; in addition, compared with the single sequencing, the data quality indexes Q20 and Q30 values after the integrated analysis of the three sequencings were significantly improved, and the mapping rate (%) also increased, indicating that the repeated sequencing method of the embodiment of the present application can ensure the quality of the sequencing results and effectively improve the accuracy of the sequencing results.

实施例4Example 4

采用E.coli PCR free样本(MGI),利用MGISEQ-2000RS高通量测序试剂盒(MGI,货号1000012551)在MGISEQ-2000平台上,按照试剂盒和测序平台的使用说明,以该样本的核苷酸序列为模板进行三次SE50测序(即单端50bp的重复测序):Using E. coli PCR free samples (MGI), using the MGISEQ-2000RS high-throughput sequencing kit (MGI, catalog number 1000012551) on the MGISEQ-2000 platform, according to the instructions for use of the kit and sequencing platform, three SE50 sequencing (i.e., single-end 50bp repeated sequencing) was performed using the nucleotide sequence of the sample as a template:

1)首先将该E.coli PCR free样本的核苷酸序列模板固定在芯片上,通入CPAS AD153测序引物工作液进行退火,然后进行第一次SE50测序,检测并记录第一次测序中的多轮SBS反应的荧光信息(其中每轮SBS反应测得一个碱基),通过荧光信息获得核苷酸模板第一次的碱基序列信息。1) First, the nucleotide sequence template of the E. coli PCR free sample is fixed on the chip, and the CPAS AD153 sequencing primer working solution is passed through for annealing. Then, the first SE50 sequencing is performed, and the fluorescence information of multiple rounds of SBS reactions in the first sequencing is detected and recorded (one base is measured in each round of SBS reaction), and the first base sequence information of the nucleotide template is obtained through the fluorescence information.

2)通入40μL浓度为100%的甲酰胺在40℃温度条件反应8min,以对如上1)中描述的第一次测序链(即测序合成链)进行去除,再次通入CPAS AD153测序引物工作液进行退火,进行第二次SE50测序,检测并记录第二次测序中的多轮SBS反应的荧光信息(其中每轮SBS反应测得一个碱基),通过荧光信息获得核苷酸模板的第二次的碱基序列信息。2) Pass 40 μL of 100% formamide and react at 40°C for 8 minutes to remove the first sequencing chain (i.e., the sequencing synthesis chain) described in 1) above, pass CPAS AD153 sequencing primer working solution again for annealing, perform a second SE50 sequencing, detect and record the fluorescence information of multiple rounds of SBS reactions in the second sequencing (one base is measured in each round of SBS reaction), and obtain the second base sequence information of the nucleotide template through the fluorescence information.

3)再次通入40μL浓度为100%的甲酰胺在40℃温度条件反应8min,对如上2)中描述的第二次测序链进行去除,再次通入CPAS AD153测序引物工作液进行退火,进行第三次SE50测序,检测并记录第三次测序中的多轮SBS反应的荧光信息(其中每轮SBS反应测得一个碱基),通过荧光信息获得核苷酸模板第三次的碱基序列信息。3) Pass 40 μL of 100% formamide again and react at 40°C for 8 minutes to remove the second sequencing chain described in 2) above, pass CPAS AD153 sequencing primer working solution again for annealing, perform SE50 sequencing for the third time, detect and record the fluorescence information of multiple rounds of SBS reactions in the third sequencing (one base is measured in each round of SBS reaction), and obtain the third base sequence information of the nucleotide template through the fluorescence information.

4)对三次测得的碱基序列信息进行分析,以获得高准确度的碱基序列。4) Analyze the base sequence information obtained three times to obtain a highly accurate base sequence.

表5展示了步骤1)所得的第一次SE50测序的结果(即1st);表6展示了步骤2)和3)所得的第二次(2nd)和第三次SE50测序(3rd)的结果。由表5和6可见,第一次测序的质量指标Q30为95.34%,经过第一遍甲酰胺反应去除测序链,第二次测序的质量指标Q30为93.82%,再次经过第二遍甲酰胺反应去除测序链,第三次测序的质量指标Q30为93.71%,可见使用甲酰胺去除测序链,会对DNB模板产 生一定的损伤,导致第二第三次的测序质量有微小下降,但数据整体上仍保持较高的质量。此外,与实施例3中的SE100的测序结果相比,本实施例中SE50的测序读长较短,其整体的Q30有所提升,且第二和第三次的测序质量下降幅度并不明显,即三轮测序均保持着极高的数据质量产出。Table 5 shows the results of the first SE50 sequencing (i.e., 1st) obtained in step 1); Table 6 shows the results of the second (2nd) and third SE50 sequencing (3rd) obtained in steps 2) and 3). As can be seen from Tables 5 and 6, the quality index Q30 of the first sequencing is 95.34%. After the first formamide reaction to remove the sequencing chain, the quality index Q30 of the second sequencing is 93.82%. After the second formamide reaction to remove the sequencing chain, the quality index Q30 of the third sequencing is 93.71%. It can be seen that the use of formamide to remove the sequencing chain will affect the DNB template production. The second and third rounds of sequencing were damaged to a certain extent, resulting in a slight decrease in the quality of the second and third rounds of sequencing, but the data as a whole still maintained a high quality. In addition, compared with the sequencing results of SE100 in Example 3, the sequencing read length of SE50 in this example is shorter, its overall Q30 is improved, and the decrease in the second and third rounds of sequencing quality is not obvious, that is, the three rounds of sequencing maintain extremely high data quality output.

表5:第一次SE50测序结果
Table 5: First SE50 sequencing results

表6:第二次和第三次SE50测序结果
Table 6: Second and third SE50 sequencing results

使用BWA MEM比对算法,对以上三次SE50测序所得到的序列进行互相校正:首先,同一比对位点三次测序读取到的都是相同碱基,将此三次相同碱基识别为该位点正确的碱基;同一比对位点两次测序碱基相同、一次碱基不同,将此两次相同的碱基识别为该位点正确的碱基;同一比对位点三次测序碱基都不同,则进行进一步校正:采取Q值权重法将Q值最大者确定为该位点正确的碱基。The BWA MEM alignment algorithm was used to calibrate the sequences obtained from the three SE50 sequencing runs: first, the same base was read in three sequencing runs at the same alignment site, and the three identical bases were identified as the correct bases for the site; the same alignment site had the same base twice and a different base once, and the two identical bases were identified as the correct bases for the site; if the three sequencing bases at the same alignment site were different, further correction was performed: the Q value weighting method was used to determine the base with the largest Q value as the correct base for the site.

表7为本实施例的第一次SE50测序的生物信息分析结果;表8为本实施例的三次SE50测序进行互相校正的结果。Table 7 is the bioinformatics analysis result of the first SE50 sequencing of this embodiment; Table 8 is the result of mutual calibration of the three SE50 sequencings of this embodiment.

表7:第一次SE50测序的生物信息分析结果-单次测序
Table 7: Bioinformatics analysis results of the first SE50 sequencing - single sequencing

表8:三次SE50测序的生物信息分析结果-三次测序互相校正

Table 8: Bioinformatics analysis results of three SE50 sequencings - three sequencings were mutually corrected

由表7和表8分别所示的第一次SE50的单次测序分析结果和三次SE50测序互相校正后的结果的比较可见,三次测序后,数据量指标BaseNum呈现出下降趋势,第一次测序时的BaseNum约为27.9Gb,经过甲酰胺去除测序链,第二次和第三次测序数据量稍有减少。最后在三次测序互相校正中用到的BaseNum约为26.1Gb,可见即使经由三次重复测序,本申请实施例的重复测序方法所产生的总数据量仍体积较小,易于后期存储和处理。From the comparison of the single sequencing analysis results of the first SE50 and the results after the three SE50 sequencings are mutually corrected as shown in Tables 7 and 8, it can be seen that after the three sequencings, the data volume indicator BaseNum shows a downward trend. The BaseNum of the first sequencing is about 27.9Gb. After the sequencing chain is removed by formamide, the data volume of the second and third sequencings is slightly reduced. Finally, the BaseNum used in the mutual correction of the three sequencings is about 26.1Gb. It can be seen that even after three repeated sequencings, the total data volume generated by the repeated sequencing method of the embodiment of the present application is still small in volume, which is easy to store and process later.

同时,三次测序后,错误率指标Mismatch rate!N(%)呈现出下降趋势,第一次测序时的Mismatch rate!N(%)为0.2%。三次测序进行互相校正后的Mismatch rate!N(%)为0.06%,与第一次测序的错误率相比下降了60%;此外,相较于单次测序,三次测序整合分析后的数据质量指标Q20和Q30值明显提升,且比对率Mapping rate(%)也有所增加,说明本申请实施例的重复测序方法能够保证测序结果的质量,并有效提升测序结果的准确性。At the same time, after three sequencings, the error rate index Mismatch rate! N (%) showed a downward trend, and the Mismatch rate! N (%) during the first sequencing was 0.2%. After the three sequencings were mutually corrected, the Mismatch rate! N (%) was 0.06%, which was 60% lower than the error rate of the first sequencing; in addition, compared with the single sequencing, the data quality indexes Q20 and Q30 values after the integrated analysis of the three sequencings were significantly improved, and the mapping rate (%) also increased, indicating that the repeated sequencing method of the embodiment of the present application can ensure the quality of the sequencing results and effectively improve the accuracy of the sequencing results.

实施例5Example 5

采用E.coli PCR free(MGI)和E.coli PCR样本(MGI,货号:1000005033),利用BGISEQ-500RS高通量测序试剂盒(MGI,货号1000005485)在BGISEQ-500平台上,按照试剂盒和测序平台的使用说明,以两样本的核苷酸序列为模板分别进行三次SE50测序(即单端50bp的重复测序):Using E. coli PCR free (MGI) and E. coli PCR samples (MGI, catalog number: 1000005033), using the BGISEQ-500RS high-throughput sequencing kit (MGI, catalog number 1000005485) on the BGISEQ-500 platform, according to the instructions for the kit and sequencing platform, the nucleotide sequences of the two samples were used as templates for three SE50 sequencing (i.e., single-end 50bp repeated sequencing):

1)首先将该E.coli PCR free和E.coli PCR这两个样本的核苷酸序列模板固定在芯片上,通入CPAS AD153测序引物工作液进行退火,然后进行第一次SE50测序,检测并记录第一次测序中的多轮SBS反应的荧光信息(其中每轮SBS反应测得一个碱基),通过荧光信息获得两核苷酸模板第一次的碱基序列信息。1) First, the nucleotide sequence templates of the two samples, E. coli PCR free and E. coli PCR, were fixed on the chip, and the CPAS AD153 sequencing primer working solution was introduced for annealing. Then, the first SE50 sequencing was performed, and the fluorescence information of multiple rounds of SBS reactions in the first sequencing was detected and recorded (one base was measured in each round of SBS reaction), and the first base sequence information of the two nucleotide templates was obtained through the fluorescence information.

2)通入60μL浓度为4U/μL的Exonuclease III在37℃温度条件反应10min,以对如上1)中描述的第一次测序链(即测序合成链)进行去除,再次通入CPAS AD153测序引物工作液进行退火,进行第二次SE50测序,检测并记录第二次测序中的多轮SBS反应的荧光信息(其中每轮SBS反应测得一个碱基),通过荧光信息获得两核苷酸模板的第二次的碱基序列信息。2) Pass 60 μL of Exonuclease III at a concentration of 4 U/μL and react at 37°C for 10 min to remove the first sequencing chain (i.e., the sequencing synthesis chain) described in 1) above, pass CPAS AD153 sequencing primer working solution again for annealing, perform a second SE50 sequencing, detect and record the fluorescence information of multiple rounds of SBS reactions in the second sequencing (one base is measured in each round of SBS reaction), and obtain the second base sequence information of the two-nucleotide template through the fluorescence information.

3)再次通入60μL浓度为4U/μL的Exonuclease III在37℃温度条件反应10min,对如上2)中描述的第二次测序链进行去除,再次通入CPAS AD153测序引物工作液进行退火,进行第三次SE50测序,检测并记录第三次测序中的多轮SBS反应的荧光信息(其中每轮SBS反应测得一个碱基),通过荧光信息获得两核苷酸模板第三次的碱基序列信息。3) Pass 60 μL of 4 U/μL Exonuclease III again and react at 37°C for 10 min to remove the second sequencing chain described in 2) above, pass CPAS AD153 sequencing primer working solution again for annealing, perform the third SE50 sequencing, detect and record the fluorescence information of multiple rounds of SBS reactions in the third sequencing (one base is measured in each round of SBS reaction), and obtain the third base sequence information of the two-nucleotide template through the fluorescence information.

4)对两样本的三次测得的碱基序列信息进行分析,以获得高准确度的碱基序列。4) Analyze the base sequence information of the three measurements of the two samples to obtain a base sequence with high accuracy.

表9展示了步骤1)所得的两样本的第一次SE50测序的结果(即1st);表10展示了步骤2)和3)所得的两样本的第二次(2nd)和第三次SE50测序(3rd)的结果。由表9可见,对于第一次测序的质量指标Q30,PCR free样本为92.75%;PCR样本为92.71%。由表10可见,经过第一遍Exonuclease III反应去除测序链,对于第二次测序的质量指标Q30,PCR free样本为86.33%;PCR样本为85.42%,再次经过第二遍Exonuclease III反应去除测序链,对于第三次测序的质量指标Q30,PCR free样本为82.82%;PCR样本为82.35%。,可见使用Exonuclease III去除测序链,会对DNB模板产生一定的损伤,导致第二第三 次的测序质量有所下降,但数据整体上仍保持较高的质量,即三轮测序均保持着较高的数据质量产出。Table 9 shows the results of the first SE50 sequencing of the two samples obtained in step 1) (i.e., 1st); Table 10 shows the results of the second (2nd) and third SE50 sequencing (3rd) of the two samples obtained in steps 2) and 3). As can be seen from Table 9, for the quality index Q30 of the first sequencing, the PCR free sample is 92.75%; the PCR sample is 92.71%. As can be seen from Table 10, after the first Exonuclease III reaction to remove the sequencing chain, the quality index Q30 for the second sequencing is 86.33% for the PCR free sample; the PCR sample is 85.42%. After the second Exonuclease III reaction to remove the sequencing chain, the quality index Q30 for the third sequencing is 82.82% for the PCR free sample; the PCR sample is 82.35%. It can be seen that the use of Exonuclease III to remove the sequencing chain will cause certain damage to the DNB template, resulting in the second and third The sequencing quality of each sequencing was reduced, but the data as a whole still maintained a high quality, that is, the three rounds of sequencing maintained a high data quality output.

表9:第一次SE50测序结果
Table 9: First SE50 sequencing results

表10:第二次和第三次SE50测序结果
Table 10: Second and third SE50 sequencing results

使用BWA MEM比对算法,对以上两样本的三次SE50测序所得到的序列分别进行互相校正:首先,同一比对位点三次测序读取到的都是相同碱基,将此三次相同碱基识别为该位点正确的碱基;同一比对位点两次测序碱基相同、一次碱基不同,将此两次相同的碱基识别为该位点正确的碱基;同一比对位点三次测序碱基都不同,则进行进一步校正:采取Q值权重法将Q值最大者确定为该位点正确的碱基。The BWA MEM alignment algorithm was used to calibrate the sequences obtained from the three SE50 sequencing of the above two samples: first, the three sequencing reads of the same alignment site were the same base, and the three identical bases were identified as the correct bases for the site; the same alignment site had the same base twice and a different base once, and the two identical bases were identified as the correct bases for the site; if the three sequencing bases of the same alignment site were different, further correction was performed: the Q value weight method was used to determine the base with the largest Q value as the correct base for the site.

表11为本实施例的两样本的第一次SE50测序的生物信息分析结果;表12为本实施例的两样本的三次SE50测序进行互相校正的结果。Table 11 shows the bioinformatics analysis results of the first SE50 sequencing of the two samples of this embodiment; Table 12 shows the results of mutual calibration of the three SE50 sequencing of the two samples of this embodiment.

表11:第一次SE50测序的生物信息分析结果-单次测序
Table 11: Bioinformatics analysis results of the first SE50 sequencing - single sequencing

表12:三次SE50测序的生物信息分析结果-三次测序互相校正

Table 12: Bioinformatics analysis results of three SE50 sequencings - three sequencings were mutually corrected

由表11和表12分别所示的两样本的第一次SE50的单次测序分析结果和三次SE50测序互相校正后的结果的比较可见,三次测序后,两样本的数据量指标BaseNum均呈现出下降趋势,第一次测序时的BaseNum,PCR free文库约为10.9Gb;PCR文库约为11.1Gb。经过Exonuclease III去除测序链,第二次和第三次测序数据量有所减少。最后在三次测序互相校正中用到的BaseNum,PCR free文库约为6.8Gb;PCR文库约为6.6Gb,可见即使经由三次重复测序,本申请实施例的重复测序方法所产生的总数据量仍体积较小,易于后期存储和处理。From the comparison of the single sequencing analysis results of the first SE50 of the two samples and the results after mutual calibration of the three SE50 sequencings shown in Tables 11 and 12, it can be seen that after three sequencings, the data volume index BaseNum of the two samples showed a downward trend. The BaseNum of the first sequencing was about 10.9Gb for the PCR free library and about 11.1Gb for the PCR library. After Exonuclease III removed the sequencing chain, the amount of data for the second and third sequencing was reduced. Finally, the BaseNum used in the mutual calibration of the three sequencings was about 6.8Gb for the PCR free library and about 6.6Gb for the PCR library. It can be seen that even after three repeated sequencings, the total amount of data generated by the repeated sequencing method of the embodiment of the present application is still small in size and easy to store and process later.

同时,三次测序后,错误率指标Mismatch rate!N(%)呈现出下降趋势,即对于第一次测序时的Mismatch rate!N(%),PCR free文库为0.04%;PCR文库为0.06%。而三次测序进行互相校正后的Mismatch rate!N(%),PCR free文库为0.03%,与第一次测序的错误率相比下降了25%;PCR文库为0.05%,与第一次测序的错误率相比下降了17%。此外,相较于单次测序,三次测序整合分析后的两样本的数据质量指标Q20和Q30值均有明显提升,说明本申请实施例的重复测序方法能够保证测序结果的质量,并有效提升测序结果的准确性。At the same time, after three sequencings, the error rate indicator Mismatch rate! N (%) showed a downward trend, that is, for the Mismatch rate! N (%) during the first sequencing, the PCR free library was 0.04%; the PCR library was 0.06%. After the three sequencings were mutually corrected, the Mismatch rate! N (%) of the PCR free library was 0.03%, which was a 25% decrease compared with the error rate of the first sequencing; the PCR library was 0.05%, which was a 17% decrease compared with the error rate of the first sequencing. In addition, compared with a single sequencing, the data quality indicators Q20 and Q30 values of the two samples after the integrated analysis of the three sequencings were significantly improved, indicating that the repeated sequencing method of the embodiment of the present application can ensure the quality of the sequencing results and effectively improve the accuracy of the sequencing results.

实施例6Example 6

采用E.coli PCR样本(MGI),利用MGISEQ-2000RS高通量测序试剂盒(MGI,货号1000012551)在MGISEQ-2000平台上,按照试剂盒和测序平台的使用说明,以该样本的核苷酸序列为模板进行一次SE50测序(即单端50bp的测序),并在去除测序合成链后,再进行一次PE50测序(即双端各50bp的测序):Using E. coli PCR sample (MGI), using MGISEQ-2000RS high-throughput sequencing kit (MGI, catalog number 1000012551) on the MGISEQ-2000 platform, according to the instructions for use of the kit and sequencing platform, use the nucleotide sequence of the sample as a template to perform SE50 sequencing (i.e., single-end 50bp sequencing), and after removing the sequencing synthesis chain, perform PE50 sequencing (i.e., double-end 50bp sequencing):

1)首先将该E.coli PCR样本的核苷酸序列模板固定在芯片上,通入CPAS AD153测序引物工作液进行退火,按照MGISEQ-2000的操作说明书,进行一次SE50测序,检测并记录此次SE50测序中的多轮SBS反应的荧光信息(其中每轮SBS反应测得一个碱基),通过荧光信息获得核苷酸模板此次SE50测序的碱基序列信息。1) First, the nucleotide sequence template of the E. coli PCR sample was fixed on the chip, and the CPAS AD153 sequencing primer working solution was introduced for annealing. According to the operating instructions of MGISEQ-2000, SE50 sequencing was performed once, and the fluorescence information of multiple rounds of SBS reactions in this SE50 sequencing was detected and recorded (one base was measured in each round of SBS reaction). The base sequence information of the nucleotide template in this SE50 sequencing was obtained through the fluorescence information.

2)通入40μL浓度为50%的甲酰胺在40℃温度条件反应8min,以对如上1)中描述的SE50的测序链(即测序合成链)进行去除。2) 40 μL of 50% formamide was introduced and reacted at 40° C. for 8 min to remove the sequencing chain (ie, the sequencing synthesis chain) of SE50 described in 1) above.

3)进行PE50测序:通入测序试剂盒中的第一链Read1的测序引物进行退火,按照测序平台的操作说明书进行第一链Read1的50bp测序,检测并记录Read1测序中的多轮SBS反应的荧光信息,通过荧光信息获得第一链核苷酸模板的50个碱基序列信息;然后通入试剂盒中具备链置换功能的高保真聚合酶和dNTP底物,在此聚合酶的作用下合成第二链的测序模板(即第一链的互补链),再通入测序试剂盒中的第二链Read2的测序引物进行退火,接着按照测序平台的操作说明书进行第二链Read2的50bp测 序,检测并记录Read2测序中的多轮SBS反应的荧光信息,通过荧光信息获得第二链核苷酸模板的50个碱基序列信息。3) Perform PE50 sequencing: introduce the sequencing primer of the first chain Read1 in the sequencing kit for annealing, perform 50bp sequencing of the first chain Read1 according to the operating instructions of the sequencing platform, detect and record the fluorescence information of multiple rounds of SBS reactions in Read1 sequencing, and obtain the 50 base sequence information of the first chain nucleotide template through the fluorescence information; then introduce the high-fidelity polymerase and dNTP substrate with chain displacement function in the kit, synthesize the second chain sequencing template (i.e., the complementary chain of the first chain) under the action of this polymerase, then introduce the sequencing primer of the second chain Read2 in the sequencing kit for annealing, and then perform 50bp sequencing of the second chain Read2 according to the operating instructions of the sequencing platform. Sequencing, detect and record the fluorescence information of multiple rounds of SBS reactions in Read2 sequencing, and obtain the 50-base sequence information of the second-chain nucleotide template through the fluorescence information.

4)对测得的碱基序列信息进行分析,以获得碱基序列。4) Analyze the measured base sequence information to obtain the base sequence.

表13展示了步骤1)所得的单次SE50测序的结果(即1st);表14展示了步骤3)所得的PE50测序的结果。Table 13 shows the results of a single SE50 sequencing (i.e., 1st) obtained in step 1); Table 14 shows the results of PE50 sequencing obtained in step 3).

表13:单次SE50测序结果
Table 13: Single SE50 sequencing results

表14:PE50测序结果
Table 14: PE50 sequencing results

由表13和14可见,第一次单次SE测序的质量指标Q30为96.1%,经过第一遍甲酰胺反应去除测序链,对于后续的PE50测序的质量指标Q30,Read1为96.37%,Read2为96.0%,Total为96.19%,可见使用甲酰胺去除SE50的测序链,再进行PE50测序,未显示出甲酰胺对DNB模板具有损伤,数据整体上保持着较高的质量。It can be seen from Tables 13 and 14 that the quality index Q30 of the first single SE sequencing is 96.1%. After the first formamide reaction to remove the sequencing chain, the quality index Q30 of the subsequent PE50 sequencing is 96.37% for Read1, 96.0% for Read2, and 96.19% for Total. It can be seen that the use of formamide to remove the sequencing chain of SE50 and then perform PE50 sequencing did not show that formamide had any damage to the DNB template, and the data maintained a relatively high quality overall.

使用BWA MEM比对算法,对以上SE50和PE50测序结果进行生物信息分析,数据统计如表15和表16所示。The BWA MEM alignment algorithm was used to perform bioinformatics analysis on the above SE50 and PE50 sequencing results. The data statistics are shown in Tables 15 and 16.

表15:SE50测序的生物信息分析结果
Table 15: Bioinformatics analysis results of SE50 sequencing

表16:PE50测序的生物信息分析结果

Table 16: Bioinformatics analysis results of PE50 sequencing

由表15和表16分别所示的SE50测序的生物信息分析结果和其后续的PE50测序的生物信息分析结果可见,数据量指标BaseNum呈现出下降趋势,第一次SE50测序时的BaseNum约为27.78Gb,经过甲酰胺去除测序链,第二次PE50测序时PE50-Read1的数据量与SE50相当,为26.46Gb。From the bioinformatics analysis results of SE50 sequencing and the subsequent PE50 sequencing shown in Tables 15 and 16, respectively, it can be seen that the data volume indicator BaseNum shows a downward trend. The BaseNum during the first SE50 sequencing is about 27.78 Gb. After formamide removal of the sequencing chain, the data volume of PE50-Read1 during the second PE50 sequencing is equivalent to that of SE50, which is 26.46 Gb.

同时,比较SE测序和PE测序结果发现,错误率指标Mismatch rate!N(%)、数据质量指标Q20和Q30值以及比对率Mapping rate(%)指标相当,均指示了较高的数据质量,说明根据本申请实施例提出的重复测序方法进行SE50测序并去除测序链之后,后续可正常进行PE测序,此前进行的单次或多次单端测序不会影响后续的双端测序,因此本申请实施例中提出的重复测序方法能够与正常的单端/双端测序灵活结合,并能够获得高质量和高准确性的测序结果。At the same time, by comparing the results of SE sequencing and PE sequencing, it was found that the error rate indicator Mismatch rate! N (%), data quality indicators Q20 and Q30 values, and mapping rate (%) indicators were comparable, all indicating high data quality, indicating that after SE50 sequencing and removal of the sequencing chain according to the repeated sequencing method proposed in the embodiment of the present application, PE sequencing can be performed normally afterwards, and the single or multiple single-end sequencing performed previously will not affect the subsequent double-end sequencing. Therefore, the repeated sequencing method proposed in the embodiment of the present application can be flexibly combined with normal single-end/double-end sequencing, and can obtain high-quality and high-accuracy sequencing results.

实施例7Example 7

采用E.coli PCR样本(MGI,货号:1000005033),利用MGISEQ-2000RS高通量测序试剂盒(MGI,货号1000012551)在MGISEQ-2000平台上,按照试剂盒和测序平台的使用说明,以该样本的第二链(Reverse Strand)为模板进行四次单端的50bp测序(即单端50bp的重复测序):Using E. coli PCR sample (MGI, catalog number: 1000005033), using MGISEQ-2000RS high-throughput sequencing kit (MGI, catalog number 1000012551) on the MGISEQ-2000 platform, according to the instructions for use of the kit and sequencing platform, the second strand (Reverse Strand) of the sample was used as a template to perform four single-end 50bp sequencing (i.e., single-end 50bp repeated sequencing):

1)首先将该E.coli PCR样本的核苷酸序列模板固定在芯片上,通入试剂盒中具备链置换功能的高保真聚合酶和dNTP底物,在此聚合酶的作用下合成第二链的测序模板(即第一链的互补链)。1) First, the nucleotide sequence template of the E. coli PCR sample is fixed on the chip, and the high-fidelity polymerase and dNTP substrate with chain displacement function in the kit are introduced. Under the action of this polymerase, the second chain sequencing template (i.e., the complementary chain of the first chain) is synthesized.

2)通入测序试剂盒中的第二链的测序引物(CPAS AD153测序引物2工作液)进行退火,接着按照测序平台的操作说明书进行第二链的第一次50bp测序,检测并记录测序中的多轮SBS反应的荧光信息,通过荧光信息获得第二链核苷酸模板的第一次的50个碱基序列信息。2) Introduce the second-chain sequencing primer (CPAS AD153 sequencing primer 2 working solution) in the sequencing kit for annealing, and then perform the first 50bp sequencing of the second chain according to the operating instructions of the sequencing platform. Detect and record the fluorescence information of multiple rounds of SBS reactions during sequencing, and obtain the first 50 base sequence information of the second-chain nucleotide template through the fluorescence information.

3)通入40μL浓度为50%的甲酰胺在40℃温度条件反应8min,以对如上2)中描述的第一次50个碱基的测序链(即测序合成链)进行去除。3) 40 μL of 50% formamide was introduced and reacted at 40° C. for 8 min to remove the first 50-base sequencing chain (ie, the sequencing synthesis chain) described in 2) above.

4)再次通入CPAS AD153测序引物2工作液进行退火,进行第二次50个碱基的测序,检测并记录第二次测序中的多轮SBS反应的荧光信息,通过荧光信息获得以第二链核苷酸模板的第二次50个碱基序列信息。4) The CPAS AD153 sequencing primer 2 working solution was introduced again for annealing, and a second 50-base sequencing was performed. The fluorescence information of the multiple rounds of SBS reactions in the second sequencing was detected and recorded, and the second 50-base sequence information based on the second chain nucleotide template was obtained through the fluorescence information.

5)重复步骤3)和4),检测并记录第三次50bp测序中的多轮SBS反应的荧光信息,通过荧光信息获得第二链核苷酸模板的第三次50个碱基序列信息。5) Repeat steps 3) and 4), detect and record the fluorescence information of multiple rounds of SBS reactions in the third 50 bp sequencing, and obtain the third 50 base sequence information of the second chain nucleotide template through the fluorescence information.

6)重复步骤3)和4),检测并记录第四次50bp测序中的多轮SBS反应的荧光信息,通过荧光信息获得第二链核苷酸模板的第四次50个碱基序列信息。6) Repeat steps 3) and 4), detect and record the fluorescence information of multiple rounds of SBS reactions in the fourth 50 bp sequencing, and obtain the fourth 50 base sequence information of the second chain nucleotide template through the fluorescence information.

7)对四次测得的碱基序列信息进行分析,以获得高准确度的碱基序列。7) Analyze the base sequence information obtained four times to obtain a base sequence with high accuracy.

表17展示了步骤2)所得的第一次50bp测序的结果(即1st);表18展示了步骤4)和5)所得的第二次(2nd)、第三次(3rd)和第四次(4th)50bp测序的结果。 Table 17 shows the results of the first 50bp sequencing obtained in step 2) (i.e., 1st); Table 18 shows the results of the second (2nd), third (3rd), and fourth (4th) 50bp sequencing obtained in steps 4) and 5).

表17:第一次50bp测序结果
Table 17: First 50bp sequencing results

表18:第二次、第三次和第四次的50bp测序结果
Table 18: Second, third and fourth 50bp sequencing results

由表17和18可见,第一次50bp测序的质量指标Q30为94.7%,经过第一遍甲酰胺反应去除测序链,第二次50bp测序的质量指标Q30为94.7%,经过第二遍甲酰胺反应去除测序链,第三次50bp测序的质量指标Q30为94.79%,再次经过第三遍甲酰胺反应去除测序链,第四次50bp测序的质量指标Q30为93.18%。可见使用甲酰胺去除测序链以进行多次重复测序时,第二第三次的测序质量上均没有显示出明显的下降幅度,说明本实施例的方法能够有效用于测序链的洗脱并能够有效保证测序质量;此外,甲酰胺去除测序链会对DNB模板产生一定的损伤,导致第四次的测序质量有微小下降,但数据整体上仍保持较高的质量,即四轮测序均保持着极高的数据质量产出。As can be seen from Tables 17 and 18, the quality index Q30 of the first 50bp sequencing is 94.7%, after the first formamide reaction to remove the sequencing chain, the quality index Q30 of the second 50bp sequencing is 94.7%, after the second formamide reaction to remove the sequencing chain, the quality index Q30 of the third 50bp sequencing is 94.79%, and after the third formamide reaction to remove the sequencing chain, the quality index Q30 of the fourth 50bp sequencing is 93.18%. It can be seen that when formamide is used to remove the sequencing chain for repeated sequencing, the second and third sequencing quality do not show a significant decline, indicating that the method of this embodiment can be effectively used for elution of the sequencing chain and can effectively ensure the sequencing quality; in addition, formamide removal of the sequencing chain will cause certain damage to the DNB template, resulting in a slight decline in the fourth sequencing quality, but the data as a whole still maintains a high quality, that is, the four rounds of sequencing maintain extremely high data quality output.

使用BWA MEM比对算法,对以上四次50bp的二链测序所得到的序列进行互相校正,包括三次测序结果间的互相校正(Triple)以及四次测序结果间的互相校正(Quadruple),具体如下:The BWA MEM alignment algorithm was used to calibrate the sequences obtained from the above four 50 bp double-strand sequencing, including the mutual calibration between the three sequencing results (Triple) and the mutual calibration between the four sequencing results (Quadruple), as follows:

三次测序互相校正流程(取第一、第三、第四次测序数据进行分析):首先,同一比对位点三次测序读取到的都是相同碱基,将此三次相同的碱基识别为该位点正确的碱基;同一比对位点两次测序读取的碱基相同、一次不同,将此两次读取的相同的碱基识别为该位点正确的碱基;同一比对位点三次测序读取的碱基都不同,则采取Q值权重法将Q值最大者作为该位点正确的碱基。The mutual correction process of three sequencing runs (the first, third, and fourth sequencing data are analyzed): First, the same base is read in three sequencing runs for the same alignment site, and the three identical bases are identified as the correct bases for the site; the bases read in two sequencing runs for the same alignment site are the same and one is different, and the identical bases read in the two reads are identified as the correct bases for the site; if the bases read in three sequencing runs for the same alignment site are different, the Q value weighting method is used to select the base with the largest Q value as the correct base for the site.

四次测序互相校正流程:首先,同一比对位点四次测序读取到的都是相同碱基,将此四次相同的碱基识别为该位点正确的碱基;同一比对位点三次测序读取的碱基相同、一次不同,将此两次读取的相同的碱基识别为该位点正确的碱基;同一比对位点两次测序读取的碱基相同、两次不同,或者四次读取的碱基都不同,则进行进一步校正:采取Q值权重法取Q值最大者成为该位点正确的碱基。The mutual correction process of four sequencings: First, the same base is read in four sequencings of the same alignment site, and the four identical bases are identified as the correct bases for the site; the bases read in three sequencings of the same alignment site are the same and one is different, and the identical bases read twice are identified as the correct bases for the site; if the bases read in two sequencings of the same alignment site are the same and two are different, or the bases read in four sequencings are different, further correction is performed: the Q value weight method is used to take the base with the largest Q value as the correct base for the site.

表19:第一次50bp测序的生物信息分析结果-单次测序
Table 19: Bioinformatics analysis results of the first 50bp sequencing - single sequencing

表20:四次50bp测序的生物信息分析结果-三次测序互相校正和四次测序互相校正
Table 20: Bioinformatics analysis results of four 50 bp sequencing runs - mutual calibration of three runs and mutual calibration of four runs

表19为本实施例的第一次50bp测序的生物信息分析结果;表20为本实施例的三次50bp测序及四次50bp测序进行互相校正的结果。由表19和表20所示的结果可见,四次二链测序后,数据量指标BaseNum呈现出下降趋势,第二链第一次50bp测序时的BaseNum约为26.87Gb,经过甲酰胺去除测序链,第二次、第三次和第四次50bp测序数据量稍有减少。最后在三次50bp测序互相校正中用到的BaseNum约为25.48Gb,在四次50bp测序互相校正中用到的BaseNum约为25.36Gb,可见即使经由三次重复测序,本申请实施例的重复测序方法所产生的总数据量仍体积较小,易于后期存储和处理。Table 19 is the bioinformatics analysis result of the first 50bp sequencing of this embodiment; Table 20 is the result of mutual correction of three 50bp sequencing and four 50bp sequencing of this embodiment. It can be seen from the results shown in Tables 19 and 20 that after four times of two-chain sequencing, the data volume indicator BaseNum shows a downward trend. The BaseNum of the first 50bp sequencing of the second chain is about 26.87Gb. After the sequencing chain is removed by formamide, the second, third and fourth 50bp sequencing data volume is slightly reduced. Finally, the BaseNum used in the mutual correction of three 50bp sequencing is about 25.48Gb, and the BaseNum used in the mutual correction of four 50bp sequencing is about 25.36Gb. It can be seen that even after three repeated sequencing, the total data volume generated by the repeated sequencing method of the embodiment of the present application is still small in volume, which is easy to store and process later.

同时,四次二链测序后,错误率指标Mismatch rate!N(%)呈现出下降趋势,第二链第一次50bp测序时的Mismatch rate!N(%)为0.18%。三次50bp测序进行互相校正后的Mismatch rate!N(%)为0.08%,与第一次50bp测序的错误率相比下降了55.6%;四次50bp测序进行互相校正后的Mismatch rate!N(%)为0.07%,与第一次50bp测序的错误率相比下降了61.1%,说明本申请实施例的重复测序方法能够保证测序结果的质量,并有效提升测序结果的准确性。At the same time, after four times of two-strand sequencing, the error rate indicator Mismatch rate! N (%) showed a downward trend, and the Mismatch rate! N (%) of the first 50bp sequencing of the second strand was 0.18%. The Mismatch rate! N (%) after three 50bp sequencings were mutually corrected was 0.08%, which was 55.6% lower than the error rate of the first 50bp sequencing; the Mismatch rate! N (%) after four 50bp sequencings were mutually corrected was 0.07%, which was 61.1% lower than the error rate of the first 50bp sequencing, indicating that the repeated sequencing method of the embodiment of the present application can ensure the quality of the sequencing results and effectively improve the accuracy of the sequencing results.

术语“第一”、“第二”仅用于描述目的,而不能理解为指示或暗示相对重要性或者隐含指明所指示的技术特征的数量。由此,限定有“第一”、“第二”的特征可以明示或者隐含地包括至少一个该特征。在本公开的描述中,“多个”的含义是至少两个,例如两个,三个等,除非另有明确具体的限定。The terms "first" and "second" are used for descriptive purposes only and should not be understood as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one of the features. In the description of the present disclosure, "plurality" means at least two, such as two, three, etc., unless otherwise clearly and specifically defined.

在本说明书的描述中,参考术语“一个实施例”、“一些实施例”、“示例”、“具体示例”、或“一些示例”等的描述意指结合该实施例或示例描述的具体特征、结构、材料或者特点包含于本发明的至少一个实施例或示例中。在本说明书中,对上述术语的示意性表述不必须针对的是相同的实施例或示例。而且,描述的具体特征、结构、材料或者特点可以在任一个或多个实施例或示例中以合适的方式结合。此外,在不相互矛盾的情况下,本领域的技术人员可以将本说明书中描述的不同实施例或示例以及不同实施例或示例的特征进行结合和组合。In the description of this specification, the description with reference to the terms "one embodiment", "some embodiments", "example", "specific example", or "some examples" etc. means that the specific features, structures, materials or characteristics described in conjunction with the embodiment or example are included in at least one embodiment or example of the present invention. In this specification, the schematic representations of the above terms do not necessarily refer to the same embodiment or example. Moreover, the specific features, structures, materials or characteristics described may be combined in any one or more embodiments or examples in a suitable manner. In addition, those skilled in the art may combine and combine the different embodiments or examples described in this specification and the features of the different embodiments or examples, without contradiction.

尽管上面已经示出和描述了本发明的实施例,可以理解的是,上述实施例是示例性的,不能理解为对本发明的限制,本领域的普通技术人员在本发明的范围内可以对上述实施例进行变化、修改、替换和变型。 Although the embodiments of the present invention have been shown and described above, it is to be understood that the above embodiments are exemplary and are not to be construed as limitations of the present invention. A person skilled in the art may change, modify, replace and vary the above embodiments within the scope of the present invention.

Claims (21)

一种重复测序方法,包括:A repeated sequencing method comprising: 基于第一测序引物,对样本进行第一测序,以获得基于第一测序合成链的第一测序序列;Based on the first sequencing primer, performing a first sequencing on the sample to obtain a first sequencing sequence based on the first sequencing synthesis chain; 使洗脱试剂与所述第一测序合成链接触,以去除所述第一测序合成链;contacting the first sequencing synthesized strand with an elution reagent to remove the first sequencing synthesized strand; 基于所述第一测序引物,对所述样本进行第二测序,以获得基于第二测序合成链的第二测序序列;和Based on the first sequencing primer, performing a second sequencing on the sample to obtain a second sequencing sequence based on a second sequencing synthesis chain; and 将所述第一测序序列和所述第二测序序列进行比对,以确定所述样本的序列信息。The first sequencing sequence and the second sequencing sequence are compared to determine the sequence information of the sample. 根据权利要求1所述的方法,其中所述基于第一测序引物,对样本进行第一测序或第二测序包括:The method according to claim 1, wherein the first sequencing or the second sequencing of the sample based on the first sequencing primer comprises: 使所述第一测序引物与所述样本退火结合并进行多个碱基测序循环,其中每个循环完成一个碱基的信号检测,由此获得所述第一测序合成链或所述第二测序合成链的光学信号,并基于所述第一测序合成链或所述第二测序合成链的光学信号获得所述第一测序序列或所述第二测序序列。The first sequencing primer is annealed to the sample and multiple base sequencing cycles are performed, wherein each cycle completes signal detection of one base, thereby obtaining an optical signal of the first sequencing synthesis chain or the second sequencing synthesis chain, and obtaining the first sequencing sequence or the second sequencing sequence based on the optical signal of the first sequencing synthesis chain or the second sequencing synthesis chain. 根据权利要求1或2所述的方法,其中基于第一测序引物,对样本进行第一测序还包括:The method according to claim 1 or 2, wherein performing a first sequencing on the sample based on the first sequencing primer further comprises: 将所述样本固定于用于测序的固相载体,可选地,所述固相载体为珠粒或芯片。The sample is fixed on a solid phase carrier for sequencing. Optionally, the solid phase carrier is a bead or a chip. 根据权利要求1所述的方法,其中所述洗脱试剂为以下的一种或多种:The method according to claim 1, wherein the elution reagent is one or more of the following: DNA变性剂和/或核酸外切酶,其中所述DNA变性剂包括:甲醇、乙醇、尿素、甲酰胺和氢氧化钠,所述核酸外切酶包括:蛇毒磷酸二酯酶、大肠杆菌核酸外切酶I、大肠杆菌核酸外切酶II、大肠杆菌核酸外切酶III、脾脏磷酸二酯酶和嗜酸乳杆菌核酸酶,DNA denaturants and/or exonucleases, wherein the DNA denaturants include: methanol, ethanol, urea, formamide and sodium hydroxide, and the exonucleases include: snake venom phosphodiesterase, Escherichia coli exonuclease I, Escherichia coli exonuclease II, Escherichia coli exonuclease III, spleen phosphodiesterase and Lactobacillus acidophilus nuclease, 优选地,所述DNA变性剂为甲酰胺和/或氢氧化钠,所述核酸外切酶为大肠杆菌核酸外切酶III。Preferably, the DNA denaturant is formamide and/or sodium hydroxide, and the exonuclease is Escherichia coli exonuclease III. 根据权利要求3所述的方法,其中所述甲酰胺的浓度为30%-100%,优选40-100%;所述大肠杆菌核酸外切酶III的浓度为1-50U/μL,优选为1-10U/μL,The method according to claim 3, wherein the concentration of formamide is 30%-100%, preferably 40-100%; the concentration of Escherichia coli exonuclease III is 1-50U/μL, preferably 1-10U/μL, 可选地,所述洗脱试剂的洗脱时间为0-20min,优选为5-15min,更优选为5-10min,Optionally, the elution time of the elution reagent is 0-20 min, preferably 5-15 min, more preferably 5-10 min. 可选地,所述洗脱试剂的洗脱温度为10-60℃,优选为20-50℃,更优选为25-45℃。Optionally, the elution temperature of the elution reagent is 10-60°C, preferably 20-50°C, and more preferably 25-45°C. 根据权利要求1所述的方法,其中将所述第一测序序列和所述第二测序序列进行比对,以确定所述样本的序列信息,包括:The method according to claim 1, wherein comparing the first sequencing sequence and the second sequencing sequence to determine the sequence information of the sample comprises: 将所述第一测序序列和所述第二测序序列的对应位置的碱基进行比对;Comparing the bases at corresponding positions of the first sequencing sequence and the second sequencing sequence; 响应于所述第一测序序列和所述第二测序序列的对应位置的碱基相同,将所述相同碱基确定为所述位置处的碱基类型;In response to the bases at corresponding positions of the first sequencing sequence and the second sequencing sequence being identical, determining the identical base as the base type at the position; 响应于所述第一测序序列和所述第二测序序列的对应位置的碱基不同,通过统计学数据确定所述位置处的碱基类型,In response to the difference in bases at corresponding positions of the first sequencing sequence and the second sequencing sequence, determining the type of the base at the position by using statistical data, 可选地,所述统计学数据为Q值。 Optionally, the statistical data is a Q value. 根据权利要求6所述的方法,其中所述通过统计学数据确定所述位置处的碱基类型具体包括:The method according to claim 6, wherein determining the base type at the position by statistical data specifically comprises: 比较所述第一测序序列和所述第二测序序列的对应位置的碱基的所述Q值,基于所述第一测序序列的所述位置处的碱基x的所述Q值大于所述第二测序序列的所述位置处的碱基y的所述Q值,将所述碱基x确定为所述位置处的碱基类型。The Q values of bases at corresponding positions of the first sequencing sequence and the second sequencing sequence are compared, and based on the fact that the Q value of base x at the position of the first sequencing sequence is greater than the Q value of base y at the position of the second sequencing sequence, the base x is determined as the base type at the position. 根据权利要求1至7中任一项所述的方法,还包括:重复基于所述第一测序引物的测序过程,包括:The method according to any one of claims 1 to 7, further comprising: repeating the sequencing process based on the first sequencing primer, comprising: 对所述样本再进行m次测序,以获得基于m个测序合成链的m个测序序列;The sample is sequenced again m times to obtain m sequencing sequences based on the m sequencing synthesis chains; 使所述洗脱试剂分别与所述m个或m-1个测序合成链接触,以去除所述m个或m-1个测序合成链;和contacting the elution reagent with the m or m-1 sequencing synthesis chains respectively to remove the m or m-1 sequencing synthesis chains; and 将所述第一测序序列、所述第二测序序列和所述m个测序序列进行比对,以确定所述样本的序列信息。The first sequencing sequence, the second sequencing sequence and the m sequencing sequences are compared to determine sequence information of the sample. 根据权利要求8所述的方法,其中m为大于0的正整数,优选地,1≤m≤10。The method according to claim 8, wherein m is a positive integer greater than 0, preferably, 1≤m≤10. 根据权利要求8或9所述的方法,其中所述将所述第一测序序列、所述第二测序序列和所述m个测序序列进行比对,以确定所述样本的序列信息包括:The method according to claim 8 or 9, wherein comparing the first sequencing sequence, the second sequencing sequence, and the m sequencing sequences to determine the sequence information of the sample comprises: 将所述第一测序序列、所述第二测序序列和所述m个测序序列的对应位置的碱基进行比对;Comparing the bases at corresponding positions of the first sequencing sequence, the second sequencing sequence, and the m sequencing sequences; 响应于所述第一测序序列、所述第二测序序列和所述m个测序序列的对应位置的碱基相同,将所述相同碱基确定为所述位置处的碱基类型;In response to the first sequencing sequence, the second sequencing sequence, and the m sequencing sequences having the same base at the corresponding position, determining the same base as the base type at the position; 响应于所述第一测序序列、所述第二测序序列和所述m个测序序列的对应位置的碱基不同,将m+2个碱基中出现频率最高的碱基确定为所述位置处的碱基类型,或者响应于所述第一测序序列、所述第二测序序列和所述m个测序序列的对应位置的碱基不同,则通过统计学数据确定所述位置处的碱基类型。In response to the fact that the bases at corresponding positions of the first sequencing sequence, the second sequencing sequence, and the m sequencing sequences are different, the base with the highest occurrence frequency among the m+2 bases is determined as the base type at the position; or in response to the fact that the bases at corresponding positions of the first sequencing sequence, the second sequencing sequence, and the m sequencing sequences are different, the base type at the position is determined by statistical data. 根据权利要求10所述的方法,还包括:The method according to claim 10, further comprising: 响应于所述第一测序序列、所述第二测序序列和所述m个测序序列的对应位置的碱基不同,且m+2个碱基中存在出现频率相同的碱基类型,则通过统计学数据确定所述位置处的碱基类型,In response to the first sequencing sequence, the second sequencing sequence, and the m sequencing sequences having different bases at corresponding positions, and the m+2 bases having base types with the same occurrence frequency, determining the base type at the position by using statistical data, 可选地,所述统计学数据为Q值。Optionally, the statistical data is a Q value. 根据权利要求1至11中任一项所述的方法,其中所述第一测序引物是正向测序引物或反向测序引物。The method according to any one of claims 1 to 11, wherein the first sequencing primer is a forward sequencing primer or a reverse sequencing primer. 一种双端重复测序的方法,包括:A method for double-end repeat sequencing, comprising: 基于第一测序引物,根据权利要求1至12中任一项所述的重复测序方法对样本进行第一端重复测序,以获得所述样本的第一端序列信息; Based on the first sequencing primer, performing first-end repetitive sequencing on the sample according to the repetitive sequencing method according to any one of claims 1 to 12 to obtain first-end sequence information of the sample; 基于第二测序引物,对样本进行第二端测序,以获得所述样本的第二端序列信息;和Based on the second sequencing primer, performing second-end sequencing on the sample to obtain second-end sequence information of the sample; and 分析所述第一端序列信息和所述第二端序列信息,以确定所述样本的序列信息,analyzing the first end sequence information and the second end sequence information to determine the sequence information of the sample, 其中所述第一测序引物为正向测序引物或反向测序引物中的一个,所述第二测序引物为正向测序引物或反向测序引物中的另一个。The first sequencing primer is one of a forward sequencing primer and a reverse sequencing primer, and the second sequencing primer is the other of the forward sequencing primer and the reverse sequencing primer. 一种双端重复测序的方法,包括:A method for double-end repeat sequencing, comprising: 基于第一测序引物,根据权利要求1至12中任一项所述的重复测序方法对样本进行第一端重复测序,以获得所述样本的第一端序列信息;Based on the first sequencing primer, performing first-end repetitive sequencing on the sample according to the repetitive sequencing method according to any one of claims 1 to 12 to obtain first-end sequence information of the sample; 基于第二测序引物,根据权利要求1至12中任一项所述的重复测序方法对样本进行第二端重复测序,以获得所述样本的第二端序列信息;和Based on the second sequencing primer, performing second-end repetitive sequencing on the sample according to the repetitive sequencing method according to any one of claims 1 to 12 to obtain second-end sequence information of the sample; and 分析所述第一端序列信息和所述第二端序列信息,以确定所述样本的序列信息,analyzing the first end sequence information and the second end sequence information to determine the sequence information of the sample, 其中所述第一测序引物为正向测序引物,所述第一端序列信息为正读序列信息,并且wherein the first sequencing primer is a forward sequencing primer, the first end sequence information is a forward read sequence information, and 所述第二测序引物为反向测序引物,所述第二端序列信息为反读序列信息。The second sequencing primer is a reverse sequencing primer, and the second end sequence information is reverse reading sequence information. 根据权利要求14所述的方法,其中基于所述第二测序引物对所述样本进行第二端重复测序包括:The method according to claim 14, wherein performing second end repeat sequencing on the sample based on the second sequencing primer comprises: 生成所述样本的互补链,并基于所述互补链进行所述第二端重复测序,generating a complementary strand of the sample, and performing second end repeat sequencing based on the complementary strand, 可选地,所述样本为DNA纳米球,所述互补链为MDA链。Optionally, the sample is a DNA nanoball, and the complementary chain is an MDA chain. 根据权利要求15所述的方法,其中在生成所述样本的互补链之后,所述方法还包括:The method according to claim 15, wherein after generating the complementary strand of the sample, the method further comprises: 将所述样本移除,并基于所述互补链进行所述第二端重复测序。The sample is removed, and the second end repeat sequencing is performed based on the complementary strand. 根据权利要求16所述的方法,其中所述样本中含有条形码1和可选地条形码2,所述方法还包括:The method according to claim 16, wherein the sample contains barcode 1 and optionally barcode 2, the method further comprising: 分别对所述条形码1和所述条形码2进行测序,以获得所述条形码1和所述条形码2的序列信息;和Sequencing the barcode 1 and the barcode 2 respectively to obtain sequence information of the barcode 1 and the barcode 2; and 根据所述条形码1和所述条形码2的序列信息,筛选出所述样本的序列信息。The sequence information of the sample is screened out according to the sequence information of the barcode 1 and the barcode 2. 一种重复测序体系,包括:测序试剂和洗脱试剂,其中所述洗脱试剂为以下的一种或多种:A repeated sequencing system comprises: a sequencing reagent and an elution reagent, wherein the elution reagent is one or more of the following: DNA变性剂和/或核酸外切酶,其中所述DNA变性剂包括:甲醇、乙醇、尿素、甲酰胺、氢氧化钠,所述核酸外切酶包括:蛇毒磷酸二酯酶、大肠杆菌核酸外切酶I、大肠杆菌核酸外切酶II、大肠杆菌核酸外切酶III、脾脏磷酸二酯酶和嗜酸乳杆菌核酸酶,DNA denaturants and/or exonucleases, wherein the DNA denaturants include: methanol, ethanol, urea, formamide, sodium hydroxide, and the exonucleases include: snake venom phosphodiesterase, Escherichia coli exonuclease I, Escherichia coli exonuclease II, Escherichia coli exonuclease III, spleen phosphodiesterase, and Lactobacillus acidophilus nuclease, 优选地,所述DNA变性剂为甲酰胺和/或氢氧化钠,所述核酸外切酶为大肠杆菌核酸外切酶III。Preferably, the DNA denaturant is formamide and/or sodium hydroxide, and the exonuclease is Escherichia coli exonuclease III. 根据权利要求18所述的体系,其中所述测序试剂包括第一测序引物、反应酶和dNTP,The system according to claim 18, wherein the sequencing reagent comprises a first sequencing primer, a reaction enzyme and dNTPs, 优选地,所述dNTP上附接荧光基团以用于碱基报告,Preferably, a fluorescent group is attached to the dNTP for base reporting. 可选地,所述测序试剂还包括第二测序引物, Optionally, the sequencing reagent further comprises a second sequencing primer, 可选地,所述测序试剂还包括第一条形码测序引物和可选地第二条形码测序引物。Optionally, the sequencing reagents further include a first barcode sequencing primer and optionally a second barcode sequencing primer. 根据权利要求19所述的体系,其中所述反应酶包括DNA聚合酶和可选地DNA连接酶。The system according to claim 19, wherein the reaction enzyme comprises DNA polymerase and optionally DNA ligase. 一种测序试剂盒,包括如权利要求18至20中任一项所述的重复测序体系。 A sequencing kit comprising the repeated sequencing system according to any one of claims 18 to 20.
PCT/CN2023/105577 2023-07-03 2023-07-03 Repeated sequencing method Pending WO2025007252A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2023/105577 WO2025007252A1 (en) 2023-07-03 2023-07-03 Repeated sequencing method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2023/105577 WO2025007252A1 (en) 2023-07-03 2023-07-03 Repeated sequencing method

Publications (1)

Publication Number Publication Date
WO2025007252A1 true WO2025007252A1 (en) 2025-01-09

Family

ID=94171089

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/105577 Pending WO2025007252A1 (en) 2023-07-03 2023-07-03 Repeated sequencing method

Country Status (1)

Country Link
WO (1) WO2025007252A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2025212421A1 (en) * 2024-03-30 2025-10-09 Purdue Research Foundation Systems and methods for fabricating fungal mycelium-based consumer products

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020012930A1 (en) * 1999-09-16 2002-01-31 Rothberg Jonathan M. Method of sequencing a nucleic acid
CN101575639A (en) * 2009-06-19 2009-11-11 无锡艾吉因生物信息技术有限公司 DNA sequencing method capable of verifying base information for second time
WO2020010495A1 (en) * 2018-07-09 2020-01-16 深圳华大智造极创科技有限公司 Method for nucleic acid sequencing
WO2021142769A1 (en) * 2020-01-17 2021-07-22 深圳华大智造科技有限公司 Method for synchronously sequencing sense strand and antisense strand of dna
CN113293205A (en) * 2021-05-24 2021-08-24 深圳市真迈生物科技有限公司 Sequencing method

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020012930A1 (en) * 1999-09-16 2002-01-31 Rothberg Jonathan M. Method of sequencing a nucleic acid
CN101575639A (en) * 2009-06-19 2009-11-11 无锡艾吉因生物信息技术有限公司 DNA sequencing method capable of verifying base information for second time
WO2020010495A1 (en) * 2018-07-09 2020-01-16 深圳华大智造极创科技有限公司 Method for nucleic acid sequencing
WO2021142769A1 (en) * 2020-01-17 2021-07-22 深圳华大智造科技有限公司 Method for synchronously sequencing sense strand and antisense strand of dna
CN113293205A (en) * 2021-05-24 2021-08-24 深圳市真迈生物科技有限公司 Sequencing method

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2025212421A1 (en) * 2024-03-30 2025-10-09 Purdue Research Foundation Systems and methods for fabricating fungal mycelium-based consumer products

Similar Documents

Publication Publication Date Title
Cheng et al. Methods to improve the accuracy of next-generation sequencing
US20230323446A1 (en) Methods and systems for high-depth sequencing of methylated nucleic acid
Su et al. Next-generation sequencing and its applications in molecular diagnostics
Chen et al. Highly accurate fluorogenic DNA sequencing with information theory–based error correction
WO2025007252A1 (en) Repeated sequencing method
KR20220011725A (en) Methods and kits for preparing nested multiplex PCR high-throughput sequencing libraries
Stokes et al. Transcriptomics for clinical and experimental biology research: hang on a seq
KR20170133270A (en) Method for preparing libraries for massively parallel sequencing using molecular barcoding and the use thereof
WO2023036271A1 (en) Method for constructing capture library having high test performance, and kit
CN110219054A (en) A kind of nucleic acid sequencing library and its construction method
CN108165618B (en) A DNA sequencing method comprising nucleotides and 3' end reversibly blocked nucleotides
EP2805281A1 (en) Methods for mapping bar-coded molecules for structural variation detection and sequencing
WO2023137667A1 (en) Linker and use thereof in constructing dnb library
Somalraju et al. Investigating RNA dynamics from single molecule transcriptomes
CN107083440A (en) Kit for detecting chromosome aneuploidy and preparation method and application thereof
WO2023138206A1 (en) Dna molecular signal amplification and nucleic acid sequencing method based on solid-phase carrier
CN112280842B (en) Sequencing-by-synthesis method for 3' -hydroxyl-terminated reversible blocked nucleotide
CN116949580A (en) Tag set for DNA library preparation and preparation method thereof, DNA tag library and kit
CN1187457C (en) Prepn of bar code-type gene chip
WO2023092601A1 (en) Umi molecular tag and application, adapter, adapter ligation reagent, and kit thereof, and library construction method
CN116004772B (en) Fluorescence on-sensor for detecting BRCA2 and preparation method and application thereof
CN114108103B (en) A high-quality 3'RNA-seq library construction method and its application
CN118581201A (en) A method for predicting the adapter contamination rate of high-throughput sequencing libraries
Monisha et al. RNA-SEQ: A High-Revolution View of the Transcriptome
CN104388546B (en) Connected sequencing method for DNA by coupling and coding two rounds of signals

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23943986

Country of ref document: EP

Kind code of ref document: A1