[go: up one dir, main page]

WO2019108014A1 - Procédé de mesure de l'intégrité d'une séquence d'acide nucléique uid dans une analyse de séquençage d'acide nucléique - Google Patents

Procédé de mesure de l'intégrité d'une séquence d'acide nucléique uid dans une analyse de séquençage d'acide nucléique Download PDF

Info

Publication number
WO2019108014A1
WO2019108014A1 PCT/KR2018/015086 KR2018015086W WO2019108014A1 WO 2019108014 A1 WO2019108014 A1 WO 2019108014A1 KR 2018015086 W KR2018015086 W KR 2018015086W WO 2019108014 A1 WO2019108014 A1 WO 2019108014A1
Authority
WO
WIPO (PCT)
Prior art keywords
nucleic acid
acid sequence
region
uid
polynucleotide
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/KR2018/015086
Other languages
English (en)
Korean (ko)
Inventor
정종석
박동현
박웅양
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Samsung Life Public Welfare Foundation
Original Assignee
Samsung Life Public Welfare Foundation
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Samsung Life Public Welfare Foundation filed Critical Samsung Life Public Welfare Foundation
Publication of WO2019108014A1 publication Critical patent/WO2019108014A1/fr
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Images

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6869Methods for sequencing
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6806Preparing nucleic acids for analysis, e.g. for polymerase chain reaction [PCR] assay
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2525/00Reactions involving modified oligonucleotides, nucleic acids, or nucleotides
    • C12Q2525/10Modifications characterised by
    • C12Q2525/191Modifications characterised by incorporating an adaptor
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2535/00Reactions characterised by the assay type for determining the identity of a nucleotide base or a sequence of oligonucleotides
    • C12Q2535/122Massive parallel sequencing

Definitions

  • a polynucleotide for measuring the degree of purity of a UID nucleic acid sequence and a method for measuring the degree of purity of a UID nucleic acid sequence in nucleic acid sequence analysis using the polynucleotide.
  • a genome or genome is any genetic information that a creature has.
  • Several techniques have been developed for sequencing or sequencing genomes of a single individual, such as DNA chip and Next Generation Sequencing (NGS), and Next Generation Sequencing (NNGS).
  • NGS is widely used for research and diagnostic purposes. Although NGS differs depending on the kind of equipment, it can be broadly divided into three stages: sampling, library production, and nucleic acid sequence analysis. After the nucleic acid sequence analysis, the presence or absence of the gene mutation is detected based on the produced sequence analysis data.
  • a number of samples may be mixed into one nucleic acid sequencing kit.
  • the sample to be mixed should have a label that can distinguish each sample before mixing.
  • the label may cause errors in nucleic acid sequence analysis results due to polymerase-induced errors in polymerase chain reaction and / or detection errors during nucleic acid sequence analysis, There is a problem that it inhibits. Therefore, there is a need for a method that can identify whether a label capable of distinguishing a large number of samples correctly binds to a sample to be analyzed and correctly labels the sample.
  • One aspect provides a polynucleotide for measuring the purity of a UID nucleic acid sequence.
  • Another aspect provides a method for determining the purity of the UID nucleic acid sequence in nucleic acid sequencing.
  • One aspect includes a first region in which two or more consecutive nucleotides comprise a unique identification (UID) nucleic acid sequence, a second region in which at least two consecutive nucleotides comprise a non-homologous nucleic acid sequence Region and a third region comprising two or more contiguous nucleotides of the nucleic acid sequence homologous to the reference genome.
  • UID unique identification
  • the first region in the polynucleotide may comprise a unique identification (UID) nucleic acid sequence.
  • UID unique identification
  • the UID refers to a nucleic acid fragment that serves to identify a sample during nucleic acid sequencing. That is, the UID is a marker for distinguishing different samples from each other in nucleic acid sequence analysis for a plurality of samples. Therefore, in order to distinguish a plurality of samples, the UID may have different nucleic acid sequences among the samples.
  • the polynucleotide for measuring the degree of purity of UID may be one kind of nucleic acid sequence analysis Or a different UID nucleic acid sequence from the above-mentioned samples.
  • the polynucleotide may be synthesized or prepared so as to have the same UID nucleic acid sequence among a plurality of polynucleotides.
  • the polynucleotide may have one common UID nucleic acid sequence such as AGTC, or may have one or more identical UID nucleic acid sequences, for example, AGTC and TGAC in common.
  • the UID nucleic acid sequence may be mixed with unique molecular identifiers (UMI), an index, or a barcode.
  • UMI unique molecular identifiers
  • the UID nucleic acid sequence may include, but is not limited to, a base of A, G, C, or T. Also, the UID nucleic acid sequence may be from about 2 bp to about 40 bp, from about 2 bp to about 35 bp, from about 2 bp to about 30 bp, from about 2 bp to about 25 bp, from about 2 bp to about 30 bp, About 3 bp to about 20 bp, about 4 bp to about 20 bp, or about 4 bp to about 16 bp, but the length is not limited thereto.
  • the polynucleotide may be for application to multiplexing.
  • Multiplexing means mixing two or more samples so that two or more samples can be sequenced in one nucleic acid sequencing lane or chip.
  • the integrity of the UID means the number or percentage of unique UIDs present in the sample in the sequence analysis data.
  • the degree of purity of the UID may be affected by the library production process and / or the nucleic acid sequencing process.
  • the degree of purity of the UID may be expressed as a relative level.
  • the second region in the polynucleotide may comprise a nucleic acid sequence that is non-homologous to the reference genome.
  • the second region in the polynucleotide may comprise a nucleic acid sequence which is not homologous to the reference genome.
  • the polynucleotide may be synthesized or prepared so as to have the same nucleic acid sequence having no homology with the reference genome among a plurality of polynucleotides.
  • Sequences that do not have homology with the reference genome, after nucleic acid sequence analysis, are designed to remove synthetic fragments (polynucleotides to measure the purity of the UIDs) artificially injected from the sequence analysis results of the original sequence analysis sample , Sequences that are not homologous to the reference genome can be separated if the consecutive nucleotide sequence of at least 4 bp or more is different from the reference genome in order to prevent the sequencing data of the generated fragment from being located in the reference genome.
  • Nucleic acid sequences that are not homologous to the reference genome may be from about 2 bp to about 250 bp, from about 2 bp to about 40 bp, from about 2 bp to about 35 bp, from about 2 bp to about 30 bp, from about 2 bp From about 2 bp to about 30 bp, from about 3 bp to about 20 bp, from about 4 bp to about 20 bp, or from about 4 bp to about 16 bp, although the length is not limited thereto .
  • the reference genomic data may be a database already known in the art such as National Center for Biotechnology Information (NCBI), Gene Expression Omnibus (GEO), Food and Drug Administration (FDA), My Cancer Genome, TCGA , Or may be obtained from a control, i.e., a biological sample of a normal person.
  • the normal person may be a healthy person who has not found a specific disease, for example, a tumor.
  • the reference genome may be a human reference genome, and may be hg18 or hg19.
  • the homology means the degree to which the homology matches the nucleotide sequence of a given reference genome.
  • the third region in the polynucleotide may comprise a nucleic acid sequence having homology with the reference genome.
  • the third region in the polynucleotide may comprise a nucleic acid sequence homologous to the reference genome.
  • the nucleic acid sequence that is homologous to the reference genome may be homologous to two or more consecutive nucleotides of the nucleic acid sequence of the target region.
  • the whole genome can be sequenced using the next-generation nucleic acid sequencing method, or the nucleic acid sequence can be analyzed only in the exosome region or a specific region. This method of analysis is called target sequence analysis or targeted resequencing.
  • the polynucleotide may be for application in target sequence analysis.
  • the target region may be a whole or a partial region of the gene of interest, and the kind of the gene is not limited.
  • the second region is located at the 5 'end of the third region, the 3' end of the third region, or the 5 'end and the 3' end of the third region, Terminus of the first region, the 3 'terminus of the third region, the 5' terminus of the second region, the 3 'terminus of the second region, or the 5' terminus of the second region and the 3 'terminus of the second region.
  • the polynucleotide may comprise a first region, a second region and a third region, or a third region, a second region and a first region, in the direction from the 5 'end to the 3' end have.
  • the polynucleotide may include a nucleotide sequence that does not have homology with the UID nucleic acid sequence and / or the reference genome at both ends of the polynucleotide, for example, at the 5 'end to the 3' , A second region, a third region, a second region and a first region.
  • the second region may comprise a nucleic acid sequence that is the same as or different from the second region and not homologous to the reference genome and wherein the first region comprises a UID nucleic acid sequence identical or different than the first region Lt; / RTI >
  • the first region, the second region, and the third regions may be immediately adjacent to each other, or may be located at a certain distance further including any other nucleic acid sequence therebetween.
  • Fig. 1 is an image showing the structure of a synthetic fragment for measuring the degree of purity of UID (the above-described polynucleotide).
  • a synthetic fragment for measuring purity of a plurality of UIDs may include the same UID nucleic acid sequence, a nucleic acid sequence that is not homologous to the reference genome, and a nucleic acid sequence that is homologous to the reference genome have.
  • the synthetic fragment for measuring the purity of the UID may further comprise a primer and / or an adapter.
  • the synthetic fragment for measuring the degree of purity of the UID may be inserted into a library preparation step for one or more samples requiring nucleic acid sequence analysis and subjected to nucleic acid sequence analysis together. In this case, one or more samples for which nucleic acid sequence analysis is required and a sample for measuring the degree of purity of UID may be those having the same primer and / or adapter nucleic acid sequence, while having different UID nucleic acid sequences.
  • composition comprising the polynucleotides described above.
  • Another aspect provides a kit comprising the polynucleotides described above.
  • a method for producing a polynucleotide comprising the steps of: preparing a first library for measuring purity of a UID comprising the polynucleotide described above; Fragmenting a nucleic acid isolated from a biological sample and ligating a polynucleotide comprising a unique identification (UID) nucleic acid sequence to one or more ends of the fragmented nucleic acid to prepare a second library for nucleic acid sequence analysis; Nucleic acid sequence analysis of said first library and said second library to obtain sequencing data; Extracting a lead including a second region from a read of the obtained sequence analysis data; Calculating a ratio of the lead including the first region out of the leads including the extracted second region; And a step of measuring the degree of purity of the UID from the ratio of the lead containing the calculated first region to the degree of purity of the UID in the nucleic acid sequence analysis.
  • UID unique identification
  • the library means a nucleic acid fragment prepared in a form suitable for nucleic acid sequence analysis.
  • the library may be a fragment in which a primer, an adapter, a unique identifier or a combination thereof is ligated to a nucleic acid fragment for which nucleic acid sequence analysis is required, It may be an amplification product thereof.
  • the library may comprise a nucleic acid fragment set prepared in a form suitable for nucleic acid sequence analysis before or after pre-capture polymerase chain reaction (PCR), target enrichment, or post capture PCR Lt; / RTI >
  • the library may be, for example, a genomic library, a complementary DNA library, a randomized mutant library, or a combination thereof.
  • the method includes the step of constructing a first library for measuring purity of a UID comprising the polynucleotide.
  • the first library may be prepared by ligating primers and / or adapters to one or more ends of the polynucleotide.
  • the primers and / or adapters may be adapters suitable for nucleic acid sequencing, primers for polymerase chain reaction, primers suitable for nucleic acid sequencing, regions in which primers suitable for nucleic acid sequencing can be annealed, or combinations thereof .
  • the primers and / or adapters can be selected according to the nucleic acid sequencing method by a person skilled in the art.
  • the method comprises fragmenting a nucleic acid isolated from a biological sample and ligation of a polynucleotide comprising a UID nucleic acid sequence at one or more ends of the fragmented nucleic acid to prepare a second library for nucleic acid sequence analysis.
  • the biological sample may be one obtained from a suspected individual having a disease, a suspected individual having a tumor, a normal person, or a combination thereof, or a compound.
  • the subject may be a human being, a cow, a horse, a pig, a sheep, a goat, a dog, a cat, or a rodent.
  • the biological sample may be obtained from blood, plasma, serum, urine, saliva, mucous secretion, sputum, feces, tears or a combination thereof.
  • the nucleic acid may be a genome or a fragment thereof, and may be used interchangeably with a polynucleotide having an arbitrary length.
  • the genome or genome refers to the entire chromosome, chromatin, or gene.
  • the nucleic acid may be DNA (deoxyribonucleic acid), RNA (ribonucleic acid) or a combination thereof, and may be, for example, cell-free DNA (cf DNA).
  • the method of separating the nucleic acid from the sample can be carried out by a method known to a person skilled in the art.
  • the method of fragmenting the isolated nucleic acid may be performed by methods known to those of ordinary skill in the art, and may be physical, chemical, or enzymatic cleavage of the genome, such as by digesting the genome with a restriction enzyme Lt; / RTI >
  • the method may comprise selecting the size of the fragmented nucleic acid.
  • the step of selecting the size may be performed by electrophoresis, centrifugation, chromatography, or a combination thereof.
  • the isolated nucleic acid fragment may be from about 10 bp to about 2000 bp, from about 15 bp to about 1500 bp, from about 20 bp to about 1000 bp, from about 20 bp to about 500 bp, or from about 20 to about 300 bp.
  • the second library may be prepared by ligating a polynucleotide comprising a UID nucleic acid sequence to one or more ends of the fragmented nucleic acid.
  • a sample may be one having a different UID nucleic acid sequence from another sample and a sample for measuring the degree of purity of the UID.
  • the second library may be prepared by ligating primers and / or adapters to one or more ends of the fragmented nucleic acid.
  • the primers and / or adapters may be adapters suitable for nucleic acid sequencing, primers for polymerase chain reaction, primers suitable for nucleic acid sequencing, regions in which primers suitable for nucleic acid sequencing can be annealed, or combinations thereof .
  • the primers and / or adapters can be selected according to the nucleic acid sequencing method by a person skilled in the art.
  • the method may comprise subjecting to enrichment.
  • the target enrichment means increasing the frequency of the gene or other region of interest to be subjected to the nucleic acid sequence analysis.
  • the target enrichment may be carried out in a manner known to those skilled in the art, for example, by in-solution capture, hybridization of a sample with a bait, polymerase chain reaction, or a combination thereof .
  • the method may include performing pre capture polymerase chain reaction (PCR) prior to target enrichment, post capture PCR after target enrichment, or a combination thereof.
  • PCR pre capture polymerase chain reaction
  • the step of producing the first library and the step of producing the second library may be performed before the nucleic acid sequence analysis. Therefore, the step of producing the first library may be performed first, and the step of producing the second library may be performed first, or the step of producing the first library and the second library may be performed simultaneously.
  • the method comprises nucleic acid sequencing the first library and the second library to obtain sequencing data.
  • the nucleic acid sequence analysis may be next generation sequencing (NGS).
  • NGS next generation sequencing
  • Nucleic acid sequence analysis may be used interchangeably with sequencing, sequencing, or sequencing.
  • the NGS may be used interchangeably with massive parallel sequencing or second-generation sequencing.
  • the NGS is a technique for simultaneously sequencing large amounts of nucleic acid of a fragment, which is a chip-based and polymerase chain reaction (PCR) -based paired end format , And performing sequencing at a very high speed based on hybridization of the fragment.
  • PCR polymerase chain reaction
  • Illumina HiSeq Illumina HiSeq 2500, Illumina Genome Analyzer, Solexa platform, SOLiD System (Applied Biosystems), Ion Proton (Life Technologies), Complete Genomics , Helicos Biosciences Heliscope, Pacific Biosciences single molecule real time (SMRT (TM)) technology, or a combination thereof.
  • SOLiD System Applied Biosystems
  • Ion Proton Life Technologies
  • Complete Genomics , Helicos Biosciences Heliscope, Pacific Biosciences single molecule real time (SMRT (TM)) technology, or a combination thereof.
  • SMRT single molecule real time
  • the nucleic acid sequence analysis may be a nucleic acid sequence analysis for analyzing only the region of interest.
  • the nucleic acid sequence analysis may include, for example, NGS-based targeted sequencing, targeted deep sequencing or panel sequencing.
  • the sequence analysis data means the data obtained by the nucleic acid sequence analysis, and may include the sequence, frequency, and quality index of the individual leads to the nucleic acid sequence analyzing object.
  • the " read " means the nucleic acid sequence information of the nucleic acid fragment obtained by the nucleic acid sequence analysis, and may be data derived from nucleic acid sequence analysis or a fragment of the nucleic acid sequence.
  • the sequence analysis data may be obtained, for example, from binary version of SAM (SAM) format and / or Sequence Alignment / Map (SAM) format data.
  • SAM binary version of SAM
  • SAM Sequence Alignment / Map
  • the BAM format and / or the SAM format may typically be used in a format that describes data about short leads.
  • the data in the BAM format and / or the SAM format includes a start point of a lead, a direction of a lead, a mapping quality, a FLAG indicating a degree of alignment, a text related to a CIGAR (Compact Idiosyncratic Gapped Alignment Report) Data may be included.
  • CIGAR Cosmetic Idiosyncratic Gapped Alignment Report
  • the method includes extracting a lead comprising a second region from the leads of the obtained sequence analysis data. It is possible to select only the lid including the second region in the lead of the sequence analysis data, specifically, the lid containing the nucleic acid sequence not having homology with the reference genome of the polynucleotide.
  • the method includes calculating the ratio of leads comprising the first region among the leads comprising the extracted second region.
  • the ratio of the lid including the first region among the leads including the extracted second region, specifically, the lid having the UID nucleic acid sequence of the polynucleotide can be calculated.
  • the method includes measuring the degree of purity of the UID from the ratio of leads comprising the calculated first region.
  • the leads including the extracted second region may all have the same UID nucleic acid sequence.
  • the degree of purity of the UID in the nucleic acid sequence analysis may be 100%.
  • the UID nucleic acid sequence may be different from the UID nucleic acid sequence of the synthetic fragment for measuring purity. In this case, the degree of purity of the UID in the nucleic acid sequence analysis may be from 0% to less than 100%.
  • the synthetic fragment for measuring the purity of the UID of the present invention it is possible to determine whether or not the nucleic acid sequence is in error by the UID, and the ratio of the nucleic acid sequence to the nucleic acid sequence, Reliability can be predicted.
  • a method for producing a polynucleotide comprising the steps of: preparing a first library for measuring purity of a UID comprising the polynucleotide described above; Nucleic acid sequence analysis of said first library to obtain nucleic acid sequence analysis data; Extracting a lead containing the second region from the leads of the obtained sequence analysis data; Calculating a ratio of the lead including the first region out of the leads including the extracted second region; And a step of measuring the degree of purity of the UID from the ratio of the lead containing the calculated first region to the degree of purity of the UID in the nucleic acid sequence analysis.
  • the above method is the same as the method of measuring the degree of purity of UID described above, except that nucleic acid sequence analysis is performed on one or more samples requiring nucleic acid sequence analysis.
  • This method can be performed to test the probability of occurrence of an error in the whole process of library production and nucleic acid sequence analysis even when there is no sample requiring nucleic acid sequence analysis.
  • Figure 1 illustrates the purity of a UID containing a unique identifier (UID) nucleic acid sequence, a non-homologous nucleic acid sequence with the reference genome, and a nucleic acid sequence of a target region homologous to the reference genome
  • Fig. 2 is an image showing the structure of a composite section for use in the present invention.
  • UID unique identifier
  • a and B show nucleic acid sequence analysis using synthetic fragments for measuring purity of UID, select leads containing nucleic acid sequences not having homology with the reference genome, This is the result of confirming the ratio of the UID and the UID of the combined intercept.
  • ss means a specific sequence with no homology to the reference genome (ss).
  • FIG. 3 is a flowchart of a procedure for performing nucleic acid sequence analysis using synthetic fragments for measuring the purity of UID.
  • a method for measuring the degree of purity of a UID includes performing nucleic acid fragmentation, end-repair, 3'-adenosine tailing, and adapter ligation, introducing synthetic fragments for measuring purity of UID , Purity of UID can be measured by performing capture-pre-polymerase chain reaction, target enrichment, capture-post-polymerase chain reaction, nucleic acid sequence analysis, fastq file extraction, and UID nucleic acid sequence selection.
  • Example 1 Measurement of integrity of UID in nucleic acid sequence analysis of a target region
  • next generation sequencing the genes KRAS, IDH1, BRAC1, ALK, and ERBB2, which are known to have mutations in the cancer as the nucleotide sequence of the target region, and the regions of these genes were selected.
  • a reference sequence of about 100 to about 350 bp was selected based on the selected site.
  • a nucleic acid sequence having a unique identifier (UID) nucleic acid sequence, a non-homologous nucleic acid sequence having no homology with the reference genome, and a nucleic acid sequence having a homology with the reference genome and having an illumina P5 Adapter and a P7 adapter nucleic acid sequence hereinafter referred to as "synthetic fragments for measuring purity of UID").
  • nucleotide sequences of the selected fragments for the determination of the purity of the selected gene, the reference sequence and the UID, and the nucleic acid sequences extracted from the sequence analysis are shown in Table 1 below.
  • Table 1 the nucleic acid sequence that is not homologous to the reference genome is 4 bp in front of the extraction sequence and is shown in bold color.
  • UID nucleic acid sequences are shown in bold and underlined text in the synthetic fragments for UID purity measurement.
  • the P5 and P7 sequences are shown in slanted text.
  • Sequence analysis primer binds after P5 and before GTCT sequence. Sequence from P7 to P7 is combined with sequence analysis primer and index sequence analysis primer.
  • the target region corresponds to all sequences other than the nucleic acid sequence 4 bp that is not homologous to the reference genome in the extracted sequence.
  • a library for nucleic acid sequence analysis of the target region was constructed as follows.
  • genomic DNA genomic DNA: gDNA
  • NA12878 sample of the Coriell institute 50 ng of genomic DNA (genomic DNA: gDNA) of the NA12878 sample of the Coriell institute was prepared.
  • the prepared gDNA samples were subjected to fragmentation, end-repair, 3'-adenosine tailing (3'A-C) using the KAPA hyper illumina production kit (Kapa Biosystems) according to the manufacturer's method. tailing and adapter ligation were performed and purified using AMPure beads (Beckman Coulter, Indiana, USA) to prepare a fragment for nucleic acid sequence analysis.
  • AMPure beads Beckman Coulter, Indiana, USA
  • a synthetic section for measuring the purity of UID prepared in Example 1.1 was quantified and a synthetic section for measuring purity of 5 amole UID was added to the section for nucleic acid sequence analysis.
  • a pre-capture polymerase chain reaction (PCR) was performed on a fragment for nucleic acid sequence analysis to which a synthetic fragment for measurement of purity of UID was added.
  • PCR polymerase chain reaction
  • the completed library was purified with AMPure beads and quantified by PicoGreen fluorescence analysis using a dsDNA HS assay kit and Qubit 2.0 fluorescence photometer. Based on the DNA concentration and average fragment size, the library was normalized to a concentration of 2 nM.
  • the DNA was denatured using 0.2N NaOH and the denatured library was diluted in a hybridization buffer (Illumina, San Diego, Calif., USA) to 20 pM.
  • the denatured template was cluster amplified according to the manufacturer's instruction (Illumina).
  • Flow cells were sequenced in a 100 bp pair-terminal mode using a HiSeq 2500 v3 Sequencing-by-Synthesis kit (Illumina) and analyzed using RTA software (v.1.12.4.2 or higher).
  • the nucleotide sequence was extracted in BCL format and converted to a fastq format file through the bcl converter.
  • BWA-mem v0.7.5
  • a BAM file was generated by aligning all the raw data to the hg19 human reference genome.
  • SAMTOOLS v0.1.18
  • Picard v1.93
  • GATK v3.1.1
  • SAM / BAM files were categorized, local realignment was performed, and redundancy was indicated. Through this process, redundancy, mismatch pairs, and leads deviating from the target were removed.
  • a lid with the extraction sequence set forth in Table 1 in a separate lid i.e., a lid containing a nucleic acid sequence that is not homologous to the reference genome, was selected. Then, the ratio of the UID nucleic acid sequence, that is, the lead having the AGTC, in the selected lid was confirmed.
  • a and B show nucleic acid sequence analysis using synthetic fragments for measuring purity of UID, select leads containing nucleic acid sequences not having homology with the reference genome, This is the result of confirming the ratio of the UID and the UID of the combined intercept.
  • a and B in Fig. 2 it can be seen that a plurality of leads having UID nucleic acid sequences other than the AGTC, which is a UID nucleic acid sequence, can be included.
  • a synthetic fragment for measuring the purity of the UID of the present invention is used in the process of preparing a nucleic acid sequence analyzing library and performing nucleic acid sequence analysis, the error and the ratio in which the UID is deleted, substituted, deleted, or replaced are measured .

Landscapes

  • Chemical & Material Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Organic Chemistry (AREA)
  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Zoology (AREA)
  • Analytical Chemistry (AREA)
  • Wood Science & Technology (AREA)
  • Biotechnology (AREA)
  • General Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Biophysics (AREA)
  • Biochemistry (AREA)
  • Molecular Biology (AREA)
  • Microbiology (AREA)
  • General Engineering & Computer Science (AREA)
  • Immunology (AREA)
  • Genetics & Genomics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Medical Informatics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Theoretical Computer Science (AREA)
  • Chemical Kinetics & Catalysis (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

L'invention concerne un polynucléotide destiné à mesurer l'intégrité d'une UID et un procédé destiné à mesurer l'intégrité d'une UID dans un séquençage d'acide nucléique à l'aide de celui-ci, le polynucléotide comprenant : une première région comprenant une séquence d'acide nucléique UID; une deuxième région comprenant une séquence d'acide nucléique n'ayant pas d'homologie avec un génome de référence; et une troisième région comprenant une séquence d'acide nucléique ayant une homologie avec le génome de référence.
PCT/KR2018/015086 2017-11-30 2018-11-30 Procédé de mesure de l'intégrité d'une séquence d'acide nucléique uid dans une analyse de séquençage d'acide nucléique Ceased WO2019108014A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
KR1020170162809A KR101967879B1 (ko) 2017-11-30 2017-11-30 핵산 서열분석에서 uid 핵산 서열의 순결도를 측정하는 방법
KR10-2017-0162809 2017-11-30

Publications (1)

Publication Number Publication Date
WO2019108014A1 true WO2019108014A1 (fr) 2019-06-06

Family

ID=66163983

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/KR2018/015086 Ceased WO2019108014A1 (fr) 2017-11-30 2018-11-30 Procédé de mesure de l'intégrité d'une séquence d'acide nucléique uid dans une analyse de séquençage d'acide nucléique

Country Status (2)

Country Link
KR (1) KR101967879B1 (fr)
WO (1) WO2019108014A1 (fr)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160319345A1 (en) * 2015-04-28 2016-11-03 Illumina, Inc. Error suppression in sequenced dna fragments using redundant reads with unique molecular indices (umis)
KR20160141680A (ko) * 2015-06-01 2016-12-09 연세대학교 산학협력단 바코드 서열을 포함하는 어댑터를 이용한 차세대 염기서열 분석 방법
US20170058340A1 (en) * 2009-04-30 2017-03-02 Prognosys Biosciences, Inc. Nucleic acid constructs and methods of use
JP2017511121A (ja) * 2014-02-11 2017-04-20 エフ.ホフマン−ラ ロシュ アーゲーF. Hoffmann−La Roche Aktiengesellschaft 標的シーケンシングおよびuidフィルタリング

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018208133A1 (fr) * 2017-05-12 2018-11-15 서울대학교산학협력단 Procédé et appareil d'acquisition de nucléotides de haute pureté

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170058340A1 (en) * 2009-04-30 2017-03-02 Prognosys Biosciences, Inc. Nucleic acid constructs and methods of use
JP2017511121A (ja) * 2014-02-11 2017-04-20 エフ.ホフマン−ラ ロシュ アーゲーF. Hoffmann−La Roche Aktiengesellschaft 標的シーケンシングおよびuidフィルタリング
US20160319345A1 (en) * 2015-04-28 2016-11-03 Illumina, Inc. Error suppression in sequenced dna fragments using redundant reads with unique molecular indices (umis)
KR20160141680A (ko) * 2015-06-01 2016-12-09 연세대학교 산학협력단 바코드 서열을 포함하는 어댑터를 이용한 차세대 염기서열 분석 방법

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
KOU, R. ET AL.: "Benefits and Challenges with Applying Unique Molecular Identifiers in Next Generation Sequencing to Detect Low Frequency Mutations", PLOS ONE, vol. 11, no. 1, 11 January 2016 (2016-01-11), pages e0146638, XP055469818 *

Also Published As

Publication number Publication date
KR101967879B1 (ko) 2019-04-10

Similar Documents

Publication Publication Date Title
Yin et al. Challenges in the application of NGS in the clinical laboratory
ES3013495T3 (en) Method for isolating and sequencing cell-free dna
EP3885445B1 (fr) Procédés de fixation d'adaptateurs à des acides nucléiques échantillons
CN118638898A (zh) 用于靶向核酸序列富集的方法及在错误纠正的核酸测序中的应用
CN112639983B (zh) 微卫星不稳定性检测
EP2691544B1 (fr) Procédé de vérification d'échantillons de bioanalyse
Profaizer et al. Human leukocyte antigen typing by next-generation sequencing
CN105331606A (zh) 应用于高通量测序的核酸分子定量方法
WO2017193044A1 (fr) Diagnostic prénatal non effractif
WO2017204572A1 (fr) Procédé de préparation de bibliothèque destiné à un séquençage hautement parallèle à l'aide du codage à barres moléculaire et son utilisation
CN108359723B (zh) 一种降低深度测序错误的方法
CN113454218A (zh) 用于改进核酸分子的回收的方法、组合物和系统
KR102347463B1 (ko) 핵산 서열 분석에서 위양성 변이를 검출하는 방법 및 장치
JP2025013900A (ja) 無細胞核酸試料におけるアレル不均衡を検出するための方法およびシステム
US20240141425A1 (en) Correcting for deamination-induced sequence errors
CN112970068A (zh) 用于检测样品之间的污染的方法和系统
WO2019031867A1 (fr) Procédé d'augmentation de la précision d'analyse par élimination d'une séquence d'amorce dans un séquençage de nouvelle génération, basé sur un amplicon
WO2019108014A1 (fr) Procédé de mesure de l'intégrité d'une séquence d'acide nucléique uid dans une analyse de séquençage d'acide nucléique
WO2018110940A1 (fr) Procédé permettant de mesurer la complexité d'une banque en vue d'un séquençage de nouvelle génération
CN119530378B (zh) 检测ids基因突变的引物组、试剂盒及其应用
Zhang et al. A precise and cost-efficient whole-genome haplotyping method without probands: preimplantation genetic testing analysis
WO2017179946A1 (fr) Procédé et dispositif de confirmation d'erreur pour un séquençage parallèle massif
US20210123097A1 (en) Methods for 3' overhang repair
WO2022181858A1 (fr) Composition pour améliorer l'efficacité du codage à barres moléculaire et son utilisation
Lin Developing A Nanopore Sequencing Data Processing Pipeline for Structural Variation Identification

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18883670

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18883670

Country of ref document: EP

Kind code of ref document: A1