WO2023043097A1 - Procédé pour afficher une fusion de fragments de séquence appariés pour un séquençage de nouvelle génération - Google Patents
Procédé pour afficher une fusion de fragments de séquence appariés pour un séquençage de nouvelle génération Download PDFInfo
- Publication number
- WO2023043097A1 WO2023043097A1 PCT/KR2022/013100 KR2022013100W WO2023043097A1 WO 2023043097 A1 WO2023043097 A1 WO 2023043097A1 KR 2022013100 W KR2022013100 W KR 2022013100W WO 2023043097 A1 WO2023043097 A1 WO 2023043097A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- sequence
- generation sequencing
- sequencing
- sequences
- paired
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Images
Classifications
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6806—Preparing nucleic acids for analysis, e.g. for polymerase chain reaction [PCR] assay
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6869—Methods for sequencing
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B20/00—ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B30/00—ICT specially adapted for sequence analysis involving nucleotides or amino acids
- G16B30/20—Sequence assembly
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B45/00—ICT specially adapted for bioinformatics-related data visualisation, e.g. displaying of maps or networks
Definitions
- the present invention relates to a method for merging paired sequence fragments for next-generation sequencing analysis.
- DNA sequence information is expressed in DNA sequence genes, and complete DNA sequence information of an individual is very important to understand life phenomena and obtain disease-related information.
- the key to decoding DNA sequence information is to identify individual differences and ethnic characteristics, to identify congenital causes including chromosomal abnormalities in diseases related to genetic abnormalities, and to identify genetic defects in complex diseases such as diabetes and hypertension. is to find
- sequencing data is very important because information such as gene expression, gene diversity, and their interactions can be widely used in the field of molecular diagnosis and treatment.
- next-generation sequencing As a method for genome sequencing, Next Generation Sequencing (NGS) has been used since 2007, and with the development of NGS, it has become much easier and cheaper to analyze compared to traditional methods.
- Representative next-generation genome sequencers that implement next-generation sequencing methods include Roche/454, Illumina/Solexa, and SOLiD of Life Technologies (ABI). These next-generation sequencing devices can read more than 80 million sequences in 7 hours. With these technological advances, next-generation sequencing methods, which were conventionally used only for research due to the enormous test cost, can be used in medical clinical tests.
- Target selection is divided into an amplicon method of amplification with PCR primers and a capture method of hybridization using a probe.
- the PCR amplicon method is useful for testing small, well-designed gene panels because it requires a shorter test time and requires a relatively small amount of DNA, but it is difficult to use when the number of genes in a panel increases or when exome sequencing is required.
- the probe method is advantageous.
- the present inventors have developed a method for merging and displaying paired sequence fragments for next-generation sequencing analysis, thereby completing the present invention.
- Patent Document 0001 Patent Registration No. 10-1969971
- the present invention relates to a method for merging paired sequence fragments for next-generation sequencing analysis.
- a first aspect of the present disclosure provides a method for merging paired sequence fragments for next-generation sequencing analysis.
- a second aspect of the present disclosure provides an analysis method for paired sequence fragment merge display for next-generation sequencing analysis.
- the sequencing data processing time can be reduced by half, and the data storage space can be reduced because each of the two sequences expressed in sequence alignment visualization is reduced to one.
- the target region coverage and sequencing error of the sequence can be determined only with the merged sequence information without additional information, next-generation sequencing analysis can be performed more efficiently.
- 1 is a diagram showing the overlap of paired sequence fragments seen in target sequencing.
- FIG. 2 is a diagram showing a method of merging when the overlapping of the R1 and R2 lead sequences with respect to the target reference sequence is perfectly matched.
- FIG. 3 is a diagram illustrating a merge display method when information on R1 and R2 lead sequences for a target reference sequence is unknown.
- FIG. 4 is a diagram showing a merge display method when the overlapping bases of two paired sequence fragments are different.
- FIG. 5 is a diagram showing a merge display method when two paired sequence fragments do not overlap.
- FIG. 6 is a diagram showing an example of merge display (when there are many lowercase n's) according to the merge display method of the present application.
- FIG. 7 is a diagram showing an example of merge display (when there are many bases indicated by lowercase letters) according to the merge display method of the present application.
- FIG. 8 is a diagram showing a comparison between a conventional method and a case in which sequence alignment is visualized according to the present method.
- the term “combination(s) of these” included in the expression of the Markush form means a mixture or combination of one or more selected from the group consisting of the components described in the expression of the Markush form, It means including one or more selected from the group consisting of the above components.
- a first aspect of the present disclosure provides a method for merging paired sequence fragments for next-generation sequencing analysis.
- the base of the reference sequence is displayed as it is, but in lowercase letters
- a paired sequence fragment merging display method for next-generation sequencing analysis characterized in that the bases of the reference sequence are displayed as they are in lowercase letters when there is no base in the overlapping portion of the two sequences (FIGS. 2 to 5 reference).
- paired-end read means a read (fragment) obtained by sequencing both ends of a cDNA fragment in the forward and reverse directions.
- read sequence refers to a single nucleic acid fragment analyzed through next-generation sequencing (NGS). Length of read sequence is generally composed of 35 to 500 bp (base pair) depending on the type of genome sequencer, and is generally represented by alphabet letters A, T, G, and C in the case of DNA bases.
- reference sequence used throughout the present specification means a base sequence that is a reference for generating the entire base sequence from the read sequences.
- the entire base sequence is completed by mapping a large amount of reads output from a genome sequencer with reference to a reference sequence.
- the reference sequence may be a sequence set in advance during nucleotide sequence analysis (eg, the entire human nucleotide sequence), or a nucleotide sequence generated by a genome sequencer may be used as a reference sequence.
- base used throughout the specification is the smallest unit constituting a reference sequence and a lead sequence.
- DNA it can be composed of four types of alphabetic characters A, T, G and C, and each of these is expressed as a base. That is, in the case of DNA, it is expressed by 4 bases, and this is also true of the lead sequence.
- the sequencing data processing time after sequence alignment is reduced by half, and the data stored in two lines is reduced to one. As space is reduced, next-generation sequencing can be performed more efficiently.
- a second aspect of the present disclosure provides an analysis method for paired sequence fragment merge display for next-generation sequencing analysis. Content overlapping with the first aspect of the present application is also applied to the method of the second aspect of the present application.
- the present application provides a method of interpreting a merged-marked sequence fragment according to the paired sequence fragment merge-marking method (see FIGS. 2 to 5).
- the present application may determine target region coverage and sequencing error of a sequence only with merged sequence information without additional information.
- the target reference sequence of the unknown part is imported as it is, but indicated in lowercase letters and combined (see FIG. 3).
- sequence fragments are merged and displayed according to the methods 1 to 4 above, it can be interpreted as follows.
- Number of lowercase letters a, t, g, c target region sequencing coverage (the higher the number of lowercase letters a, t, g, c, the narrower the sequencing coverage)
- the number of lowercase n the degree of sequencing error (the greater the number of lowercase n, the greater the sequencing error)
- the sequencing data processing time can be reduced by half, and since each of the two sequences is reduced to one, the data storage space is reduced, and without additional information. It was found that target region coverage and sequencing errors of sequences could be determined only with the merged sequence information.
Landscapes
- Life Sciences & Earth Sciences (AREA)
- Chemical & Material Sciences (AREA)
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Organic Chemistry (AREA)
- Bioinformatics & Cheminformatics (AREA)
- General Health & Medical Sciences (AREA)
- Biophysics (AREA)
- Biotechnology (AREA)
- Analytical Chemistry (AREA)
- Zoology (AREA)
- Wood Science & Technology (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Theoretical Computer Science (AREA)
- Medical Informatics (AREA)
- Genetics & Genomics (AREA)
- Molecular Biology (AREA)
- Immunology (AREA)
- Microbiology (AREA)
- Biochemistry (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Chemical Kinetics & Catalysis (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
La présente invention concerne un procédé pour afficher une fusion de fragments de séquence appariés pour un séquençage de nouvelle génération. Dans la présente invention, deux séquences sont fusionnées pour être utilisées pour le séquençage, et ainsi, un temps de traitement de données de séquençage peut être réduit de moitié, deux séquences utilisées pour exprimer la visualisation de l'alignement des séquences sont réduites à une seule, et ainsi, un espace de stockage pour les données peut être réduit, et une plage de régions cibles d'une séquence et une erreur de séquençage peuvent être déterminées uniquement avec des informations de séquences fusionnées sans informations supplémentaires, et ainsi, le séquençage de nouvelle génération peut être effectué de manière plus efficace.
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| KR1020210122244A KR102799506B1 (ko) | 2021-09-14 | 2021-09-14 | 차세대 염기서열 분석을 위한 짝지어진 서열조각 병합 표시 방법 |
| KR10-2021-0122244 | 2021-09-14 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2023043097A1 true WO2023043097A1 (fr) | 2023-03-23 |
Family
ID=85603103
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/KR2022/013100 Ceased WO2023043097A1 (fr) | 2021-09-14 | 2022-09-01 | Procédé pour afficher une fusion de fragments de séquence appariés pour un séquençage de nouvelle génération |
Country Status (2)
| Country | Link |
|---|---|
| KR (1) | KR102799506B1 (fr) |
| WO (1) | WO2023043097A1 (fr) |
Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| KR20150059101A (ko) * | 2013-11-18 | 2015-05-29 | 한국전자통신연구원 | 염색체 전좌의 위치 계산방법 |
| WO2017075706A1 (fr) * | 2015-11-04 | 2017-05-11 | Vineland Research and Innovations Centre Inc. | Procédé haut débit de criblage d'une population à la recherche d'éléments comprenant au moins une mutation dans une séquence cible à l'aide d'une analyse de séquence sans alignement |
| WO2020047553A1 (fr) * | 2018-08-31 | 2020-03-05 | Guardant Health, Inc. | Détection de variants génétiques basée sur des lectures fusionnées et non fusionnées |
| KR102177386B1 (ko) * | 2019-11-05 | 2020-11-11 | 주식회사 마크로젠 | 차세대염기서열분석을 위한, 마이크로웨이브를 이용한 dna 추출방법 및 이의 용도 |
-
2021
- 2021-09-14 KR KR1020210122244A patent/KR102799506B1/ko active Active
-
2022
- 2022-09-01 WO PCT/KR2022/013100 patent/WO2023043097A1/fr not_active Ceased
Patent Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| KR20150059101A (ko) * | 2013-11-18 | 2015-05-29 | 한국전자통신연구원 | 염색체 전좌의 위치 계산방법 |
| WO2017075706A1 (fr) * | 2015-11-04 | 2017-05-11 | Vineland Research and Innovations Centre Inc. | Procédé haut débit de criblage d'une population à la recherche d'éléments comprenant au moins une mutation dans une séquence cible à l'aide d'une analyse de séquence sans alignement |
| WO2020047553A1 (fr) * | 2018-08-31 | 2020-03-05 | Guardant Health, Inc. | Détection de variants génétiques basée sur des lectures fusionnées et non fusionnées |
| KR102177386B1 (ko) * | 2019-11-05 | 2020-11-11 | 주식회사 마크로젠 | 차세대염기서열분석을 위한, 마이크로웨이브를 이용한 dna 추출방법 및 이의 용도 |
Non-Patent Citations (1)
| Title |
|---|
| JOHNM. GASPAR: "NGmerge: merging paired-end reads via novel empirically-derived models of sequencing errors", BMC BIOINFORMATICS, BIOMED CENTRAL LTD, LONDON, UK, vol. 19, no. 1, 20 December 2018 (2018-12-20), London, UK , pages 1 - 9, XP021265639, DOI: 10.1186/s12859-018-2579-2 * |
Also Published As
| Publication number | Publication date |
|---|---|
| KR102799506B1 (ko) | 2025-04-28 |
| KR20230039218A (ko) | 2023-03-21 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| Kumar et al. | Next-generation sequencing and emerging technologies | |
| US10370710B2 (en) | Analysis methods | |
| Seo et al. | De novo assembly and phasing of a Korean human genome | |
| US20140129201A1 (en) | Validation of genetic tests | |
| Wadapurkar et al. | Computational analysis of next generation sequencing data and its applications in clinical oncology | |
| Bocklandt et al. | Bionano genome mapping: high-throughput, ultra-long molecule genome analysis system for precision genome assembly and haploid-resolved structural variation discovery | |
| Korpelainen et al. | RNA-seq data analysis: a practical approach | |
| Duncan et al. | Next-generation sequencing in the clinical laboratory | |
| Liu et al. | Performance of a multiplexed amplicon-based next-generation sequencing assay for HLA typing | |
| Macken et al. | Enhanced mitochondrial genome analysis: bioinformatic and long-read sequencing advances and their diagnostic implications | |
| Leatham et al. | A rapid, multiplex digital PCR assay to detect gene variants and fusions in non‐small cell lung cancer | |
| Steyaert et al. | Systematic analysis of paralogous regions in 41,755 exomes uncovers clinically relevant variation | |
| Kamps-Hughes et al. | A systematic method for detecting abnormal mRNA splicing and assessing its clinical impact in individuals undergoing genetic testing for hereditary cancer syndromes | |
| WO2023043097A1 (fr) | Procédé pour afficher une fusion de fragments de séquence appariés pour un séquençage de nouvelle génération | |
| WO2023018024A1 (fr) | Méthode de diagnostic de l'instabilité des microsatellites à l'aide d'un taux de variation de longueurs de séquence au niveau de locus microsatellites | |
| WO2014119914A1 (fr) | Procédé permettant de fournir des informations sur un marqueur personnel basé sur une séquence de gènes et appareil l'utilisant | |
| WO2023018026A1 (fr) | Méthode de diagnostic de l'instabilité des microsatellites par l'utilisation d'une différence entre une valeur maximale et une valeur minimale de longueurs de séquence de loci microsatellites | |
| WO2024106109A1 (fr) | Détection de gène à l'aide d'un substrat modifié qui modifie la mobilité d'électrophorèse | |
| Vaisvila et al. | Discovery of novel DNA cytosine deaminase activities enables a nondestructive single-enzyme methylation sequencing method for base resolution high-coverage methylome mapping of cell-free and ultra-low input DNA | |
| WO2022124575A1 (fr) | Procédé de diagnostic de l'instabilité des microsatellites à l'aide d'un coefficient de variation de longueurs de séquence dans des loci microsatellites | |
| WO2018110940A1 (fr) | Procédé permettant de mesurer la complexité d'une banque en vue d'un séquençage de nouvelle génération | |
| WO2016208827A1 (fr) | Procédé et dispositif d'analyse de gène | |
| WO2023214754A1 (fr) | Procédé et appareil de génération de séquence de graines pour une analyse d'itd dans une analyse de ngs | |
| Frias-De-Diego et al. | Influence of Sequencing Technology on Pangenome-Level Analysis and Detection of Antimicrobial Resistance Genes in ESKAPE Pathogens | |
| Lim et al. | Comparison of Sequencing-by-Synthesis and Avidity Base Chemistry Next-Generation Sequencing Platforms in Identifying Somatic Variants of Hematological Malignancies |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 22870182 Country of ref document: EP Kind code of ref document: A1 |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| 122 | Ep: pct application non-entry in european phase |
Ref document number: 22870182 Country of ref document: EP Kind code of ref document: A1 |