[go: up one dir, main page]

WO2023043097A1 - Procédé pour afficher une fusion de fragments de séquence appariés pour un séquençage de nouvelle génération - Google Patents

Procédé pour afficher une fusion de fragments de séquence appariés pour un séquençage de nouvelle génération Download PDF

Info

Publication number
WO2023043097A1
WO2023043097A1 PCT/KR2022/013100 KR2022013100W WO2023043097A1 WO 2023043097 A1 WO2023043097 A1 WO 2023043097A1 KR 2022013100 W KR2022013100 W KR 2022013100W WO 2023043097 A1 WO2023043097 A1 WO 2023043097A1
Authority
WO
WIPO (PCT)
Prior art keywords
sequence
generation sequencing
sequencing
sequences
paired
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/KR2022/013100
Other languages
English (en)
Korean (ko)
Inventor
정준혁
박승구
서성현
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Dxome Co Ltd
Original Assignee
Dxome Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dxome Co Ltd filed Critical Dxome Co Ltd
Publication of WO2023043097A1 publication Critical patent/WO2023043097A1/fr
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Images

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6806Preparing nucleic acids for analysis, e.g. for polymerase chain reaction [PCR] assay
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6869Methods for sequencing
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids
    • G16B30/20Sequence assembly
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B45/00ICT specially adapted for bioinformatics-related data visualisation, e.g. displaying of maps or networks

Definitions

  • the present invention relates to a method for merging paired sequence fragments for next-generation sequencing analysis.
  • DNA sequence information is expressed in DNA sequence genes, and complete DNA sequence information of an individual is very important to understand life phenomena and obtain disease-related information.
  • the key to decoding DNA sequence information is to identify individual differences and ethnic characteristics, to identify congenital causes including chromosomal abnormalities in diseases related to genetic abnormalities, and to identify genetic defects in complex diseases such as diabetes and hypertension. is to find
  • sequencing data is very important because information such as gene expression, gene diversity, and their interactions can be widely used in the field of molecular diagnosis and treatment.
  • next-generation sequencing As a method for genome sequencing, Next Generation Sequencing (NGS) has been used since 2007, and with the development of NGS, it has become much easier and cheaper to analyze compared to traditional methods.
  • Representative next-generation genome sequencers that implement next-generation sequencing methods include Roche/454, Illumina/Solexa, and SOLiD of Life Technologies (ABI). These next-generation sequencing devices can read more than 80 million sequences in 7 hours. With these technological advances, next-generation sequencing methods, which were conventionally used only for research due to the enormous test cost, can be used in medical clinical tests.
  • Target selection is divided into an amplicon method of amplification with PCR primers and a capture method of hybridization using a probe.
  • the PCR amplicon method is useful for testing small, well-designed gene panels because it requires a shorter test time and requires a relatively small amount of DNA, but it is difficult to use when the number of genes in a panel increases or when exome sequencing is required.
  • the probe method is advantageous.
  • the present inventors have developed a method for merging and displaying paired sequence fragments for next-generation sequencing analysis, thereby completing the present invention.
  • Patent Document 0001 Patent Registration No. 10-1969971
  • the present invention relates to a method for merging paired sequence fragments for next-generation sequencing analysis.
  • a first aspect of the present disclosure provides a method for merging paired sequence fragments for next-generation sequencing analysis.
  • a second aspect of the present disclosure provides an analysis method for paired sequence fragment merge display for next-generation sequencing analysis.
  • the sequencing data processing time can be reduced by half, and the data storage space can be reduced because each of the two sequences expressed in sequence alignment visualization is reduced to one.
  • the target region coverage and sequencing error of the sequence can be determined only with the merged sequence information without additional information, next-generation sequencing analysis can be performed more efficiently.
  • 1 is a diagram showing the overlap of paired sequence fragments seen in target sequencing.
  • FIG. 2 is a diagram showing a method of merging when the overlapping of the R1 and R2 lead sequences with respect to the target reference sequence is perfectly matched.
  • FIG. 3 is a diagram illustrating a merge display method when information on R1 and R2 lead sequences for a target reference sequence is unknown.
  • FIG. 4 is a diagram showing a merge display method when the overlapping bases of two paired sequence fragments are different.
  • FIG. 5 is a diagram showing a merge display method when two paired sequence fragments do not overlap.
  • FIG. 6 is a diagram showing an example of merge display (when there are many lowercase n's) according to the merge display method of the present application.
  • FIG. 7 is a diagram showing an example of merge display (when there are many bases indicated by lowercase letters) according to the merge display method of the present application.
  • FIG. 8 is a diagram showing a comparison between a conventional method and a case in which sequence alignment is visualized according to the present method.
  • the term “combination(s) of these” included in the expression of the Markush form means a mixture or combination of one or more selected from the group consisting of the components described in the expression of the Markush form, It means including one or more selected from the group consisting of the above components.
  • a first aspect of the present disclosure provides a method for merging paired sequence fragments for next-generation sequencing analysis.
  • the base of the reference sequence is displayed as it is, but in lowercase letters
  • a paired sequence fragment merging display method for next-generation sequencing analysis characterized in that the bases of the reference sequence are displayed as they are in lowercase letters when there is no base in the overlapping portion of the two sequences (FIGS. 2 to 5 reference).
  • paired-end read means a read (fragment) obtained by sequencing both ends of a cDNA fragment in the forward and reverse directions.
  • read sequence refers to a single nucleic acid fragment analyzed through next-generation sequencing (NGS). Length of read sequence is generally composed of 35 to 500 bp (base pair) depending on the type of genome sequencer, and is generally represented by alphabet letters A, T, G, and C in the case of DNA bases.
  • reference sequence used throughout the present specification means a base sequence that is a reference for generating the entire base sequence from the read sequences.
  • the entire base sequence is completed by mapping a large amount of reads output from a genome sequencer with reference to a reference sequence.
  • the reference sequence may be a sequence set in advance during nucleotide sequence analysis (eg, the entire human nucleotide sequence), or a nucleotide sequence generated by a genome sequencer may be used as a reference sequence.
  • base used throughout the specification is the smallest unit constituting a reference sequence and a lead sequence.
  • DNA it can be composed of four types of alphabetic characters A, T, G and C, and each of these is expressed as a base. That is, in the case of DNA, it is expressed by 4 bases, and this is also true of the lead sequence.
  • the sequencing data processing time after sequence alignment is reduced by half, and the data stored in two lines is reduced to one. As space is reduced, next-generation sequencing can be performed more efficiently.
  • a second aspect of the present disclosure provides an analysis method for paired sequence fragment merge display for next-generation sequencing analysis. Content overlapping with the first aspect of the present application is also applied to the method of the second aspect of the present application.
  • the present application provides a method of interpreting a merged-marked sequence fragment according to the paired sequence fragment merge-marking method (see FIGS. 2 to 5).
  • the present application may determine target region coverage and sequencing error of a sequence only with merged sequence information without additional information.
  • the target reference sequence of the unknown part is imported as it is, but indicated in lowercase letters and combined (see FIG. 3).
  • sequence fragments are merged and displayed according to the methods 1 to 4 above, it can be interpreted as follows.
  • Number of lowercase letters a, t, g, c target region sequencing coverage (the higher the number of lowercase letters a, t, g, c, the narrower the sequencing coverage)
  • the number of lowercase n the degree of sequencing error (the greater the number of lowercase n, the greater the sequencing error)
  • the sequencing data processing time can be reduced by half, and since each of the two sequences is reduced to one, the data storage space is reduced, and without additional information. It was found that target region coverage and sequencing errors of sequences could be determined only with the merged sequence information.

Landscapes

  • Life Sciences & Earth Sciences (AREA)
  • Chemical & Material Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Organic Chemistry (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Biotechnology (AREA)
  • Analytical Chemistry (AREA)
  • Zoology (AREA)
  • Wood Science & Technology (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Theoretical Computer Science (AREA)
  • Medical Informatics (AREA)
  • Genetics & Genomics (AREA)
  • Molecular Biology (AREA)
  • Immunology (AREA)
  • Microbiology (AREA)
  • Biochemistry (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Chemical Kinetics & Catalysis (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

La présente invention concerne un procédé pour afficher une fusion de fragments de séquence appariés pour un séquençage de nouvelle génération. Dans la présente invention, deux séquences sont fusionnées pour être utilisées pour le séquençage, et ainsi, un temps de traitement de données de séquençage peut être réduit de moitié, deux séquences utilisées pour exprimer la visualisation de l'alignement des séquences sont réduites à une seule, et ainsi, un espace de stockage pour les données peut être réduit, et une plage de régions cibles d'une séquence et une erreur de séquençage peuvent être déterminées uniquement avec des informations de séquences fusionnées sans informations supplémentaires, et ainsi, le séquençage de nouvelle génération peut être effectué de manière plus efficace.
PCT/KR2022/013100 2021-09-14 2022-09-01 Procédé pour afficher une fusion de fragments de séquence appariés pour un séquençage de nouvelle génération Ceased WO2023043097A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
KR1020210122244A KR102799506B1 (ko) 2021-09-14 2021-09-14 차세대 염기서열 분석을 위한 짝지어진 서열조각 병합 표시 방법
KR10-2021-0122244 2021-09-14

Publications (1)

Publication Number Publication Date
WO2023043097A1 true WO2023043097A1 (fr) 2023-03-23

Family

ID=85603103

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/KR2022/013100 Ceased WO2023043097A1 (fr) 2021-09-14 2022-09-01 Procédé pour afficher une fusion de fragments de séquence appariés pour un séquençage de nouvelle génération

Country Status (2)

Country Link
KR (1) KR102799506B1 (fr)
WO (1) WO2023043097A1 (fr)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20150059101A (ko) * 2013-11-18 2015-05-29 한국전자통신연구원 염색체 전좌의 위치 계산방법
WO2017075706A1 (fr) * 2015-11-04 2017-05-11 Vineland Research and Innovations Centre Inc. Procédé haut débit de criblage d'une population à la recherche d'éléments comprenant au moins une mutation dans une séquence cible à l'aide d'une analyse de séquence sans alignement
WO2020047553A1 (fr) * 2018-08-31 2020-03-05 Guardant Health, Inc. Détection de variants génétiques basée sur des lectures fusionnées et non fusionnées
KR102177386B1 (ko) * 2019-11-05 2020-11-11 주식회사 마크로젠 차세대염기서열분석을 위한, 마이크로웨이브를 이용한 dna 추출방법 및 이의 용도

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20150059101A (ko) * 2013-11-18 2015-05-29 한국전자통신연구원 염색체 전좌의 위치 계산방법
WO2017075706A1 (fr) * 2015-11-04 2017-05-11 Vineland Research and Innovations Centre Inc. Procédé haut débit de criblage d'une population à la recherche d'éléments comprenant au moins une mutation dans une séquence cible à l'aide d'une analyse de séquence sans alignement
WO2020047553A1 (fr) * 2018-08-31 2020-03-05 Guardant Health, Inc. Détection de variants génétiques basée sur des lectures fusionnées et non fusionnées
KR102177386B1 (ko) * 2019-11-05 2020-11-11 주식회사 마크로젠 차세대염기서열분석을 위한, 마이크로웨이브를 이용한 dna 추출방법 및 이의 용도

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
JOHNM. GASPAR: "NGmerge: merging paired-end reads via novel empirically-derived models of sequencing errors", BMC BIOINFORMATICS, BIOMED CENTRAL LTD, LONDON, UK, vol. 19, no. 1, 20 December 2018 (2018-12-20), London, UK , pages 1 - 9, XP021265639, DOI: 10.1186/s12859-018-2579-2 *

Also Published As

Publication number Publication date
KR102799506B1 (ko) 2025-04-28
KR20230039218A (ko) 2023-03-21

Similar Documents

Publication Publication Date Title
Kumar et al. Next-generation sequencing and emerging technologies
US10370710B2 (en) Analysis methods
Seo et al. De novo assembly and phasing of a Korean human genome
US20140129201A1 (en) Validation of genetic tests
Wadapurkar et al. Computational analysis of next generation sequencing data and its applications in clinical oncology
Bocklandt et al. Bionano genome mapping: high-throughput, ultra-long molecule genome analysis system for precision genome assembly and haploid-resolved structural variation discovery
Korpelainen et al. RNA-seq data analysis: a practical approach
Duncan et al. Next-generation sequencing in the clinical laboratory
Liu et al. Performance of a multiplexed amplicon-based next-generation sequencing assay for HLA typing
Macken et al. Enhanced mitochondrial genome analysis: bioinformatic and long-read sequencing advances and their diagnostic implications
Leatham et al. A rapid, multiplex digital PCR assay to detect gene variants and fusions in non‐small cell lung cancer
Steyaert et al. Systematic analysis of paralogous regions in 41,755 exomes uncovers clinically relevant variation
Kamps-Hughes et al. A systematic method for detecting abnormal mRNA splicing and assessing its clinical impact in individuals undergoing genetic testing for hereditary cancer syndromes
WO2023043097A1 (fr) Procédé pour afficher une fusion de fragments de séquence appariés pour un séquençage de nouvelle génération
WO2023018024A1 (fr) Méthode de diagnostic de l'instabilité des microsatellites à l'aide d'un taux de variation de longueurs de séquence au niveau de locus microsatellites
WO2014119914A1 (fr) Procédé permettant de fournir des informations sur un marqueur personnel basé sur une séquence de gènes et appareil l'utilisant
WO2023018026A1 (fr) Méthode de diagnostic de l'instabilité des microsatellites par l'utilisation d'une différence entre une valeur maximale et une valeur minimale de longueurs de séquence de loci microsatellites
WO2024106109A1 (fr) Détection de gène à l'aide d'un substrat modifié qui modifie la mobilité d'électrophorèse
Vaisvila et al. Discovery of novel DNA cytosine deaminase activities enables a nondestructive single-enzyme methylation sequencing method for base resolution high-coverage methylome mapping of cell-free and ultra-low input DNA
WO2022124575A1 (fr) Procédé de diagnostic de l'instabilité des microsatellites à l'aide d'un coefficient de variation de longueurs de séquence dans des loci microsatellites
WO2018110940A1 (fr) Procédé permettant de mesurer la complexité d'une banque en vue d'un séquençage de nouvelle génération
WO2016208827A1 (fr) Procédé et dispositif d'analyse de gène
WO2023214754A1 (fr) Procédé et appareil de génération de séquence de graines pour une analyse d'itd dans une analyse de ngs
Frias-De-Diego et al. Influence of Sequencing Technology on Pangenome-Level Analysis and Detection of Antimicrobial Resistance Genes in ESKAPE Pathogens
Lim et al. Comparison of Sequencing-by-Synthesis and Avidity Base Chemistry Next-Generation Sequencing Platforms in Identifying Somatic Variants of Hematological Malignancies

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22870182

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 22870182

Country of ref document: EP

Kind code of ref document: A1