[go: up one dir, main page]

WO2014041380A1 - Procédé et produit de programme informatique pour la détection d'une mutation dans une séquence nucléotidique - Google Patents

Procédé et produit de programme informatique pour la détection d'une mutation dans une séquence nucléotidique Download PDF

Info

Publication number
WO2014041380A1
WO2014041380A1 PCT/HU2013/000086 HU2013000086W WO2014041380A1 WO 2014041380 A1 WO2014041380 A1 WO 2014041380A1 HU 2013000086 W HU2013000086 W HU 2013000086W WO 2014041380 A1 WO2014041380 A1 WO 2014041380A1
Authority
WO
WIPO (PCT)
Prior art keywords
nucleotide sequence
sub
read
word
words
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/HU2013/000086
Other languages
English (en)
Inventor
Lőrinc PONGOR
Ferenc PINTÉR
István PETÁK
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
KPS ZRT
Original Assignee
KPS ZRT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by KPS ZRT filed Critical KPS ZRT
Publication of WO2014041380A1 publication Critical patent/WO2014041380A1/fr
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/20Allele or variant detection, e.g. single nucleotide polymorphism [SNP] detection
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids
    • G16B30/10Sequence alignment; Homology search
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations

Definitions

  • NGS next generation sequencing
  • NGS technology for the clinical practice is its ability to deal with parallel sequencing of multiple genes. For instance, re-sequencing of signal transduction genes such as EGFR, KRAS, BRAF etc. is increasingly important approach for personalizing cancer therapies.
  • signal transduction genes such as EGFR, KRAS, BRAF etc.
  • a typical application example is a panel of 40 PCR amplicons taken from 12 genes. Before sequencing, a barcode DNA sequence of 10- 12 bp is added to each set of amplicons which enables parallel sequencing of several different amplicons from different patients. Processing high throughput NGS data for diagnostic purposes has its own challenges.
  • EGFR Exon 19 deletions are known to be sensitizing to EGFR tyrosine kinase inhibitor (TKI) therapy (see Pinter, F. et al. Journal of Molecular Diagnostics 10, 160-168, 2008; and Schwab, R. et al. J Clin Oncol 23, 7736- 8, 2005).
  • TKI tyrosine kinase inhibitor
  • nucleotide sequence read corresponding to said reference nucleotide sequence, said nucleotide sequence read having a predetermined similarity to the reference nucleotide sequence, generating a set of overlapping sub- words from the reference sequence, matching all of the overlapping sub-words of the reference sequence with the read sequence by applying an exact string matching algorithm
  • identifying a complex mutation within the nucleotide sequence read by using a dynamic programming algorithm for comparing said polymorphic region of the reference nucleotide sequence with said mutated segment of the nucleotide sequence read.
  • the sub-words have a preset length
  • the subsequent sub-words have a preset overlapping, wherein the length of the overlapping sub- words and/or the extent of overlapping may be adjustable.
  • the strstr function defined in the GNU C Library is preferably used, and a complex mutation within the nucleotide sequence read may be identified by using the Needleman-Wunsch algorithm or the Smith-Waterman algorithm.
  • a complex mutation within the nucleotide sequence read may be identified by using the Needleman-Wunsch algorithm or the Smith-Waterman algorithm.
  • a computer program product comprising a computer-usable medium having computer-readable program codes embodied in the medium, the program codes when executed causing a computer to carry out the above method.
  • Figure 2 is an example of a memory-resident index list built for the sub-words of the reference sequence
  • Figure 3 schematically illustrates problem of finding a mutation in a reference sequence containing repeats
  • Figures 4.A-C are plot diagrams illustrating the run times of the computer program carrying out the method according to the invention, as a function of various benchmarking parameters, and
  • FIG. 5 is a flow diagram showing the major processing steps of the method according to the invention. Description of the preferred embodiments
  • a reference nucleotide sequence 100 and a nucleotide sequence read 1 10 is provided for detecting genetic variations (mutations) in the nucleotide sequence read 1 10.
  • both of the reference nucleotide sequence 100 and the read nucleotide sequence 110 have a length of 100-500 base-pair (bp).
  • the segment of the read sequence 110 that contains one or more mutations is called a mutated segment (referred by 1 12 in Figures l.b to l.d).
  • the segment of the reference sequence 1 10 that is mapped to the mutated segment is called a polymorphic region (referred by 102 in Figures l.b to l.d).
  • a plurality of overlapping sub- words 101 are generated so that the thus obtained series of the overlapping sub- words 101 entirely covers the whole reference sequence 100.
  • the read sequence 110 is sufficiently similar to the reference sequence 100.
  • Sufficient similarity may be defined in a user-tunable way.
  • a read sequence 1 10 and a reference sequence 100 may be regarded to be sufficiently similar if they have at least five common sub-words 101 with a length of at least 15 bp.
  • a memory-resident index list is built. This list contains the sequences of sub- words 101 of length w starting at sequence positions 1, ⁇ , 2 ⁇ , 3 ⁇ ... respectively.
  • the length w of the sub- words is measured in the number of base-pairs (bp). If the last sub- word 101 is shorter than w, its sequence is preferably added to the previous sub- word 101.
  • an exact string matching algorithm is used to locate the position of the sub- words 101 of the reference sequence 100 within the read sequence 1 10.
  • Such matching algorithms have been already available in the form of highly efficient assembly language implementations, such as the strstr function in the GNU C Library, which is a robust and efficient exact string matching function found to be the most preferred algorithm for the method according to the present invention.
  • the strstr function provides a return value, which is either the starting position of the first occurrence of a matching sub- word 101 within the read sequence 110, or a null pointer for a sub- word 101 that has been not found within the read sequence 110.
  • Other exact string matching algorithms such as the Aho-corasick algorithm, the Rabin-Karp string searching algorithm, the Commentz- Walter algorithm, or the like, may also be suitable for use in the method of the invention.
  • the exact string matching algorithm such as the strstr function
  • the exact string matching algorithm is to be applied in a recursive fashion.
  • Very highly repetitive sequences e.g. reference sequences with dominant homopolymeric runs
  • sequences are not frequent targets in medical diagnosis applications.
  • the polymorphic region 102 of the reference sequence 100 is mapped out by the corresponding parts of the non-matching sub- words 101 starting at the end of the last matching sub- word within the reference sequence 100 before the mutated position(s), i.e. from end of the sub- word 101a in Figures 1.b-d, and finishing at the beginning of the first matching sub- word following the mutated position(s), i.e. at the beginning of the sub- word 101b in Figures l.b-d. More precisely, the polymorphic region 102 is defined to be the section of the reference sequence 100 between the end of the last matching sub- word (e.g. sub- word 101a) prior to the first non-matching sub- word and the beginning of the first matching sub-word (e.g. sub-word 101b) following the last non-matching sub-word.
  • the polymorphic region 102 is defined to be the section of the reference sequence 100 between the end of the last matching sub- word (e.g. sub- word 101a) prior to the first
  • Figure l .b shows a polymorphic region 102 of the reference sequence 100 for which a single substitution has been found in the read sequence 100
  • Figure l.c shows the case where the polymorphic region 102 corresponds to a mutated segment 112 of the read sequence 1 10 that contains a deleted portion of one or more nucleotides
  • Figure l.d. shows the case where the polymorphic region 102 corresponds to a mutated segment 112 of the read sequence 110 that contains an inserted portion of one or more nucleotides.
  • Figure l.b also shows that a section containing a substitution or a simple indel can not be longer than ⁇ , which sets an upper limit to the length of the polymorphic region found by the exact string matching algorithm.
  • the exact position of the mutation(s) is determined.
  • Figure 3 shows a part of a reference sequence 100 including two identical sections 310 and 320 indicated by boxes. It is noted that the aforementioned strstr function is capable of detecting only the first occurrence of multiple matching sub- words 101, 10 , therefore a mutation M within the read sequence 110 covered by a second (or a further) internal repeat (i.e. sub-word 10 ) would not be detected once the first occurrence of a repeated sub- word 101 under matching has been first found in the read sequence 110, i.e. within a section corresponding to the first one 310 of the identical sections of the reference sequence 100. Such mutations within a repeated section would therefore be noticed only if the string matching algorithm specifically looks for such mutations.
  • a second (or a further) internal repeat i.e. sub-word 10
  • a memory-resident index list 200 is built for the sub- words of the reference sequence as shown in Figure 2.
  • the reference sequence is translated into an index list that contains the following data for each one of the sub-words: a) the starting position of the sub-word within the reference sequence,
  • a repetitiveness indicator which is a flag variable indicating whether or not the sub-word is found at multiple positions within the reference sequence. If a given sub-word appears only once within the reference sequence, the repetitiveness flag will be 0, otherwise the flag will indicate the number of repetitions of the given sub- word within the reference sequence. For example, a repetition number 2 means that a particular sub-word appears three times within the reference sequence, including a first occurrence, and to subsequent repeats thereof.
  • repeats in the reference sequence may lead to a failure of finding a mutation in the subsequent repeating sub-words as the string matching algorithm would normally find only the first occurrence of the repeating sub-word of the read sequence, while this first occurrence may not necessarily contain a mutation.
  • the Exon sequences which are the predominant targets in diagnostic sequencing are typically non-repetitive.
  • the index list shown in Figure 2
  • a (hypothetical) homopolymeric reference sequence e.g. a sequence containing 100 'A's
  • a mutation is qualified as a simple mutation if the mutated segment contains one block of mutation, wherein the block of mutation may contain one or more substitution or indel. Otherwise a mutation is qualified as a complex mutation, i.e. wherein the mutation is formed by more than one separated block of mutation.
  • a simple mutation within the mutated segment of the read sequence is identified by using a character-based comparison of said polymorphic region of the reference nucleotide sequence and said mutated segment of the read sequence, while a complex mutation within the mutated segment of the read nucleotide sequence is identified by using a dynamic programming algorithm for comparing said polymorphic region of the reference nucleotide sequence and said mutated segment of the read nucleotide sequence.
  • different search strategies are used for said two types of mutations.
  • the method according to the present invention first an attempt is made to locate simple mutations using a character-by-character comparison of the two sequence segments, i.e. the polymorphic region within the reference sequence and the mutated segment within the read sequence. If only one simple mutation is found, its location and type (substitution, insertion, deletion) is identified. However, if there is a further mismatch within a window of w bounding a simple mutation on either side, the mutation will be considered "complex", and the two sequence segments will be compared with a dynamic programming algorithm.
  • the Needleman-Wunsch algorithm is preferably used for mutations located within the interior of the read sequence
  • the Smith-Waterman algorithm is preferably used for mutations located at the 5' or 3' termini of the read sequence.
  • complex polymorphisms such as insertions, deletions, and complex indels are recognised by dynamic programming, which provides an optimal identification of such mutations, but the alignment pattern will depend on specific parameters, such as gap insertion and gap extension penalties as specified in Gotoh, O.: An improved algorithm for matching biological sequences. J. Mol. Biol. 162, 705-708, 1982.
  • the first module is the Preprocessing module 500. This module performs reading and checking the input files in steps 501 and 502, respectively.
  • the input files may contain the following data:
  • the user is allowed to select single end (5') or paired end (5' and 3') bar coding.
  • the Analysis module 510 uses the output of the Preprocessing module 500 to perform index file construction in step 511, read filtering in step 512, locating the polymorphic region and identification of simple mutations in steps 513 and 514, respectively, analysis of complex mutations and truncated reads in step 515, and output file generation in step 516.
  • the program builds the index file (see Figure 2) based on the user-defined word-length w and tiling shift values ⁇ (see Figures l.a-d).
  • Read filtering (step 512) is used to retain those reads that contain
  • each retained read will be characterized by the position(s) of the matching sub- words found by the exact string matching algorithm.
  • the polymorphic regions defined as the sequence segment between the end of the last matching sub-word preceding the mutated position(s), and the beginning of the first sub-word following the respective mutated position(s) are located.
  • the polymorphic regions are first compared with the corresponding region of the reference sequence using a character-by-character comparison. This process will allow an accurate identification of point mutations (substitutions, simple indels), as well as contiguous, single deletions or insertions.
  • step 515 polymorphic regions that contain more than one simple mutation are deemed "complex" and are therefore further analyzed by the Needleman-Wunsch algorithm, whereas the truncated reads are analyzed by the Smith- Waterman algorithm.
  • the output of the Analysis module is generated in step 516 for each read, for example, in the form of a tab-delimited list of the mutations.
  • the Postprocessing module 520 performs, in step 521, counting the frequency of individual mutations provided in the output data file of the Analysis module 510, and then generating an output result file in step 522.
  • the output of the Postprocessing module 520 may be presented, for example, in a tabular format.
  • the program code of the Analysis module 510 is preferably written in the C program language.
  • the Preprocessing and Postprocessing modules 500, 520 are wrapper program, preferably written in the PERL program language.
  • Raw datasets of approximately 400,000 reads were collected for each cell line. Two datasets containing mutations found in the COSMIC database (http://www.sanger.ac.uk/genetics/CGP/cosmic/) were used in further comparisons. In order to benchmark all algorithms on an equal footing, raw datasets were prefiltered so that only reads containing valid molecular identifiers and longer than 80 bp were used in the analysis.
  • Dataset 1 (8610 reads) contained a 225 bp segment corresponding to Exon 20 of human EGFR receptor, with a 1280T mutation (COSMIC.ID: 6240), while Dataset 2 (5867 reads) contained a 286 bp segment corresponding to Exon 19 of human EGFR receptor, with a 137delGGAATTAAGAGAAGCA mutation (COSMIC ID: 6223).
  • COSMIC ID: 6240 a 1280T mutation
  • Dataset 2 5867 reads
  • 2 contained a 286 bp segment corresponding to Exon 19 of human EGFR receptor, with a 137delGGAATTAAGAGAAGCA mutation (COSMIC ID: 6223).
  • the octapeptide repeat region of human major prion protein was chosen as a prototype example of a repetitive protein.
  • An approximately 200 bp region was chosen that contains five 24 bp near-perfect repeats, with the structure of R1-R2-R2-R3-R4, where Rl is a nonapeptide repeat, R2 is present in two identical copies.
  • a natural variant of this region contains a 24 bp deletion between Rl and R2 (no. 2 in Table SI -2), while disease-linked mutations include several single or multiple insertions of 24bp repeats ' .
  • the sequence of the insertion often correspond to variants of the R2 repeat, which gives rise to structures such as R1-R2-R2'-R2-R2'R3-R4 (no 3 in Table Sl-2) where R2' differs from R2 by a single nucleotide substitution (sequences given in Appendix).
  • the run time of the computer program carrying out the method according to the present invention depends on various factors, such as
  • the processing speed of the method of the invention is basically influenced by a combination of several factors.
  • a simple and memory-resident index structure is used, that is fast to build and can be stored in the computer memory, even for a large number of reference sequences.
  • This index structure allows to use the simple string- matching algorithm strstr, which has a highly efficient, assembly-based implementation in GNU C library (http://www.gnu.org/software/libc/).
  • strstr has a highly efficient, assembly-based implementation in GNU C library (http://www.gnu.org/software/libc/).
  • the performance of the method according to the present inventions will be compared with several other commercially available mutation mapping tools (programs or algorithms).
  • NW Needleman-Wunsch
  • BWA-WS algorithm recommended for Roche454 reads was used.
  • Minimum threshold for shared sub- words 5
  • the time durations shown in the table were measured for 500,000 amplicons harboring different randomly placed substitutions.
  • the SNP(l) sequences were produced by placing one random substitution into the repeat region using the MSBAR program of the EMBOSS package. The time values were measured for 500,000 randomly generated read copies.

Landscapes

  • Life Sciences & Earth Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • General Health & Medical Sciences (AREA)
  • Analytical Chemistry (AREA)
  • Chemical & Material Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Biotechnology (AREA)
  • Evolutionary Biology (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Medical Informatics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Theoretical Computer Science (AREA)
  • Genetics & Genomics (AREA)
  • Molecular Biology (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
PCT/HU2013/000086 2012-09-11 2013-09-02 Procédé et produit de programme informatique pour la détection d'une mutation dans une séquence nucléotidique Ceased WO2014041380A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US201261699320P 2012-09-11 2012-09-11
US61/699,320 2012-09-11

Publications (1)

Publication Number Publication Date
WO2014041380A1 true WO2014041380A1 (fr) 2014-03-20

Family

ID=49382543

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/HU2013/000086 Ceased WO2014041380A1 (fr) 2012-09-11 2013-09-02 Procédé et produit de programme informatique pour la détection d'une mutation dans une séquence nucléotidique

Country Status (1)

Country Link
WO (1) WO2014041380A1 (fr)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9618474B2 (en) 2014-12-18 2017-04-11 Edico Genome, Inc. Graphene FET devices, systems, and methods of using the same for sequencing nucleic acids
US9859394B2 (en) 2014-12-18 2018-01-02 Agilome, Inc. Graphene FET devices, systems, and methods of using the same for sequencing nucleic acids
US9857328B2 (en) 2014-12-18 2018-01-02 Agilome, Inc. Chemically-sensitive field effect transistors, systems and methods for manufacturing and using the same
US10006910B2 (en) 2014-12-18 2018-06-26 Agilome, Inc. Chemically-sensitive field effect transistors, systems, and methods for manufacturing and using the same
US10020300B2 (en) 2014-12-18 2018-07-10 Agilome, Inc. Graphene FET devices, systems, and methods of using the same for sequencing nucleic acids
EP3483286A1 (fr) * 2017-11-09 2019-05-15 National Cancer Center Procédé d'analyse de séquence, appareil d'analyse de séquence, procédé de génération de séquence de référence, et appareil de génération de séquence de référence
CN110168647A (zh) * 2016-11-16 2019-08-23 宜曼达股份有限公司 测序数据读段重新比对的方法
US10429342B2 (en) 2014-12-18 2019-10-01 Edico Genome Corporation Chemically-sensitive field effect transistor
US10811539B2 (en) 2016-05-16 2020-10-20 Nanomedical Diagnostics, Inc. Graphene FET devices, systems, and methods of using the same for sequencing nucleic acids
CN114530199A (zh) * 2022-01-19 2022-05-24 重庆邮电大学 基于双重测序数据检测低频突变的方法、装置及存储介质

Non-Patent Citations (11)

* Cited by examiner, † Cited by third party
Title
GOTOH, 0.: "An improved algorithm for matching biological sequences", J MOL. BIOI., vol. 162, 1982, pages 705 - 708
H. LI ET AL: "A survey of sequence alignment algorithms for next-generation sequencing", BRIEFINGS IN BIOINFORMATICS, vol. 11, no. 5, 11 May 2010 (2010-05-11), pages 473 - 483, XP055085554, ISSN: 1467-5463, DOI: 10.1093/bib/bbq015 *
HACH, F. ET AL.: "mrsFAST.- a cache-oblivious algorithm for short-read mapping", NAT METHODS, vol. 7, 2010, pages 576 - 577
LANGMEAD BEN ET AL: "Fast gapped-read alignment with Bowtie 2", NATURE METHODS, NATURE PUBLISHING GROUP, GB, vol. 9, no. 4, 1 April 2012 (2012-04-01), pages 357 - 359, XP002715401, ISSN: 1548-7091, [retrieved on 20120304], DOI: 10.1038/NMETH.1923 *
LORINC S. PONGOR ET AL: "HeurAA: Accurate and Fast Detection of Genetic Variations with a Novel Heuristic Amplicon Aligner Program for Next Generation Sequencing", PLOS ONE, vol. 8, no. 1, 18 January 2013 (2013-01-18), pages e54294, XP055101377, DOI: 10.1371/journal.pone.0054294 *
M. DAVID ET AL: "SHRiMP2: Sensitive yet Practical Short Read Mapping", BIOINFORMATICS, vol. 27, no. 7, 28 January 2011 (2011-01-28), pages 1011 - 1012, XP055102118, ISSN: 1367-4803, DOI: 10.1093/bioinformatics/btr046 *
PINTER, F. ET AL., JOURNAL OF MOLECULAR DIAGNOSTICS, vol. 10, 2008, pages 160 - 168
RICE, P.; LONGDEN, I.; BLEASBY, A.: "EMBOSS: the European Molecular Biology Open Software Suite", TRENDS GENET, vol. 16, 2000, pages 276 - 277
SCHWAB, R. ET AL., J CLIN ONCOL, vol. 23, 2005, pages 7736 - 8
SOPHIE SCHBATH ET AL: "Mapping Reads on a Genomic Sequence: An Algorithmic Overview and a Practical Comparative Analysis", JOURNAL OF COMPUTATIONAL BIOLOGY, vol. 19, no. 6, 1 June 2012 (2012-06-01), pages 796 - 813, XP055101462, ISSN: 1066-5277, DOI: 10.1089/cmb.2012.0022 *
STENSON, P.D. ET AL., HUM GENOMICS, vol. 4, 2009, pages 69 - 72

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10494670B2 (en) 2014-12-18 2019-12-03 Agilome, Inc. Graphene FET devices, systems, and methods of using the same for sequencing nucleic acids
US10607989B2 (en) 2014-12-18 2020-03-31 Nanomedical Diagnostics, Inc. Graphene FET devices, systems, and methods of using the same for sequencing nucleic acids
US9857328B2 (en) 2014-12-18 2018-01-02 Agilome, Inc. Chemically-sensitive field effect transistors, systems and methods for manufacturing and using the same
US10006910B2 (en) 2014-12-18 2018-06-26 Agilome, Inc. Chemically-sensitive field effect transistors, systems, and methods for manufacturing and using the same
US10020300B2 (en) 2014-12-18 2018-07-10 Agilome, Inc. Graphene FET devices, systems, and methods of using the same for sequencing nucleic acids
US10429381B2 (en) 2014-12-18 2019-10-01 Agilome, Inc. Chemically-sensitive field effect transistors, systems, and methods for manufacturing and using the same
US9859394B2 (en) 2014-12-18 2018-01-02 Agilome, Inc. Graphene FET devices, systems, and methods of using the same for sequencing nucleic acids
US10429342B2 (en) 2014-12-18 2019-10-01 Edico Genome Corporation Chemically-sensitive field effect transistor
US9618474B2 (en) 2014-12-18 2017-04-11 Edico Genome, Inc. Graphene FET devices, systems, and methods of using the same for sequencing nucleic acids
US10811539B2 (en) 2016-05-16 2020-10-20 Nanomedical Diagnostics, Inc. Graphene FET devices, systems, and methods of using the same for sequencing nucleic acids
CN110168647A (zh) * 2016-11-16 2019-08-23 宜曼达股份有限公司 测序数据读段重新比对的方法
CN110168647B (zh) * 2016-11-16 2023-10-31 宜曼达股份有限公司 测序数据读段重新比对的方法
EP3483286A1 (fr) * 2017-11-09 2019-05-15 National Cancer Center Procédé d'analyse de séquence, appareil d'analyse de séquence, procédé de génération de séquence de référence, et appareil de génération de séquence de référence
US11901043B2 (en) 2017-11-09 2024-02-13 National Cancer Center Sequence analysis method, sequence analysis apparatus, reference sequence generation method, reference sequence generation apparatus, program, and storage medium
CN114530199A (zh) * 2022-01-19 2022-05-24 重庆邮电大学 基于双重测序数据检测低频突变的方法、装置及存储介质

Similar Documents

Publication Publication Date Title
WO2014041380A1 (fr) Procédé et produit de programme informatique pour la détection d'une mutation dans une séquence nucléotidique
Alser et al. Technology dictates algorithms: recent developments in read alignment
JP7637139B2 (ja) がん予測パイプラインにおけるrna発現コールを自動化するためのシステムおよび方法
Larson et al. A clinician’s guide to bioinformatics for next-generation sequencing
EP2718862B1 (fr) Procédé pour l'assemblage des données de séquence d'acide nucléique
EP3625802B1 (fr) Scansoft : procédé de détection de suppressions génomiques et de duplications en données de séquençage parallèle massif
Diroma et al. Extraction and annotation of human mitochondrial genomes from 1000 Genomes Whole Exome Sequencing data
EP2923293B1 (fr) Comparaison efficace de séquences polynucléotidiques
WO2018064547A1 (fr) Procédés de classification des variations somatiques
Marin et al. Analysis of the limited M. tuberculosis accessory genome reveals potential pitfalls of pan-genome analysis approaches
Pradhan et al. High-throughput sequencing
DK3283647T3 (en) A method for non-invasive prenatal detection of fetal chromosome aneuploidy from maternal blood
JP2021101629A (ja) ゲノム解析および遺伝子解析用のシステム並びに方法
KR101977976B1 (ko) 앰플리콘 기반 차세대 염기서열 분석기법에서 프라이머 서열을 제거하여 분석의 정확도를 높이는 방법
EP3588506A1 (fr) Systèmes et procédés d'analyse génomique et génétique
US20180293348A1 (en) Signature-hash for multi-sequence files
Esim et al. Determination of malignant melanoma by analysis of variation values
Hasan Identifying and Analyzing Indel Variants in the Human Genome Using Computational Approaches
Pongor et al. HeurAA: Accurate and Fast Detection of Genetic Variations with a Novel Heuristic Amplicon Aligner Program for Next Generation Sequencing
Liao et al. Efficient Identification of Short Tandem Repeats via Context-Aware Motif Discovery and Ultra-Fast Sequence Alignment
Chaushevska et al. Get ready for short tandem repeats analysis using long reads-The challenges and the state of the art
CN120569492A (zh) 使用单倍型分析的无创胎儿变异鉴定
Bernstein Penalty-Based Dynamic Programming for the Identification of Post-Translational Modifications in Peptide Mass Spectra
Shen et al. FirstSV: Fast and Accurate Approach of Structural Variations Detection for Short DNA fragments
Jayaram Improving the prediction of transcription factor binding sites to aid the interpretation of non-coding single nucleotide variants

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 13777331

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 13777331

Country of ref document: EP

Kind code of ref document: A1