WO2014085891A1 - Procédé et utilisation pour vérification d'erreurs de montage dans des génomes - Google Patents
Procédé et utilisation pour vérification d'erreurs de montage dans des génomes Download PDFInfo
- Publication number
- WO2014085891A1 WO2014085891A1 PCT/BR2013/000543 BR2013000543W WO2014085891A1 WO 2014085891 A1 WO2014085891 A1 WO 2014085891A1 BR 2013000543 W BR2013000543 W BR 2013000543W WO 2014085891 A1 WO2014085891 A1 WO 2014085891A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- genome
- genomes
- frequency
- assembly
- errors
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B20/00—ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B30/00—ICT specially adapted for sequence analysis involving nucleotides or amino acids
- G16B30/20—Sequence assembly
Definitions
- the present invention is a method for verifying assembly errors in genomes of sequenced or synthetically produced organisms that use frequency ratios between nucleotide sequence fragments of a genome.
- the method has applications in error checking in a genome assembled from fragments derived from different types of sequencing technologies (Illumin, 454, Solid, PacBio, among others) and can assist in the construction of synthetic genomes, such as for example. example in genetically hybrid or transgenic organisms.
- the proposed method can also be used for compression of genetic material, as the frequency ratios determined by the method allow to reduce the complexity of nucleotide sequences, so as to represent their content in a compressed manner, thus reducing the space required for their compression. storage.
- the first is the current inability of sequencing equipment to extract genetic material that corresponds to entire nucleotide sequences, which makes it necessary to introduce a fragmentation step in different regions of the molecule.
- Second is the inability of computational tools to accurately handle the high frequency of repetitive sequences present in genomes, which are often much larger than the average size of reads generated by the sequencing process.
- the computational assembly steps of genomes also require a correct and thorough configuration of the software that will be used, because the parameters they vary according to the type of organism, sequencing equipment used, and available computer resources.
- Other factors such as preparation and / or contamination of genetic material and lack of strict control in the purification process of the samples to be sequenced, also influence the final assembly of a genome.
- most computational tools try to address them, they use a conservative approach, thus reducing the amount of assembly errors and also the reconstruction level of the original molecule.
- the present invention is a method for verifying assembly errors in genomes of sequenced or synthetically produced organisms that use frequency ratios between nucleotide sequence fragments of a genome.
- the method can be used to check for assembly errors in a genome that has been reconstructed from nucleotide sequence fragments or has been synthetically constructed. Its use has advantages related to the possibility of obtaining sequences that are closer to the original sequence when extracted from any organism of nature.
- synthetic molecules such as synthetic genomes
- the method can be used to verify if their construction was performed in such a way as to move closer to a natural genome, thus suggesting rearranging it to be more biologically efficient.
- Another advantage of this method is that it allows nucleotide sequences to be compressed, making their transfer faster and reducing the space required for their storage.
- Figure 1 shows an application of the method of the present invention based on frequency ratios of the words F (w k ) and F (R (w k )) to identify assembly errors in genomes, for word sizes ranging from 2 to 8. Thirty-two genomes were considered and, as considered by the method, the unexpected deviation of at least 0.01 in the sum of the frequency of words in the genome of the bacterium Xylella fastidiosa 9a5c demonstrates the existence of assembly errors.
- Figure 2 shows an application of the method of the present invention, considering frequency ratios for size fragments ranging from 1 to 8. It is clearly noted that, regardless of the value of k, frequency ratios become invalid for genome of the bacterium Xylella fastidiosa 9a5c, assembled by Simpson et al. (2000).
- Figure 3 presents a comparison of genome assemblies of the species of X. fastidiosa ssp. using NCBI's Gmap software.
- Figure 4 shows a multiple alignment of genome assemblies of species of X. fastidiosa ssp.
- X. fastidiosa 9a5c genome blocks (Xf_9a5c_DNA.fas) at the bottom represent regions that have undergone inversions or translocations in relation to the other genomes that are practically the same structurally.
- the present invention is a method for verifying assembly errors in genomes of sequenced or synthetically produced organisms that use frequency ratios between nucleotide sequence fragments of a genome.
- the invention describes a set of oligonucleotide frequency parity rules that are observed in various genomes and can be applied to: check for assembly errors in reconstructed genomes from sequence fragments; evaluate the quality of synthetic genomes in the same way as is done in an assembled genome; compress nucleotide sequences to reduce the physical space they occupy in a computer system.
- the method considers the existence of two frequency ratios (Equations 1 and 2) of words that are invariant with each other. Such relationships take into account a sequence w of length k, and the following operators about w k : R (w k ) - reverse sequence of w k ; C (w k ) - complementary sequence of w k where complementary of A is T, and of C is G; and R (C (w k )) - reverse and complementary sequence of w k .
- the method for verifying assembly errors in genomes comprises the following steps:
- Figure 1 represents the application of the method based on frequency ratios of words F (w k ) and F (R (w k )) to identify assembly errors in genomes for word sizes ranging from 2 to 8.
- F (w k ) and F (R (w k )) are considered and, as considered by the method, the unexpected deviation of at least 0.01 in the sum of the frequency of words in the genome of the bacterium Xylella fastidiosa (9a5c) demonstrates the existence of assembly errors.
- the exceptions were the HIV RNA virus genome and the bacterium Xylella fastidiosa 9a5c. In them, the frequency ratios of the method presented variation greater than 0.01, and with great deviation compared to the other organisms ( Figure 1).
- Sequencing and subsequent assembly of genomes from the use of sequencing technologies has become increasingly common. Such technologies are based on the fragmentation of molecules that, with the use of sequence overlapping computational tools, are reconstructed. However, several factors ranging from the high frequency of repetitive sequences in the genomes as well as the generation of artifacts (contamination or poor data quality) make the assembly process quite complex. Despite the importance of using methods to verify the final quality of a genome, whether assembled or even synthetically constructed, there are still no methods that are based on frequency relationships of fragments of a genome. The present work describes a new method that can be used for the validation step of a genome. In the method, a genome is fragmented into fixed length sequences.
- Yamagishi MEB, Hirai RH Grammar of Biology Chargaffs: New Fractal-like Rules. arXiv; 201 1.
Landscapes
- Life Sciences & Earth Sciences (AREA)
- Physics & Mathematics (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Health & Medical Sciences (AREA)
- Engineering & Computer Science (AREA)
- Biotechnology (AREA)
- Medical Informatics (AREA)
- Biophysics (AREA)
- Theoretical Computer Science (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Bioinformatics & Computational Biology (AREA)
- Chemical & Material Sciences (AREA)
- Evolutionary Biology (AREA)
- General Health & Medical Sciences (AREA)
- Analytical Chemistry (AREA)
- Molecular Biology (AREA)
- Genetics & Genomics (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
La présente invention concerne un procédé de vérification d'erreurs de montage dans des génomes d'organismes séquencés ou produits de manière synthétique, faisant intervenir des rapports de fréquence entre fragments de séquences de nucléotides d'un génome. Le procédé trouve des applications dans la vérification d'erreurs dans un génome monté à partir de fragments provenant des différents types de technologies de séquençage et auxiliairement dans la construction de génomes synthétiques. Il peut être utilisé pour la compression de données de matériel génétique, les rapports de fréquence déterminés par le procédé permettant de réduire la complexité de séquences de nucléotides, de manière à représenter leur contenu de manière compressé, ce qui permet de réduire l'espace nécessaire pour leur stockage.
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| BRBR1020120310961 | 2012-12-05 | ||
| BR102012031096A BR102012031096B1 (pt) | 2012-12-05 | 2012-12-05 | método e uso para verificação de erros de montagem em genomas |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2014085891A1 true WO2014085891A1 (fr) | 2014-06-12 |
Family
ID=50882688
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/BR2013/000543 Ceased WO2014085891A1 (fr) | 2012-12-05 | 2013-12-03 | Procédé et utilisation pour vérification d'erreurs de montage dans des génomes |
Country Status (2)
| Country | Link |
|---|---|
| BR (1) | BR102012031096B1 (fr) |
| WO (1) | WO2014085891A1 (fr) |
Citations (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2001063543A2 (fr) * | 2000-02-22 | 2001-08-30 | Pe Corporation (Ny) | Procede et systeme d'assemblage d'un genome entier au moyen d'un ensemble de donnees prises au hasard |
| WO2008098014A2 (fr) * | 2007-02-05 | 2008-08-14 | Applied Biosystems, Llc | Système et procédé pour identification d'insertion-délétion en utilisant un séquençage à lecture courte |
-
2012
- 2012-12-05 BR BR102012031096A patent/BR102012031096B1/pt active IP Right Grant
-
2013
- 2013-12-03 WO PCT/BR2013/000543 patent/WO2014085891A1/fr not_active Ceased
Patent Citations (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2001063543A2 (fr) * | 2000-02-22 | 2001-08-30 | Pe Corporation (Ny) | Procede et systeme d'assemblage d'un genome entier au moyen d'un ensemble de donnees prises au hasard |
| WO2008098014A2 (fr) * | 2007-02-05 | 2008-08-14 | Applied Biosystems, Llc | Système et procédé pour identification d'insertion-délétion en utilisant un séquençage à lecture courte |
Non-Patent Citations (2)
| Title |
|---|
| CHEN T ET AL.: "Trie-Based Data Structures for Sequence Assembly", THE EIGHTH SYMPOSIUM ON COMBINATORIAL PATTERN MATCHING, 1997, 11 June 1997 (1997-06-11), pages 1 - 17 * |
| ISTVANICK W ET AL.: "Dynamic methods for fragment assembly in large scale genome sequencing projects", SYSTEM SCIENCES, 1993, PROCEEDING OF THE, TWENTY-SIXTH HAWAII INTERNATIONAL CONFERENCE ON WAITEA, HI, USA, 5 January 1993 (1993-01-05) - 8 January 1993 (1993-01-08), LOS ALAMITOS, CA , USA , IEEE , US,A, pages 534 - 543 * |
Also Published As
| Publication number | Publication date |
|---|---|
| BR102012031096A2 (pt) | 2014-09-16 |
| BR102012031096B1 (pt) | 2019-10-22 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| Jin et al. | GetOrganelle: a fast and versatile toolkit for accurate de novo assembly of organelle genomes | |
| Liu et al. | Ancient and modern genomes unravel the evolutionary history of the rhinoceros family | |
| Nauheimer et al. | HybPhaser: A workflow for the detection and phasing of hybrids in target capture data sets | |
| Irisarri et al. | Phylotranscriptomic consolidation of the jawed vertebrate timetree | |
| Folk et al. | A protocol for targeted enrichment of intron‐containing sequence markers for recent radiations: A phylogenomic example from Heuchera (Saxifragaceae) | |
| Straub et al. | Navigating the tip of the genomic iceberg: Next‐generation sequencing for plant systematics | |
| Chorlton | Ten common issues with reference sequence databases and how to mitigate them | |
| Tang et al. | Unraveling ancient hexaploidy through multiply-aligned angiosperm gene maps | |
| Xin et al. | Accelerating read mapping with FastHASH | |
| Wolf | Principles of transcriptome analysis and gene expression quantification: an RNA‐seq tutorial | |
| Ripma et al. | Geneious! Simplified genome skimming methods for phylogenetic systematic studies: A case study in Oreocarya (Boraginaceae) | |
| Soorni et al. | Organelle_PBA, a pipeline for assembling chloroplast and mitochondrial genomes from PacBio DNA sequencing data | |
| Soto Gomez et al. | A customized nuclear target enrichment approach for developing a phylogenomic baseline for Dioscorea yams (Dioscoreaceae) | |
| Qu et al. | Multiple measures could alleviate long-branch attraction in phylogenomic reconstruction of Cupressoideae (Cupressaceae) | |
| Hirsch et al. | Genomic limitations to RNA sequencing expression profiling | |
| JP2016506733A5 (fr) | ||
| Morrison | A framework for phylogenetic sequence alignment | |
| Hearn et al. | Likelihood‐based inference of population history from low‐coverage de novo genome assemblies | |
| Bzikadze et al. | UniAligner: a parameter-free framework for fast sequence alignment | |
| Shi et al. | MSOAR 2.0: Incorporating tandem duplications into ortholog assignment based on genome rearrangement | |
| Sutton et al. | Optimizing experimental design for genome sequencing and assembly with Oxford Nanopore Technologies | |
| Straub et al. | Enabling evolutionary studies at multiple scales in Apocynaceae through Hyb‐Seq | |
| Zhai et al. | Complete chloroplast genome sequencing and comparative analysis reveals changes to the chloroplast genome after allopolyploidization in Cucumis | |
| WO2014028771A1 (fr) | Assembleur de génome itératif | |
| WO2014085891A1 (fr) | Procédé et utilisation pour vérification d'erreurs de montage dans des génomes |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 13860331 Country of ref document: EP Kind code of ref document: A1 |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| 122 | Ep: pct application non-entry in european phase |
Ref document number: 13860331 Country of ref document: EP Kind code of ref document: A1 |