WO2010087740A1 - Procédé permettant d'améliorer la précision de détermination de la séquence de résidus d'acides aminés d'un biopolymère sur la base de données d'analyse par spectrométrie de masse, système informatique - Google Patents
Procédé permettant d'améliorer la précision de détermination de la séquence de résidus d'acides aminés d'un biopolymère sur la base de données d'analyse par spectrométrie de masse, système informatique Download PDFInfo
- Publication number
- WO2010087740A1 WO2010087740A1 PCT/RU2010/000038 RU2010000038W WO2010087740A1 WO 2010087740 A1 WO2010087740 A1 WO 2010087740A1 RU 2010000038 W RU2010000038 W RU 2010000038W WO 2010087740 A1 WO2010087740 A1 WO 2010087740A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- database
- algorithm
- acid residues
- sequence
- mass spectrometric
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B30/00—ICT specially adapted for sequence analysis involving nucleotides or amino acids
Definitions
- the present invention relates to methods of computer processing of mass spectrometric data, aimed at identifying the primary structure of biopolymers, including proteins and peptides.
- a biopolymer is considered to be a sequence of amino acid residues encoded in the genome containing at least one peptide bond and capable of containing chemical modifications of the residues, including non-protein components, such as lipids, hydrocarbons, other organic and inorganic elements, for example metals.
- the sequence of amino acid residues is characterized by variability due to the following molecular biological processes: alternative splicing, insertions, deletions and substitutions of single amino acid residues.
- the last three categories of microvariability of the structure of protein biopolymers are denoted by the abbreviation SAP (S ⁇ pgl Canal ⁇ m ⁇ 1958 While Congress Prop ⁇ Proprion Struktur propose forhism).
- SAP abbreviation SAP
- the set of individual characteristics of the body's proteins forms its proteotype. To determine the proteotype (proteotyping), a way is needed to identify microheterogeneous differences in the primary structures of proteins.
- the identification of the primary structure of biopolymers is based on mass spectrometric data.
- mass spectrometric data means information about the mass or mass-charge characteristics of complete proteins, peptide fragments of their hydrolysis or fragments of induced decay of biopolymer ions.
- their primary structure may undergo specific amino acid residues or non-specific modifications, i.e., modifications that are independent of the type of residue in the primary structure of the biopolymer.
- Mass spectrometric data processing is performed using bioinformation algorithms. Most of them, for example, the Movse algorithm [1], are based on a comparison of experimentally obtained mass spectrometric data with calculated estimates based on genomic databases (GDB).
- GDB genomic databases
- “Genome data bases” are a collection of information resources containing information records about the sequences of amino acid residues in proteins obtained by decoding genomic information and (or) deciphering the expressed parts of the genome. The entry in the GDB includes a unique identifier of the protein and the corresponding sequence of amino acid residues in letter coding.
- the identification algorithm calculates an estimate of statistical reliability, which allows one to judge the probability of the correct identification of the protein taking into account the specified mass spectrometric data and a specific genomic database.
- a protein is considered identified if the assessment of statistical significance exceeds an arbitrarily set threshold value.
- the publication [3] describes a method for improving the accuracy of determining the amino acid sequence of peptides — protein proteolysis products — according to mass spectrometric analysis based on the use of extended HBD.
- GBD is expanded by the inclusion of amino acid sequences of proteins containing annotated various SAP sources and post-translational modifications (PTMs).
- PTMs post-translational modifications
- the proposed solution to this problem in accordance with the present invention is to re-apply mass spectrometric identification algorithms after entering new records in the GDB, or to create GDB from new records that reflect AC and SAP information based on the results of protein identification by mass spectrometric data.
- the present invention relates to a method for improving the accuracy of determining the sequence of amino acid residues according to mass spectrometric analysis, which involves the use of at least one biopolymer identification algorithm based on a comparison of mass spectrometric data with a genomic database, the algorithm being applied sequentially at least twice.
- the present invention provides for the primary identification of proteins by the AI algorithm, adding primary structure variants to the GDB containing AC and SAP products only for identified proteins, and then re-identifying on the enriched database or the same algorithm AI, or another AI algorithm.
- the present invention provides for the primary identification of proteins by the AI algorithm, the creation of GDB containing the primary structures of the AC and SAP products of only previously identified proteins, and then re-identification on the enriched database or the same AI algorithm , or another AI algorithm.
- a distinctive advantage of the present invention from similar methods involving the use of a combination of bioinformation algorithms to increase the level of statistical reliability of identification is that identification algorithms are applied sequentially, while the previous algorithm (AI) is coupled with the subsequent one (AI 1 ) by making changes to the GDB.
- AI previous algorithm
- AI 1 subsequent one
- Another distinctive advantage of the present invention from publication [3] is that before each repeated application of the algorithm, changes are made to the GDB taking into account the results of previous application of the algorithm (AI). This allows you to significantly increase the search efficiency (due to the fact that each subsequent identification is more precise in relation to the previous one) and its reliability (due to a sharp decrease in the probability of obtaining false positive results).
- the present invention also relates to a computer system, the operation of which is based on the method disclosed above.
- the MSD mass spectrometric data are input to the system. These data are used to identify biopolymers by the GDB genomic database using the AI algorithm.
- Identification results (RI) are a list of protein identifiers for which the assessment of identification reliability exceeds a threshold value set by the user. For proteins in the composition of the RI, based on the information contained in the external sources of information of the SRI about known or proposed AC products and SAP variants, primary structure variants are generated.
- FIG. 1 is a diagram of a computing system according to the present invention. The following notation is used in this scheme: MSD — initial mass spectrometric data received at the system input; GDB - source genomic database;
- AI and AI '- mass spectrometric identification algorithms and it is assumed that AI is identical to AI';
- RI - primary identification results which are a list of protein identifiers
- GBD is a modified genomic database, which includes variants of proteins contained in external sources of information (VII).
- Example 1 Identification of a polymorphic variant of Trypsin-1 protein [Presursl (Uprot P07477) by the method according to the present invention
- Mass spectrometric data from a study of a human stem cell sample were downloaded from the Paris system (http://www.ebi.ac.uk/pride/).
- the primary mass spectrometric identification of the proteins of the loaded mass spectra was performed using the Mascot program using the N ⁇ I-pr database.
- 13 polymorphic variants were obtained from the Uproot database.
- a new database was formed by adding the list of polymorphic variants of the Tgypsin-1 protein to the NBCI-pr database. Repeated mass spectrometric identification of proteins was performed using the Mascot program using a new database.
Landscapes
- Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Chemical & Material Sciences (AREA)
- Analytical Chemistry (AREA)
- Biophysics (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Health & Medical Sciences (AREA)
- Engineering & Computer Science (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Biotechnology (AREA)
- Evolutionary Biology (AREA)
- General Health & Medical Sciences (AREA)
- Medical Informatics (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Theoretical Computer Science (AREA)
- Other Investigation Or Analysis Of Materials By Electrical Means (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
La présente invention concerne des procédés bioinformatiques permettant d'identifier des protéines et des peptides en fonction de bases de données génomiques et permet d'améliorer la précision d'identification. Ce procédé comprend l'utilisation réitérative d'algorithmes de comparaison de spectres de masse et d'une base de données génomique après que la base de données a été complétée par de nouvelles entrées, soit après la suppression d'entrées de la base de données, soit après le remplacement de la base de données par une base de données composée de nouvelles entrées. Les entrées supplémentaires sont générées par introduction des modifications correspondant aux remplacements, suppressions, insertions ou modifications d'un ou plusieurs résidus d'acides aminés dans les séquences des biopolymères identifiés. Cette invention concerne également un système informatique dont le fonctionnement repose sur le procédé susmentionné.
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| RU2009103057/15A RU2408011C2 (ru) | 2009-01-30 | 2009-01-30 | Способ повышения точности определения последовательности аминокислотных остатков биополимера на основе данных масс-спектрометрического анализа, вычислительная система |
| RU2009103057 | 2009-01-30 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2010087740A1 true WO2010087740A1 (fr) | 2010-08-05 |
Family
ID=42395817
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/RU2010/000038 Ceased WO2010087740A1 (fr) | 2009-01-30 | 2010-02-01 | Procédé permettant d'améliorer la précision de détermination de la séquence de résidus d'acides aminés d'un biopolymère sur la base de données d'analyse par spectrométrie de masse, système informatique |
Country Status (2)
| Country | Link |
|---|---|
| RU (1) | RU2408011C2 (fr) |
| WO (1) | WO2010087740A1 (fr) |
Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20060188887A1 (en) * | 2003-05-23 | 2006-08-24 | Protagen Ag | Method and system for elucidating the primary structure of biopolymers |
| WO2007112289A2 (fr) * | 2006-03-23 | 2007-10-04 | The Regents Of The University Of California | Procédé permettant d'identifier et de séquencer des protéines |
| WO2008151140A2 (fr) * | 2007-05-31 | 2008-12-11 | The Regents Of The University Of California | Procédé pour identifier des peptides en utilisant des spectres de masse en tandem en déterminant dynamiquement le nombre de reconstructions de peptide requis |
-
2009
- 2009-01-30 RU RU2009103057/15A patent/RU2408011C2/ru not_active IP Right Cessation
-
2010
- 2010-02-01 WO PCT/RU2010/000038 patent/WO2010087740A1/fr not_active Ceased
Patent Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20060188887A1 (en) * | 2003-05-23 | 2006-08-24 | Protagen Ag | Method and system for elucidating the primary structure of biopolymers |
| WO2007112289A2 (fr) * | 2006-03-23 | 2007-10-04 | The Regents Of The University Of California | Procédé permettant d'identifier et de séquencer des protéines |
| WO2008151140A2 (fr) * | 2007-05-31 | 2008-12-11 | The Regents Of The University Of California | Procédé pour identifier des peptides en utilisant des spectres de masse en tandem en déterminant dynamiquement le nombre de reconstructions de peptide requis |
Non-Patent Citations (2)
| Title |
|---|
| ALVES GELOI ET AL.: "RAId-DdS: mass-spectrometry based peptide identification web server with knowledge integration", BMC GENOMICS, vol. 9, 2008, pages 505, Retrieved from the Internet <URL:http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2605478/pd1471-2164-9-505.pdf> * |
| EDWARDS NATHAN J ET AL.: "Novel peptide identification from tandem mass spectra using ESTs and sequence database compression", MOLECULAR SYSTEMS BIOLOGY., vol. 3, no. 102, 2007 * |
Also Published As
| Publication number | Publication date |
|---|---|
| RU2408011C2 (ru) | 2010-12-27 |
| RU2009103057A (ru) | 2010-08-10 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US9354236B2 (en) | Method for identifying peptides and proteins from mass spectrometry data | |
| Remmert et al. | HHblits: lightning-fast iterative protein sequence searching by HMM-HMM alignment | |
| US6393367B1 (en) | Method for evaluating the quality of comparisons between experimental and theoretical mass data | |
| Howbert et al. | Computing exact p-values for a cross-correlation shotgun proteomics score function | |
| US20200243164A1 (en) | Systems and methods for patient-specific identification of neoantigens by de novo peptide sequencing for personalized immunotherapy | |
| WO2009143212A1 (fr) | Système informatique et procédé assisté par ordinateur pour l'alignement et l'analyse de séquences d'acide nucléique | |
| Kulp et al. | Integrating database homology in a probabilistic gene structure model | |
| WO2010056131A1 (fr) | Procédé et système d'analyse de séquences de données | |
| JP7218019B2 (ja) | 質量スペクトルからの存在物の同定の方法 | |
| JP6489224B2 (ja) | ペプチド帰属方法及びペプチド帰属システム | |
| US20130144585A1 (en) | Apparatus and method for idendificaton of protein modification | |
| RU2408011C2 (ru) | Способ повышения точности определения последовательности аминокислотных остатков биополимера на основе данных масс-спектрометрического анализа, вычислительная система | |
| JP5610347B2 (ja) | リボ核酸同定装置、リボ核酸同定方法、プログラムおよびリボ核酸同定システム | |
| KR20200102182A (ko) | 염기 서열 클러스터링 기법을 이용한 생물종 분류 방법 및 장치 | |
| Martens | Bioinformatics challenges in mass spectrometry-driven proteomics | |
| EP1272657A2 (fr) | Procede et systeme d'identification de micro-organismes par recherche dans une base de donnees de proteomes fondee sur la spectrometrie de masse | |
| US20250054579A1 (en) | Analysis and determination of polypeptide sequences | |
| US20240153587A1 (en) | Workflow to assign putative source to de novo peptide sequence | |
| WO2001096861A1 (fr) | Systeme d'identification de molecule | |
| Copeland | Computational Analysis of High-replicate RNA-seq Data in Saccharomyces Cerevisiae: Searching for New Genomic Features | |
| WO2003087805A2 (fr) | Procede permettant de calculer de maniere efficace la masse de peptides modifies en vue de l'identification par recherche de base de donnees et spectrometrie de masse | |
| CN119207548A (zh) | 二级质谱鉴定序列的评估优化方法及装置 | |
| WO2025137775A1 (fr) | Procédé de génération et de criblage de bibliothèques d'aptamères peptidiques synthétiques | |
| Goldenkova-Pavlova et al. | Experimental and Computational Methodology to the Design and Construction of Translatomic Maps of Plants | |
| JP2008305102A (ja) | データベース検索装置および方法 |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 10736091 Country of ref document: EP Kind code of ref document: A1 |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| 122 | Ep: pct application non-entry in european phase |
Ref document number: 10736091 Country of ref document: EP Kind code of ref document: A1 |