[go: up one dir, main page]

WO2010087740A1 - Procédé permettant d'améliorer la précision de détermination de la séquence de résidus d'acides aminés d'un biopolymère sur la base de données d'analyse par spectrométrie de masse, système informatique - Google Patents

Procédé permettant d'améliorer la précision de détermination de la séquence de résidus d'acides aminés d'un biopolymère sur la base de données d'analyse par spectrométrie de masse, système informatique Download PDF

Info

Publication number
WO2010087740A1
WO2010087740A1 PCT/RU2010/000038 RU2010000038W WO2010087740A1 WO 2010087740 A1 WO2010087740 A1 WO 2010087740A1 RU 2010000038 W RU2010000038 W RU 2010000038W WO 2010087740 A1 WO2010087740 A1 WO 2010087740A1
Authority
WO
WIPO (PCT)
Prior art keywords
database
algorithm
acid residues
sequence
mass spectrometric
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/RU2010/000038
Other languages
English (en)
Russian (ru)
Inventor
Александр Иванович АРЧАКОВ
Виктор Гаврилович ЗГОДА
Андрей Валерьевич ЛИСИЦА
Сергей Александрович МОШКОВСКИЙ
Алексей Леонидович ЧEPHOБPOBKИH
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
OBSHCHESTVO S OGRANICHENNOI OTVETSTVENNOSTIU "INTERLAB"
Original Assignee
OBSHCHESTVO S OGRANICHENNOI OTVETSTVENNOSTIU "INTERLAB"
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by OBSHCHESTVO S OGRANICHENNOI OTVETSTVENNOSTIU "INTERLAB" filed Critical OBSHCHESTVO S OGRANICHENNOI OTVETSTVENNOSTIU "INTERLAB"
Publication of WO2010087740A1 publication Critical patent/WO2010087740A1/fr
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids

Definitions

  • the present invention relates to methods of computer processing of mass spectrometric data, aimed at identifying the primary structure of biopolymers, including proteins and peptides.
  • a biopolymer is considered to be a sequence of amino acid residues encoded in the genome containing at least one peptide bond and capable of containing chemical modifications of the residues, including non-protein components, such as lipids, hydrocarbons, other organic and inorganic elements, for example metals.
  • the sequence of amino acid residues is characterized by variability due to the following molecular biological processes: alternative splicing, insertions, deletions and substitutions of single amino acid residues.
  • the last three categories of microvariability of the structure of protein biopolymers are denoted by the abbreviation SAP (S ⁇ pgl Canal ⁇ m ⁇ 1958 While Congress Prop ⁇ Proprion Struktur propose forhism).
  • SAP abbreviation SAP
  • the set of individual characteristics of the body's proteins forms its proteotype. To determine the proteotype (proteotyping), a way is needed to identify microheterogeneous differences in the primary structures of proteins.
  • the identification of the primary structure of biopolymers is based on mass spectrometric data.
  • mass spectrometric data means information about the mass or mass-charge characteristics of complete proteins, peptide fragments of their hydrolysis or fragments of induced decay of biopolymer ions.
  • their primary structure may undergo specific amino acid residues or non-specific modifications, i.e., modifications that are independent of the type of residue in the primary structure of the biopolymer.
  • Mass spectrometric data processing is performed using bioinformation algorithms. Most of them, for example, the Movse algorithm [1], are based on a comparison of experimentally obtained mass spectrometric data with calculated estimates based on genomic databases (GDB).
  • GDB genomic databases
  • “Genome data bases” are a collection of information resources containing information records about the sequences of amino acid residues in proteins obtained by decoding genomic information and (or) deciphering the expressed parts of the genome. The entry in the GDB includes a unique identifier of the protein and the corresponding sequence of amino acid residues in letter coding.
  • the identification algorithm calculates an estimate of statistical reliability, which allows one to judge the probability of the correct identification of the protein taking into account the specified mass spectrometric data and a specific genomic database.
  • a protein is considered identified if the assessment of statistical significance exceeds an arbitrarily set threshold value.
  • the publication [3] describes a method for improving the accuracy of determining the amino acid sequence of peptides — protein proteolysis products — according to mass spectrometric analysis based on the use of extended HBD.
  • GBD is expanded by the inclusion of amino acid sequences of proteins containing annotated various SAP sources and post-translational modifications (PTMs).
  • PTMs post-translational modifications
  • the proposed solution to this problem in accordance with the present invention is to re-apply mass spectrometric identification algorithms after entering new records in the GDB, or to create GDB from new records that reflect AC and SAP information based on the results of protein identification by mass spectrometric data.
  • the present invention relates to a method for improving the accuracy of determining the sequence of amino acid residues according to mass spectrometric analysis, which involves the use of at least one biopolymer identification algorithm based on a comparison of mass spectrometric data with a genomic database, the algorithm being applied sequentially at least twice.
  • the present invention provides for the primary identification of proteins by the AI algorithm, adding primary structure variants to the GDB containing AC and SAP products only for identified proteins, and then re-identifying on the enriched database or the same algorithm AI, or another AI algorithm.
  • the present invention provides for the primary identification of proteins by the AI algorithm, the creation of GDB containing the primary structures of the AC and SAP products of only previously identified proteins, and then re-identification on the enriched database or the same AI algorithm , or another AI algorithm.
  • a distinctive advantage of the present invention from similar methods involving the use of a combination of bioinformation algorithms to increase the level of statistical reliability of identification is that identification algorithms are applied sequentially, while the previous algorithm (AI) is coupled with the subsequent one (AI 1 ) by making changes to the GDB.
  • AI previous algorithm
  • AI 1 subsequent one
  • Another distinctive advantage of the present invention from publication [3] is that before each repeated application of the algorithm, changes are made to the GDB taking into account the results of previous application of the algorithm (AI). This allows you to significantly increase the search efficiency (due to the fact that each subsequent identification is more precise in relation to the previous one) and its reliability (due to a sharp decrease in the probability of obtaining false positive results).
  • the present invention also relates to a computer system, the operation of which is based on the method disclosed above.
  • the MSD mass spectrometric data are input to the system. These data are used to identify biopolymers by the GDB genomic database using the AI algorithm.
  • Identification results (RI) are a list of protein identifiers for which the assessment of identification reliability exceeds a threshold value set by the user. For proteins in the composition of the RI, based on the information contained in the external sources of information of the SRI about known or proposed AC products and SAP variants, primary structure variants are generated.
  • FIG. 1 is a diagram of a computing system according to the present invention. The following notation is used in this scheme: MSD — initial mass spectrometric data received at the system input; GDB - source genomic database;
  • AI and AI '- mass spectrometric identification algorithms and it is assumed that AI is identical to AI';
  • RI - primary identification results which are a list of protein identifiers
  • GBD is a modified genomic database, which includes variants of proteins contained in external sources of information (VII).
  • Example 1 Identification of a polymorphic variant of Trypsin-1 protein [Presursl (Uprot P07477) by the method according to the present invention
  • Mass spectrometric data from a study of a human stem cell sample were downloaded from the Paris system (http://www.ebi.ac.uk/pride/).
  • the primary mass spectrometric identification of the proteins of the loaded mass spectra was performed using the Mascot program using the N ⁇ I-pr database.
  • 13 polymorphic variants were obtained from the Uproot database.
  • a new database was formed by adding the list of polymorphic variants of the Tgypsin-1 protein to the NBCI-pr database. Repeated mass spectrometric identification of proteins was performed using the Mascot program using a new database.

Landscapes

  • Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Chemical & Material Sciences (AREA)
  • Analytical Chemistry (AREA)
  • Biophysics (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Biotechnology (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Theoretical Computer Science (AREA)
  • Other Investigation Or Analysis Of Materials By Electrical Means (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

La présente invention concerne des procédés bioinformatiques permettant d'identifier des protéines et des peptides en fonction de bases de données génomiques et permet d'améliorer la précision d'identification. Ce procédé comprend l'utilisation réitérative d'algorithmes de comparaison de spectres de masse et d'une base de données génomique après que la base de données a été complétée par de nouvelles entrées, soit après la suppression d'entrées de la base de données, soit après le remplacement de la base de données par une base de données composée de nouvelles entrées. Les entrées supplémentaires sont générées par introduction des modifications correspondant aux remplacements, suppressions, insertions ou modifications d'un ou plusieurs résidus d'acides aminés dans les séquences des biopolymères identifiés. Cette invention concerne également un système informatique dont le fonctionnement repose sur le procédé susmentionné.
PCT/RU2010/000038 2009-01-30 2010-02-01 Procédé permettant d'améliorer la précision de détermination de la séquence de résidus d'acides aminés d'un biopolymère sur la base de données d'analyse par spectrométrie de masse, système informatique Ceased WO2010087740A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
RU2009103057/15A RU2408011C2 (ru) 2009-01-30 2009-01-30 Способ повышения точности определения последовательности аминокислотных остатков биополимера на основе данных масс-спектрометрического анализа, вычислительная система
RU2009103057 2009-01-30

Publications (1)

Publication Number Publication Date
WO2010087740A1 true WO2010087740A1 (fr) 2010-08-05

Family

ID=42395817

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/RU2010/000038 Ceased WO2010087740A1 (fr) 2009-01-30 2010-02-01 Procédé permettant d'améliorer la précision de détermination de la séquence de résidus d'acides aminés d'un biopolymère sur la base de données d'analyse par spectrométrie de masse, système informatique

Country Status (2)

Country Link
RU (1) RU2408011C2 (fr)
WO (1) WO2010087740A1 (fr)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060188887A1 (en) * 2003-05-23 2006-08-24 Protagen Ag Method and system for elucidating the primary structure of biopolymers
WO2007112289A2 (fr) * 2006-03-23 2007-10-04 The Regents Of The University Of California Procédé permettant d'identifier et de séquencer des protéines
WO2008151140A2 (fr) * 2007-05-31 2008-12-11 The Regents Of The University Of California Procédé pour identifier des peptides en utilisant des spectres de masse en tandem en déterminant dynamiquement le nombre de reconstructions de peptide requis

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060188887A1 (en) * 2003-05-23 2006-08-24 Protagen Ag Method and system for elucidating the primary structure of biopolymers
WO2007112289A2 (fr) * 2006-03-23 2007-10-04 The Regents Of The University Of California Procédé permettant d'identifier et de séquencer des protéines
WO2008151140A2 (fr) * 2007-05-31 2008-12-11 The Regents Of The University Of California Procédé pour identifier des peptides en utilisant des spectres de masse en tandem en déterminant dynamiquement le nombre de reconstructions de peptide requis

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
ALVES GELOI ET AL.: "RAId-DdS: mass-spectrometry based peptide identification web server with knowledge integration", BMC GENOMICS, vol. 9, 2008, pages 505, Retrieved from the Internet <URL:http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2605478/pd1471-2164-9-505.pdf> *
EDWARDS NATHAN J ET AL.: "Novel peptide identification from tandem mass spectra using ESTs and sequence database compression", MOLECULAR SYSTEMS BIOLOGY., vol. 3, no. 102, 2007 *

Also Published As

Publication number Publication date
RU2408011C2 (ru) 2010-12-27
RU2009103057A (ru) 2010-08-10

Similar Documents

Publication Publication Date Title
US9354236B2 (en) Method for identifying peptides and proteins from mass spectrometry data
Remmert et al. HHblits: lightning-fast iterative protein sequence searching by HMM-HMM alignment
US6393367B1 (en) Method for evaluating the quality of comparisons between experimental and theoretical mass data
Howbert et al. Computing exact p-values for a cross-correlation shotgun proteomics score function
US20200243164A1 (en) Systems and methods for patient-specific identification of neoantigens by de novo peptide sequencing for personalized immunotherapy
WO2009143212A1 (fr) Système informatique et procédé assisté par ordinateur pour l&#39;alignement et l&#39;analyse de séquences d&#39;acide nucléique
Kulp et al. Integrating database homology in a probabilistic gene structure model
WO2010056131A1 (fr) Procédé et système d&#39;analyse de séquences de données
JP7218019B2 (ja) 質量スペクトルからの存在物の同定の方法
JP6489224B2 (ja) ペプチド帰属方法及びペプチド帰属システム
US20130144585A1 (en) Apparatus and method for idendificaton of protein modification
RU2408011C2 (ru) Способ повышения точности определения последовательности аминокислотных остатков биополимера на основе данных масс-спектрометрического анализа, вычислительная система
JP5610347B2 (ja) リボ核酸同定装置、リボ核酸同定方法、プログラムおよびリボ核酸同定システム
KR20200102182A (ko) 염기 서열 클러스터링 기법을 이용한 생물종 분류 방법 및 장치
Martens Bioinformatics challenges in mass spectrometry-driven proteomics
EP1272657A2 (fr) Procede et systeme d&#39;identification de micro-organismes par recherche dans une base de donnees de proteomes fondee sur la spectrometrie de masse
US20250054579A1 (en) Analysis and determination of polypeptide sequences
US20240153587A1 (en) Workflow to assign putative source to de novo peptide sequence
WO2001096861A1 (fr) Systeme d&#39;identification de molecule
Copeland Computational Analysis of High-replicate RNA-seq Data in Saccharomyces Cerevisiae: Searching for New Genomic Features
WO2003087805A2 (fr) Procede permettant de calculer de maniere efficace la masse de peptides modifies en vue de l&#39;identification par recherche de base de donnees et spectrometrie de masse
CN119207548A (zh) 二级质谱鉴定序列的评估优化方法及装置
WO2025137775A1 (fr) Procédé de génération et de criblage de bibliothèques d&#39;aptamères peptidiques synthétiques
Goldenkova-Pavlova et al. Experimental and Computational Methodology to the Design and Construction of Translatomic Maps of Plants
JP2008305102A (ja) データベース検索装置および方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 10736091

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 10736091

Country of ref document: EP

Kind code of ref document: A1