WO2003019417A1 - System and method for proteome analysis and data management - Google Patents
System and method for proteome analysis and data management Download PDFInfo
- Publication number
- WO2003019417A1 WO2003019417A1 PCT/KR2002/001624 KR0201624W WO03019417A1 WO 2003019417 A1 WO2003019417 A1 WO 2003019417A1 KR 0201624 W KR0201624 W KR 0201624W WO 03019417 A1 WO03019417 A1 WO 03019417A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- proteome
- experimental
- protein
- data
- search
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B50/00—ICT programming tools or database systems specially adapted for bioinformatics
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B20/00—ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
Definitions
- the present invention relates to a proteome analysis system, and more particularly, to a system and method for storing, searching for, and analyzing proteome-related information collected through 2-dimensional electrophoresis, which is a common research technique applied in the biological field.
- Protein is a compound word of "protein” and "ome” used as a meaning of integrally indicating all kinds of proteins.
- single-cellular organisms have a consistent proteome pattern in a cell
- multi-cellular organisms have different proteome patterns in individual cells with an identical genome.
- the genome is expressed into different proteome patterns in particular cells or under particular conditions. Identifying the kinds of proteins in cells, degrees of protein expressions, any modification and its site in a cell, and the interaction of proteins is integrally referred to as "proteomics”.
- proteomics The identification of all kinds of proteins expressed in cells and of a network of the proteins through proteomic methods leads to people to a better understanding of the life phenomena, originated from genes and expressed by the proteins.
- 2D-PAGE 2-dimensional polyacrylamide gel electrophoresis
- 2D-PAGE includes processes of staining proteins of interest and an enzymatic cleavage using proteases, which are performed using an automated assay device and computer.
- MS mass spectrometry
- proteome data A huge amount of proteome information collected through such an assay process is stored in a medium for analysis by researchers.
- An effective analysis of proteome data requires a database storing the results of proteomic experiments and an integrated search using both the experimental database and a reference database.
- experimental proteome data, reference databases, and a variety of proteome analysis programs use different data formats. Thus, it was inconvenient for users to store, search for, and analyze data under different environments.
- the present invention provides a system and method for proteome analysis and data management, in which proteins are isolated and identified, and the results are organized on a project base in a co-research environment through network.
- the present invention provides an efficient integrated system and method for proteome analysis and data management, in which an experimental proteome database and a previously established reference database, which are physically separated from one another, are efficiently integrated in a client/server environment for data storage, search, and analysis.
- the present invention provides a system and method for proteome analysis and data management, in which desired proteome data can be searched for using keywords, 2-dimensional gel images, protein expression profiles, isoelectric point, molecular weight, peptide mass fingerprinting (PMF), and protein sequence information. Also, proteins can be characterized based on the searched data.
- the present invention provides a system and method for proteome analysis and data management, in which proteome data in an experimental database and a reference database can be exchanged and integrated with data from another system.
- a system for proteome analysis and data management comprising a first database, a second database, a proteome identification unit, a data management unit, an interface, a proteome search unit, and a proteome analysis unit.
- the first database stores a large amount of validated reference proteome data.
- the second database stores experimental proteome data obtained through experiments.
- the proteome identification unit identifies experimental proteome data using the reference proteome data stored in the first database.
- the data management unit controls an input and output of data to and from the first and second databases.
- the interface receives one of experimental proteome data and a search parameter input from a user.
- the proteome search unit searches for experimental proteome data throughout the second database corresponding to one of the experimental proteome data and the search parameter from a user, and extracts detailed information on the experimental proteome data identified through searching from the first proteome database.
- the proteome analysis unit analyzes the searched results from the proteome search unit to characterize the identified experimental proteome data.
- a method for establishing an experimental proteome database comprising: (a) inputting experimental proteome data; (b) searching throughout a first database storing a large number of validated reference proteome data for similar proteome data to the experimental proteome using PMF (peptide mass fingerprinting) data, a ratio of isoelectric point and molecular weight and protein sequence information amount the input experimental proteome data; (c) performing proteome identification based on the searched result; and (d) storing the experimental proteome data and the identified result in a second database.
- PMF protee mass fingerprinting
- a method for storing and editing experimental proteome data comprising: (a) determining whether to create a new project file; (b) if it is determined in step (a) to create a new project file, inputting project management information required to create the new project file; (c) determining whether to retrieve data of a previous project file for the new project file; (d) if it is determined in step (c) to retrieve data from the previous project file, loading the data of the previous project file that are common to the new project; (e) if it is determined in step (a) not to create a new project file, it is determined whether to edit a previous project file stored in the experimental proteome database; and (f) if it is determined in step (e) to edit a previous project file, selecting a project file to be edited among a number of previous project files stored in experimental proteome database and editing the data of the selected project file.
- a proteome analysis method comprising: (a) selecting a search method; (b) if a keyword search option is selected as the search method, inputting a keyword for searching; (c) searching a first database storing experimental proteome data and a second database storing a large amount of validated reference proteome data for proteome data corresponding to the input keyword; (d) if an image search option is selected as the search method, loading a 2-D gel image for searching; (e) designating a spot of protein on the 2-D gel image; (f) searching the first and second databases for proteome data corresponding to the position of the designated spot; (g) if an advanced search option is selected as the search method, searching the first and second databases for similar proteome data by using at least one of peptide mass fingerprinting (PMF) data, a ratio of isoelectric point and molecular weight, and protein sequence information among the proteome data as a search parameter; and (h) displaying the search result obtained in step (c), (f
- PMF peptide mass fingerprinting
- a similar protein search method comprising: (a) determining whether to use a 2-D gel image obtained through electrophoresis for a similar protein search; (b) if it is determined to use a 2-D gel image for a similar protein search, designating a spot of protein of interest on the 2-D gel image; (c) obtaining the experimental isoelectric point and molecular weight of the protein from the coordinate value of the spot; (d) if it is determined not to use a 2-D gel image for a similar protein search, directly inputting the experimental isoelectric point and molecular weight of a protein of interest, a search range, name of species, and a ratio of isoelectric point and molecular weight; (e) adjusting the ratio of the x-axis and y-axis of the 2-D gel image by the ratio of isoelectric point and molecular weight and calculating the Euclidian distance between the protein of interest and each identified protein stored in a reference proteome database using the experimental isoelectric point
- a protein expression ratio variation analysis method comprising: (a) defining at least two different experimental conditions and a protein expression variation range; (b) extracting quantitative protein information for the defined experimental conditions from a first database storing experimental proteome data obtained through experiments; (c) calculating protein expression ratio variations of the extracted quantitative protein information for the defined experimental conditions; and (d) screening proteins having a protein expression ratio variation within the defined protein expression variation range.
- a similarity expression pattern search method comprising: (a) selecting a protein of interest for a similar expression pattern search; (b) extracting quantitative protein profile information on a plurality of proteins from a first database storing experimental proteome data obtained through experiments; (c) calculating the Euclidian distance between the protein of interest and each of the proteins stored in the first database using the quantitative protein profile information of the protein of interest and the extracted quantitative protein profile information; and (d) sorting and outputting the proteins stored in the first database in order of increasing Euclidian distance.
- a method for hierarchically clustering proteins by similarity in protein expression profile comprising: (a) define an experimental condition for clustering; (b) extracting the quantitative protein information of proteins for the experimental condition from a first database storing experimental proteome data obtained through experiments; (c) calculating the Euclidian distance between all pairs of proteins using the extracted quantitative protein information; (d) hierarchically clustering the proteins using the calculated Euclidian distances, wherein a smaller Euclidian distance indicates a higher similarity in expression profile; and (e) displaying the clustered result.
- FIG. 1 is a block diagram of a system for proteome analysis and data management according to an embodiment of the present invention
- FIG. 2 systematically illustrates the functions of the system for proteome analysis and data management shown in FIG. 1 ;
- FIG. 3 illustrates the kinds of information stored and the correlation thereof in a reference proteome database and a experimental proteome database shown in FIG. 1 ;
- FIG. 4 is a flowchart illustrating a data storage/edition/deletion function performed in a data management unit shown in FIG. 1 ;
- FIGS. 5 through 7 show program execution windows for performing a data storage/edition/deletion function on a project basis according to the method illustrated in FIG. 4;
- FIG. 8 is a flowchart illustrating a method for identifying proteins in a proteome identification unit shown in FIG. 1 ;
- FIG. 9 shows an example of an initial search window for proteome search according to an embodiment of the present invention.
- FIG. 10 is a flowchart illustrating a proteome search function performed in a proteome search unit shown in FIG. 1 ;
- FIG. 1 1 shows a keyword search window opened upon selection of a keyword search option 310 shown in FIG. 9, and FIG. 12 shows a window displaying the results of the keyword search performed according to the procedure illustrated in FIG. 10;
- FIGS. 13 and 14 show windows displaying the results of an image search performed according to the procedure illustrated in FIG. 10 by selecting an image search option 320 shown in FIG. 9;
- FIG. 15 shows a pl/MW similarity search window opened upon selection of a pl/MW similarity search 342 shown in FIG. 9, and FIG. 16 shows a window displaying the results of the pl/MW similarity search performed according to the procedure illustrated in FIG. 10;
- FIG. 17 is a flowchart illustrating a pl/MW similarity search method according to an embodiment of the present invention.
- FIG. 18 is a diagram illustrating an Euclidean distance calculating method used in a pl/MW similarity search according to the present invention.
- FIG. 19 shows a PMF similarity search window opened upon selection of a PMF similarity search shown in FIG. 9, and FIG. 20 shows a window displaying the results of the PMF similarity search;
- FIG. 21 shows a sequence similarity search window opened upon selection of a sequence similarity search 346 shown in FIG. 9, and FIG. 22 shows a window displaying the results of the sequence similarity search;
- FIG. 23 illustrates a protein expression profile analysis function performed in a proteome analysis unit shown in FIG. 1 ;
- FIG. 24 is a flowchart illustrating a protein expression ratio variation analysis function shown in FIGS. 2 and 23;
- FIG. 25 shows a window for inputting data for the protein expression ratio variation analysis illustrated in FIG. 24 and displaying the results of the protein expression ratio variation analysis;
- FIG. 26 shows an example of quantitative protein information used for the protein expression ratio variation analysis as illustrated in FIG. 23;
- FIG. 27 is a flowchart illustrating a similar expression pattern search function illustrated in FIGS. 2 and 23;
- FIG. 28 is a window displaying the results of the similar expression pattern search performed according to the method illustrated as an embodiment of the present invention in FIG. 27;
- FIG. 29 is a flowchart illustrating a hierarchical clustering method using protein expression profiles illustrated in FIG. 2 and 23; and FIG. 30 is a window showing the results of the hierarchical clustering using protein expression profiles according to an embodiment of the present invention.
- FIG. 1 is a block diagram of a system for proteome analysis and data management according to an embodiment of the present invention.
- a proteome analysis and data management system according to the present invention includes at least one clients 1001 , 100b, ..., and 100z connected to a network 10 and a proteome analysis and management server 200 providing the clients 100a, 100b, ..., and 10Oz with proteome analysis services.
- the proteome analysis and data management server 200 includes an interface 210, a proteome identification unit 220, a proteome search unit 230, a proteome analysis unit 240, a reference proteome database (DB) 250, an experimental proteome DB 260, a data management unit 270, a flat file conversion unit 280, and an extensible markup language (XML) formation unit 290.
- DB reference proteome database
- XML extensible markup language
- FIG. 2 systematically illustrates the functions of the system for proteome analysis and data management shown in FIG. 1.
- a proteome analysis and data management function 2000 of the system according to the present invention is roughly classified into a proteome identification function 2200, a proteome search function 2300, a proteome analysis function 2400, and a data management function 2700.
- the proteome search function 2300 performed in the proteome search unit 230 is further classified into a keyword search function 3100, an image search function 3200, and an advanced search function 3400.
- the advanced search function 3400 is further classified into an isoelectric point (pl)/molecular weight (MW) similarity search function 3420, a peptide mass fingerprinting (PMF) similarity search function 3440, and a sequence similarity search function 3460.
- the pl/MW similarity search function 3420 refers to a function of searching for similar proteins to an unknown protein using its isoelectric point and molecular weight in order to identify the unknown protein, without needs for costly mass spectroscopic equipment and skilled persons capable of operating the same.
- the proteome analysis function 2400 performed in the proteome analysis unit 240 is further classified into a protein expression profile analysis function 2410, a comparative image analysis function 2420, an advanced search result analysis function 2430, a post-translational modification analysis function 2440, and a protein interaction analysis function 2450.
- the protein expression profile analysis function 2410 is further classified into a protein expression ratio variation analysis function 2411 , a similar expression pattern search function 2412, and a clustering function 2413.
- the data management function 2700 performed in the data management unit 270 is classified into an image analysis result loading function 2710, a data storage/edition/deletion function 2720, an flat file conversion function 2800, and an XML construction function 2900.
- the data management function 2700 is performed on a project basis, as indicated by reference numerals 2701 , 2702 , and 270n, in FIG. 2.
- the image analysis result loading function 2710 is a function of loading a gel image analysis file, for example, performed using an external program, such as Biorad's PDQuest software, or performed in the proteome analysis and data management system itself.
- the gel image analysis result is loaded in a predetermined file format, for example, as a Microsoft's Excel file, common gel image data between separate projects is not stored in duplicate, and the one stored common gel image data is referred to for another project. As a result, the amount of used memory space of the database can be conserved.
- the interface 210 receives one of experimental proteome data and a search parameter from the clients 100a, 100b, ..., and 100z to transmit the received data to the proteome analysis and data management server 200 and receives a proteome analysis result and a proteome search result from the proteome analysis and data management server 200 to transmit the received results to the clients 100a, 100b, ..., and 100z.
- the data management unit 270 connected to the interface 210 provides the image analysis result loading function 2710 to load a gel image analysis result, the data storage/edition/deletion function 2720 to store, edit, or delete data received from the clients 100a, 100b, ..., and 100z on a project basis, the flat file conversion function 2800, and the XML construction function 2900.
- the flat file conversion unit 280 converts the flat file format of the reference proteome DB 250 under the control of the data management unit 270, so that reference proteome data can be input in a format compatible with the format of the experimental proteome DB 260.
- the XML formation unit 290 formats the proteome data stored in the reference proteome DB 250 and the experimental proteome DB 260 into an XML format under the control of the data management unit 270, so that the proteome data in the two DBs 250 and 260 can be exchanged with and integrated into data stored in another system.
- the data management unit 270 manages the reference proteome DB 250 and the experimental proteome DB 260.
- the reference proteome DB 250 is a protein sequence DB storing a huge amount of validated proteome related data.
- databases that can be used for the reference proteome DB 250 include SWISS-PROT database storing protein related information, for example, on protein sequence, function, and structure, domain information, and post-translational modification, and PIR (Protein Information Resource), InterPro, and trEMBL databases.
- the experimental proteome DB 260 is a database storing proteome data obtained through experiments.
- the experimental proteome DB 260 may be a rat liver aging proteome database.
- the experimental proteome DB 260 is set up as follows.
- the input experimental protein data are transmitted to the proteome identification unit 220 via the data management unit 270.
- the proteome identification unit 220 identifies an experimental protein of interest based on 2-D gel images, PMF, molecular weight, isotropic point, and other protein related information input from the data management unit 270, through a comparison with the reference proteome data stored in the reference proteome DB 260.
- the data management unit 270 stores data on the experimental protein in the experimental proteome DB 260 with the attachment of the entry numbers of the corresponding reference data stored in the reference proteome DB 250.
- the experimental proteome DB 260 Once the experimental proteome DB 260 has set up, the experimental proteome DB 160 and the reference proteome DB 150 are connected to each other. As a result, the proteome search unit 230 can perform a search using the data stored in the experimental proteome DB 260, and the proteome analysis unit 240 can analyze the data. As described later, the experimental proteome DB 260 is constructed and managed on a project basis.
- the proteome search unit 230 performs the keyword search function 3100, the image search function 3200, and the advanced search function 3400 using search parameters, such as 2-D gel images, PMF, isoelectric point, molecular weight, and protein sequence information, input from the user, and searches the experimental proteome DB 250 and the reference proteome DB 260 to retrieve the corresponding data therefrom. For example, when an arbitrary search parameter is input by a user, the proteome search unit 230 responses by searching the experimental proteome DB 260 for particular proteome data and extracts detailed information on the corresponding proteome data from the reference proteome DB 250 with reference to the entry numbers of reference proteome data.
- search parameters such as 2-D gel images, PMF, isoelectric point, molecular weight, and protein sequence information
- the proteome analysis unit 240 performs a proteome analysis function 2400, such as the protein expression profile analysis function 2410, the comparative image analysis function 2420, the advanced search result analysis function 2430, the post-translational modification analysis function 2440, and the protein interaction analysis function 2450.
- the protein expression profile analysis function 2410 a function of analyzing protein expression patterns for different experimental conditions, which are obtained through 2-D electrophoresis, and similar expression patterns, is further classified into the protein expression ratio variation analysis function 2411 , the similar expression pattern search function 2412, and the clustering function 2413.
- protein expression profile analysis function 2410 protein expression variations through 2-D electrophoresis between different experimental conditions can be analyzed, and similar expression profile patterns can be searched for and then hierarchically clustered.
- the protein expression profile analysis function 2410 will be described later with reference to FIGS. 23 through 30.
- the comparative image analysis function 2420 is a function of comparing at least two 2-D gel images and analyzing the difference between the 2-D gel images.
- the advanced search result analysis function 2430 is a function of analytically characterizing a protein of interest using the advanced search result.
- the post-translational modification analysis function 2440 is a function of analyzing the difference between the experimental data of proteins and the theoretical data of reference proteins, which are considered to be similar to the experimental protein, stored in the reference proteome DB 250, for example, a difference between the experimental and theoretical isoelectric points and a difference between the experimental and theoretical molecular weights in order to provide basic information on a post-translational modification.
- the protein interaction analysis function 2450 is a function of analyzing the interaction between at least two proteins to characterize a protein of interest. Through the above-described various kinds of analysis functions, all kinds of proteins expressed in cells can be integrally identified and characterized.
- the experimental proteome DB 250 is constructed by storing the experimental data and the entry numbers of the corresponding reference data, rather than the entire corresponding reference data, stored in the reference proteome DB 250.
- Data models of the experimental proteome DB 260 and the reference proteome DB 250 are as follows.
- FIG. 3 illustrates the kinds of information stored and the correlation thereof in the reference proteome DB 250 and the experimental proteome DB 260 shown in FIG. 1.
- arrows indicate the information tables that are being referred to. Referring to FIG.
- the reference proteome DB 250 includes a reference database information table (DB_REFERENCE) 2501 , a reference literature information table (REFERENCE) 2502, a protein annotation table (PROTEIN_ANT) 2503, a comment information table (COMMENTS) 2504, a proteome sequence information table (SEQ_VALUE) 2505, and a proteome feature information table (FEATURE) 2506.
- DB_REFERENCE reference database information table
- REFERENCE reference literature information table
- PROTEIN_ANT protein annotation table
- COMPISH comment information table
- SEQ_VALUE proteome sequence information table
- FEATURE proteome feature information table
- the theoretical isoelectric point and molecular weight of all reference proteins are calculated using the proteome sequence information and stored in the protein annotation information table 2530.
- the theoretical isoelectric points and molecular weights stored in the protein annotation information table 2503 are used in a pl/MW similarity search described later to search for similar proteins to a particular protein having a predetermined isoelectric point and molecular weight.
- the experimental proteome DB 260 includes a protein information table (PROTEINJNFO) 2601 , a project information table (PROJECTJNFO) 2602, a project user information table (PROJECTJJSERJNFO) 2603, a user information table (USERJNFO) 2604, a normal gel information table (NORM_GEL_INFO) 2605, a normal gel image information table (NORM_GEL_IMAGE) 2606, a standard gel information table (STD_GEL_INFO) 2607, a standard gel image information table (STD_GEL_IMAGE) 2608, a normal spot information table (NORM_SPOT_INFO) 2609, and a standard spot information table (STD_SPOT_INFO) 2610.
- PROTEINJNFO protein information table
- PROJECTJNFO project information table
- PROJECTJJSERJNFO PROJECTJJSERJNFO
- USERJNFO user information table
- NVM_GEL_INFO normal gel information table
- the protein information table (PROTEINJNFO) 2601 links the experimental proteome DB 260 to the reference proteome DB 250 and is used to manage detailed information on spots in 2-D gel images and identified (reference) proteins.
- the protein information table (PROTEINJNFO) 2601 stores the entry numbers of identified protein data stored in the reference proteome DB 250, rather than storing the entire corresponding protein related reference data.
- the protein information table (PROTEINJNFO) 2601 includes a project identifier, a standard spot identifier, an identified protein annotation identifier, an annotation identifier for identified proteins, and other information.
- the project information table (PROJECTJNFO) 2602 is used to manage information on a plurality of projects on different research subjects, and particularly, to manage, search, and analyze the experimental data for each research subject based on the project.
- the project information table (PROJECTJNFO) 2602 includes a project identifier; name of project; project start date; project end date; names of researchers (members) involved; status of project; experimental parameters, such as time, diet therapies, etc.; comments; whether or not to open the project to the public; name of species used; name of genus used; experimental methods; and other information.
- the project user information table (PROJECT_USER_INFO) 2603 is used to manage information on users who are authorized or are directly involved in a project.
- the project user information table (PROJECTJJSERJNFO) 2603 includes a project identifier, a user identifier, duties of user, and other information.
- the user information table (USER JNFO) 2604 is used to manage information on the users involved in a project.
- the user information table (USERJNFO) 2604 includes a user identifier; a password; name of user; position of user; degree of user's authority ranging, for example, from level 1 to level 5; descriptions on user; and other information.
- the normal gel information table (NORM_GEL_INFO) 2605 is used to manage detailed information on a normal gel image integrated from a plurality of 2-D gel images obtained through electrophoresis.
- the normal gel information table (NORM_GEL_INFO) 2605 includes a project identifier; a gel identifier; experimental parameters, such as time, diet therapies, etc.; description on gel used; data unload date; last gel image data process date; a flag indicating whether the image has been processed or not; and other information.
- the normal gel image information table (NORMJ3ELJMAGE) 2606 is used to manage the normal gel image.
- the normal gel image information table (NORM_GELJMAGE) 2606 includes a gel identifier, a project identifier, a gel image file, a gel image file format, such as TIFF, GIF, or JPG, and other information.
- the standard gel information table (STD_GEL_INFO) 2607 is used to manage detailed information on individual gel images constituting the normal gel image.
- the standard gel information table (STD_GEL_INFO) 2607 includes a standard gel identifier; a project identifier; the largest and smallest molecular weights in a gel of interest; the largest and smallest isoelectric points' the slope between isoelectric points; gel formation date; and other information.
- the standard gel image information table (STDJ3ELJMAGE) 2608 includes a standard gel identifier, a project identifier, a gel image file, a gel image file format, such as TIFF, GIF, or JPG, and other information.
- the normal spot information table (NORM_SPOT_INFO) 2609 is used to manage detailed information on a normal spot image of a plurality of spots that is integrated from individual gel images.
- the normal spot information table (NORM_SPOTJNFOR) 2609 includes a spot identifier; a gel identifier; a project identifier; a standard spot identifier; x- and y-coordinates of spot; spot intensity information; molecular weight and isoelectric point of spot; spotting date; PMF information on spot; and other information.
- the standard spot information table (STD_SPOTJNFO) 2610 is used to integrally manage detailed information on individual spots constituting the normal spot image.
- the standard spot information table (STD_SPOT_INFO) 2610 includes a standard spot identifier; a standard gel identifier; a project identifier; average of the x-coordinates of individual spots; average of the y-coordinates of individual spots; quantitative information on individual spots; the molecular weights and isoelectric electric points of individual spots; and other information.
- the experimental proteome DB 260 having the above-described configuration manages the experimental data on a project basis. Therefore, a user can store and edit data to comply with a current project, and only a portion of the database that relates to the current project can be searched during an analysis of particular data. Common gel image data between different projects are not stored in duplicate in the experimental proteome DB 260, and the one stored common gel image is referred to for another project. As a result, the amount of used memory space of the database can be conserved. A method for building up the experimental proteome DB 260 having the configuration as described above will be described below.
- FIG. 4 is a flowchart illustrating a data storage/edition/deletion function 2720 (see FIG. 2) performed in the data management unit 270 shown in FIG. 1.
- FIGS. 5 through 7 show program execution windows for performing a data storage/edition/deletion function on a project basis according to the method illustrated in FIG. 4.
- it is determined whether to create a new project file in the experimental proteome DB 260 (step 2721 ). If it is determined in step 2721 to create a new project file, a window for new project is opened, as shown in FIG.
- Information required to create a new project file includes a project identifier; name of project; project start date; project end date; names of researchers (members) involved; status of project; experimental parameters, such as time, diet therapies, etc.; comments; whether or not to open the project to the public; name of species used; name of genus used; experimental methods; and other information.
- step 2723 it is determined whether to retrieve the experimental data of a previous project file that are common to the new project (step 2723). If it is determined in step 2723 to retrieve the common experimental data of the previous project file, the common experimental data are loaded onto the new project file (step 2724). For example, when there is a common predetermined image file between two projects, instead of storing the common image file for each of the projects in the experimental proteome DB 260, the image file used in the previous project is loaded for the new project. As a result, the amount of used memory space of the experimental proteome DB 260 can be conserved.
- step 2720 If it is determined in step 2720 not to create a new project file, it is determined whether to edit a previous project file stored in the experimental proteome DB 260 (step 2725). If it is determined in step 2725
- a project file to be edited is selected among a plurality of previous project files displayed on a window of project list, as shown in FIG. 6 (step 2726).
- a window informing the selected previous project file is displayed, as shown in FIG. 7, to allow a user to edit desired data, for example, for the information tables 2606 through 2610 of FIG. 3 (step 2727).
- Such editing of data can be performed regardless of the type of data, including numeric data, symbolic data, and image file data.
- the experimental protein data stored in the experimental proteome DB 260 on a project basis according to the method as described above are input to the proteome identification unit 220 via the data management unit 270.
- the project identification unit 220 identifies an experimental protein of interest on a project basis and stores the identified result in the experimental proteome DB 260, i.e., the protein information table 2601 shown in FIG. 3.
- FIG. 8 is a flowchart illustrating a method for identifying proteins in the proteome identification unit 220 shown in FIG. 1.
- the proteome identification unit 220 initially receives experimental data from the data management unit 270 (step 2210). When data required to identify the experimental proteins are retrieved through an advanced search (step 2220), the proteome identification unit 220 identifies proteins of interest based on the searched result (step 2230). Next, it is determined whether the identification of the proteins of interest has been completed (step 2240).
- the data management unit 2670 stores the experimental protein data and the identified result in the experimental proteome DB 260, i.e., the protein information table 2601 of FIG. 3 (step 2250).
- the experimental proteome DB 260 built up as described above can be applied for a keyword search, an image search, a protein expression pattern search, and an advanced search, which is a kind of combination search of the forgoing search techniques.
- the experimental proteome DB 260 can be used for a proteome (reference) data search in connection with the reference proteome DB 250.
- a proteome search is performed as follows.
- FIG. 9 shows an example of an initial search window 300 for proteome analysis according to an embodiment of the present invention.
- the initial search window 300 providing a graphic user interface for data search includes a search menu for a keyword search option 310, an image search option 320, and an advanced search option 340.
- the advanced search option 340 provides a list of choices for pl/MW similarity search 342, PMF similarity search 344, and sequence similarity search 346.
- FIG. 10 is a flowchart illustrating a proteome search function 2300 performed in the proteome search unit 230 shown in FIG. 1.
- a search method is selected among a keyword search option, an image search option, and an advanced search option (step 3010).
- a keyword search option is selected in step 3010, a keyword for searching is received from a user (step 3110).
- protein information relating to the received keyword is searched for (step 3120), and the searched result is displayed (step 3500).
- a desired 2-D gel image is selected among a plurality of reference images stored in the reference proteome DB and is loaded for search (step 3210).
- a spot of protein of interest is designated on the selected 2-D gel image by the user (step 3220).
- the present invention provides a zoom-in function of magnifying and displaying a region around the protein spot. This function allows a user to more accurately designate a spot of protein which is to be searched for.
- information on the designated protein spot is searched for (step 3230), and the searched result is displayed (step 3500).
- step 3110 If an advanced search option is selected in step 3110, the process goes to step 3410 for selecting a detailed advanced search parameter. If a pl/MW similarity search is selected, the pl/MW data of an experimental protein of interest are received from the user (step 3420), proteins having a similar pl/MW to the experimental protein are searched for (step 3422), and the searched result is displayed (step 3500). If a PMF similarity search is selected in step 3410, the PMF data of an experimental protein of interest are received from the user (step 3441 ), proteins having a similar PMF to the experimental protein are searched for (step 3422), and the searched result is displayed (step 3500).
- step 3410 If a sequence similarity search is selected in step 3410, the protein sequence information of an experimental protein of interest are received from the user (step 3461 ), proteins having a similar sequence to the experimental protein are searched for (step 3462), and the searched result is displayed (step 3500).
- the proteome analysis and data management system provides detailed data on experimental proteins through keyword and image searches and identifies experimental proteins by analyzing the isoelectric point, molecular weight, PMF, or protein sequence information thereof through an advanced search.
- FIG. 11 shows a keyword search window opened upon selection of the keyword search option 310 shown in FIG. 9, and FIG. 12 shows a window displaying the results of the keyword search performed according to the procedure illustrated in FIG. 10.
- a user designates the isoelectric point and molecular weight of a protein of interest, and name of the protein, and a search range
- the searched results as shown in FIG. 12 are displayed as a tree view.
- proteome information on the selected protein including a 2-D gel image, the detailed features and sequence of the protein, comments, reference literatures, and species information, is searched for throughout the reference proteome DB 250 and displayed.
- FIG. 13 and 14 show windows displaying the results of an image search performed according to the procedure illustrated in FIG. 10 by selecting the image search option 320 shown in FIG. 9.
- a proteome search is performed on the spot designated by the user, and detailed information on the spot, including its x- and y-coordinates, is displayed, as shown in FIG. 14.
- FIG. 14 shows that if the user clicks on an arbitrary one of the searched proteins displayed on the screen, detailed proteome information on the selected protein, including a 2-D gel image, is searched for throughout the reference proteome DB 250 and displayed.
- FIGS. 15 through 22 are for illustrating advanced searches performed according to the procedure illustrated in FIG. 10 by selecting the advanced search option 340 shown in FIG. 9.
- advanced searches are classified into the pl/MW similarity search 342, the PMF similarity search 344, and the sequence similarity search 346 according to the search parameter input.
- FIG. 15 shows a pl/MW similarity search window opened upon selection of the pl/MW similarity search 342 shown in FIG.
- FIG. 16 shows a window displaying the results of the pl/MW similarity search performed according to the procedure illustrated in FIG.
- the pl/MW similarity search for looking for a protein similar to an unknown experimental protein can be achieved by directly inputting the measured molecular weight and isoelectric point of the unknown experimental protein and, for example, a isoelectric point range, a molecular weight range, a ratio of molecular weight and isoelectric point, and name of species to be searched for.
- the pl/MW similarity search may be performed using a 2-D gel image.
- two types of user input interfaces are provided for the pl/MW similarity search according to the present invention.
- One allows a user to directly input the isoelectric point and its range of reference proteins to be searched for, the molecular weight and its range of reference proteins to be searched for, name of species to be searched for, and a ratio of isoelectric point and the logarithm of molecular weight (pl/log(MW)).
- the other one allows a user to directly click on a spot on a 2-D gel image that is of interest to be searched for. For example, when a user clicks on an arbitrary spot in the image search window as shown in FIG. 13, the isoelectric point and the molecular weight of the selected spot corresponding to its x- and y-coordinates are obtained. Next, proteins having a similar isoelectric point and molecular weight to the spot are searched for based on the isoelectric point and molecular weight of the spot, and displayed in order, as shown in FIG. 16.
- the coordinate values of the designated protein spot are transformed into experimental isoelectric point and molecular weight values.
- an isoelectric point range, a molecular weight range, name of species, and a ratio of isoelectric point and molecular weight are not input by the user.
- the ratio of isoelectric point and molecular weight can be calculated from the experimental isoeletric point and molecular weight of the protein spot.
- a pl/MW similarity search according to the present invention can be performed using an arbitrary default value and name of species by designating a particular database established through experiments, for example, a rat liver aging database.. Through such pl/MW similarity searches, the user can identify unknown proteins to a certain extent.
- FIG. 17 is a flowchart illustrating a pl/MW similarity search method according to an embodiment of the present invention.
- FIG. 18 is a diagram illustrating an Euclidean distance calculating method used in the pl/MW similarity search according to the present invention.
- a pl/MW similarity search method according to an embodiment of the present invention involves determining whether to use a 2-D gel image for the pl/MW similarity search (step 3412). If it is determined to use a 2-D gel image for the pl/MW similarity search, a spot of protein to be searched for is designated on the 2-D gel image (step 3422). Based on the position (x- and y-coordinates) of the selected protein spot, the experimental isoelectric point and molecular weight of the protein spot are obtained (step 3423).
- the x-axis of the 2-D gel image obtained through electrophoresis represents isoelectric point (pi), whereas the y-axis thereof represents the logarithm of molecular weight.
- the ratio of x-axis and y-axis is controlled by the ratio of isoelectric point and the logarithm of molecular weight. This will be described in detail later.
- the proteome search unit 230 extracts the theoretical isoelectric points and molecular weights of identified proteins stored in the protein annotation information table 2503 of the reference proteome DB 250 and calculates the Euclidian distance between the protein spot designated by the user on a plane of the logarithm of molecular weight vs. isoelectric point, and the position of each of the identified proteins, wherein the isoelectric point and the molecular weight of the protein spot are experimental values, whereas those of the identified proteins are theoretical values (step 3424).
- the calculated Euclidian distances are sorted in order of increasing Euclidian distance (step 3425), and the searched proteins are displayed in order of increasing Euclidian distance, i.e., in order of decreasing similarity as a result of the pl/MW similarity search (step 3426). Sorting the searched results in step 3425 is performed using a sort function provided for a relational database.
- step 3421 If it is determined in step 3421 not to use a 2-D gel image for the pl/MW similarity search, information for the pl/MW similarity search, such as the isoelectric point and molecular weight of a protein of interest, a search range, name of species, a ratio of isoelectric point and molecular weight are directly input by the user (step 3427).
- the proteome search unit 230 extracts the theoretical isoelectric points and molecular weights of identified proteins stored in the protein annotation information table 2503 of the reference proteome DB 250 and calculates the Euclidian distance between the protein designated by the user and each of the identified proteins using their isoelectric points and molecular weights.
- the calculated Euclidian distances are sorted in order of increasing Euclidian distance (step 3425), and the searched proteins are displayed in order of increasing Euclidian distance, i.e., in order of decreasing similarity as a result of the pl/MW similarity search (step 3426).
- a method for calculating the Euclidian distance applied for the pl/MW similarity search according to the present invention will be described with reference to FIG. 18.
- the Euclidian distance means the shortest distance between two points in N-dimensional space. Therefore, the Euclidian distance between two points, (P1 , M1 ) and (P2, M2), on a 2-D gel image can be expressed as equation (1 ) below, and the Euclidian distance between two points, (P1 , M1 ) and (P3, M3), on the 2-D gel image can be expressed as equation (2) below.
- the Euclidian distance distl between two points (P1 , M1 ) and (P2, M2) is smaller than the Euclidian distance dist2 between two points (P1 , M1 ) and (P3, M3). Therefore, point (P2, M2) is determined to have a position of higher similarity to that of point (P1 , M3).
- the proteome analysis and data management system can display proteins similar to a particular unknown protein of interest in order of decreasing similarity, so that the unknown protein can be identified, without needs for costly mass spectroscopy equipment and skilled personnel capable of operating the same.
- the proteome analysis and data management system can provide a user with basic information on a post-translational modification by comparing the theoretical isoelectric point and theoretical molecular weight of an identified protein with the experimental isoelectric point and molecular weight of a protein which are input by the user. For example, with the assumption that no post-translational modification occurs, the experimental isoelectric point and molecular weight almost match the theoretical isoelectric point and molecular weight, respectively. However, when the experimental isoelectric point and molecular weight are greatly different from the theoretical isoelectric point and molecular weight, respectively, a higher likelihood of post-translational modification is expected. Based on the likelihood of post-translational modification revealed through the pl/MW similarity search according to the present invention, a user can continue to research in depth the problem of post-translational modification that is crucial in the proteome research field.
- FIG. 19 shows a PMF similarity search window opened upon selection of the PMF similarity search 344 shown in FIG. 9, and FIG. 20 shows a window displaying the results of the PMF similarity search.
- FIG. 19 when the PMF data of a protein which the user wishes to search for are input, proteins having similar PMF characteristics to the input protein are searched for and displayed in order of decreasing similarity.
- FIG. 20 shows a window displaying the results of the PMF similarity search.
- FIG. 21 shows a sequence similarity search window opened upon selection of the sequence similarity search 346 shown in FIG. 9, and FIG. 22 shows a window displaying the results of the sequence similarity search.
- FIG. 21 when a user designates a database used to search for similar proteins to a protein of interest having an arbitrary sequence and a sequence similarity search program, proteins having a sequence similar to the input protein are displayed in order, as shown in FIG. 22.
- FIG. 22 shows if the user clicks on an arbitrary one of the searched proteins displayed on the screen as shown in FIG. 22, detailed proteome information on the selected protein, including a 2-D gel image, is searched for throughout the reference proteome DB 250 and displayed.
- FIG. 23 illustrates a protein expression profile analysis function 2410 performed in the proteome analysis unit 240 shown in FIG. 1 .
- a protein expression profile analysis function 2410 is further classified into the protein expression ratio variation analysis function 241 1 , the similar expression pattern search function 2412, and the clustering function 2413.
- the protein expression ratio variation analysis function 241 1 is a function of comparing quantitative information extracted from the experimental proteome DB 260 on different experimental conditions designated by the user to calculate protein expression ratio variations between the experimental conditions and searching for and outputting proteins having a protein expression ratio variation within an expression variation range designated by the user.
- the similar expression pattern search function 2412 is a function of searching for proteins having a similar protein expression pattern to a protein selected by the user by calculating the Euclidian distance between the protein selected by the user and proteins stored in the experimental proteome DB 260 using their quantitative protein expression profile information.
- the clustering function 2413 is a function of hierarchically clustering proteins of a particular experimental condition designated by the user by similarity in expression pattern using their quantitative protein profile information extracted from the experimental proteome DB 260.
- quantitative protein information 241 is extracted from the experimental proteome DB 260.
- the extracted quantitative protein information 241 is used to analyze protein expression patterns between different experimental conditions (function 2411 ), to search for proteins having a similar protein expression pattern to a protein of interest (function 2412), or to hierarchically cluster proteins using expression profiles (function 2413).
- FIG. 24 is a flowchart illustrating a protein expression ratio variation analysis function 2411 shown in FIGS. 2 and 23.
- FIG. 25 shows a window for inputting data for the protein expression ratio variation analysis illustrated in FIG. 24 and displaying the results of the protein expression ratio variation analysis.
- FIG. 26 shows an example of quantitative protein information 241 used for the protein expression ratio variation analysis as illustrated in FIG. 23. A method for analyzing protein expression ratio variations for different experimental conditions according to the present invention will be described with reference to FIGS. 24 through 26.
- experimental conditions for a protein expression variation comparison and a protein expression variation range to be searched for are input (step 24111 ), via the window as illustrated in FIG. 25.
- the experimental conditions may be input using a drop down menu so that a user can input data easily.
- the protein expression variation range may be manually input by the user.
- the quantitative protein information 241 on the input experimental conditions is extracted from the experimental proteome DB 260 (step 24112).
- the quantitative protein information 241 refers to the intensity of spots on a gel image. Although an example of expressing the quantitative protein information 241 is illustrated in FIG. 26, the quantitative protein information 241 may be expressed in various forms.
- an experimental quantitative protein information variation ratio is calculated (step 24113).
- the experimental quantitative protein information variation ratio is calculated using equation (3) below.
- the experimental quantitative protein information variation ratio is a ratio of variations in protein expression between different experimental conditions.
- experimental quantitative protein information variation ratios are calculated using equation (3) above, experimental quantitative protein information variation ratios that are within the protein expression variation range defined by the user are extracted and displayed as shown in a lower portion of the window shown in FIG. 25 (step 24114).
- the results of the protein expression ratio variation analysis displayed in step 24114 as shown in FIG. 25 include protein spot information, such as protein ID No. and name, experimental quantitative protein information variation ratios between different groups, and whether the experimental quantitative protein information variation ratio changes or not.
- Such analyzed results may be tabled and color-coded so that an increase and a decrease in protein expression ratio are made more distinct. Accordingly, the user can easily distinguish between proteins having a similar tendency of expression ratio increasing or decreasing.
- the analyzed results may be linked to the reference proteome DB 250 (step 24115). Then, it is determined whether to search the reference proteome DB 250 for detailed information on the proteins extracted through the analysis (step 24116). Next, the reference proteome DB 250 is searched for detailed information on the proteins extracted through the analysis (step 24117).
- the user can analyze the protein expression patterns between different experimental conditions by calculating quantitative protein information variation ratios.
- the user can search for detailed information on the proteins screened through the protein expression pattern analysis, if necessary, by clicking on a desired protein of interest in the list of the screened proteins.
- Detailed information on reference proteins is provided by an owner of the reference proteome DB 250.
- the reference proteome DB 150 provides users with a huge amount of validated protein related information useful for protein analysis.
- FIG. 27 is a flowchart illustrating a similar expression pattern search function 2412 illustrated in FIGS. 2 and 23.
- FIG. 28 is a window displaying the results of the similar expression pattern search performed according to the method illustrated as an embodiment of the present invention in FIG. 27.
- a protein of interest is selected (step 24121 ).
- it is determined whether an expression profile of the protein has been input (step 24122). If it is determined that the expression profile of the protein has been input, quantitative profile information on other proteins is extracted from the experimental proteome DB 260 (step 24123). The Euclidian distance between the position of the protein which is of interest and the position of each of the other experimental proteins is calculated using the quantitative profile information (step 24124).
- the experimental proteins are sorted (step 24125) and displayed (step 24126) in order of increasing Euclidian distance, i.e., in order of decreasing similarity.
- the Euclidian distance indicates the shortest distance between two points in N-dimensional space. A smaller Euclidian distance means a higher similarity in expression pattern between two proteins.
- the results of the similar expression pattern search may be linked to the reference proteome DB 250 (step 24127). Then, it is determined whether to search the reference proteome DB 250 for detailed information on the proteins determined to have a similar expression pattern through the search (step 24128). If it is determined to search the reference proteome DB 250 for detailed information, the reference proteome DB 250 is searched for detailed information on the similar proteins, and the searched detailed information is output (step 24129). Therefore, the user can search for proteins having a similar protein expression profile using quantitative profile information and can refer to detailed information on the proteins by searching the reference proteome DB 250 if necessary.
- a hierarchical clustering method using protein expression profiles involves user inputting an experimental condition for hierarchical clustering (step 24131), extracting quantitative protein profile information on the experimental condition (step 24132), calculating the Euclidian distance between all pairs of proteins using the extracted quantitative protein profile information (step 24133), hierarchically clustering the proteins by similarity in expression profile using the calculated Euclidian distances (step 24134), and displaying the clustered result (step 24135).
- the clustered result is linked to the reference proteome database 250 (step 24136), and it is determined whether to search the reference proteome database 250 for individual proteins in clusters (step 24137). If it is determined to search the reference proteome database 250 for individual proteins in clusters, the reference proteome database 250 is searched for detailed information on each of the clustered proteins (step 24138).
- the user can hierarchically cluster proteins by similarity in protein expression profile using the quantitative information stored in the experimental proteome database 260.
- the user can refer to detailed information on each of the proteins in clusters by searching the reference proteome database 250 if necessary.
- FIG. 30 shows a window displaying the results of the hierarchical clustering using protein expression profiles according to an embodiment of the present invention.
- "Hierarchical clustering” is a statistical analysis technique for grouping multivariate data by similarity in each parameter, wherein similar objects or variables are clustered in groups according to predetermined rules.
- Various cluster linkage rules are useful. Single linkage methods determine the distance between the two closest objects. After clustering the two closest objects in a group, one of the two objects and another closest object are clustered in a group. Complete linkage methods determine the greatest distance between any two objects in different clusters in a matrix, wherein each cluster includes the closest objects, and cluster objects in different clusters but which are closest in a group. Average linkage methods determine the average distance between all pairs of objects in two different clusters.
- the result of the hierarchical clustering is visualized as an image to allow the user to easily understand the correlation of proteins in their expression profile.
- the distance between adjacent proteins that is calculated from their expression pattern is expressed as a branch of a tree, as shown in the tree view of FIG. 30.
- the degree of similarity between two proteins in expression pattern is expressed by the x-axial length of a branch in the tree view. Therefore, the similarity between two proteins in expression pattern can be verified by comparing the lengths of branches on the x-axis. For example, the x-axial length between proteins Nos. 008 and 009 is shortest, the two proteins are considered to have the most similar expression pattern.
- Such a hierarchical clustering allows a user to perceive the similarity of all the proteins stored in a database in expression pattern at a glance and to easily compare degrees of similarity in protein expression pattern from the x-axial lengths of branches of the tree.
- protein information in a proteome database is changed or added thereto, such a hierarchical clustering can be performed on all the proteins in the modified database in real time. Therefore, the user can acquire a result of clustering in real time that varies according to changes in the information of the database.
- protein expression profile can be analyzed in various aspects through an expression pattern ratio variation analysis for different experimental conditions, a similar expression pattern search, and a hierarchical clustering.
- detailed information on the proteins that have been analyzed to have a similar protein expression profile can be searched for throughout the reference proteome database 250.
- proteome analysis and data management system capable of integrally searching for and analyzing data using two databases, a reference proteome database and an experimental proteome database, in a client/server environment, wherein the experimental proteme database is built up with reference to the format of the reference proteome database, is described
- the present invention can be applied in a local environment or in a web environment.
- the invention may be embodied in a general purpose digital computer by running a program from a computer readable medium, including but not limited to storage media such as magnetic storage media (e.g., ROM's, floppy disks, hard disks, etc.), optically readable media (e.g., CD-ROMs, DVDs, etc.) and carrier waves (e.g., transmissions over the Internet).
- a computer readable medium including but not limited to storage media such as magnetic storage media (e.g., ROM's, floppy disks, hard disks, etc.), optically readable media (e.g., CD-ROMs, DVDs, etc.) and carrier waves (e.g., transmissions over the Internet).
- the present invention may be embodied as a computer readable medium having a computer readable program code unit embodied therein for causing a number of computer systems connected via a network to effect distributed processing.
- an experimental proteome database and an established reference database which are physically separated from one another, can be efficiently integrated on a project basis for data storage, search, and analysis.
- the establishment of a database of experimental data and a protein search and analysis can be easily implemented on a project basis in a client/server environment.
Landscapes
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Engineering & Computer Science (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Theoretical Computer Science (AREA)
- Evolutionary Biology (AREA)
- Biophysics (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Medical Informatics (AREA)
- General Health & Medical Sciences (AREA)
- Bioinformatics & Computational Biology (AREA)
- Biotechnology (AREA)
- Analytical Chemistry (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Molecular Biology (AREA)
- Genetics & Genomics (AREA)
- Chemical & Material Sciences (AREA)
- Bioethics (AREA)
- Databases & Information Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Description
Claims
Applications Claiming Priority (6)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| KR2001/52554 | 2001-08-29 | ||
| KR1020010052555A KR20030019681A (en) | 2001-08-29 | 2001-08-29 | Web-based workbench system and method for proteome analysis and management |
| KR2001/52555 | 2001-08-29 | ||
| KR10-2001-0052554A KR100478792B1 (en) | 2001-08-29 | 2001-08-29 | Apparatus and method for searching similar position protein based on 2 dimensional gel image |
| KR2001/52556 | 2001-08-29 | ||
| KR1020010052556A KR20030019682A (en) | 2001-08-29 | 2001-08-29 | Apparatus and method for analysing protein expression profile based on spot intensity information |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2003019417A1 true WO2003019417A1 (en) | 2003-03-06 |
Family
ID=27350514
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/KR2002/001624 Ceased WO2003019417A1 (en) | 2001-08-29 | 2002-08-29 | System and method for proteome analysis and data management |
Country Status (1)
| Country | Link |
|---|---|
| WO (1) | WO2003019417A1 (en) |
Cited By (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2004086281A1 (en) * | 2003-03-25 | 2004-10-07 | Institut Suisse De Bioinformatique | Method for comparing proteomes |
| WO2008007821A1 (en) * | 2006-07-12 | 2008-01-17 | Korea Basic Science Institute | A method for reconstructing protein database and a method for identifying proteins by using the same method |
| CN114242163A (en) * | 2020-09-09 | 2022-03-25 | 复旦大学 | Proteomics mass spectrometry data processing system |
| CN114910550A (en) * | 2017-12-14 | 2022-08-16 | 布鲁克·道尔顿有限及两合公司 | Mass spectrometric determination of specific tissue states |
| CN116913387A (en) * | 2023-05-19 | 2023-10-20 | 北京火山引擎科技有限公司 | A biological information data processing method, device, equipment and related media |
Citations (7)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO1998019271A1 (en) * | 1996-10-25 | 1998-05-07 | Peter Mose Larsen | Proteome analysis for characterization of up- and down-regulated proteins in biological samples |
| WO2001002848A1 (en) * | 1999-07-05 | 2001-01-11 | Thomas Moore | Method for the multi-dimensional analysis of a proteome |
| WO2001018627A2 (en) * | 1999-09-06 | 2001-03-15 | National University Of Singapore | Method and apparatus for computer automated detection of protein and nucleic acid targets of a chemical compound |
| WO2001030830A2 (en) * | 1999-10-26 | 2001-05-03 | Mitokor | Gene sequences identified by protein motif database searching |
| US6256647B1 (en) * | 1998-02-16 | 2001-07-03 | Biomolecular Engineering Research Institute | Method of searching database of three-dimensional protein structures |
| US6277259B1 (en) * | 1998-04-24 | 2001-08-21 | Enterprise Partners Ii | High performance multidimensional proteome analyzer |
| KR20020080626A (en) * | 2001-04-16 | 2002-10-26 | 학교법인연세대학교 | Providing Apparatus and Method for Proteome Data |
-
2002
- 2002-08-29 WO PCT/KR2002/001624 patent/WO2003019417A1/en not_active Ceased
Patent Citations (7)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO1998019271A1 (en) * | 1996-10-25 | 1998-05-07 | Peter Mose Larsen | Proteome analysis for characterization of up- and down-regulated proteins in biological samples |
| US6256647B1 (en) * | 1998-02-16 | 2001-07-03 | Biomolecular Engineering Research Institute | Method of searching database of three-dimensional protein structures |
| US6277259B1 (en) * | 1998-04-24 | 2001-08-21 | Enterprise Partners Ii | High performance multidimensional proteome analyzer |
| WO2001002848A1 (en) * | 1999-07-05 | 2001-01-11 | Thomas Moore | Method for the multi-dimensional analysis of a proteome |
| WO2001018627A2 (en) * | 1999-09-06 | 2001-03-15 | National University Of Singapore | Method and apparatus for computer automated detection of protein and nucleic acid targets of a chemical compound |
| WO2001030830A2 (en) * | 1999-10-26 | 2001-05-03 | Mitokor | Gene sequences identified by protein motif database searching |
| KR20020080626A (en) * | 2001-04-16 | 2002-10-26 | 학교법인연세대학교 | Providing Apparatus and Method for Proteome Data |
Cited By (8)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2004086281A1 (en) * | 2003-03-25 | 2004-10-07 | Institut Suisse De Bioinformatique | Method for comparing proteomes |
| WO2008007821A1 (en) * | 2006-07-12 | 2008-01-17 | Korea Basic Science Institute | A method for reconstructing protein database and a method for identifying proteins by using the same method |
| US8296300B2 (en) | 2006-07-12 | 2012-10-23 | Korea Basic Science Institute | Method for reconstructing protein database and a method for screening proteins by using the same method |
| CN114910550A (en) * | 2017-12-14 | 2022-08-16 | 布鲁克·道尔顿有限及两合公司 | Mass spectrometric determination of specific tissue states |
| CN114910550B (en) * | 2017-12-14 | 2025-09-16 | 布鲁克·道尔顿有限及两合公司 | Mass spectrometry of specific tissue states |
| CN114242163A (en) * | 2020-09-09 | 2022-03-25 | 复旦大学 | Proteomics mass spectrometry data processing system |
| CN114242163B (en) * | 2020-09-09 | 2024-01-30 | 复旦大学 | Proteomics mass spectrometry data processing system |
| CN116913387A (en) * | 2023-05-19 | 2023-10-20 | 北京火山引擎科技有限公司 | A biological information data processing method, device, equipment and related media |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| Bouyssié et al. | Proline: an efficient and user-friendly software suite for large-scale proteomics | |
| US20020067358A1 (en) | Data analysis software | |
| US6466923B1 (en) | Method and apparatus for biomathematical pattern recognition | |
| US7707143B2 (en) | Systems, methods, and computer program products that automatically discover metadata objects and generate multidimensional models | |
| JP3362125B2 (en) | Information processing method | |
| US20030218634A1 (en) | System and methods for visualizing diverse biological relationships | |
| US20060179051A1 (en) | Methods and apparatus for steering the analyses of collections of documents | |
| CN105956416B (en) | A kind of method of fast automatic analyzing prokaryote protein gene group data | |
| US20030061243A1 (en) | Information auto classification method and information search and analysis method | |
| KR100463667B1 (en) | System for processing patent materials, its method | |
| CN109460386B (en) | Malicious file homology analysis method and device based on multi-dimensional fuzzy hash matching | |
| EP1192567B1 (en) | Content-based retrieval of series data | |
| Appel et al. | Computer analysis of 2-D images | |
| KR100650203B1 (en) | Genome sequence analysis and data management system and method | |
| D. LeDuc et al. | Using ProSight PTM and related tools for targeted protein identification and characterization with high mass accuracy tandem MS data | |
| JP2005025731A (en) | Drill-through query from data mining model content | |
| WO2003019417A1 (en) | System and method for proteome analysis and data management | |
| US20030036207A1 (en) | System and method for storing mass spectrometry data | |
| US6927779B2 (en) | Web-based well plate information retrieval and display system | |
| CN100458788C (en) | Clustering method, searching method and system for interconnection network audio file | |
| Kaushal et al. | Analyzing and visualizing expression data with Spotfire | |
| KR20030019681A (en) | Web-based workbench system and method for proteome analysis and management | |
| US20060080296A1 (en) | Text mining server and text mining system | |
| JP2003242154A (en) | Method and apparatus for managing gene manifestation information, program, and recording medium | |
| KR100478792B1 (en) | Apparatus and method for searching similar position protein based on 2 dimensional gel image |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| AK | Designated states |
Kind code of ref document: A1 Designated state(s): AE AG AL AM AT AU AZ BA BB BG BY BZ CA CH CN CO CR CU CZ DE DM DZ EC EE ES FI GB GD GE GH HR HU ID IL IN IS JP KE KG KP KZ LK LR LS LT LU LV MA MD MG MK MW MX MZ NO NZ OM PH PL PT RO SD SE SG SI SK SL TJ TM TN TR TT UA UG US UZ VC VN YU ZA ZM Kind code of ref document: A1 Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ OM PH PL PT RO RU SD SE SG SI SK SL TJ TM TN TR TT TZ UA UG US UZ VC VN YU ZA ZM ZW |
|
| AL | Designated countries for regional patents |
Kind code of ref document: A1 Designated state(s): GH GM KE LS MW MZ SD SL SZ UG ZM ZW AM AZ BY KG KZ RU TJ TM AT BE BG CH CY CZ DK EE ES FI FR GB GR IE IT LU MC PT SE SK TR BF BJ CF CG CI GA GN GQ GW ML MR NE SN TD TG Kind code of ref document: A1 Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR IE IT LU MC NL PT SE SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG |
|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
| 32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: COMMUNICATION PURSUANT TO RULE 69 EPC (EPO FROM 1205A OF 070504) |
|
| 122 | Ep: pct application non-entry in european phase | ||
| NENP | Non-entry into the national phase |
Ref country code: JP |
|
| WWW | Wipo information: withdrawn in national office |
Country of ref document: JP |