WO2002103954A2 - Plate-forme d'exploration de donnees en bio-informatique et autres domaines de decouverte de connaissance - Google Patents
Plate-forme d'exploration de donnees en bio-informatique et autres domaines de decouverte de connaissance Download PDFInfo
- Publication number
- WO2002103954A2 WO2002103954A2 PCT/US2002/019202 US0219202W WO02103954A2 WO 2002103954 A2 WO2002103954 A2 WO 2002103954A2 US 0219202 W US0219202 W US 0219202W WO 02103954 A2 WO02103954 A2 WO 02103954A2
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- data
- features
- feature
- gene
- genes
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B50/00—ICT programming tools or database systems specially adapted for bioinformatics
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B25/00—ICT specially adapted for hybridisation; ICT specially adapted for gene or protein expression
- G16B25/10—Gene or protein expression profiling; Expression-ratio estimation or normalisation
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B40/00—ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
- G16B40/20—Supervised data analysis
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B40/00—ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
- G16B40/30—Unsupervised data analysis
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B50/00—ICT programming tools or database systems specially adapted for bioinformatics
- G16B50/20—Heterogeneous data integration
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B25/00—ICT specially adapted for hybridisation; ICT specially adapted for gene or protein expression
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B40/00—ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
Definitions
- FIG. 9 is an exemplary screen shot of an interface for the Gene Search Assistant application for bioinformatics for use in searching published information.
- the following sequence illustrates application of recursive feature elimination (RFE) to a SVM using the weight magnitude as the ranking criterion.
- SRS6 SRS is a program developed at the European Bioinfonnatics Institute for the indexing and cross-referencing of databases of textual information. It provides unified access to molecular biology databases, integration of analysis tools and advanced parsing tools for disseminating and reformatting information stored in ASCII text.
- Ranlced lists of features can also be visualized as a matrix of colored coefficients.
- the columns of the matrix represent all of the values a given feature takes across all patterns.
- the columns are ordered according to the feature ranlcing.
- the rows of the matrix may be ordered, for example, to group the examples of a same class together.
- a matrix can be transposed.
- One can also represent ranlced lists of feature subsets, particularly equivalent features, in this way. Nested subsets of features with cardinality increments of one can be visualized by printing the feature identifiers in the order that they are added to increase the cardinality of the feature subsets.
- the identifiers, or their background can then be optionally colored according to the score of the subset containing all the features from the beginning of the list to that feature.
- feature f a singleton
- color 1 illustrated as low density dots
- feature f 5 is filled indicated by a box filled with color 8 (illustrated as grid lines) to indicate the highest score.
- FIG. 14 illustrates the gene tree (observation graph) corresponding to the screen information in FIG. 11.
- This tree was generated from DNA microarray data of colon cancer and normal patients.
- Several runs using the RFE-SVM algorithm were used to generate alternative nested subsets of genes.
- the nodes are labeled with GANs.
- the quality of every subset of genes can be assessed, for example, by the success rate of a classifier trained with these genes.
- the shading (color) of the last node of a given path indicates the quality of the subset, hi the present example, a scale of 64 shades, or colors, was used to map the leave-one-out success rate.
- a binary tree of depth 4 is construed. This means that for every gene selection, only two alternatives are presented, and that up to four genes can be selected. Wider trees (with more children at every node) permit selection from a wider variety of genes. Deeper tree provide for selection of a larger number of genes.
Landscapes
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Medical Informatics (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Biophysics (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Biotechnology (AREA)
- Evolutionary Biology (AREA)
- General Health & Medical Sciences (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Bioethics (AREA)
- Databases & Information Systems (AREA)
- Genetics & Genomics (AREA)
- Artificial Intelligence (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Epidemiology (AREA)
- Evolutionary Computation (AREA)
- Public Health (AREA)
- Software Systems (AREA)
- Molecular Biology (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Priority Applications (4)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US10/481,068 US7444308B2 (en) | 2001-06-15 | 2002-06-17 | Data mining platform for bioinformatics and other knowledge discovery |
| AU2002304006A AU2002304006A1 (en) | 2001-06-15 | 2002-06-17 | Data mining platform for bioinformatics and other knowledge discovery |
| US11/928,641 US7542947B2 (en) | 1998-05-01 | 2007-10-30 | Data mining platform for bioinformatics and other knowledge discovery |
| US13/079,198 US8126825B2 (en) | 1998-05-01 | 2011-04-04 | Method for visualizing feature ranking of a subset of features for classifying data using a learning machine |
Applications Claiming Priority (6)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US29875701P | 2001-06-15 | 2001-06-15 | |
| US29886701P | 2001-06-15 | 2001-06-15 | |
| US29884201P | 2001-06-15 | 2001-06-15 | |
| US60/298,867 | 2001-06-15 | ||
| US60/298,757 | 2001-06-15 | ||
| US60/298,842 | 2001-06-15 |
Related Parent Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/US2002/016012 Continuation-In-Part WO2002095534A2 (fr) | 1998-05-01 | 2002-05-20 | Procedes de selection de caracteristiques dans une machine a enseigner |
Related Child Applications (3)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| US10481068 A-371-Of-International | 2002-06-17 | ||
| US11/928,606 Continuation US7921068B2 (en) | 1998-05-01 | 2007-10-30 | Data mining platform for knowledge discovery from heterogeneous data types and/or heterogeneous data sources |
| US11/928,641 Continuation US7542947B2 (en) | 1998-05-01 | 2007-10-30 | Data mining platform for bioinformatics and other knowledge discovery |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| WO2002103954A2 true WO2002103954A2 (fr) | 2002-12-27 |
| WO2002103954A3 WO2002103954A3 (fr) | 2003-04-03 |
Family
ID=27404588
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/US2002/019202 Ceased WO2002103954A2 (fr) | 1998-05-01 | 2002-06-17 | Plate-forme d'exploration de donnees en bio-informatique et autres domaines de decouverte de connaissance |
Country Status (2)
| Country | Link |
|---|---|
| AU (1) | AU2002304006A1 (fr) |
| WO (1) | WO2002103954A2 (fr) |
Cited By (6)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US7243100B2 (en) * | 2003-07-30 | 2007-07-10 | International Business Machines Corporation | Methods and apparatus for mining attribute associations |
| WO2008078293A1 (fr) * | 2006-12-22 | 2008-07-03 | International Business Machines Corporation | Procédé mis en oeuvre par ordinateur, programme d'ordinateur et système destiné à analyser des enregistrements de données |
| WO2010072382A1 (fr) * | 2008-12-22 | 2010-07-01 | Roche Diagnostics Gmbh | Système et procédé d'analyse de données génomiques |
| US10515715B1 (en) | 2019-06-25 | 2019-12-24 | Colgate-Palmolive Company | Systems and methods for evaluating compositions |
| CN112116952A (zh) * | 2020-08-06 | 2020-12-22 | 温州大学 | 基于扩散及混沌局部搜索的灰狼优化算法的基因选择方法 |
| US11521751B2 (en) * | 2020-11-13 | 2022-12-06 | Zhejiang Lab | Patient data visualization method and system for assisting decision making in chronic diseases |
Family Cites Families (10)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US6470333B1 (en) * | 1998-07-24 | 2002-10-22 | Jarg Corporation | Knowledge extraction system and method |
| US6266668B1 (en) * | 1998-08-04 | 2001-07-24 | Dryken Technologies, Inc. | System and method for dynamic data-mining and on-line communication of customized information |
| US20020052882A1 (en) * | 2000-07-07 | 2002-05-02 | Seth Taylor | Method and apparatus for visualizing complex data sets |
| CA2414421A1 (fr) * | 2000-07-31 | 2002-02-07 | Gene Logic, Inc. | Modelisation en toxicologie moleculaire |
| US7062384B2 (en) * | 2000-09-19 | 2006-06-13 | The Regents Of The University Of California | Methods for classifying high-dimensional biological data |
| EP1381971A2 (fr) * | 2000-09-27 | 2004-01-21 | Integrative Proteomics, Inc. | Analyse de donnees proteiques |
| CA2424487C (fr) * | 2000-09-28 | 2012-11-27 | Oracle Corporation | Systeme d'entreprise d'exploration en profondeur de reseau et procede |
| US20020133504A1 (en) * | 2000-10-27 | 2002-09-19 | Harry Vlahos | Integrating heterogeneous data and tools |
| CA2429824A1 (fr) * | 2000-11-28 | 2002-06-06 | Surromed, Inc. | Procedes servant a analyser de vastes ensembles de donnees afin de rechercher des marqueurs biologiques |
| US6789091B2 (en) * | 2001-05-02 | 2004-09-07 | Victor Gogolak | Method and system for web-based analysis of drug adverse effects |
-
2002
- 2002-06-17 AU AU2002304006A patent/AU2002304006A1/en not_active Abandoned
- 2002-06-17 WO PCT/US2002/019202 patent/WO2002103954A2/fr not_active Ceased
Cited By (15)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US7243100B2 (en) * | 2003-07-30 | 2007-07-10 | International Business Machines Corporation | Methods and apparatus for mining attribute associations |
| WO2008078293A1 (fr) * | 2006-12-22 | 2008-07-03 | International Business Machines Corporation | Procédé mis en oeuvre par ordinateur, programme d'ordinateur et système destiné à analyser des enregistrements de données |
| US7953677B2 (en) | 2006-12-22 | 2011-05-31 | International Business Machines Corporation | Computer-implemented method, computer program and system for analyzing data records by generalizations on redundant attributes |
| WO2010072382A1 (fr) * | 2008-12-22 | 2010-07-01 | Roche Diagnostics Gmbh | Système et procédé d'analyse de données génomiques |
| US10839941B1 (en) | 2019-06-25 | 2020-11-17 | Colgate-Palmolive Company | Systems and methods for evaluating compositions |
| US10839942B1 (en) | 2019-06-25 | 2020-11-17 | Colgate-Palmolive Company | Systems and methods for preparing a product |
| US10515715B1 (en) | 2019-06-25 | 2019-12-24 | Colgate-Palmolive Company | Systems and methods for evaluating compositions |
| US10861588B1 (en) | 2019-06-25 | 2020-12-08 | Colgate-Palmolive Company | Systems and methods for preparing compositions |
| US11315663B2 (en) | 2019-06-25 | 2022-04-26 | Colgate-Palmolive Company | Systems and methods for producing personal care products |
| US11342049B2 (en) | 2019-06-25 | 2022-05-24 | Colgate-Palmolive Company | Systems and methods for preparing a product |
| US11728012B2 (en) | 2019-06-25 | 2023-08-15 | Colgate-Palmolive Company | Systems and methods for preparing a product |
| US12165749B2 (en) | 2019-06-25 | 2024-12-10 | Colgate-Palmolive Company | Systems and methods for preparing compositions |
| CN112116952A (zh) * | 2020-08-06 | 2020-12-22 | 温州大学 | 基于扩散及混沌局部搜索的灰狼优化算法的基因选择方法 |
| CN112116952B (zh) * | 2020-08-06 | 2024-02-09 | 温州大学 | 基于扩散及混沌局部搜索的灰狼优化算法的基因选择方法 |
| US11521751B2 (en) * | 2020-11-13 | 2022-12-06 | Zhejiang Lab | Patient data visualization method and system for assisting decision making in chronic diseases |
Also Published As
| Publication number | Publication date |
|---|---|
| WO2002103954A3 (fr) | 2003-04-03 |
| AU2002304006A1 (en) | 2003-01-02 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US7444308B2 (en) | Data mining platform for bioinformatics and other knowledge discovery | |
| US8126825B2 (en) | Method for visualizing feature ranking of a subset of features for classifying data using a learning machine | |
| Srinivasu et al. | Using recurrent neural networks for predicting type-2 diabetes from genomic and tabular data | |
| JP7305656B2 (ja) | 確率分布をモデル化するためのシステムおよび方法 | |
| Guyon et al. | An introduction to variable and feature selection | |
| Soman et al. | Machine learning with SVM and other kernel methods | |
| Malley et al. | Statistical learning for biomedical data | |
| Awotunde et al. | Natural computing and unsupervised learning methods in smart healthcare data-centric operations | |
| Ni et al. | Heal: Brain-inspired hyperdimensional efficient active learning | |
| US20240303544A1 (en) | Graph database techniques for machine learning | |
| WO2002103954A2 (fr) | Plate-forme d'exploration de donnees en bio-informatique et autres domaines de decouverte de connaissance | |
| Sasirekha et al. | Identification and Classification of Leukemia Using Machine Learning Approaches | |
| AU2020101987A4 (en) | DIMA-Dataset Discovery: DATASET DISCOVERY IN DATA INVESTIGATIVE USING MACHINE LEARNING AND AI-BASED PROGRAMMING | |
| Tan et al. | Machine learning and its application to bioinformatics: an overview | |
| Bian | A Novel Regularized Orthonormalized Partial Least Squares Model for Multi-view Learning | |
| Nilsson | Nonlinear dimensionality reduction of gene expression data | |
| Pickard et al. | A Hands-On Introduction to Data Analytics for Biomedical Research | |
| Kiani | Optimization of Prediction Models for Motor Scores in Parkinson’s Disease | |
| Nasser | Classification of Iraqi Breast Cancer Dataset Based on Data Mining | |
| Young II | Disease endotypes of type 1 diabetes: Exploration through machine learning and topological data analysis | |
| Sevilla-Villanueva | A methodology for pre-post intervention studies: An application for a nutritional case study | |
| Abd Mohammed et al. | OML-GANs: An optimized multi-level generative adversarial networks model for multi-omics cancer subtype classification | |
| Firouzabad | From DNA to Gravitational Waves: Applications of Statistics and Machine Learning | |
| Sun | Improving classification performance of microarray analysis by feature selection and feature extraction methods | |
| Umar | Applications of bioinformatics in cancer detection: a lexicon of bioinformatics terms |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| AK | Designated states |
Kind code of ref document: A2 Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ OM PH PL PT RO RU SD SE SG SI SK SL TJ TM TN TR TT TZ UA UG US UZ VN YU ZA ZM ZW |
|
| AL | Designated countries for regional patents |
Kind code of ref document: A2 Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG |
|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
| REG | Reference to national code |
Ref country code: DE Ref legal event code: 8642 |
|
| 32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 69(1) EPC |
|
| 122 | Ep: pct application non-entry in european phase | ||
| ENP | Entry into the national phase |
Ref document number: 2006064415 Country of ref document: US Kind code of ref document: A1 |
|
| WWE | Wipo information: entry into national phase |
Ref document number: 10481068 Country of ref document: US |
|
| WWP | Wipo information: published in national office |
Ref document number: 10481068 Country of ref document: US |
|
| NENP | Non-entry into the national phase |
Ref country code: JP |
|
| WWW | Wipo information: withdrawn in national office |
Country of ref document: JP |