[go: up one dir, main page]

WO2002056197A1 - Systeme et procede pour la manipulation de documents electroniques - Google Patents

Systeme et procede pour la manipulation de documents electroniques Download PDF

Info

Publication number
WO2002056197A1
WO2002056197A1 PCT/NL2001/000013 NL0100013W WO02056197A1 WO 2002056197 A1 WO2002056197 A1 WO 2002056197A1 NL 0100013 W NL0100013 W NL 0100013W WO 02056197 A1 WO02056197 A1 WO 02056197A1
Authority
WO
WIPO (PCT)
Prior art keywords
text
document
predetermined
variables
sep
Prior art date
Application number
PCT/NL2001/000013
Other languages
English (en)
Inventor
Michaël Leonard Maria BECKERS
Barend Jan De Jong
Original Assignee
Kluwer Academic Publishers B.V.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Kluwer Academic Publishers B.V. filed Critical Kluwer Academic Publishers B.V.
Priority to PCT/NL2001/000013 priority Critical patent/WO2002056197A1/fr
Publication of WO2002056197A1 publication Critical patent/WO2002056197A1/fr

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/353Clustering; Classification into predefined classes
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

L'invention concerne un système et un procédé informatiques pour la manipulation de documents électroniques à l'aide d'un processeur (1) et d'une mémoire (5, 7, 9, 11) connectée au processeur, ce système informatique étant conçu pour effectuer les étapes consistant: (a) à recevoir un premier document et à le stocker dans ladite mémoire (5, 7, 9, 11), (b) à diviser ce premier document en une ou plusieurs parties de texte, (c) pour chaque partie de texte, à calculer combien de fois chaque variable d'une pluralité de variables est présente, (d) à utiliser un algorithme de classification prédéterminé, en utilisant au moins ledit nombre de fois que chaque variable d'une pluralité de variables est présente en tant que paramètres, pour classer chaque partie de texte en tant qu'appartenant à une classe de texte prédéterminée d'une pluralité de classes de texte prédéterminées.
PCT/NL2001/000013 2001-01-10 2001-01-10 Systeme et procede pour la manipulation de documents electroniques WO2002056197A1 (fr)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/NL2001/000013 WO2002056197A1 (fr) 2001-01-10 2001-01-10 Systeme et procede pour la manipulation de documents electroniques

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/NL2001/000013 WO2002056197A1 (fr) 2001-01-10 2001-01-10 Systeme et procede pour la manipulation de documents electroniques

Publications (1)

Publication Number Publication Date
WO2002056197A1 true WO2002056197A1 (fr) 2002-07-18

Family

ID=19760733

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/NL2001/000013 WO2002056197A1 (fr) 2001-01-10 2001-01-10 Systeme et procede pour la manipulation de documents electroniques

Country Status (1)

Country Link
WO (1) WO2002056197A1 (fr)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2008028018A1 (fr) 2006-08-30 2008-03-06 Amazon Technologies, Inc. Classification automatisée de pages de document

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2336698A (en) * 1998-04-24 1999-10-27 Dialog Corp Plc The Automatic content categorisation of text data files using subdivision to reduce false classification
US5983170A (en) * 1996-06-25 1999-11-09 Continuum Software, Inc System and method for generating semantic analysis of textual information
WO2000026795A1 (fr) * 1998-10-30 2000-05-11 Justsystem Pittsburgh Research Center, Inc. Procede de filtrage de messages sur la base du contenu, par analyse des caracteristiques des termes a l'interieur du message

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5983170A (en) * 1996-06-25 1999-11-09 Continuum Software, Inc System and method for generating semantic analysis of textual information
GB2336698A (en) * 1998-04-24 1999-10-27 Dialog Corp Plc The Automatic content categorisation of text data files using subdivision to reduce false classification
WO2000026795A1 (fr) * 1998-10-30 2000-05-11 Justsystem Pittsburgh Research Center, Inc. Procede de filtrage de messages sur la base du contenu, par analyse des caracteristiques des termes a l'interieur du message

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2008028018A1 (fr) 2006-08-30 2008-03-06 Amazon Technologies, Inc. Classification automatisée de pages de document
JP2010503075A (ja) * 2006-08-30 2010-01-28 アマゾン テクノロジーズ,インク. ドキュメントページの自動分類
US8306326B2 (en) 2006-08-30 2012-11-06 Amazon Technologies, Inc. Method and system for automatically classifying page images
US9594833B2 (en) 2006-08-30 2017-03-14 Amazon Technologies, Inc. Automatically classifying page images

Similar Documents

Publication Publication Date Title
Weiss et al. Text mining: predictive methods for analyzing unstructured information
Weiss et al. Fundamentals of predictive text mining
Chy et al. Bangla news classification using naive Bayes classifier
Witten Text Mining.
Wang et al. A machine learning based approach for table detection on the web
Duwairi Machine learning for Arabic text categorization
Khusro et al. On methods and tools of table detection, extraction and annotation in PDF documents
US7469251B2 (en) Extraction of information from documents
EP1736901B1 (fr) Procédé de classification des sous-arborescences dans des documents semi-structurés
CN114254653A (zh) 一种科技项目文本语义抽取与表示分析方法
US20120109949A1 (en) Two stage search
Hadni et al. Word sense disambiguation for Arabic text categorization.
JP4911599B2 (ja) 風評情報抽出装置及び風評情報抽出方法
JP2008165598A (ja) 風評情報抽出装置及び風評情報抽出方法
WO2009154570A1 (fr) Système et procédé d'alignement et d'indexation de documents multilingues
Kanaris et al. Learning to recognize webpage genres
US7877383B2 (en) Ranking and accessing definitions of terms
Scharkow Content analysis, automatic
EP1745396B1 (fr) Outil d'extraction d'informations dans des documents
Hull Information retrieval using statistical classification
Yurtsever et al. Figure search by text in large scale digital document collections
CN112199960B (zh) 一种标准知识元粒度解析系统
Bia et al. The Miguel de Cervantes digital library: the Hispanic voice on the web
Pembe et al. A tree-based learning approach for document structure analysis and its application to web search
Lama Clustering system based on text mining using the K-means algorithm: news headlines clustering

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CR CU CZ DE DK DM DZ EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT TZ UA UG US UZ VN YU ZA ZW

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE TR BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
REG Reference to national code

Ref country code: DE

Ref legal event code: 8642

122 Ep: pct application non-entry in european phase
NENP Non-entry into the national phase

Ref country code: JP