[go: up one dir, main page]

SE0002368D0 - Method and system for information extraction - Google Patents

Method and system for information extraction

Info

Publication number
SE0002368D0
SE0002368D0 SE0002368A SE0002368A SE0002368D0 SE 0002368 D0 SE0002368 D0 SE 0002368D0 SE 0002368 A SE0002368 A SE 0002368A SE 0002368 A SE0002368 A SE 0002368A SE 0002368 D0 SE0002368 D0 SE 0002368D0
Authority
SE
Sweden
Prior art keywords
natural language
analyzed
text corpus
variants
word tokens
Prior art date
Application number
SE0002368A
Other languages
English (en)
Other versions
SE517496C2 (sv
SE0002368L (sv
Inventor
Eva Ingegerd Ejerhed
Peter A Braroe
Original Assignee
Hapax Information Systems Ab
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hapax Information Systems Ab filed Critical Hapax Information Systems Ab
Priority to SE0002368A priority Critical patent/SE517496C2/sv
Publication of SE0002368D0 publication Critical patent/SE0002368D0/sv
Priority to US09/599,563 priority patent/US6842730B1/en
Priority to AU2001266481A priority patent/AU2001266481A1/en
Priority to PCT/SE2001/001409 priority patent/WO2001098946A1/en
Priority to EP01944033A priority patent/EP1311983A1/en
Publication of SE0002368L publication Critical patent/SE0002368L/sv
Publication of SE517496C2 publication Critical patent/SE517496C2/sv
Priority to US11/032,075 priority patent/US7194406B2/en
Priority to US11/723,079 priority patent/US7657425B2/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/3332Query translation
    • G06F16/3334Selection or weighting of terms from queries, including natural language queries
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/3332Query translation
    • G06F16/3335Syntactic pre-processing, e.g. stopword elimination, stemming
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/211Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/253Grammatical analysis; Style critique
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/268Morphological analysis
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10TECHNICAL SUBJECTS COVERED BY FORMER USPC
    • Y10STECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10S707/00Data processing: database and file management or data structures
    • Y10S707/99931Database or file accessing
    • Y10S707/99933Query processing, i.e. searching
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10TECHNICAL SUBJECTS COVERED BY FORMER USPC
    • Y10STECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10S707/00Data processing: database and file management or data structures
    • Y10S707/99931Database or file accessing
    • Y10S707/99933Query processing, i.e. searching
    • Y10S707/99934Query formulation, input preparation, or translation
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10TECHNICAL SUBJECTS COVERED BY FORMER USPC
    • Y10STECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10S707/00Data processing: database and file management or data structures
    • Y10S707/99931Database or file accessing
    • Y10S707/99933Query processing, i.e. searching
    • Y10S707/99935Query augmenting and refining, e.g. inexact access

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Machine Translation (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
SE0002368A 2000-06-22 2000-06-22 Metod och system för informationsextrahering SE517496C2 (sv)

Priority Applications (7)

Application Number Priority Date Filing Date Title
SE0002368A SE517496C2 (sv) 2000-06-22 2000-06-22 Metod och system för informationsextrahering
US09/599,563 US6842730B1 (en) 2000-06-22 2000-06-23 Method and system for information extraction
AU2001266481A AU2001266481A1 (en) 2000-06-22 2001-06-20 Method and system for information extraction
PCT/SE2001/001409 WO2001098946A1 (en) 2000-06-22 2001-06-20 Method and system for information extraction
EP01944033A EP1311983A1 (en) 2000-06-22 2001-06-20 Method and system for information extraction
US11/032,075 US7194406B2 (en) 2000-06-22 2005-01-11 Method and system for information extraction
US11/723,079 US7657425B2 (en) 2000-06-22 2007-03-16 Method and system for information extraction

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
SE0002368A SE517496C2 (sv) 2000-06-22 2000-06-22 Metod och system för informationsextrahering

Publications (3)

Publication Number Publication Date
SE0002368D0 true SE0002368D0 (sv) 2000-06-22
SE0002368L SE0002368L (sv) 2001-12-23
SE517496C2 SE517496C2 (sv) 2002-06-11

Family

ID=20280222

Family Applications (1)

Application Number Title Priority Date Filing Date
SE0002368A SE517496C2 (sv) 2000-06-22 2000-06-22 Metod och system för informationsextrahering

Country Status (5)

Country Link
US (3) US6842730B1 (sv)
EP (1) EP1311983A1 (sv)
AU (1) AU2001266481A1 (sv)
SE (1) SE517496C2 (sv)
WO (1) WO2001098946A1 (sv)

Families Citing this family (64)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7254773B2 (en) 2000-12-29 2007-08-07 International Business Machines Corporation Automated spell analysis
US7831442B1 (en) * 2001-05-16 2010-11-09 Perot Systems Corporation System and method for minimizing edits for medical insurance claims processing
US7822621B1 (en) 2001-05-16 2010-10-26 Perot Systems Corporation Method of and system for populating knowledge bases using rule based systems and object-oriented software
US8380491B2 (en) * 2002-04-19 2013-02-19 Educational Testing Service System for rating constructed responses based on concepts and a model answer
US7266553B1 (en) * 2002-07-01 2007-09-04 Microsoft Corporation Content data indexing
US20040019478A1 (en) * 2002-07-29 2004-01-29 Electronic Data Systems Corporation Interactive natural language query processing system and method
US7293005B2 (en) 2004-01-26 2007-11-06 International Business Machines Corporation Pipelined architecture for global analysis and index building
US8296304B2 (en) 2004-01-26 2012-10-23 International Business Machines Corporation Method, system, and program for handling redirects in a search engine
US7499913B2 (en) 2004-01-26 2009-03-03 International Business Machines Corporation Method for handling anchor text
US7424467B2 (en) 2004-01-26 2008-09-09 International Business Machines Corporation Architecture for an indexer with fixed width sort and variable width sort
US7461064B2 (en) 2004-09-24 2008-12-02 International Buiness Machines Corporation Method for searching documents for ranges of numeric values
US7877383B2 (en) * 2005-04-27 2011-01-25 Microsoft Corporation Ranking and accessing definitions of terms
US8417693B2 (en) 2005-07-14 2013-04-09 International Business Machines Corporation Enforcing native access control to indexed documents
US8209335B2 (en) * 2005-09-20 2012-06-26 International Business Machines Corporation Extracting informative phrases from unstructured text
US7895193B2 (en) * 2005-09-30 2011-02-22 Microsoft Corporation Arbitration of specialized content using search results
JP2007122509A (ja) * 2005-10-28 2007-05-17 Rozetta Corp 語句配列の自然度判定装置、方法及びプログラム
US7533089B2 (en) * 2006-06-27 2009-05-12 International Business Machines Corporation Hybrid approach for query recommendation in conversation systems
US10796093B2 (en) 2006-08-08 2020-10-06 Elastic Minds, Llc Automatic generation of statement-response sets from conversational text using natural language processing
US20080114737A1 (en) * 2006-11-14 2008-05-15 Daniel Neely Method and system for automatically identifying users to participate in an electronic conversation
US20080154853A1 (en) * 2006-12-22 2008-06-26 International Business Machines Corporation English-language translation of exact interpretations of keyword queries
US20080168049A1 (en) * 2007-01-08 2008-07-10 Microsoft Corporation Automatic acquisition of a parallel corpus from a network
US8112402B2 (en) * 2007-02-26 2012-02-07 Microsoft Corporation Automatic disambiguation based on a reference resource
US8001138B2 (en) * 2007-04-11 2011-08-16 Microsoft Corporation Word relationship driven search
US8374844B2 (en) * 2007-06-22 2013-02-12 Xerox Corporation Hybrid system for named entity resolution
US20090019032A1 (en) * 2007-07-13 2009-01-15 Siemens Aktiengesellschaft Method and a system for semantic relation extraction
US8346756B2 (en) * 2007-08-31 2013-01-01 Microsoft Corporation Calculating valence of expressions within documents for searching a document index
US8229730B2 (en) * 2007-08-31 2012-07-24 Microsoft Corporation Indexing role hierarchies for words in a search index
US8209321B2 (en) * 2007-08-31 2012-06-26 Microsoft Corporation Emphasizing search results according to conceptual meaning
US8229970B2 (en) * 2007-08-31 2012-07-24 Microsoft Corporation Efficient storage and retrieval of posting lists
US8280721B2 (en) * 2007-08-31 2012-10-02 Microsoft Corporation Efficiently representing word sense probabilities
US8316036B2 (en) 2007-08-31 2012-11-20 Microsoft Corporation Checkpointing iterators during search
US8868562B2 (en) * 2007-08-31 2014-10-21 Microsoft Corporation Identification of semantic relationships within reported speech
US20090070322A1 (en) * 2007-08-31 2009-03-12 Powerset, Inc. Browsing knowledge on the basis of semantic relations
US8712758B2 (en) * 2007-08-31 2014-04-29 Microsoft Corporation Coreference resolution in an ambiguity-sensitive natural language processing system
US8463593B2 (en) * 2007-08-31 2013-06-11 Microsoft Corporation Natural language hypernym weighting for word sense disambiguation
US20090198488A1 (en) * 2008-02-05 2009-08-06 Eric Arno Vigen System and method for analyzing communications using multi-placement hierarchical structures
US7925743B2 (en) * 2008-02-29 2011-04-12 Networked Insights, Llc Method and system for qualifying user engagement with a website
US8224843B2 (en) 2008-08-12 2012-07-17 Morphism Llc Collaborative, incremental specification of identities
US8135580B1 (en) * 2008-08-20 2012-03-13 Amazon Technologies, Inc. Multi-language relevance-based indexing and search
US8370128B2 (en) * 2008-09-30 2013-02-05 Xerox Corporation Semantically-driven extraction of relations between named entities
US8949265B2 (en) 2009-03-05 2015-02-03 Ebay Inc. System and method to provide query linguistic service
US8843476B1 (en) * 2009-03-16 2014-09-23 Guangsheng Zhang System and methods for automated document topic discovery, browsable search and document categorization
US8447632B2 (en) * 2009-05-29 2013-05-21 Hyperquest, Inc. Automation of auditing claims
US8255205B2 (en) 2009-05-29 2012-08-28 Hyperquest, Inc. Automation of auditing claims
US8346577B2 (en) * 2009-05-29 2013-01-01 Hyperquest, Inc. Automation of auditing claims
US8073718B2 (en) 2009-05-29 2011-12-06 Hyperquest, Inc. Automation of auditing claims
US9836460B2 (en) * 2010-06-11 2017-12-05 Lexisnexis, A Division Of Reed Elsevier Inc. Systems and methods for analyzing patent-related documents
WO2012045492A1 (en) * 2010-10-07 2012-04-12 Dublin Institute Of Technology Content retrieval system
CN101950309A (zh) * 2010-10-08 2011-01-19 华中师范大学 一种面向学科领域的新专业词汇识别方法
US8498972B2 (en) * 2010-12-16 2013-07-30 Sap Ag String and sub-string searching using inverted indexes
US9244902B2 (en) 2011-10-20 2016-01-26 Zynga, Inc. Localization framework for dynamic text
US10068024B2 (en) * 2012-02-01 2018-09-04 Sri International Method and apparatus for correlating and viewing disparate data
EP2856344A1 (de) * 2012-05-24 2015-04-08 IQser IP AG Erzeugung von anfragen an ein datenverarbeitendes system
US9298754B2 (en) * 2012-11-15 2016-03-29 Ecole Polytechnique Federale de Lausanne (EPFL) (027559) Query management system and engine allowing for efficient query execution on raw details
JP5882241B2 (ja) * 2013-01-08 2016-03-09 日本電信電話株式会社 質問応答用検索キーワード生成方法、装置、及びプログラム
US10073835B2 (en) * 2013-12-03 2018-09-11 International Business Machines Corporation Detecting literary elements in literature and their importance through semantic analysis and literary correlation
US9721004B2 (en) 2014-11-12 2017-08-01 International Business Machines Corporation Answering questions via a persona-based natural language processing (NLP) system
US10146751B1 (en) * 2014-12-31 2018-12-04 Guangsheng Zhang Methods for information extraction, search, and structured representation of text data
JP6447161B2 (ja) 2015-01-20 2019-01-09 富士通株式会社 意味構造検索プログラム、意味構造検索装置、及び意味構造検索方法
US10289680B2 (en) * 2016-05-31 2019-05-14 Oath Inc. Real time parsing and suggestions from pre-generated corpus with hypernyms
US12210824B1 (en) * 2021-04-30 2025-01-28 Now Insurance Services, Inc. Automated information extraction from electronic documents using machine learning
CN114510933B (zh) * 2022-01-13 2025-07-22 北京华通人商用信息有限公司 文本内容的匹配方法及装置
WO2024075086A1 (en) * 2022-10-07 2024-04-11 Open Text Corporation System and method for hybrid multilingual search indexing
US12254032B2 (en) 2022-10-07 2025-03-18 Open Text Corporation System and method for hybrid multilingual search indexing

Family Cites Families (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5309359A (en) 1990-08-16 1994-05-03 Boris Katz Method and apparatus for generating and utlizing annotations to facilitate computer text retrieval
JPH0756933A (ja) 1993-06-24 1995-03-03 Xerox Corp 文書検索方法
US5519608A (en) 1993-06-24 1996-05-21 Xerox Corporation Method for extracting from a text corpus answers to questions stated in natural language by using linguistic analysis and hypothesis generation
US5331556A (en) * 1993-06-28 1994-07-19 General Electric Company Method for natural language data processing using morphological and part-of-speech information
US5794050A (en) 1995-01-04 1998-08-11 Intelligent Text Processing, Inc. Natural language understanding system
US5963940A (en) 1995-08-16 1999-10-05 Syracuse University Natural language information retrieval system and method
JP2000507008A (ja) 1996-04-04 2000-06-06 フレア・テクノロジーズ・リミテッド テキスト・ベース型情報ソースのコレクションの中の情報を捜し出すためのシステム、ソフトウエア及び方法
GB9713019D0 (en) 1997-06-20 1997-08-27 Xerox Corp Linguistic search system
EP0998714A1 (en) 1997-07-22 2000-05-10 Microsoft Corporation System for processing textual inputs using natural language processing techniques
US5933822A (en) 1997-07-22 1999-08-03 Microsoft Corporation Apparatus and methods for an information retrieval system that employs natural language processing of search results to improve overall precision
US6965857B1 (en) * 2000-06-02 2005-11-15 Cogilex Recherches & Developpement Inc. Method and apparatus for deriving information from written text

Also Published As

Publication number Publication date
US7194406B2 (en) 2007-03-20
WO2001098946A1 (en) 2001-12-27
US20050131886A1 (en) 2005-06-16
SE517496C2 (sv) 2002-06-11
EP1311983A1 (en) 2003-05-21
SE0002368L (sv) 2001-12-23
US6842730B1 (en) 2005-01-11
US20070168181A1 (en) 2007-07-19
AU2001266481A1 (en) 2002-01-02
US7657425B2 (en) 2010-02-02

Similar Documents

Publication Publication Date Title
SE0002368D0 (sv) Method and system for information extraction
SE0101127D0 (sv) Method of finding answers to questions
Davies Making Google Books n-grams useful for a wide range of research on language change
BR0312120A (pt) Método para inserir texto em um dispositivo eletrônico, e, dispositivo eletrônico
WO2005052727A3 (en) Extraction of facts from text
Pettersson et al. A multilingual evaluation of three spelling normalisation methods for historical text
WO2003098370A3 (en) Document structure identifier
WO2002056196A3 (en) Creation of structured data from plain text
WO2007035186A3 (en) A method and system for the automatic recognition of deceptive language
WO2001042981A3 (en) Natural english language search and retrieval system and method
Przepiórkowski et al. Recent developments in the National Corpus of Polish
Aït-Mokhtar et al. Subject and object dependency extraction using finite-state transducers
Sinha Stepwise mining of multi-word expressions in Hindi
Al-Shalabi et al. Proper noun extracting algorithm for arabic language
Bal et al. A morphological analyzer and a stemmer for Nepali
Van Peursen A Computational Approach to Syntactic Diversity in the Hebrew Bible
Isacson To each their own letter: structure, themes, and rhetorical strategies in the letters of Ignatius of Antioch
Tedla et al. The effect of shallow segmentation on English-Tigrinya statistical machine translation
Yusof et al. Qur'anic words stemming
Tripathi Problems and prospects of Hindi language search and text processing
Pettersson et al. Rule-based normalisation of historical text–a diachronic study
Rydholm In search of the generic identity of ci poetry
Uddin et al. A step towards Torwali machine translation: an analysis of morphosyntactic challenges in a low-resource language
Thao et al. Vietnamese noun phrase chunking based on conditional random fields
Authier The Origin of Differential Object Marking and Tripartite Alignment in Udi (East Caucasian)

Legal Events

Date Code Title Description
NUG Patent has lapsed