SE0002368D0 - Method and system for information extraction - Google Patents
Method and system for information extractionInfo
- Publication number
- SE0002368D0 SE0002368D0 SE0002368A SE0002368A SE0002368D0 SE 0002368 D0 SE0002368 D0 SE 0002368D0 SE 0002368 A SE0002368 A SE 0002368A SE 0002368 A SE0002368 A SE 0002368A SE 0002368 D0 SE0002368 D0 SE 0002368D0
- Authority
- SE
- Sweden
- Prior art keywords
- natural language
- analyzed
- text corpus
- variants
- word tokens
- Prior art date
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/3332—Query translation
- G06F16/3334—Selection or weighting of terms from queries, including natural language queries
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/3332—Query translation
- G06F16/3335—Syntactic pre-processing, e.g. stopword elimination, stemming
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/334—Query execution
- G06F16/3344—Query execution using natural language analysis
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
- G06F40/211—Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/253—Grammatical analysis; Style critique
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/268—Morphological analysis
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10—TECHNICAL SUBJECTS COVERED BY FORMER USPC
- Y10S—TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10S707/00—Data processing: database and file management or data structures
- Y10S707/99931—Database or file accessing
- Y10S707/99933—Query processing, i.e. searching
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10—TECHNICAL SUBJECTS COVERED BY FORMER USPC
- Y10S—TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10S707/00—Data processing: database and file management or data structures
- Y10S707/99931—Database or file accessing
- Y10S707/99933—Query processing, i.e. searching
- Y10S707/99934—Query formulation, input preparation, or translation
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10—TECHNICAL SUBJECTS COVERED BY FORMER USPC
- Y10S—TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10S707/00—Data processing: database and file management or data structures
- Y10S707/99931—Database or file accessing
- Y10S707/99933—Query processing, i.e. searching
- Y10S707/99935—Query augmenting and refining, e.g. inexact access
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- General Health & Medical Sciences (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Machine Translation (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Priority Applications (7)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| SE0002368A SE517496C2 (sv) | 2000-06-22 | 2000-06-22 | Metod och system för informationsextrahering |
| US09/599,563 US6842730B1 (en) | 2000-06-22 | 2000-06-23 | Method and system for information extraction |
| AU2001266481A AU2001266481A1 (en) | 2000-06-22 | 2001-06-20 | Method and system for information extraction |
| PCT/SE2001/001409 WO2001098946A1 (en) | 2000-06-22 | 2001-06-20 | Method and system for information extraction |
| EP01944033A EP1311983A1 (en) | 2000-06-22 | 2001-06-20 | Method and system for information extraction |
| US11/032,075 US7194406B2 (en) | 2000-06-22 | 2005-01-11 | Method and system for information extraction |
| US11/723,079 US7657425B2 (en) | 2000-06-22 | 2007-03-16 | Method and system for information extraction |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| SE0002368A SE517496C2 (sv) | 2000-06-22 | 2000-06-22 | Metod och system för informationsextrahering |
Publications (3)
| Publication Number | Publication Date |
|---|---|
| SE0002368D0 true SE0002368D0 (sv) | 2000-06-22 |
| SE0002368L SE0002368L (sv) | 2001-12-23 |
| SE517496C2 SE517496C2 (sv) | 2002-06-11 |
Family
ID=20280222
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| SE0002368A SE517496C2 (sv) | 2000-06-22 | 2000-06-22 | Metod och system för informationsextrahering |
Country Status (5)
| Country | Link |
|---|---|
| US (3) | US6842730B1 (sv) |
| EP (1) | EP1311983A1 (sv) |
| AU (1) | AU2001266481A1 (sv) |
| SE (1) | SE517496C2 (sv) |
| WO (1) | WO2001098946A1 (sv) |
Families Citing this family (64)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US7254773B2 (en) | 2000-12-29 | 2007-08-07 | International Business Machines Corporation | Automated spell analysis |
| US7831442B1 (en) * | 2001-05-16 | 2010-11-09 | Perot Systems Corporation | System and method for minimizing edits for medical insurance claims processing |
| US7822621B1 (en) | 2001-05-16 | 2010-10-26 | Perot Systems Corporation | Method of and system for populating knowledge bases using rule based systems and object-oriented software |
| US8380491B2 (en) * | 2002-04-19 | 2013-02-19 | Educational Testing Service | System for rating constructed responses based on concepts and a model answer |
| US7266553B1 (en) * | 2002-07-01 | 2007-09-04 | Microsoft Corporation | Content data indexing |
| US20040019478A1 (en) * | 2002-07-29 | 2004-01-29 | Electronic Data Systems Corporation | Interactive natural language query processing system and method |
| US7293005B2 (en) | 2004-01-26 | 2007-11-06 | International Business Machines Corporation | Pipelined architecture for global analysis and index building |
| US8296304B2 (en) | 2004-01-26 | 2012-10-23 | International Business Machines Corporation | Method, system, and program for handling redirects in a search engine |
| US7499913B2 (en) | 2004-01-26 | 2009-03-03 | International Business Machines Corporation | Method for handling anchor text |
| US7424467B2 (en) | 2004-01-26 | 2008-09-09 | International Business Machines Corporation | Architecture for an indexer with fixed width sort and variable width sort |
| US7461064B2 (en) | 2004-09-24 | 2008-12-02 | International Buiness Machines Corporation | Method for searching documents for ranges of numeric values |
| US7877383B2 (en) * | 2005-04-27 | 2011-01-25 | Microsoft Corporation | Ranking and accessing definitions of terms |
| US8417693B2 (en) | 2005-07-14 | 2013-04-09 | International Business Machines Corporation | Enforcing native access control to indexed documents |
| US8209335B2 (en) * | 2005-09-20 | 2012-06-26 | International Business Machines Corporation | Extracting informative phrases from unstructured text |
| US7895193B2 (en) * | 2005-09-30 | 2011-02-22 | Microsoft Corporation | Arbitration of specialized content using search results |
| JP2007122509A (ja) * | 2005-10-28 | 2007-05-17 | Rozetta Corp | 語句配列の自然度判定装置、方法及びプログラム |
| US7533089B2 (en) * | 2006-06-27 | 2009-05-12 | International Business Machines Corporation | Hybrid approach for query recommendation in conversation systems |
| US10796093B2 (en) | 2006-08-08 | 2020-10-06 | Elastic Minds, Llc | Automatic generation of statement-response sets from conversational text using natural language processing |
| US20080114737A1 (en) * | 2006-11-14 | 2008-05-15 | Daniel Neely | Method and system for automatically identifying users to participate in an electronic conversation |
| US20080154853A1 (en) * | 2006-12-22 | 2008-06-26 | International Business Machines Corporation | English-language translation of exact interpretations of keyword queries |
| US20080168049A1 (en) * | 2007-01-08 | 2008-07-10 | Microsoft Corporation | Automatic acquisition of a parallel corpus from a network |
| US8112402B2 (en) * | 2007-02-26 | 2012-02-07 | Microsoft Corporation | Automatic disambiguation based on a reference resource |
| US8001138B2 (en) * | 2007-04-11 | 2011-08-16 | Microsoft Corporation | Word relationship driven search |
| US8374844B2 (en) * | 2007-06-22 | 2013-02-12 | Xerox Corporation | Hybrid system for named entity resolution |
| US20090019032A1 (en) * | 2007-07-13 | 2009-01-15 | Siemens Aktiengesellschaft | Method and a system for semantic relation extraction |
| US8346756B2 (en) * | 2007-08-31 | 2013-01-01 | Microsoft Corporation | Calculating valence of expressions within documents for searching a document index |
| US8229730B2 (en) * | 2007-08-31 | 2012-07-24 | Microsoft Corporation | Indexing role hierarchies for words in a search index |
| US8209321B2 (en) * | 2007-08-31 | 2012-06-26 | Microsoft Corporation | Emphasizing search results according to conceptual meaning |
| US8229970B2 (en) * | 2007-08-31 | 2012-07-24 | Microsoft Corporation | Efficient storage and retrieval of posting lists |
| US8280721B2 (en) * | 2007-08-31 | 2012-10-02 | Microsoft Corporation | Efficiently representing word sense probabilities |
| US8316036B2 (en) | 2007-08-31 | 2012-11-20 | Microsoft Corporation | Checkpointing iterators during search |
| US8868562B2 (en) * | 2007-08-31 | 2014-10-21 | Microsoft Corporation | Identification of semantic relationships within reported speech |
| US20090070322A1 (en) * | 2007-08-31 | 2009-03-12 | Powerset, Inc. | Browsing knowledge on the basis of semantic relations |
| US8712758B2 (en) * | 2007-08-31 | 2014-04-29 | Microsoft Corporation | Coreference resolution in an ambiguity-sensitive natural language processing system |
| US8463593B2 (en) * | 2007-08-31 | 2013-06-11 | Microsoft Corporation | Natural language hypernym weighting for word sense disambiguation |
| US20090198488A1 (en) * | 2008-02-05 | 2009-08-06 | Eric Arno Vigen | System and method for analyzing communications using multi-placement hierarchical structures |
| US7925743B2 (en) * | 2008-02-29 | 2011-04-12 | Networked Insights, Llc | Method and system for qualifying user engagement with a website |
| US8224843B2 (en) | 2008-08-12 | 2012-07-17 | Morphism Llc | Collaborative, incremental specification of identities |
| US8135580B1 (en) * | 2008-08-20 | 2012-03-13 | Amazon Technologies, Inc. | Multi-language relevance-based indexing and search |
| US8370128B2 (en) * | 2008-09-30 | 2013-02-05 | Xerox Corporation | Semantically-driven extraction of relations between named entities |
| US8949265B2 (en) | 2009-03-05 | 2015-02-03 | Ebay Inc. | System and method to provide query linguistic service |
| US8843476B1 (en) * | 2009-03-16 | 2014-09-23 | Guangsheng Zhang | System and methods for automated document topic discovery, browsable search and document categorization |
| US8447632B2 (en) * | 2009-05-29 | 2013-05-21 | Hyperquest, Inc. | Automation of auditing claims |
| US8255205B2 (en) | 2009-05-29 | 2012-08-28 | Hyperquest, Inc. | Automation of auditing claims |
| US8346577B2 (en) * | 2009-05-29 | 2013-01-01 | Hyperquest, Inc. | Automation of auditing claims |
| US8073718B2 (en) | 2009-05-29 | 2011-12-06 | Hyperquest, Inc. | Automation of auditing claims |
| US9836460B2 (en) * | 2010-06-11 | 2017-12-05 | Lexisnexis, A Division Of Reed Elsevier Inc. | Systems and methods for analyzing patent-related documents |
| WO2012045492A1 (en) * | 2010-10-07 | 2012-04-12 | Dublin Institute Of Technology | Content retrieval system |
| CN101950309A (zh) * | 2010-10-08 | 2011-01-19 | 华中师范大学 | 一种面向学科领域的新专业词汇识别方法 |
| US8498972B2 (en) * | 2010-12-16 | 2013-07-30 | Sap Ag | String and sub-string searching using inverted indexes |
| US9244902B2 (en) | 2011-10-20 | 2016-01-26 | Zynga, Inc. | Localization framework for dynamic text |
| US10068024B2 (en) * | 2012-02-01 | 2018-09-04 | Sri International | Method and apparatus for correlating and viewing disparate data |
| EP2856344A1 (de) * | 2012-05-24 | 2015-04-08 | IQser IP AG | Erzeugung von anfragen an ein datenverarbeitendes system |
| US9298754B2 (en) * | 2012-11-15 | 2016-03-29 | Ecole Polytechnique Federale de Lausanne (EPFL) (027559) | Query management system and engine allowing for efficient query execution on raw details |
| JP5882241B2 (ja) * | 2013-01-08 | 2016-03-09 | 日本電信電話株式会社 | 質問応答用検索キーワード生成方法、装置、及びプログラム |
| US10073835B2 (en) * | 2013-12-03 | 2018-09-11 | International Business Machines Corporation | Detecting literary elements in literature and their importance through semantic analysis and literary correlation |
| US9721004B2 (en) | 2014-11-12 | 2017-08-01 | International Business Machines Corporation | Answering questions via a persona-based natural language processing (NLP) system |
| US10146751B1 (en) * | 2014-12-31 | 2018-12-04 | Guangsheng Zhang | Methods for information extraction, search, and structured representation of text data |
| JP6447161B2 (ja) | 2015-01-20 | 2019-01-09 | 富士通株式会社 | 意味構造検索プログラム、意味構造検索装置、及び意味構造検索方法 |
| US10289680B2 (en) * | 2016-05-31 | 2019-05-14 | Oath Inc. | Real time parsing and suggestions from pre-generated corpus with hypernyms |
| US12210824B1 (en) * | 2021-04-30 | 2025-01-28 | Now Insurance Services, Inc. | Automated information extraction from electronic documents using machine learning |
| CN114510933B (zh) * | 2022-01-13 | 2025-07-22 | 北京华通人商用信息有限公司 | 文本内容的匹配方法及装置 |
| WO2024075086A1 (en) * | 2022-10-07 | 2024-04-11 | Open Text Corporation | System and method for hybrid multilingual search indexing |
| US12254032B2 (en) | 2022-10-07 | 2025-03-18 | Open Text Corporation | System and method for hybrid multilingual search indexing |
Family Cites Families (11)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US5309359A (en) | 1990-08-16 | 1994-05-03 | Boris Katz | Method and apparatus for generating and utlizing annotations to facilitate computer text retrieval |
| JPH0756933A (ja) | 1993-06-24 | 1995-03-03 | Xerox Corp | 文書検索方法 |
| US5519608A (en) | 1993-06-24 | 1996-05-21 | Xerox Corporation | Method for extracting from a text corpus answers to questions stated in natural language by using linguistic analysis and hypothesis generation |
| US5331556A (en) * | 1993-06-28 | 1994-07-19 | General Electric Company | Method for natural language data processing using morphological and part-of-speech information |
| US5794050A (en) | 1995-01-04 | 1998-08-11 | Intelligent Text Processing, Inc. | Natural language understanding system |
| US5963940A (en) | 1995-08-16 | 1999-10-05 | Syracuse University | Natural language information retrieval system and method |
| JP2000507008A (ja) | 1996-04-04 | 2000-06-06 | フレア・テクノロジーズ・リミテッド | テキスト・ベース型情報ソースのコレクションの中の情報を捜し出すためのシステム、ソフトウエア及び方法 |
| GB9713019D0 (en) | 1997-06-20 | 1997-08-27 | Xerox Corp | Linguistic search system |
| EP0998714A1 (en) | 1997-07-22 | 2000-05-10 | Microsoft Corporation | System for processing textual inputs using natural language processing techniques |
| US5933822A (en) | 1997-07-22 | 1999-08-03 | Microsoft Corporation | Apparatus and methods for an information retrieval system that employs natural language processing of search results to improve overall precision |
| US6965857B1 (en) * | 2000-06-02 | 2005-11-15 | Cogilex Recherches & Developpement Inc. | Method and apparatus for deriving information from written text |
-
2000
- 2000-06-22 SE SE0002368A patent/SE517496C2/sv not_active IP Right Cessation
- 2000-06-23 US US09/599,563 patent/US6842730B1/en not_active Expired - Lifetime
-
2001
- 2001-06-20 WO PCT/SE2001/001409 patent/WO2001098946A1/en not_active Ceased
- 2001-06-20 EP EP01944033A patent/EP1311983A1/en not_active Ceased
- 2001-06-20 AU AU2001266481A patent/AU2001266481A1/en not_active Abandoned
-
2005
- 2005-01-11 US US11/032,075 patent/US7194406B2/en not_active Expired - Lifetime
-
2007
- 2007-03-16 US US11/723,079 patent/US7657425B2/en not_active Expired - Fee Related
Also Published As
| Publication number | Publication date |
|---|---|
| US7194406B2 (en) | 2007-03-20 |
| WO2001098946A1 (en) | 2001-12-27 |
| US20050131886A1 (en) | 2005-06-16 |
| SE517496C2 (sv) | 2002-06-11 |
| EP1311983A1 (en) | 2003-05-21 |
| SE0002368L (sv) | 2001-12-23 |
| US6842730B1 (en) | 2005-01-11 |
| US20070168181A1 (en) | 2007-07-19 |
| AU2001266481A1 (en) | 2002-01-02 |
| US7657425B2 (en) | 2010-02-02 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| SE0002368D0 (sv) | Method and system for information extraction | |
| SE0101127D0 (sv) | Method of finding answers to questions | |
| Davies | Making Google Books n-grams useful for a wide range of research on language change | |
| BR0312120A (pt) | Método para inserir texto em um dispositivo eletrônico, e, dispositivo eletrônico | |
| WO2005052727A3 (en) | Extraction of facts from text | |
| Pettersson et al. | A multilingual evaluation of three spelling normalisation methods for historical text | |
| WO2003098370A3 (en) | Document structure identifier | |
| WO2002056196A3 (en) | Creation of structured data from plain text | |
| WO2007035186A3 (en) | A method and system for the automatic recognition of deceptive language | |
| WO2001042981A3 (en) | Natural english language search and retrieval system and method | |
| Przepiórkowski et al. | Recent developments in the National Corpus of Polish | |
| Aït-Mokhtar et al. | Subject and object dependency extraction using finite-state transducers | |
| Sinha | Stepwise mining of multi-word expressions in Hindi | |
| Al-Shalabi et al. | Proper noun extracting algorithm for arabic language | |
| Bal et al. | A morphological analyzer and a stemmer for Nepali | |
| Van Peursen | A Computational Approach to Syntactic Diversity in the Hebrew Bible | |
| Isacson | To each their own letter: structure, themes, and rhetorical strategies in the letters of Ignatius of Antioch | |
| Tedla et al. | The effect of shallow segmentation on English-Tigrinya statistical machine translation | |
| Yusof et al. | Qur'anic words stemming | |
| Tripathi | Problems and prospects of Hindi language search and text processing | |
| Pettersson et al. | Rule-based normalisation of historical text–a diachronic study | |
| Rydholm | In search of the generic identity of ci poetry | |
| Uddin et al. | A step towards Torwali machine translation: an analysis of morphosyntactic challenges in a low-resource language | |
| Thao et al. | Vietnamese noun phrase chunking based on conditional random fields | |
| Authier | The Origin of Differential Object Marking and Tripartite Alignment in Udi (East Caucasian) |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| NUG | Patent has lapsed |