CN101169780A - A Semantic Ontology-Based Retrieval System and Method - Google Patents
A Semantic Ontology-Based Retrieval System and Method Download PDFInfo
- Publication number
- CN101169780A CN101169780A CNA2006101498039A CN200610149803A CN101169780A CN 101169780 A CN101169780 A CN 101169780A CN A2006101498039 A CNA2006101498039 A CN A2006101498039A CN 200610149803 A CN200610149803 A CN 200610149803A CN 101169780 A CN101169780 A CN 101169780A
- Authority
- CN
- China
- Prior art keywords
- semantic
- text
- index
- ontology
- semantic ontology
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
本发明的实施例公开了一种基于语义本体的检索系统,该系统包括语义本体索引数据库和语义本体索引处理单元。语义本体搜索处理单元获取文本命中文件列表,并将文本命中文件列表与语义本体索引数据库中的语义本体索引进行匹配处理,得到文档语义分类表。使得该检索系统能够识别待检索文件的语义信息,并且使搜索结果呈现出了语义的分类结果。本发明的实施例还公开了一种基于语义本体的检索方法,该方法先为已建立文本索引的文件建立语义本体索引,在用户进行搜索时,对文本匹配结果进行语义本体索引匹配处理,使得最后的输出结果在传统的文本匹配结果上呈现出了语义的分类,方便了用户的查询。
The embodiment of the invention discloses a semantic ontology-based retrieval system, which includes a semantic ontology index database and a semantic ontology index processing unit. The semantic ontology search processing unit obtains the text hit file list, and matches the text hit file list with the semantic ontology index in the semantic ontology index database to obtain the document semantic classification table. This enables the retrieval system to identify the semantic information of the files to be retrieved, and makes the search results present semantic classification results. The embodiment of the present invention also discloses a semantic ontology-based retrieval method. The method first establishes a semantic ontology index for a file with an established text index, and performs semantic ontology index matching processing on the text matching result when the user performs a search, so that The final output results present a semantic classification on the traditional text matching results, which is convenient for users to query.
Description
技术领域 technical field
本发明涉及信息检索技术,特别涉及一种基于语义本体的检索系统和方法。The invention relates to information retrieval technology, in particular to a semantic ontology-based retrieval system and method.
背景技术 Background technique
随着检索技术的飞速发展,基于文本的信息检索技术也逐渐趋于成熟,形成了一套完整的思路和完善的算法,并被广泛应用到了各类搜索引擎中,如谷歌(Google)、AltaVista、Lycos、雅虎(Yahoo)等。With the rapid development of retrieval technology, text-based information retrieval technology has gradually matured, forming a complete set of ideas and perfect algorithms, and has been widely used in various search engines, such as Google (Google), AltaVista, etc. , Lycos, Yahoo (Yahoo), etc.
图1为现有的一种文本搜索引擎的结构框图。如图1所示,现有的文本搜索引擎包括:蜘蛛控制模块101、统一资源定位(URL)数据库102、网络蜘蛛103、URL提取模块104、网页数据库105、链接信息提取模块106、文本索引模块107、链接数据库108、索引数据库109、网页评级模块110和查询服务器111。Fig. 1 is a structural block diagram of an existing text search engine. As shown in Figure 1, existing text search engine comprises:
网络蜘蛛103从互联网上抓取网页,并把网页送入网页数据库105。URL提取模块104从网络蜘蛛103抓取的网页中提取URL,并把URL送入URL数据库102。蜘蛛控制模块101从URL数据库102获取网页的URL,并控制网络蜘蛛103抓取其它网页,重复上述步骤直到把所有的网页抓取完。The
系统从网页数据库105中获取文本信息,并送入文本索引模块107,由文本索引模块107建立索引,再送入索引数据库109。同时链接信息提取模块106从网页数据库105中获取链接信息,并送入链接数据库108。链接数据库108中的链接信息为网页评级模块110提供网页评级的依据。The system acquires text information from the
当用户通过查询服务器111提交查询请求时,查询服务器111在索引数据库109中查找与用户查询请求相关的网页,同时网页评级模块110把用户查询请求和链接数据库108中的链接信息结合起来对搜索结果进行相关度的评价,并通过查询服务器111对搜索结果按照其相关度进行排序,组织最后的页面返回给用户。When a user submits a query request through the
现有的文本检索技术虽然能搜索到包含用户的文本查询信息的文件,但是无法识别出搜索到的文件的内容及意义。这是因为现有的文本检索技术是基于文本字符串匹配的,这种检索技术的问题是,当不同的词可以表示相同的意义或一个词在不同的语境中有不同的意义时,将会限制检索的查准率和查全率,导致搜索到的结果远远不能满足用户的需求,例如,当用户的搜索关键词为“天堂”时,无法判断符合用户搜索条件的文件是反映“天堂游戏”还是“天堂音乐”的内容。而语义网的提出为解决这些问题提供了契机。Although the existing text retrieval technology can search for files containing the user's text query information, it cannot identify the content and meaning of the searched files. This is because the existing text retrieval technology is based on text string matching. The problem with this retrieval technology is that when different words can represent the same meaning or a word has different meanings in different contexts, it will It will limit the precision rate and recall rate of the search, resulting in the search results far from meeting the user's needs. For example, when the user's search keyword is "heaven", it is impossible to judge whether the files that meet the user's search criteria reflect " Paradise Game" or "Paradise Music" content. The proposal of Semantic Web provides an opportunity to solve these problems.
语义网是由一群能够被计算机自动控制和识别其内容的网页构成的网络,是在现有的互联网基础上,为网页扩展计算机能够识别的数据,并增加专供计算机使用的文档,即用本体论语言对网页进行标注,明确其语义,从而使得网页信息不但被人所理解,也能被计算机自动控制和识别。语义标注的网页一般以可扩展标记语言(XML)或超文本置标语言(Html)为数据做标注,以资源描述框架(RDF)作为数据描述模型,并结合语义本体,使被标注的数据具有明确的语义。本体是一个源于哲学的概念,原意是指关于存在及其本质和规律的学说,后被人工智能领域引入,特指对概念化的一个显式的规格说明。本体能够将领域中的各种概念及相互关系显式地、形式化地表达出来,从而将术语的语义显式地表达出来,因而在语义查询方面发挥着重要的作用。这里指的语义本体定义了组成主体领域概念的基本术语和它们之间的关系,并规定了组合基本术语和它们之间的关系定义词汇的外延规则。The Semantic Web is a network composed of a group of webpages that can be automatically controlled and identified by computers. It expands the data that computers can recognize for webpages on the basis of the existing Internet, and adds documents for computer use, that is, ontology On the language to label web pages and clarify their semantics, so that web page information can not only be understood by humans, but also automatically controlled and recognized by computers. Semantically annotated webpages generally use Extensible Markup Language (XML) or Hypertext Markup Language (Html) as data for annotation, Resource Description Framework (RDF) as data description model, combined with semantic ontology, so that the annotated data has clear semantics. Ontology is a concept derived from philosophy. Its original meaning refers to the theory of existence, its essence and laws. It was later introduced by the field of artificial intelligence, specifically referring to an explicit specification of conceptualization. Ontology can explicitly and formally express various concepts and interrelationships in the domain, thereby explicitly expressing the semantics of terms, so it plays an important role in semantic query. The semantic ontology referred to here defines the basic terms that make up the concept of the subject domain and the relationship between them, and stipulates the extension rules for combining the basic terms and the relationship between them to define the vocabulary.
语义检索的目的是通过从语义网上获取的数据,增强并改进传统的搜索结果。图2是现有的一种语义搜索系统的结构框图。如图2所示,现有的语义搜索系统包括:查询接口201、查询预处理模块202、语义本体推理引擎203、标注本体库204、传统搜索模块205和结果返回接口206。The purpose of semantic retrieval is to enhance and improve traditional search results through data obtained from the Semantic Web. Fig. 2 is a structural block diagram of an existing semantic search system. As shown in FIG. 2 , the existing semantic search system includes:
查询接口201获取用户的查询信息,将其发送给查询预处理模块202。The
查询预处理模块202分析用户的查询信息,通过切分词技术,将其切分成查询关键词,并发送给语义本体推理引擎203。The query preprocessing
语义本体推理引擎203根据标注本体库204中定义的本体概念词汇及概念与概念之间的关系,匹配推理出查询关键词所对应的本体概念词汇,并将其返回给查询预处理模块202。The semantic
查询预处理模块202将语义本体推理引擎203返回的本体概念词汇发送给传统搜索模块205,并指明按照语义搜索。这里按照语义搜索是指在网页已被标注语义的情况下,按照网页标注的语义概念进行字符串匹配,而不是直接对网页自身的内容进行字符串匹配。The query preprocessing
传统搜索模块205进行语义搜索,并将搜索结果发送给结果返回接口206。结果返回接口206再将搜索结果返回给用户。The
可以看出,上述语义搜索系统是将用户查询关键词与标注网页的语义概念词汇进行匹配。It can be seen that the above-mentioned semantic search system matches the user's query keywords with the semantic concept vocabulary of the marked webpage.
综上所述,现有的文本检索技术虽然能搜索到包含查询关键词的文件,但无法识别出搜索到的文件的语义信息;而现有的语义检索技术不再做关键词检索,导致搜索到的文件包含太多与用户查询信息不相符的结果,而且基于用户查询关键词与语义概念词汇的匹配效率也不尽如人意。所以,现有的检索技术的搜索准确度不高。To sum up, although the existing text retrieval technology can search for files containing query keywords, it cannot identify the semantic information of the searched files; and the existing semantic retrieval technology no longer performs keyword retrieval, resulting in The obtained files contain too many results that do not match the user's query information, and the matching efficiency based on user query keywords and semantic concept vocabulary is not satisfactory. Therefore, the search accuracy of the existing retrieval technology is not high.
发明内容 Contents of the invention
有鉴于此,本发明实施例的主要目的在于提供一种基于语义本体的检索系统,以提高搜索的准确度。In view of this, the main purpose of the embodiments of the present invention is to provide a semantic ontology-based retrieval system to improve search accuracy.
本发明实施例的另一个目的在于提供一种基于语义本体的检索方法,以提高搜索的准确度。Another object of the embodiments of the present invention is to provide a semantic ontology-based retrieval method to improve search accuracy.
为达到上述目的,本发明的技术方案是这样实现的:In order to achieve the above object, technical solution of the present invention is achieved in that way:
本发明实施例公开了一种基于语义本体的检索系统,该系统包括:The embodiment of the present invention discloses a semantic ontology-based retrieval system, which includes:
语义本体索引数据库,用于保存语义本体索引;Semantic ontology index database, used to save semantic ontology index;
语义本体搜索处理单元,用于获取文本命中文件列表,并将文本命中文件列表与语义本体索引数据库中的语义本体索引进行匹配处理,得到文档语义分类表。The semantic ontology search processing unit is configured to obtain a text hit file list, and match the text hit file list with the semantic ontology index in the semantic ontology index database to obtain a document semantic classification table.
本发明实施例还公开了一种基于语义本体的检索方法,该方法包括以下步骤:The embodiment of the present invention also discloses a semantic ontology-based retrieval method, which includes the following steps:
A、获取已建立文本索引的文件,并为获取的文件建立语义本体索引;A. Obtain the files with established text indexes, and establish semantic ontology indexes for the obtained files;
B、获取文本命中文件列表,对文本命中文件列表进行语义本体索引匹配处理,得到文档语义分类表。B. Obtain a list of text hit files, perform semantic ontology index matching processing on the list of text hit files, and obtain a document semantic classification table.
因此,本发明实施例提供的基于语义本体的检索系统和方法,具有以下优点:先为已建立文本索引的文件建立语义本体索引,在用户搜索时,对用户输入的文本查询信息进行文本索引匹配处理得到文本命中文件列表,再对文本命中文件列表进行语义本体索引匹配处理,得到文档语义分类表,使得文本检索结果具有了语义分类信息,提高了搜索的准确度。Therefore, the semantic ontology-based retrieval system and method provided by the embodiments of the present invention have the following advantages: Firstly, a semantic ontology index is established for a file with an established text index, and when the user searches, the text index matching is performed on the text query information input by the user The text hit file list is obtained through processing, and then the semantic ontology index matching process is performed on the text hit file list to obtain the document semantic classification table, so that the text retrieval results have semantic classification information, and the search accuracy is improved.
附图说明 Description of drawings
图1是现有的文本搜索引擎的结构框图;Fig. 1 is the structural block diagram of existing text search engine;
图2是现有的语义搜索系统的结构框图;Fig. 2 is a structural block diagram of an existing semantic search system;
图3是本发明实施例一种基于语义本体的检索系统的结构框图;Fig. 3 is a structural block diagram of a semantic ontology-based retrieval system according to an embodiment of the present invention;
图4是本发明实施例中的语义本体索引处理单元建立语义本体索引的流程图;Fig. 4 is a flow chart of establishing a semantic ontology index by a semantic ontology index processing unit in an embodiment of the present invention;
图5是图3所示的本发明实施例检索系统为用户执行搜索过程的流程图;Fig. 5 is a flow chart of the retrieval system of the embodiment of the present invention shown in Fig. 3 performing a search process for a user;
图6是本发明实施例定义的两个资源描述示意图;FIG. 6 is a schematic diagram of two resource descriptions defined by the embodiment of the present invention;
图7是由图6推理出的结果示意图;Fig. 7 is a schematic diagram of the results deduced from Fig. 6;
图8是本发明实施例中的标注本体库为对实施例中的语义本体词汇建立的关系图;Fig. 8 is a relation diagram established for the semantic ontology vocabulary in the embodiment by the annotation ontology library in the embodiment of the present invention;
图9是图8中的语义本体词汇经过推理后的关系图。FIG. 9 is a relationship diagram of the semantic ontology vocabulary in FIG. 8 after reasoning.
具体实施方式 Detailed ways
为使本发明的目的、技术方案和优点更加清楚,下面结合附图及具体实施例对本发明作进一步地详细描述。In order to make the purpose, technical solution and advantages of the present invention clearer, the present invention will be further described in detail below in conjunction with the accompanying drawings and specific embodiments.
图3是本发明实施例一种基于语义本体的检索系统的结构框图。如图3所示,该系统包括:搜索接口模块301、文档语义分类规则引擎302、搜索处理模块303、语义本体推理引擎304、标注本体库305、索引数据库306、索引处理模块307、文件数据库308和网络文件抓取模块309。其中,搜索处理模块303包括:文本搜索处理单元310、语义本体搜索处理单元311和排序处理单元312;索引数据库306包括:文本索引315和语义本体索引316;索引处理模块包括:文本索引处理单元313和语义本体索引处理单元314。Fig. 3 is a structural block diagram of a semantic ontology-based retrieval system according to an embodiment of the present invention. As shown in Figure 3, the system includes: a
网络文件抓取模块309主要负责从互联网上抓取网页,并将抓取的网页保存到文件数据库308中。网络文件抓取模块309一般是通过网页抓取程序,例如“网络机器人”或“网络蜘蛛”等,遍历网页空间,扫描一定网际协议(IP)地址范围内的网站,并沿着网络上的链接从一个网页到另一个网页,从一个网站到另一个网站,采集网络文件。The network
文件数据库308用于存储供用户检索的文件,包括音频文件、视频文件和文本文件。这些文件可以是网络文件,也可以是非网络文件。文件数据库308中的每一个文件都有一个唯一的文件标识(DocID)。The
索引处理模块307主要负责对已保存在文件数据库308中的文件进行分析,提取出文件内容的关键词、消除重复的文件等,为文件数据库308中的文件建立不同类型的索引信息。索引处理模块307包括文本索引处理单元313和语义本体索引处理单元314。The
文本索引处理单元313是传统的建立文本索引的处理单元,通过分析文件内容,提取关键词和文件的标识信息,建立文本索引。鉴于传统的文本索引建立流程是成熟的现有技术,这里不再复述。The text
语义本体索引处理单元314负责为已建立文本索引的文件建立语义本体索引。首先分析已经建立文本索引的文件,判断其是否含有语义标注信息,如果某个文件含有语义标注信息,则提取相关的语义标注信息和文件标识信息,建立该文件的语义本体索引。The semantic ontology
索引数据库306用来保存索引处理模块307建立的索引信息,即保存文本索引处理单元313建立的文本索引315和语义本体索引处理单元314建立的语义本体索引316。The
搜索处理模块303负责处理用户的查询请求,通过匹配用户的文本查询信息和文件的索引信息,将符合用户查询条件的文件以一定的排序顺序反馈给用户。搜索处理模块303包括文本搜索处理单元310、语义本体搜索处理单元311和排序处理单元312。The
文本搜索处理单元310负责将用户输入的文本查询信息与文本索引315进行匹配,查询出符合用户查询条件的文本命中文件标识信息。The text
语义本体搜索处理单元311负责把文本搜索处理单元310得出的文本命中文件标识信息与语义本体索引316进行匹配处理,对这些文本命中文件标识信息进行语义分类,得到文档语义分类表。The semantic ontology
标注本体库305和语义本体推理引擎304负责对语义本体搜索处理单元311所产生的文档语义分类表中的本体概念词汇集进行语义推理,得到扩展的语义本体词汇集。其中标注本体库305保存了定义的语义本体概念词汇集及其语义本体概念之间的关系,语义本体推理引擎304定义了推理规则并执行推理操作。The
文档语义分类规则引擎302根据语义本体推理引擎304推理出的情况,触发自身定义的语义分类规则,对文档语义分类表进行扩展整合。The document semantic
排序处理单元312负责最后结果的排序优化,即对经过一系列处理,如文本索引匹配、语义本体索引匹配和语义推理扩展等,得到的语义文档分类表,计算其文档的相关性和重要性,并根据计算结果将搜索到的文件排序反馈给搜索接口模块301。The sorting
搜索接口模块301负责本系统和用户的交互操作,将用户输入的文本查询信息转发给搜索处理模块303;并将排序处理单元312的排序结果反馈给用户。The
索引数据库306保存的文本索引315包括文本正向索引和文本倒排索引。表1是文本正向索引表,表2是文本倒排索引表,如表1和表2所示:The
表1Table 1
表2Table 2
从以上两个表格可以看出,文本正向索引是以文件标识为键值,建立文件标识与关键词之间的映射关系;而文本倒排索引以关键词为键值,建立关键词与文件标识之间的映射关系。As can be seen from the above two tables, the text forward index uses the file identifier as the key value to establish the mapping relationship between the file identifier and the keyword; while the text inverted index uses the keyword as the key value to establish the keyword and file The mapping relationship between identifiers.
同样,索引数据库306保存的语义本体索引315包括语义本体正向索引和语义本体倒排索引。表3是语义本体正向索引表,表4语义本体倒排索引表,如表3和表4所示:Similarly, the
表3table 3
表4Table 4
语义本体正向索引是以文件标识为键值,建立文件标识与语义标识之间的映射关系;而语义本体倒排索引以语义标识为键值,建立语义标识与文件标识之间的映射关系。Semantic Ontology Forward Index uses document identifiers as key values to establish the mapping relationship between document identifiers and semantic identifiers; while Semantic Ontology Inverted Index uses semantic identifiers as key values to establish the mapping relationship between semantic identifiers and document identifiers.
图4是本发明实施例中的语义本体索引处理单元314建立语义本体索引316的流程图。语义本体索引的建立流程是在文本索引处理单元建立了文本索引的基础上进行的,其执行触发条件是文本索引处理单元313已经对某个文件建立了文本索引。参见图4,语义本体索引的建立流程包括以下步骤:FIG. 4 is a flow chart of establishing a
步骤401,语义本体索引处理单元314首先读取经过文本索引处理单元313处理,建立了文本索引的文件。In
步骤402,语义本体索引处理单元314判断所读取的文件是否被标注了语义标记。如果该文件标注了语义标记,执行步骤403,否则结束对该文件建立语义本体索引的流程。In
语义标注的文件与没有经过语义标注的文件之间的不同之处在于,语义标注的文件建立了本体概念映射信息。例如,一个文件标识为9,网址为http://grids.ucs.indiana.edu/ptliupages/publications/index.html的网页的内容主要是描述了有关做研究需要注意的事项,则可以将该网页标注为“研究(Research)”概念。现有的语义标注信息有些是以注释形式,有些是以XML包形式嵌入网页中的。在本例中,给出一个用斯坦福大学的文本标注工具OntoMat标注的,以注释形式表示的语义标注信息:The difference between semantically annotated documents and non-semantically annotated documents is that semantically annotated documents establish ontology-concept mapping information. For example, if a file is identified as 9, and the content of the web page with the URL http://grids.ucs.indiana.edu/ptliupages/publications/index.html mainly describes the matters that need to be paid attention to when doing research, then the web page can be Labeled as "Research (Research)" concept. Some of the existing semantic annotation information is in the form of annotations, and some are embedded in web pages in the form of XML packages. In this example, a semantic annotation information expressed in annotation form is given, which is annotated with Stanford University's text annotation tool OntoMat:
<html><html>
<head><head>
<!--<rdf:RDF xmlns:rdf=″http://www.w3.org/1999/02/22-rdf-syntax-ns#″<! --<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:daml=″http://www.daml.org/2001/03/daml+oil#″xmlns:daml="http://www.daml.org/2001/03/daml+oil#"
xmlns=″http://annotation.semanticweb.org/iswc/iswc.daml#″xmlns="http://annotation.semanticweb.org/iswc/iswc.daml#"
<Research rdf:about=″http://grids.ucs.indiana.edu/ptliupages/publications/index.html″<Research rdf:about=″http://grids.ucs.indiana.edu/ptliupages/publications/index.html″
</rdf:RDF></rdf:RDF>
-->-->
<title>Community Grids Publications</title><title>Community Grids Publications</title>
本例表示网页http://grids.ucs.indiana.edu/ptliupages/publications/index.html的内容主要是关于“Research”。对于用OntoMat工具标注的网页,其语义标注信息放置在Html头部中的注释信息中,以<rdf:RDF开头,以</rdf:RDF>结尾。因此,当语义本体索引处理单元314检测到语义标注信息是以<rdf:RDF开头,以</rdf:RDF>结尾的,则判定该网页文件是被语义标记标注过的。This example indicates that the content of the webpage http://grids.ucs.indiana.edu/ptliupages/publications/index.html is mainly about "Research". For webpages marked with OntoMat tools, the semantic annotation information is placed in the annotation information in the Html head, starting with <rdf:RDF and ending with </rdf:RDF>. Therefore, when the semantic ontology
步骤403,语义本体索引处理单元314读取文件的语义标注信息。In
在本实施例中语义本体索引处理单元314读取文件标识为9的网页的语义标注信息,即读取Html头部中的注释信息。表5是提取语义标注信息格式表,如表5所示:In this embodiment, the semantic ontology
表5table 5
步骤404,语义本体索引处理单元314从读取的语义标注信息当中提取语义本体概念词汇,建立语义本体索引。In
在本实施例中语义本体索引处理单元314调用相关的RDF文档处理应用编程接口(API),从语义标注信息中提取语义本体概念词汇“Research”,建立网页9的语义本体正向索引,并同时转换成语义本体倒排索引,如表6和表7所示。表6是网页9的语义本体正向索引,表7是网页9的语义本体倒排索引,如表6和表7所示:In this embodiment, the semantic ontology
表6Table 6
表7Table 7
步骤405,语义本体索引处理单元314将建立的语义本体正向索引和语义本体倒排索引保存到索引数据库306中,即形成了语义本体索引316的内容。
建立语义本体索引之前,之所以要先经过文本索引处理单元313的处理步骤,是因为在用户搜索时要先查询出符合用户输入的文本查询信息的文件,然后再对这些文件进行语义本体索引匹配处理。文本索引处理单元313的处理步骤保证了每个建立了文本索引,并且有语义信息的文件,在语义本体索引316中都有对应的语义本体索引信息,从而避免因为直接从文件数据库308读取文件进行语义本体索引匹配而产生的文件具有语义本体索引而没有文本索引的情况。Before establishing the semantic ontology index, the reason why the processing steps of the text
图5是图3所示的本发明实施例检索系统为用户执行搜索过程的流程图,如图5所示,包括以下步骤:Fig. 5 is a flow chart of the retrieval system of the embodiment of the present invention shown in Fig. 3 performing a search process for a user, as shown in Fig. 5, comprising the following steps:
步骤501,搜索接口模块301获取用户输入的文本查询信息,并将其发送给搜索处理模块303。本实施例中假设用户输入的查询信息为“天堂”。
步骤502,搜索处理模块303接收搜索接口模块301发送的文本查询信息,对其进行切分预处理,然后将切分后的查询关键词发送给文本搜索处理单元310。
切分处理的具体过程在现有的描述搜索引擎的相关文献中都有描述,这里不再复述。本实施例中文本查询信息“天堂”经过切分预处理后的结果为关键词“天堂”。The specific process of segmentation processing has been described in existing related documents describing search engines, and will not be repeated here. In this embodiment, the text query information "paradise" is segmented and preprocessed and the result is the keyword "paradise".
步骤503,文本搜索处理单元310匹配切分后的查询关键词与文本倒排索引,将匹配命中的文本命中文件列表发送给语义本体搜索处理单元311。
文本搜索处理单元310接收到查询关键词后,向索引数据库306发送读取文本倒排索引的请求信息,索引数据库306根据请求返回文本索引315中的文本倒排索引。文本搜索处理单元310将用户查询关键词“天堂”与文本倒排索引进行匹配,获得一系列包含该关键词的网页文件标识——文本命中文件标识列表,并将文本命中文件列表发送给语义本体搜索处理单元311进行处理。After receiving the query keyword, the text
为简单起见,在本实施例中假设只对20个文件建立了索引。表8是索引数据库306返回给文本搜索处理单元310的文本倒排索引表,如表8所示:For simplicity, it is assumed in this embodiment that only 20 files are indexed. Table 8 is the text inverted index table returned to the text
表8Table 8
表8中,每一行对应一个关键词和出现了该关键词的文件标识序列。其中,文件标识序列的二进制总位数20表示建立索引的总文件个数,每个二进制位代表一个文件,二进制位的位置序号与文件标识序号相同,即第一个二进制位表示标识序号为1的文件,第二个二进制位表示标识序号为2的文件,依次类推。若某个二进制位为0,表示相应的关键词没有在对应的文件中出现,若为1则表示相应的关键词在对应的文件中出现。In Table 8, each row corresponds to a keyword and the file identification sequence in which the keyword appears. Among them, the total number of binary digits of the file identification sequence is 20, indicating the total number of files indexed, each binary digit represents a file, and the position number of the binary digit is the same as the file identification serial number, that is, the first binary digit indicates that the identification serial number is 1 , the second binary bit indicates the file with the identification number 2, and so on. If a certain binary bit is 0, it means that the corresponding keyword does not appear in the corresponding file, and if it is 1, it means that the corresponding keyword appears in the corresponding file.
文本搜索处理单元310将用户查询关键词“天堂”匹配到表8中的“天堂”关键词,将其后的文件标识序列,即文本命中文件列表11011011111110001011取出,发送到语义本体搜索处理单元311。文本命中文件列表中二进制位为1的就是命中的文件了。The text
同理若用户输入的文本查询信息为“天堂应用”,经过切分预处理后得到关键词“天堂”和关键词“应用”,因此只要分别匹配到文本倒排索引中的“天堂”和“应用”两个关键词,将其后的文件标识序列做与操作得到结果01011000100010001010,其中二进制位为1的表示在对应的文件中同时出现了“天堂”和“应用”两个关键词。Similarly, if the text query information entered by the user is "paradise application", the keyword "paradise" and the keyword "application" will be obtained after segmentation and preprocessing, so as long as they match "paradise" and " Apply the two keywords, and perform an AND operation on the subsequent file identification sequence to get the result 01011000100010001010, where the binary bit is 1, which means that the two keywords "paradise" and "application" appear in the corresponding file at the same time.
步骤504,语义本体搜索处理单元311获得文本命中文件列表后,首先判断是否进行语义本体倒排索引匹配处理。In
语义本体搜索处理单元311进行判断的依据是文本命中文件的个数,若命中文件的个数大于某个阀值,则进行语义本体倒排索引匹配处理,执行步骤505;否则进行语义本体正向索引匹配处理,执行步骤506。阀值可以作为预定义的数值存储在语义本体搜索处理单元311中,也可以是检索系统根据统计规律或其它条件动态调整的数值。The basis for judging by the semantic ontology
语义本体搜索处理单元311接收到文本命中文件列表11011011111110001011后,累加计算得到这个二进制序列中1的个数为14,即文本命中文件个数为14。假设阀值为10,由于14大于10,因此进行语义本体倒排索引匹配处理。若阀值为15,则由于14小于15,进行语义本体正向索引匹配处理。After receiving the text hit file list 11011011111110001011, the semantic ontology
步骤505,语义本体搜索处理单元311对文本命中文件列表中的文件进行语义本体倒排索引匹配处理,得到文档语义分类表。
首先,语义本体搜索处理单元311向索引数据库306发送读取语义本体倒排索引的请求消息。索引数据库306根据请求返回语义本体倒排索引。语义本体搜索处理单元311依次读出语义本体倒排索引中的每一条记录,将记录中的文件标识序列与文本命中文件列表做交集操作,即将两个二进制序列进行按位与操作,然后用操作结果覆盖语义本体倒排索引表中对应的文件标识序列。最后,过滤掉交集为空的记录,则原来的语义本体倒排索引表就变成了文档语义分类表。执行步骤507。First, the semantic ontology
表9是本实施例中索引数据库306返回给语义本体搜索处理单元311的语义本体倒排索引表,如表9所示:Table 9 is the semantic ontology inverted index table returned to the semantic ontology
表9Table 9
表9中假设建立索引的20个文件只涉及五个语义本体概念,即全部文件中的语义标识有五种。每个语义标识后的文件标识序列表示该本体概念在20个文件中出现的情况。其表示方法同文本倒排索引中的文件标识序列,每个二进制位代表一个文件,二进制位的位置序号与文件的标识序号相同。若某个二进制位为0,表示对应的文件没有标注相应的本体概念,若为1表示标注了相应的本体概念。例如流行音乐的文件标识序列是01011010110001100000,表示文件标识为2、4、5、7、9、10、14、15的文件被标注成流行音乐的概念,反映了这些文件的内容与流行音乐有关。In Table 9, it is assumed that the 20 indexed documents only involve five semantic ontology concepts, that is, there are five kinds of semantic identifiers in all documents. The document identification sequence after each semantic identification indicates the occurrence of the ontology concept in 20 documents. Its representation method is the same as the file identification sequence in the text inverted index, each binary bit represents a file, and the position number of the binary bit is the same as the identification number of the file. If a binary bit is 0, it means that the corresponding ontology concept is not marked in the corresponding file, and if it is 1, it means that the corresponding ontology concept is marked. For example, the file identification sequence of pop music is 01011010110001100000, which means that files with file IDs of 2, 4, 5, 7, 9, 10, 14, and 15 are labeled as pop music, reflecting that the content of these files is related to pop music.
语义本体搜索单元311读取表9所示语义本体倒排索引中的每一个文件标识序列,与文本命中文件列表11011011111110001011做按位与操作,将操作结果存入表9中对应的文件标识序列的位置,并覆盖原来的文件标识序列,最后过滤掉交集为空,既与操作结果为全零的语义标识项,产生文档语义分类表。表10是产生的文档语义分类表,如表10所示:The semantic
表10Table 10
这样,就将文本命中文件列表11011011111110001011按语义分类了。In this way, the text hit file list 11011011111110001011 is semantically classified.
步骤506,语义本体搜索处理单元311对文本命中文件列表中的文件进行语义本体正向索引匹配处理,得到文档语义分类表。
首先,语义本体搜索处理单元311向索引数据库306发送读取语义本体正向索引的请求消息。表11是索引数据库306根据语义本体搜索处理单元311的请求返回语义本体正向索引表,如表11所示:First, the semantic ontology
表11Table 11
语义本体搜索处理单元311将文本命中文件列表11011011111110001011转化为具体的文件标识:1、2、4、5、7、8、9、10、11、12、13、17、19、20,并以每一个文件标识为查询条件在语义本体正向索引中匹配对应的记录,得到一个只包含这些文件标识的语义本体正向索引。表12是通过上述过程得到的语义本体正向索引表,如表12所示:The semantic ontology
表12Table 12
最后,以表12中出现的每一个语义本体概念为键值,统计出出现该键值的文件标识,完成正向索引到倒排索引的转换,产生文档语义分类表。表13是通过上述过程得到文档语义分类表,如表13所示:Finally, take each semantic ontology concept that appears in Table 12 as a key value, count the document identifiers that appear in this key value, complete the conversion from forward index to inverted index, and generate a document semantic classification table. Table 13 is the document semantic classification table obtained through the above process, as shown in Table 13:
表13Table 13
然后执行步骤507。Then step 507 is executed.
之所以分为语义本体倒排索引匹配处理和语义本体正向索引匹配处理,是考虑到效率问题。因为在进行语义本体倒排索引匹配处理的过程中,需要用文本命中文件列表依次匹配语义本体倒排索引中的每一条记录,并且做交集操作,这种全表扫描语义本体倒排索引的过程,其计算量开销非常大。因此,当文本命中文件的个数很少时,进行语义本体正向索引匹配处理可以减少计算量。但无论用哪种匹配方法,最后产生的文档语义分类表都是相同的,即表13与表10相同。The reason why it is divided into semantic ontology inverted index matching processing and semantic ontology forward index matching processing is that efficiency is considered. Because in the process of semantic ontology inverted index matching processing, it is necessary to use the text hit file list to match each record in the semantic ontology inverted index in turn, and perform an intersection operation. This process of full table scanning semantic ontology inverted index , which is computationally expensive. Therefore, when the number of text hit files is small, performing semantic ontology forward index matching processing can reduce the amount of calculation. But no matter which matching method is used, the resulting document semantic classification table is the same, that is, Table 13 is the same as Table 10.
步骤507,语义本体搜索处理单元311利用语义本体推理引擎304、标注本体库305和文档语义分类规则引擎对文档语义分类表中的语义词汇进行推理,根据推理结果对语义分类表进行扩展,并将扩展后的文档语义分类表发送给排序处理单元312。
语义本体搜索处理单元311执行完语义本体索引匹配操作后,首先将文档语义分类表中的语义本体概念词汇发送到语义本体推理引擎304进行语义推理。语义本体推理引擎304根据本体标注库305中定义的语义本体概念及其关系和自身定义的推理规则,产生表示语义本体词汇之间关系的RDF文档,返回给语义本体搜索处理单元311。然后,语义本体搜索处理单元311将这个RDF文档与文档语义分类规则引擎302中定义的语义分类规则中的触发条件进行匹配,判断哪些语义分类规则需要触发,并触发相应的规则,产生经过推理扩展的文档语义分类表。最后,将扩展后的语义文档分类表发送给排序处理单元312。After the semantic ontology
本实施例中,语义本体搜索处理单元311将表10或表13中的四个语义本体概念词,流行音乐、电脑游戏、古典音乐、小说,发送到语义本体推理引擎304进行推理。语义本体推理引擎304的推理原理是:根据资源的RDF三元组的表示形式,依据定义的推理规则进行推理处理。RDF三元组的表现形式为:(主体,谓词,个体)。例如定义两个如图6所示的资源描述:深圳601属于广东602;广东602属于中国603。同时定义一个推理规则为:(?a,属于,?b),(?b,属于,?c)→(?a,属于,?c)。该推理规则表达的含义是:如果a属于b,并且b属于c,则可以推理出a属于c。因此,从图6所示的关系可以推理出图7所示的结果:深圳601属于中国603。In this embodiment, the semantic ontology
假设标注本体库305中对本实施例的四个本体概念建立了如图8所示的关系:流行音乐801的父类为通俗音乐802,通俗音乐802和古典音乐803的父类均为音乐804;小说805的父类为文学806;电脑游戏807的父类为游戏。则经过推理规则推理后得到的四个本体概念的RDF关系如图9所示:流行音乐801和古典音乐803的父类均为音乐804;小说805的父类为文学806;电脑游戏807的父类为游戏808。其RDF三元组输出格式为:Assume that the relationship shown in Figure 8 is established for the four ontology concepts of the present embodiment in the annotation ontology library 305: the parent class of
(流行音乐,父类,音乐)(pop, parent, music)
(古典音乐,父类,音乐)(classical music, parent, music)
(小说,父类,文学)(fiction, parent genre, literature)
(电脑游戏,父类,游戏)(computer game, parent, game)
文档语义分类规则引擎302中定义了这样一条语义分类规则:若多个三元组存在共同的个体,且谓词为“父类”,则在文档语义分类表中增加新的文档分类,类别名称为该个体的名称,文件标识序列为多个三元组中各主体词汇对应的文件标识序列的并集,即按位或操作的结果序列。表14是上述的语义分类规则表,如表14所示:Such a semantic classification rule is defined in the document semantic classification rule engine 302: if there is a common individual in multiple triples, and the predicate is "parent class", then a new document classification is added to the document semantic classification table, and the category name is The name of the individual and the file identification sequence are the union of the file identification sequences corresponding to the subject words in multiple triplets, that is, the result sequence of the bitwise OR operation. Table 14 is the above-mentioned semantic classification rule table, as shown in Table 14:
表14Table 14
则经过语义推理处理并根据语义分类规则扩展整合后的文档语义分类表。表15是扩展后的文档语义分类表,如表15所示:After semantic reasoning processing and expanding the integrated document semantic classification table according to the semantic classification rules. Table 15 is the extended document semantic classification table, as shown in Table 15:
表15Table 15
步骤508,排序处理单元312对经过语义推理后的文档语义分类表中的文件进行相关性和重要性的计算,然后按照计算结果对文件进行排序,最后将排序后的结果和文档语义分类信息发送给搜索接口模块301。
步骤509,搜索接口模块301将接收到的排序结果和语义分类信息作为搜索结果反馈给用户。In
以上所述,仅为本发明的较佳实施例而已,并非用来限定本发明的保护范围。The above descriptions are only preferred embodiments of the present invention, and are not intended to limit the protection scope of the present invention.
Claims (24)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CNA2006101498039A CN101169780A (en) | 2006-10-25 | 2006-10-25 | A Semantic Ontology-Based Retrieval System and Method |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CNA2006101498039A CN101169780A (en) | 2006-10-25 | 2006-10-25 | A Semantic Ontology-Based Retrieval System and Method |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| CN101169780A true CN101169780A (en) | 2008-04-30 |
Family
ID=39390409
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CNA2006101498039A Pending CN101169780A (en) | 2006-10-25 | 2006-10-25 | A Semantic Ontology-Based Retrieval System and Method |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN101169780A (en) |
Cited By (39)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN101799835A (en) * | 2010-04-21 | 2010-08-11 | 中国测绘科学研究院 | Ontology-driven geographic information retrieval system and method |
| CN101917413A (en) * | 2010-07-29 | 2010-12-15 | 清华大学 | Service assembly system and method based on service quality optimization and semantic information integration |
| CN101944099A (en) * | 2010-06-24 | 2011-01-12 | 西北工业大学 | Method for automatically classifying text documents by utilizing body |
| CN101566984B (en) * | 2008-07-11 | 2011-02-09 | 博采林电子科技(深圳)有限公司 | Search engine used in personal hand-held equipment and resource search method |
| CN102063453A (en) * | 2010-05-31 | 2011-05-18 | 百度在线网络技术(北京)有限公司 | Method and device for searching based on demands of user |
| CN102073692A (en) * | 2010-12-16 | 2011-05-25 | 北京农业信息技术研究中心 | Agricultural field ontology library based semantic retrieval system and method |
| CN102725759A (en) * | 2010-02-05 | 2012-10-10 | 微软公司 | Semantic catalog for search results |
| CN102750277A (en) * | 2011-04-18 | 2012-10-24 | 腾讯科技(深圳)有限公司 | Method and device for obtaining information |
| CN102799677A (en) * | 2012-07-20 | 2012-11-28 | 河海大学 | Water conservation domain information retrieval system and method based on semanteme |
| CN102880645A (en) * | 2012-08-24 | 2013-01-16 | 上海云叟网络科技有限公司 | Semantic intelligent search method |
| CN103020283A (en) * | 2012-12-27 | 2013-04-03 | 华北电力大学 | Semantic search method based on dynamic reconfiguration of background knowledge |
| CN103136360A (en) * | 2013-03-07 | 2013-06-05 | 北京宽连十方数字技术有限公司 | Internet behavior markup engine and behavior markup method corresponding to same |
| CN103177123A (en) * | 2013-04-15 | 2013-06-26 | 昆明理工大学 | Method for improving database retrieval information relevancy |
| CN103440284A (en) * | 2013-08-14 | 2013-12-11 | 郭克华 | Multimedia storage and search method supporting cross-type semantic search |
| CN104462060A (en) * | 2014-12-03 | 2015-03-25 | 百度在线网络技术(北京)有限公司 | Method and device for calculating text similarity and realizing search processing through computer |
| CN104615729A (en) * | 2014-10-30 | 2015-05-13 | 南京源成语义软件科技有限公司 | Network searching method based on semantic net technology |
| CN104765779A (en) * | 2015-03-20 | 2015-07-08 | 浙江大学 | Patent document inquiry extension method based on YAGO2s |
| CN104866598A (en) * | 2015-06-01 | 2015-08-26 | 北京理工大学 | Heterogeneous database integrating method based on configurable templates |
| CN105160046A (en) * | 2015-10-30 | 2015-12-16 | 成都博睿德科技有限公司 | Text-based data retrieval method |
| WO2016009321A1 (en) * | 2014-07-14 | 2016-01-21 | International Business Machines Corporation | System for searching, recommending, and exploring documents through conceptual associations and inverted table for storing and querying conceptual indices |
| CN105335510A (en) * | 2015-10-30 | 2016-02-17 | 成都博睿德科技有限公司 | Text data efficient searching method |
| CN102750277B (en) * | 2011-04-18 | 2016-12-14 | 深圳市世纪光速信息技术有限公司 | The method and apparatus of acquisition information |
| CN103886099B (en) * | 2014-04-09 | 2017-02-15 | 中国人民大学 | Semantic retrieval system and method of vague concepts |
| CN106951191A (en) * | 2017-03-22 | 2017-07-14 | 江苏金易达供应链管理有限公司 | Towards the big data storage method of auto service platform |
| CN107004158A (en) * | 2014-11-27 | 2017-08-01 | 爱克发医疗保健公司 | Data repository querying method |
| CN107590166A (en) * | 2016-12-20 | 2018-01-16 | 百度在线网络技术(北京)有限公司 | A kind of data creation method and device based on inquiry content |
| CN108170739A (en) * | 2017-12-18 | 2018-06-15 | 深圳前海微众银行股份有限公司 | Problem matching process, terminal and computer readable storage medium |
| WO2018141140A1 (en) * | 2017-02-06 | 2018-08-09 | 中兴通讯股份有限公司 | Method and device for semantic recognition |
| CN109522414A (en) * | 2018-11-26 | 2019-03-26 | 吉林大学 | A kind of document delivery object selection system |
| CN110245215A (en) * | 2019-06-05 | 2019-09-17 | 阿里巴巴集团控股有限公司 | A kind of text searching method and device |
| US10496683B2 (en) | 2014-07-14 | 2019-12-03 | International Business Machines Corporation | Automatically linking text to concepts in a knowledge base |
| US10503762B2 (en) | 2014-07-14 | 2019-12-10 | International Business Machines Corporation | System for searching, recommending, and exploring documents through conceptual associations |
| US10572521B2 (en) | 2014-07-14 | 2020-02-25 | International Business Machines Corporation | Automatic new concept definition |
| CN111199170A (en) * | 2018-11-16 | 2020-05-26 | 长鑫存储技术有限公司 | Formula file identification method and device, electronic equipment and storage medium |
| CN111353055A (en) * | 2020-03-02 | 2020-06-30 | 中国传媒大学 | Cataloging method and system for extended metadata based on smart tags |
| CN112182239A (en) * | 2020-09-22 | 2021-01-05 | 中国建设银行股份有限公司 | Information retrieval method and device |
| CN112559735A (en) * | 2019-09-10 | 2021-03-26 | 富士施乐株式会社 | Information processing apparatus and recording medium |
| CN113779032A (en) * | 2021-09-14 | 2021-12-10 | 广州汇通国信科技有限公司 | Search engine index construction method and device based on recurrent neural network |
| CN114556969A (en) * | 2019-11-27 | 2022-05-27 | 深圳市欢太科技有限公司 | Data processing method, device and storage medium |
-
2006
- 2006-10-25 CN CNA2006101498039A patent/CN101169780A/en active Pending
Cited By (61)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN101566984B (en) * | 2008-07-11 | 2011-02-09 | 博采林电子科技(深圳)有限公司 | Search engine used in personal hand-held equipment and resource search method |
| CN102725759A (en) * | 2010-02-05 | 2012-10-10 | 微软公司 | Semantic catalog for search results |
| CN102725759B (en) * | 2010-02-05 | 2015-11-25 | 微软技术许可有限责任公司 | Semantic catalog for search results |
| CN101799835A (en) * | 2010-04-21 | 2010-08-11 | 中国测绘科学研究院 | Ontology-driven geographic information retrieval system and method |
| CN101799835B (en) * | 2010-04-21 | 2012-07-04 | 中国测绘科学研究院 | Ontology-driven geographic information retrieval system and method |
| CN102063453A (en) * | 2010-05-31 | 2011-05-18 | 百度在线网络技术(北京)有限公司 | Method and device for searching based on demands of user |
| CN101944099A (en) * | 2010-06-24 | 2011-01-12 | 西北工业大学 | Method for automatically classifying text documents by utilizing body |
| CN101917413A (en) * | 2010-07-29 | 2010-12-15 | 清华大学 | Service assembly system and method based on service quality optimization and semantic information integration |
| CN101917413B (en) * | 2010-07-29 | 2013-07-17 | 清华大学 | Service assembly system and method based on service quality optimization and semantic information integration |
| CN102073692A (en) * | 2010-12-16 | 2011-05-25 | 北京农业信息技术研究中心 | Agricultural field ontology library based semantic retrieval system and method |
| CN102073692B (en) * | 2010-12-16 | 2016-04-27 | 北京农业信息技术研究中心 | Based on the semantic retrieval system and method for agriculture field ontology library |
| CN102750277A (en) * | 2011-04-18 | 2012-10-24 | 腾讯科技(深圳)有限公司 | Method and device for obtaining information |
| CN102750277B (en) * | 2011-04-18 | 2016-12-14 | 深圳市世纪光速信息技术有限公司 | The method and apparatus of acquisition information |
| CN102799677A (en) * | 2012-07-20 | 2012-11-28 | 河海大学 | Water conservation domain information retrieval system and method based on semanteme |
| CN102799677B (en) * | 2012-07-20 | 2014-11-12 | 河海大学 | Water conservation domain information retrieval system and method based on semanteme |
| CN102880645A (en) * | 2012-08-24 | 2013-01-16 | 上海云叟网络科技有限公司 | Semantic intelligent search method |
| CN102880645B (en) * | 2012-08-24 | 2015-12-16 | 上海云叟网络科技有限公司 | The intelligent search method of semantization |
| CN103020283A (en) * | 2012-12-27 | 2013-04-03 | 华北电力大学 | Semantic search method based on dynamic reconfiguration of background knowledge |
| CN103020283B (en) * | 2012-12-27 | 2015-12-09 | 华北电力大学 | A kind of semantic retrieving method of the dynamic restructuring based on background knowledge |
| CN103136360B (en) * | 2013-03-07 | 2016-09-07 | 北京宽连十方数字技术有限公司 | A kind of internet behavior markup engine and to should the behavior mask method of engine |
| CN103136360A (en) * | 2013-03-07 | 2013-06-05 | 北京宽连十方数字技术有限公司 | Internet behavior markup engine and behavior markup method corresponding to same |
| CN103177123A (en) * | 2013-04-15 | 2013-06-26 | 昆明理工大学 | Method for improving database retrieval information relevancy |
| CN103177123B (en) * | 2013-04-15 | 2016-05-11 | 昆明理工大学 | A kind of method that improves database retrieval information correlation |
| CN103440284B (en) * | 2013-08-14 | 2016-04-20 | 郭克华 | A kind of support across type semantic search multimedia store and searching method |
| CN103440284A (en) * | 2013-08-14 | 2013-12-11 | 郭克华 | Multimedia storage and search method supporting cross-type semantic search |
| CN103886099B (en) * | 2014-04-09 | 2017-02-15 | 中国人民大学 | Semantic retrieval system and method of vague concepts |
| WO2016009321A1 (en) * | 2014-07-14 | 2016-01-21 | International Business Machines Corporation | System for searching, recommending, and exploring documents through conceptual associations and inverted table for storing and querying conceptual indices |
| US10572521B2 (en) | 2014-07-14 | 2020-02-25 | International Business Machines Corporation | Automatic new concept definition |
| US10496683B2 (en) | 2014-07-14 | 2019-12-03 | International Business Machines Corporation | Automatically linking text to concepts in a knowledge base |
| US10496684B2 (en) | 2014-07-14 | 2019-12-03 | International Business Machines Corporation | Automatically linking text to concepts in a knowledge base |
| US10503762B2 (en) | 2014-07-14 | 2019-12-10 | International Business Machines Corporation | System for searching, recommending, and exploring documents through conceptual associations |
| US10503761B2 (en) | 2014-07-14 | 2019-12-10 | International Business Machines Corporation | System for searching, recommending, and exploring documents through conceptual associations |
| US10956461B2 (en) | 2014-07-14 | 2021-03-23 | International Business Machines Corporation | System for searching, recommending, and exploring documents through conceptual associations |
| CN104615729A (en) * | 2014-10-30 | 2015-05-13 | 南京源成语义软件科技有限公司 | Network searching method based on semantic net technology |
| CN107004158A (en) * | 2014-11-27 | 2017-08-01 | 爱克发医疗保健公司 | Data repository querying method |
| CN104462060B (en) * | 2014-12-03 | 2017-08-01 | 百度在线网络技术(北京)有限公司 | Pass through computer implemented calculating text similarity and search processing method and device |
| CN104462060A (en) * | 2014-12-03 | 2015-03-25 | 百度在线网络技术(北京)有限公司 | Method and device for calculating text similarity and realizing search processing through computer |
| CN104765779A (en) * | 2015-03-20 | 2015-07-08 | 浙江大学 | Patent document inquiry extension method based on YAGO2s |
| CN104866598B (en) * | 2015-06-01 | 2018-05-08 | 北京理工大学 | Heterogeneous databases integration method based on configurable template |
| CN104866598A (en) * | 2015-06-01 | 2015-08-26 | 北京理工大学 | Heterogeneous database integrating method based on configurable templates |
| CN105335510A (en) * | 2015-10-30 | 2016-02-17 | 成都博睿德科技有限公司 | Text data efficient searching method |
| CN105160046A (en) * | 2015-10-30 | 2015-12-16 | 成都博睿德科技有限公司 | Text-based data retrieval method |
| CN107590166B (en) * | 2016-12-20 | 2019-02-12 | 百度在线网络技术(北京)有限公司 | A kind of data creation method and device based on inquiry content |
| CN107590166A (en) * | 2016-12-20 | 2018-01-16 | 百度在线网络技术(北京)有限公司 | A kind of data creation method and device based on inquiry content |
| US11301515B2 (en) | 2016-12-20 | 2022-04-12 | Baidu Online Network Technology (Beijing) Co., Ltd. | Method and apparatus for generating data based on query content |
| WO2018141140A1 (en) * | 2017-02-06 | 2018-08-09 | 中兴通讯股份有限公司 | Method and device for semantic recognition |
| CN106951191A (en) * | 2017-03-22 | 2017-07-14 | 江苏金易达供应链管理有限公司 | Towards the big data storage method of auto service platform |
| CN108170739A (en) * | 2017-12-18 | 2018-06-15 | 深圳前海微众银行股份有限公司 | Problem matching process, terminal and computer readable storage medium |
| CN111199170B (en) * | 2018-11-16 | 2022-04-01 | 长鑫存储技术有限公司 | Formula file identification method and device, electronic equipment and storage medium |
| CN111199170A (en) * | 2018-11-16 | 2020-05-26 | 长鑫存储技术有限公司 | Formula file identification method and device, electronic equipment and storage medium |
| CN109522414A (en) * | 2018-11-26 | 2019-03-26 | 吉林大学 | A kind of document delivery object selection system |
| CN109522414B (en) * | 2018-11-26 | 2021-06-04 | 吉林大学 | A Document Delivery Object Selection System |
| CN110245215A (en) * | 2019-06-05 | 2019-09-17 | 阿里巴巴集团控股有限公司 | A kind of text searching method and device |
| CN110245215B (en) * | 2019-06-05 | 2023-10-20 | 创新先进技术有限公司 | A text retrieval method and device |
| CN112559735A (en) * | 2019-09-10 | 2021-03-26 | 富士施乐株式会社 | Information processing apparatus and recording medium |
| CN114556969A (en) * | 2019-11-27 | 2022-05-27 | 深圳市欢太科技有限公司 | Data processing method, device and storage medium |
| CN111353055A (en) * | 2020-03-02 | 2020-06-30 | 中国传媒大学 | Cataloging method and system for extended metadata based on smart tags |
| CN111353055B (en) * | 2020-03-02 | 2024-04-16 | 中国传媒大学 | Cataloging method and system based on intelligent tag extension metadata |
| CN112182239A (en) * | 2020-09-22 | 2021-01-05 | 中国建设银行股份有限公司 | Information retrieval method and device |
| CN113779032A (en) * | 2021-09-14 | 2021-12-10 | 广州汇通国信科技有限公司 | Search engine index construction method and device based on recurrent neural network |
| CN113779032B (en) * | 2021-09-14 | 2024-03-12 | 广州汇通国信科技有限公司 | Search engine index construction method and device based on cyclic neural network |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN101169780A (en) | A Semantic Ontology-Based Retrieval System and Method | |
| CN100498790C (en) | Retrieving method and system | |
| CN103136360B (en) | A kind of internet behavior markup engine and to should the behavior mask method of engine | |
| CN103365924B (en) | A kind of method of internet information search, device and terminal | |
| CN105045875B (en) | Personalized search and device | |
| CN100394427C (en) | network searching system and method | |
| US20090070322A1 (en) | Browsing knowledge on the basis of semantic relations | |
| CN100440224C (en) | An automatic processing method for search engine performance evaluation | |
| CN112115232A (en) | A data error correction method, device and server | |
| WO2008097856A2 (en) | Search result delivery engine | |
| CN101261629A (en) | Specific Information Search Method Based on Automatic Classification Technology | |
| CN100478962C (en) | Method, device and system for searching web page and device for establishing index database | |
| KR20060048778A (en) | Phrases-based Search in Information Retrieval Systems | |
| KR20060048777A (en) | Phrase-based generation of document descriptions | |
| WO2007132342A1 (en) | Documentary search procedure in a distributed information system | |
| US9971828B2 (en) | Document tagging and retrieval using per-subject dictionaries including subject-determining-power scores for entries | |
| CN102081660B (en) | Method for searching and sequencing keywords of XML documents based on semantic correlation | |
| CN103198136B (en) | A kind of PC file polling method based on sequential correlation | |
| CN105912662A (en) | Coreseek-based vertical search engine research and optimization method | |
| CN101477527A (en) | Multimedia resource retrieval method and apparatus | |
| Remi et al. | Domain ontology driven fuzzy semantic information retrieval | |
| JP2005063432A (en) | Multimedia object search device and multimedia object search method | |
| CN103559258A (en) | Webpage ranking method based on cloud computation | |
| CN104598561A (en) | Text-based intelligent agricultural video classification method and text-based intelligent agricultural video classification system | |
| CN100462969C (en) | Methods of using the Internet to provide and query information for the public |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| C06 | Publication | ||
| PB01 | Publication | ||
| C10 | Entry into substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| C12 | Rejection of a patent application after its publication | ||
| RJ01 | Rejection of invention patent application after publication |
Open date: 20080430 |