[go: up one dir, main page]

WO2012106941A1 - Procédé et dispositif de recherche en texte intégral - Google Patents

Procédé et dispositif de recherche en texte intégral Download PDF

Info

Publication number
WO2012106941A1
WO2012106941A1 PCT/CN2011/077788 CN2011077788W WO2012106941A1 WO 2012106941 A1 WO2012106941 A1 WO 2012106941A1 CN 2011077788 W CN2011077788 W CN 2011077788W WO 2012106941 A1 WO2012106941 A1 WO 2012106941A1
Authority
WO
WIPO (PCT)
Prior art keywords
classification
search
item
search result
index
Prior art date
Application number
PCT/CN2011/077788
Other languages
English (en)
Chinese (zh)
Inventor
樊彪
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to PCT/CN2011/077788 priority Critical patent/WO2012106941A1/fr
Priority to CN2011800013237A priority patent/CN102317943B/zh
Publication of WO2012106941A1 publication Critical patent/WO2012106941A1/fr

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/338Presentation of query results

Definitions

  • the present invention relates to the field of information search, and in particular, to a method and apparatus for full-text search. Background technique
  • search engines are an effective means of solving the above problems.
  • Most search engines use keyword matching methods to match the data in the repository with the keywords entered by the user to get the information the user needs.
  • Embodiments of the present invention provide a method and apparatus for full-text search to solve the problems in information search in the prior art.
  • a method for full-text search comprising:
  • a device for full-text search characterized in that the device comprises:
  • a search module configured to receive a keyword input by the user, and perform matching according to the keyword in the index library to obtain a search result
  • a classification information obtaining module configured to extract classification information of the search result in the index library
  • a classification display module configured to acquire, according to the classification information, a classification item and a classification item belonging to the search result, and according to the classification information
  • the classified large item and the classified small item to which the search result belongs display the search result into categories;
  • the preset information model includes the classification information of the all searchable documents and the all searchable documents. Title and body content.
  • the beneficial effects of the technical solution provided by the embodiment of the present invention are as follows:
  • the content of the searchable document is saved in the index database by using a preset information model, and the classification information is added, so that the search engine outputs the search result.
  • the classification information the classification item and the classification item belonging to the search result can be classified and displayed, so that the user can quickly obtain the desired result according to the classification big item and the classification item, and search for the user. It is more convenient and faster, reducing the workload of user search.
  • Embodiment 1 is a schematic flowchart of a method for full-text search provided in Embodiment 1 of the present invention
  • Embodiment 2 is a schematic flowchart of a method for full-text search provided in Embodiment 2 of the present invention
  • Embodiment 3 is a schematic structural diagram of a classifier provided in Embodiment 2 of the present invention.
  • FIG. 4 is a schematic diagram showing a display result of a search result after full-text search provided in Embodiment 2 of the present invention
  • FIG. 5 is a schematic diagram showing a sorted search result display situation provided in Embodiment 2 of the present invention
  • FIG. 6 is a schematic structural diagram of an apparatus for searching for a first full text according to Embodiment 3 of the present invention.
  • FIG. 7 is a schematic structural diagram of a second full-text search apparatus according to Embodiment 3 of the present invention.
  • FIG. 8 is a schematic structural diagram of a third full-text search apparatus provided in Embodiment 3 of the present invention.
  • FIG. 9 is a schematic structural diagram of a fourth full-text search apparatus according to Embodiment 3 of the present invention. detailed description
  • an embodiment of the present invention provides a method for full-text search, where the method includes:
  • All the searchable documents mapped by the preset information model are stored in the index library, and the preset information model includes the classification information of all searchable documents and the title and body content of all searchable documents.
  • the full-text search method provided by the embodiment of the present invention saves the content of the searchable document in the index database by using a preset information model, and adds the classification information, so that the search engine can output the search result according to the classification information.
  • the classification item and the classification item belonging to the search result are classified and displayed, so that the user can quickly obtain the desired result by filtering according to the classification item and the classification item, and the user is more convenient and fast to search. Reduce the workload of user search.
  • Embodiment 2 is a refinement based on Embodiment 1, to explain the method provided by the present invention.
  • a method for full-text search specifically comprising:
  • the content of all searchable documents is stored in the index library according to a preset information model, specifically: mapping the content of the searchable document into a body content and a document of a set of metadata and searchable documents.
  • the information consisting of the title.
  • the metadata includes at least classification information of the searchable document.
  • the preset information model defines a custom information format, and the content of the searchable document can be mapped to a format defined by the preset information model, and the content of a searchable document is pre-predicted.
  • the information model is saved as an example to illustrate:
  • keywords content alarm box fault, alarm box reporting a critical alarm when no alarm occurs
  • the content of the searchable document mapped by the information model is saved as an Html format as an example.
  • the content of the searchable document mapped by the information model may also be used in other formats. Save it.
  • the index library may be a database, and is used to save information obtained by mapping the content of the searchable document through a preset information model.
  • a summary field, a keyword field, and the like may be further included, for example, information obtained after mapping by a preset information model, specifically:
  • alarm box fault alarm box reporting a critical alarm when no alarm occurs.
  • field expansion may also be performed, for example, adding a custom field to add classification information and a rule indicating the display of the classification item.
  • the corresponding relationship between the classified large item, the classified small item, and the classification information is stored in the classifier.
  • the classifier is composed of one or more text files, and the structure thereof is composed of three parts: a classification item, a classification item, and metadata, and is illustrated by using FIG. 3 as an example, wherein the classification item includes a large item A.
  • B, N classification items include small items Al, A2, AN, B B2, BN, Nl, and metadata includes VA1, VA2, VAN, VB VB2, VBN, and Li.
  • the classification large items are defined, and the classification small items are defined under each classification large item, and the classification information in the metadata is made. Corresponds to the classification item.
  • each of the classification items and the preset information model Corresponding to the classification information in the metadata the classification information is extracted from the metadata of the information model corresponding to each searchable document, and the classification item and the classification large item to which the searchable document belongs are determined according to the classifier.
  • a searchable document may be defined with a plurality of classification information, and in this way, the searchable document is divided into a plurality of classified large items and classified small items. For example, if the classification information is read in the information model of the searchable document as VA2 and VB2, the searchable document belongs to the classification small item A2 of the classification large item A and the classification of the large classification item B when performing classification. Item B2.
  • the classifier can perform custom extensions to add new classification items and/or classification items.
  • the document index is established.
  • the search engine automatically extracts the classification information in the metadata of all the searchable documents saved according to the preset information model, and obtains the classification items and the classifications of all the searchable documents by the classifier.
  • the item and is saved as a document index in the index library.
  • obtaining, by the classifier, the classification large item and the classification small item to which all the searchable documents belong specifically: acquiring, according to the classification information in the metadata, the classification item belonging to all the searchable documents by the classifier, and according to the classifier
  • the corresponding relationship between the classified large item and the classified small item saved in the category determines the classified large item to which all the searchable documents belong, and saves the classified large item and the classified small item to which all the searchable documents belong as the document index in the index library.
  • the indexing library obtains the classified large item and the classified small item to which the search result belongs, and specifically, in the index library, the classified large item and the classified small item to which the search result saved in the document index belongs are read.
  • the search results are classified and displayed according to the classification information, which specifically includes:
  • the search result is displayed according to the classified large item and the classified small item classified by the search result.
  • the searchable document included in the search result includes three categories of large-scale operation and maintenance processes, document types, product models, and operation and maintenance processes, document types, and product models. It is divided into several sub-items.
  • the above method further includes:
  • the classification item selected by the user is received, and the search result included in the classification item is filtered according to the keyword, and the search result after the filtering search is performed in the searchable document included in the category item is displayed.
  • the search search may be performed based on the keyword among the 388 search results included in the classified item product A, and only the search results included in the classified item A may be displayed. Narrow down The range of search results is more convenient for users to get the closest search results.
  • the method for the full-text search described above may further include:
  • the key field includes a content field and a classification field defined in a preset information model, a content field such as a title, a keyword field, and the like, a classification field such as a summary, a variety of custom fields, and the like, and a body content field of the searchable document may be
  • the key segment weighter is specifically a text file, and the weight of the key segment can be weighted according to the key segment weighter. The greater the weighting result corresponding to the searchable document, the more relevant the searchable document is. High, the higher the priority of the search results output.
  • the title, keyword, and instruction command are all key fields defined in the preset information model.
  • body can be used as a standard field.
  • step 206 specifically includes: classifying all the search results according to the classification information, and displaying all the search results according to the relevance of all the search results. Sort display. As shown in FIG. 5, the ranking results of the search results after the key segment weighting process are displayed, and the top ranked is the most relevant search result, which can make the user get the accuracy of the search result and the user needs to search. Keep the document in front and avoid too many page-finding operations.
  • the full-text search method provided by the embodiment of the present invention saves the content of the searchable document in the index database by using a preset information model, and adds the classification information, so that the search engine can output the search result according to the classification information.
  • the classification item and the classification item belonging to the search result are classified and displayed, so that the user can quickly obtain the desired result by filtering according to the classification item and the classification item, and the user is more convenient and fast to search. Reduce the workload of user search.
  • Example 3 Example 3
  • An embodiment of the present invention provides a device for full-text search. As shown in FIG. 6, the device specifically includes:
  • the search module 301 is configured to receive a keyword input by the user, and perform matching according to the keyword in the index library to obtain a search result;
  • a classification information obtaining module 302 configured to extract classification information of the search result in the index library
  • the classification display module 303 is configured to obtain, according to the classification information, the classification large item and the classification small item to which the search result belongs according to the classification information, and classify and display the search result according to the classification large item and the classification small item to which the search result belongs, and classify and display the search result;
  • the foregoing apparatus further includes:
  • the document indexing module 304 is configured to: before the search module 304 receives the keyword input by the user, establish a classifier according to the preset information model, and extract classification information in all searchable documents mapped by the preset information model, according to the classification And classification information to obtain the classification items and classification items belonging to all searchable documents, and save them as document indexes, and save the document indexes in the index library;
  • the classifier stores a correspondence relationship between the classification large item, the classification small item, and the classification information.
  • the classification display module 303 is specifically configured to: obtain, according to the classification information of the search result, a classification item and a classification item belonging to the search result in the document index, and display the search according to the classification item and the classification item category to which the search result belongs result.
  • the foregoing apparatus further includes:
  • the filtering search module 305 is configured to: after the classification display module 303 sorts and displays the search results according to the classification information, receive the classification item selected by the user, and perform filtering search according to the keyword in the searchable document included in the classification item, display filtering Search results after search.
  • the foregoing apparatus further includes:
  • the weighted index establishing module 306 is configured to establish a key segment weighter according to a preset information model before the search module 304 receives the keyword input by the user, and define different key segments for different key segments in the key segment weighter. Weight, calculate the weighted result of all searchable documents after the preset information model mapping, and store the weighted result as a weighted index in the index library.
  • the classification display module 303 is specifically configured to classify and display the search results according to the classification information, and obtain the weighted results of the search results saved in the weighted index, and display the search results in descending order according to the weighting result.
  • the apparatus for full-text search provided by the embodiment of the present invention saves the content of the searchable document in the index database by using a preset information model, and adds the classification information, so that the search engine can output the search result according to the classification information.
  • the classification item and the classification item belonging to the search result are classified and displayed, so that the user can quickly obtain the desired result by filtering according to the classification item and the classification item, and the user is more convenient and fast to search. Reduce the workload of user search.
  • the apparatus for full-text search provided by the foregoing embodiment is only illustrated by the division of each functional module. In actual applications, the function allocation may be completed by different functional modules as needed, that is, full-text search.
  • the internal structure of the device is divided into different functional modules to perform all or part of the functions described above.
  • the device for the full-text search provided by the foregoing embodiment is the same as the method for the full-text search. The specific implementation process is described in detail in the method embodiment, and details are not described herein again.
  • All or part of the technical solutions provided by the above embodiments may be implemented by software programming, and the software program is stored in a readable storage medium such as a hard disk, an optical disk or a floppy disk in a computer.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

L'invention porte sur un procédé et un dispositif de recherche en texte intégral, appartenant au domaine de la recherche d'informations. Dans la présente invention, les contenus de documents pouvant être recherchés sont stockés dans une bibliothèque d'index à l'aide d'un modèle d'informations préréglé, et des informations de classification sont ajoutées de telle sorte que, lorsque le moteur de recherche émet les résultats de recherche, une catégorie large et une catégorie précise auxquelles les résultats de recherche appartiennent peuvent être acquises selon les informations de classification de façon à afficher les résultats de recherche d'une manière classée, et l'utilisateur peut acquérir rapidement les résultats désirés par filtrage selon la catégorie large et la catégorie précise, et l'utilisateur peut réaliser une recherche de manière plus commode et plus rapide, réduisant la charge de travail de recherche de l'utilisateur.
PCT/CN2011/077788 2011-07-29 2011-07-29 Procédé et dispositif de recherche en texte intégral WO2012106941A1 (fr)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/CN2011/077788 WO2012106941A1 (fr) 2011-07-29 2011-07-29 Procédé et dispositif de recherche en texte intégral
CN2011800013237A CN102317943B (zh) 2011-07-29 2011-07-29 一种全文搜索的方法和装置

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2011/077788 WO2012106941A1 (fr) 2011-07-29 2011-07-29 Procédé et dispositif de recherche en texte intégral

Publications (1)

Publication Number Publication Date
WO2012106941A1 true WO2012106941A1 (fr) 2012-08-16

Family

ID=45429420

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2011/077788 WO2012106941A1 (fr) 2011-07-29 2011-07-29 Procédé et dispositif de recherche en texte intégral

Country Status (2)

Country Link
CN (1) CN102317943B (fr)
WO (1) WO2012106941A1 (fr)

Families Citing this family (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102651031A (zh) * 2012-03-31 2012-08-29 百度在线网络技术(北京)有限公司 一种用于提供搜索结果的方法与设备
CN103678350B (zh) * 2012-09-10 2018-01-05 腾讯科技(深圳)有限公司 社交网络搜索结果展示方法及装置
CN102968454B (zh) * 2012-10-26 2016-08-03 北京百度网讯科技有限公司 一种用于获取推广对象搜索结果的方法和设备
CN104123366A (zh) * 2014-07-23 2014-10-29 谢建平 一种搜索方法及搜索服务器
CN106815220A (zh) * 2015-11-27 2017-06-09 英业达科技有限公司 数据分类及搜寻方法
CN105843867B (zh) * 2016-03-17 2019-09-03 畅捷通信息技术股份有限公司 基于元数据模型的检索方法和基于元数据模型的检索装置
WO2018023428A1 (fr) * 2016-08-02 2018-02-08 步晓芳 Procédé d'affichage de résultats de recherche et moteur de recherche
CN107391535B (zh) * 2017-04-20 2021-01-12 创新先进技术有限公司 在文档应用中搜索文档的方法及装置
CN107423349A (zh) * 2017-05-18 2017-12-01 福建中金在线信息科技有限公司 一种全文搜索的方法及系统
CN109657151A (zh) * 2018-12-25 2019-04-19 华联世纪工程咨询股份有限公司 一种基于用户使用情景的工程材料搜索方法及装置
CN112004126A (zh) * 2020-08-24 2020-11-27 海信视像科技股份有限公司 搜索结果显示方法及显示设备
CN112445830B (zh) * 2020-11-26 2024-05-14 湖南智慧政务区块链科技有限公司 一种基于区块链技术的数据分析系统
CN113127629B (zh) * 2021-03-11 2024-05-14 维沃移动通信有限公司 关键字搜索方法、装置及电子设备
CN113378015B (zh) 2021-06-28 2023-06-20 北京百度网讯科技有限公司 搜索方法、装置、电子设备、存储介质和程序产品
CN114416920A (zh) * 2021-12-28 2022-04-29 北京像素软件科技股份有限公司 文本搜索方法、装置、电子设备及存储介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101082914A (zh) * 2005-12-30 2007-12-05 香港应用科技研究院有限公司 结构文档的分类检索
WO2009136426A1 (fr) * 2008-05-08 2009-11-12 三菱電機株式会社 Equipement permettant d’effectuer une interrogation de recherche
CN101714172A (zh) * 2009-11-13 2010-05-26 华中科技大学 一种支持访问控制的索引结构及其检索方法
CN101840400A (zh) * 2009-03-19 2010-09-22 北大方正集团有限公司 一种多级分类检索方法及系统
US20100293174A1 (en) * 2009-05-12 2010-11-18 Microsoft Corporation Query classification

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN100478962C (zh) * 2007-07-24 2009-04-15 华为技术有限公司 搜索网页的方法、装置及系统和建立索引数据库的装置

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101082914A (zh) * 2005-12-30 2007-12-05 香港应用科技研究院有限公司 结构文档的分类检索
WO2009136426A1 (fr) * 2008-05-08 2009-11-12 三菱電機株式会社 Equipement permettant d’effectuer une interrogation de recherche
CN101840400A (zh) * 2009-03-19 2010-09-22 北大方正集团有限公司 一种多级分类检索方法及系统
US20100293174A1 (en) * 2009-05-12 2010-11-18 Microsoft Corporation Query classification
CN101714172A (zh) * 2009-11-13 2010-05-26 华中科技大学 一种支持访问控制的索引结构及其检索方法

Also Published As

Publication number Publication date
CN102317943B (zh) 2013-10-02
CN102317943A (zh) 2012-01-11

Similar Documents

Publication Publication Date Title
WO2012106941A1 (fr) Procédé et dispositif de recherche en texte intégral
CN110390044B (zh) 一种相似网络页面的搜索方法及设备
CN103514183B (zh) 基于交互式文档聚类的信息检索方法及系统
JP5316158B2 (ja) 情報処理装置、全文検索方法、全文検索プログラム、及び記録媒体
CN103593336B (zh) 一种基于语义分析的知识推送系统及方法
KR102468930B1 (ko) 관심대상 문서 필터링 시스템 및 그 방법
CN103559191B (zh) 基于隐空间学习和双向排序学习的跨媒体排序方法
CN111026710A (zh) 一种数据集的检索方法及系统
WO2010105218A2 (fr) Système et procédé de recherche de connaissances
CN102129470A (zh) 标签聚类方法和系统
CN113779995B (zh) 一种基于文本挖掘的科技文献数据自动抽取方法及系统
CN104881398B (zh) 中国作者所发英文文献的作者机构信息抽取方法
Ashok Kumar et al. An efficient text-based image retrieval using natural language processing (NLP) techniques
JP2008210024A (ja) 文書集合分析装置,文書集合分析方法,その方法を実装したプログラム及びそのプログラムを格納した記録媒体
CN110990676A (zh) 一种社交媒体热点主题提取方法与系统
CN114330329A (zh) 一种业务内容搜索方法、装置、电子设备及存储介质
CN105373546A (zh) 一种用于知识服务的信息处理方法及系统
CN106897437B (zh) 一种知识系统的高阶规则多分类方法及其系统
CN106776910A (zh) 一种搜索结果的显示方法及装置
CN114461783A (zh) 关键词生成方法、装置、计算机设备、存储介质和产品
CN102081604A (zh) 一种用于元搜索引擎的搜索方法及其装置
CN106503153A (zh) 一种计算机文本分类体系、系统及其文本分类方法
CN110347922B (zh) 基于相似度的推荐方法、装置、设备和存储介质
CN106933993B (zh) 一种资讯处理方法及装置
CN118349621A (zh) 一种索引建立方法、检索方法和电子设备

Legal Events

Date Code Title Description
WWE Wipo information: entry into national phase

Ref document number: 201180001323.7

Country of ref document: CN

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 11858381

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 11858381

Country of ref document: EP

Kind code of ref document: A1