[go: up one dir, main page]

CN114818706A - A text matching method, device and government service text matching method - Google Patents

A text matching method, device and government service text matching method Download PDF

Info

Publication number
CN114818706A
CN114818706A CN202110130726.7A CN202110130726A CN114818706A CN 114818706 A CN114818706 A CN 114818706A CN 202110130726 A CN202110130726 A CN 202110130726A CN 114818706 A CN114818706 A CN 114818706A
Authority
CN
China
Prior art keywords
text
entity name
matched
name text
entity
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110130726.7A
Other languages
Chinese (zh)
Other versions
CN114818706B (en
Inventor
王彬铸
郭立帆
李海军
丁菱
韩雨轩
韩喆
冉秋萍
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alibaba Group Holding Ltd
Original Assignee
Alibaba Group Holding Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Group Holding Ltd filed Critical Alibaba Group Holding Ltd
Priority to CN202110130726.7A priority Critical patent/CN114818706B/en
Publication of CN114818706A publication Critical patent/CN114818706A/en
Application granted granted Critical
Publication of CN114818706B publication Critical patent/CN114818706B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying
    • G06F16/90335Query processing
    • G06F16/90344Query processing by using string matching techniques
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/194Calculation of difference between files

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application provides a text matching method, which comprises the following steps: obtaining a target entity type corresponding to the entity name text to be matched; according to the target entity category, obtaining a candidate standardized entity name text corresponding to the entity name text to be matched, wherein the entity category is the same as the target entity category; and obtaining a target standardized entity name text matched with the entity name text to be matched from the candidate standardized entity name text according to the text similarity between the entity name text to be matched and the candidate standardized entity name text. According to the text matching method, the target standardized entity name text matched with the entity name text to be matched can be obtained according to the text similarity between the entity name text to be matched and the candidate standardized entity name text, the entity name text to be matched and the target standardized entity name text do not need to be matched in a manual summarizing mode, and therefore the entity name document standardization efficiency is improved.

Description

一种文本匹配方法、装置以及政务服务文本匹配方法A text matching method, device and government service text matching method

技术领域technical field

本申请涉及计算机技术领域,具体涉及一种文本匹配方法。本申请同时涉 及一种文本匹配装置、电子设备以及存储介质、政务服务文本匹配方法。The present application relates to the field of computer technology, in particular to a text matching method. The present application also relates to a text matching device, an electronic device, a storage medium, and a text matching method for government services.

背景技术Background technique

随着互联网技术的快速发展,越来越多基于互联网的业务服务体系开始出 现或已应用在不同的服务领域,如:应用于政府服务的“互联网+政务服务”的 技术体系已经开始建设。但是,基于互联网的业务服务体系在给同时,也不得 不面对多诸多问题,如:实体名称文档标准化的问题。With the rapid development of Internet technology, more and more Internet-based business service systems have begun to appear or have been applied in different service fields. For example, the technical system of "Internet + government services" applied to government services has begun to be constructed. However, the Internet-based business service system has to face many problems, such as the standardization of entity name documents.

以“互联网+政务服务”为例,在政务服务事项优化过程中需要分析哪些材 料是可以电子化的,此时,通常的做法是从各政府收集政务服务事项最依赖的 材料,再分析哪些材料是可以电子化的。然而,各地方政府在描述政务服务事 项所需要的政务材料时,对政务服务材料名称的描述往往是非标准化的,如: 将“中华人民共和国居民身份证”描述成“个人身份证”,“双人身份证”,“夫 妻身份证”等等。对政务服务材料名称的非标准化描述,不仅造成了糟糕的用 户体验,还给后续的政务服务事项优化带来了很大的挑战。因此,如何将非标 准化实体名称文档标准化,成为基于互联网的业务服务体系发展亟待解决的问题。Taking "Internet + government services" as an example, in the process of optimizing government services, it is necessary to analyze which materials can be digitized. At this time, the usual practice is to collect the most reliant materials for government services from various governments, and then analyze which materials can be electronic. However, when local governments describe the government materials required for government service matters, the descriptions of the names of government service materials are often non-standard, such as: "People's Republic of China resident ID card" is described as "personal ID card", "double ID card" ID card", "spouse ID card" and so on. The non-standard description of the names of government service materials not only causes a bad user experience, but also brings great challenges to the subsequent optimization of government service items. Therefore, how to standardize the non-standardized entity name document has become an urgent problem to be solved in the development of the Internet-based business service system.

现有技术中,解决非标准化实体名称文档标准化问题的方法一般为:建立 包含大量相同类别实体名称文档数据库,通过人工汇总的方式把数据库中描述 同一实体名称的非标准化实体名称文档汇总后链接到描述该实体名称的标准化 实体名称文档上。由于现有解决非标准化实体名称文档标准化问题的方法需要 基于人工汇总,从而导致了非标准化实体名称文档标准化的效率较低。In the prior art, the method for solving the standardization problem of non-standardized entity name documents is generally as follows: establishing a database containing a large number of entity name documents of the same category, and manually summarizing the non-standardized entity name documents describing the same entity name in the database and linking them to the database. on the Standardized Entity Name document describing the name of this entity. Since the existing methods to solve the problem of standardization of non-standardized entity name documents need to be based on manual aggregation, the efficiency of standardization of non-standardized entity name documents is low.

发明内容SUMMARY OF THE INVENTION

本申请提供一种文本匹配方法、装置、电子设备以及存储介质,以提高实 体名称文档标准化的效率。The present application provides a text matching method, apparatus, electronic device and storage medium to improve the efficiency of entity name document standardization.

本申请提供一种文本匹配方法,包括:The present application provides a text matching method, including:

获得待匹配实体名称文本对应的目标实体类别;Obtain the target entity category corresponding to the entity name text to be matched;

根据所述目标实体类别,获得实体类别与所述目标实体类别相同的所述待 匹配实体名称文本对应的候选标准化实体名称文本;According to the target entity category, obtain the candidate standardized entity name text corresponding to the entity name text to be matched whose entity category is the same as the target entity category;

根据所述待匹配实体名称文本与所述候选标准化实体名称文本的文本相似 度,从所述候选标准化实体名称文本中获得与所述待匹配实体名称文本匹配的 目标标准化实体名称文本。According to the text similarity between the entity name text to be matched and the candidate standardized entity name text, the target standardized entity name text that matches the to-be-matched entity name text is obtained from the candidate standardized entity name text.

可选的,还包括:将所述目标标准化实体名称文本提供给用户设备。Optionally, the method further includes: providing the target standardized entity name text to the user equipment.

可选的,还包括:将所述待匹配实体名称文本与所述目标标准化实体名称 文本进行关联。Optionally, the method further includes: associating the entity name text to be matched with the target standardized entity name text.

可选的,所述将所述待匹配实体名称文本与所述目标标准化实体名称文本 进行关联,包括:建立所述待匹配实体名称文本与所述目标标准化实体名称文 本的对应关系。Optionally, the associating the entity name text to be matched with the target standardized entity name text includes: establishing a correspondence between the entity name text to be matched and the target standardized entity name text.

可选的,所述获得待匹配实体名称文本对应的目标实体类别,包括:获得 所述用户设备发出的文本匹配指令,所述文本匹配指令中携带有所述待匹配实 体名称文本;Optionally, the obtaining the target entity category corresponding to the entity name text to be matched includes: obtaining a text matching instruction sent by the user equipment, where the text matching instruction carries the entity name text to be matched;

所述将所述位置推荐信息提供给用户设备,包括:针对所述文本匹配指令, 将所述目标标准化实体名称文本提供给所述用户设备。The providing the location recommendation information to the user equipment includes: for the text matching instruction, providing the target standardized entity name text to the user equipment.

可选的,还包括:展示所述目标标准化实体名称文本。Optionally, the method further includes: displaying the text of the target standardized entity name.

可选的,所述获得待匹配实体名称文本对应的目标实体类别,包括:Optionally, the obtaining the target entity category corresponding to the entity name text to be matched includes:

采取预设的分词策略对所述待匹配实体名称文本进行分词,获得所述待匹 配实体名称文本中的类别关键词;Adopt a preset word segmentation strategy to perform word segmentation on the entity name text to be matched, and obtain the category keywords in the entity name text to be matched;

根据所述待匹配实体名称文本中的类别关键词,获得所述目标实体类别。The target entity category is obtained according to the category keyword in the entity name text to be matched.

可选的,所述根据所述目标实体类别,获得实体类别与所述目标实体类别 相同的所述待匹配实体名称文本对应的候选标准化实体名称文本,包括:Optionally, according to the target entity category, obtain the candidate standardized entity name text corresponding to the entity name text to be matched whose entity category is the same as the target entity category, including:

根据所述待匹配实体名称文本中的关键词,获得所述待匹配实体名称相关 联的关联标准化实体名称文本;According to the keywords in the entity name text to be matched, obtain the associated standardized entity name text associated with the entity name to be matched;

获得所述关联标准化实体名称文本的实体类别;obtaining the entity category associated with the normalized entity name text;

根据所述目标实体类别和所述实体类别,从所述关联标准化实体名称文本 中获得所述候选标准化实体名称文本。The candidate normalized entity name text is obtained from the associated normalized entity name text based on the target entity class and the entity class.

可选的,所述根据所述待匹配实体名称文本与所述候选标准化实体名称文 本的文本相似度,从所述候选标准化实体名称文本中获得与所述待匹配实体名 称文本匹配的目标标准化实体名称文本,包括:Optionally, according to the text similarity between the entity name text to be matched and the candidate standardized entity name text, obtain the target standardized entity matching the to-be-matched entity name text from the candidate standardized entity name text. Name text, including:

获得所述待匹配实体名称文本中的关键词的权重和所述候选标准化实体名 称文本中的关键词的权重;Obtain the weight of the keyword in the entity name text to be matched and the weight of the keyword in the candidate standardized entity name text;

获得所述待匹配实体名称文本中的关键词对应的第一词向量,以及所述候 选标准化实体名称文本中的关键词对应的第二词向量;Obtain the first word vector corresponding to the keyword in the entity name text to be matched, and the second word vector corresponding to the keyword in the candidate standardized entity name text;

根据所述待匹配实体名称文本中的关键词的权重、所述候选标准化实体名 称文本中的关键词的权重、所述第一词向量以及所述第二词向量,获得所述第 一词向量和所述第二词向量的词向量相似度;The first word vector is obtained according to the weight of the keyword in the entity name text to be matched, the weight of the keyword in the candidate standardized entity name text, the first word vector and the second word vector word vector similarity with the second word vector;

根据所述词向量相似度,获得所述文本相似度。According to the word vector similarity, the text similarity is obtained.

可选的,所述根据所述词向量相似度,获得所述文本相似度,包括:Optionally, obtaining the text similarity according to the word vector similarity includes:

获得所述待匹配实体名称文本匹配的字符串;Obtain the character string matched by the name of the entity to be matched;

获得所述候选标准化实体名称文本对应的字符串;obtaining the character string corresponding to the candidate standardized entity name text;

根据所述待匹配实体名称文本匹配的字符串以及所述候选标准化实体名称 文本对应的字符串,获得所述待匹配实体名称文本匹配的字符串与所述候选标 准化实体名称文本对应的字符串的字符串相似度;According to the character string matched by the entity name text to be matched and the character string corresponding to the candidate standardized entity name text, obtain the difference between the character string matched by the to-be-matched entity name text and the character string corresponding to the candidate standardized entity name text string similarity;

根据所述词向量相似度和所述字符串相似度,获得所述文本相似度。The text similarity is obtained according to the word vector similarity and the character string similarity.

可选的,所述根据所述词向量相似度和所述字符串相似度,获得所述文本 相似度,包括:根据预设的所述词向量相似度对应的第一相似度权重以及所述 词向量相似度对应的第二相似度权重,对所述词向量相似度和所述字符串相似 度进行加权,获得所述文本相似度。Optionally, obtaining the text similarity according to the word vector similarity and the character string similarity includes: according to a preset first similarity weight corresponding to the word vector similarity and the The second similarity weight corresponding to the word vector similarity is weighting the word vector similarity and the character string similarity to obtain the text similarity.

可选的,还包括:判断所述文本相似度是否达到文本相似度阈值;Optionally, it also includes: judging whether the text similarity reaches a text similarity threshold;

所述根据所述待匹配实体名称文本与所述候选标准化实体名称文本的文本 相似度,从所述候选标准化实体名称文本中获得与所述待匹配实体名称文本匹 配的目标标准化实体名称文本,包括:若所述文本相似度达到所述文本相似度 阈值,则从所述候选标准化实体名称文本中,获得所述文本相似度达到所述文 本相似度阈值的所述候选标准化实体名称文本作为目标标准化实体名称文本。According to the text similarity between the entity name text to be matched and the candidate standardized entity name text, the target standardized entity name text that matches the to-be-matched entity name text is obtained from the candidate standardized entity name text, including : if the text similarity reaches the text similarity threshold, obtain the candidate standardized entity name text whose text similarity reaches the text similarity threshold from the candidate standardized entity name text as the target standardization Entity name text.

可选的,所述从所述候选标准化实体名称文本中,获得所述文本相似度达 到所述文本相似度阈值的所述候选标准化实体名称文本作为目标标准化实体名 称文本,包括:从所述候选标准化实体名称文本中,获得所述文本相似度达到 所述文本相似度阈值且相似度最高的所述候选标准化实体名称文本作为目标标 准化实体名称文本。Optionally, obtaining the candidate standardized entity name text whose text similarity reaches the text similarity threshold from the candidate standardized entity name text as the target standardized entity name text includes: obtaining the candidate standardized entity name text from the candidate standardized entity name text. In the standardized entity name text, the candidate standardized entity name text whose text similarity reaches the text similarity threshold and has the highest similarity is obtained as the target standardized entity name text.

可选的,还包括:若所述文本相似度未达到所述文本相似度阈值,则确定 所述候选标准化实体名称文本中不存在所述目标标准化实体名称文本。Optionally, it also includes: if the text similarity does not reach the text similarity threshold, determining that the target standardized entity name text does not exist in the candidate standardized entity name text.

本申请另一方面,还提供一种文本匹配装置,包括:Another aspect of the present application provides a text matching device, comprising:

目标实体类别获得单元,用于获得待匹配实体名称文本对应的目标实体类 别;The target entity category obtaining unit is used to obtain the target entity category corresponding to the name text of the entity to be matched;

候选文本获得单元,用于根据所述目标实体类别,获得实体类别与所述目 标实体类别相同的所述待匹配实体名称文本对应的候选标准化实体名称文本;A candidate text obtaining unit is used to obtain, according to the target entity category, a candidate standardized entity name text corresponding to the entity name text to be matched whose entity category is the same as the target entity category;

目标文本匹配单元,用于根据所述待匹配实体名称文本与所述候选标准化 实体名称文本的文本相似度,从所述候选标准化实体名称文本中获得与所述待 匹配实体名称文本匹配的目标标准化实体名称文本。A target text matching unit, configured to obtain a target standardization matching the entity name text to be matched from the candidate standardized entity name text according to the text similarity between the entity name text to be matched and the candidate standardized entity name text Entity name text.

本申请另一方面,还提供一种电子设备,包括:In another aspect of the present application, an electronic device is also provided, comprising:

处理器;以及processor; and

存储器,用于存储文本匹配方法的程序,该设备通电并通过所述处理器运 行该文本匹配方法的程序后,执行下述步骤:The memory is used to store the program of the text matching method, and after the device is powered on and the program of the text matching method is run by the processor, the following steps are performed:

获得待匹配实体名称文本对应的目标实体类别;Obtain the target entity category corresponding to the entity name text to be matched;

根据所述目标实体类别,获得实体类别与所述目标实体类别相同的所述待 匹配实体名称文本对应的候选标准化实体名称文本;According to the target entity category, obtain the candidate standardized entity name text corresponding to the entity name text to be matched whose entity category is the same as the target entity category;

根据所述待匹配实体名称文本与所述候选标准化实体名称文本的文本相似 度,从所述候选标准化实体名称文本中获得与所述待匹配实体名称文本匹配的 目标标准化实体名称文本。According to the text similarity between the entity name text to be matched and the candidate standardized entity name text, the target standardized entity name text that matches the to-be-matched entity name text is obtained from the candidate standardized entity name text.

本申请另一方面,还提供一种存储介质,存储有文本匹配方法的程序,该 程序被处理器运行,执行下述步骤:获得待匹配实体名称文本对应的目标实体 类别;On the other hand, the present application also provides a storage medium that stores a program of the text matching method, the program is run by the processor, and the following steps are performed: obtaining the target entity category corresponding to the entity name text to be matched;

根据所述目标实体类别,获得实体类别与所述目标实体类别相同的所述待 匹配实体名称文本对应的候选标准化实体名称文本;According to the target entity category, obtain the candidate standardized entity name text corresponding to the entity name text to be matched whose entity category is the same as the target entity category;

根据所述待匹配实体名称文本与所述候选标准化实体名称文本的文本相似 度,从所述候选标准化实体名称文本中获得与所述待匹配实体名称文本匹配的 目标标准化实体名称文本。According to the text similarity between the entity name text to be matched and the candidate standardized entity name text, the target standardized entity name text that matches the to-be-matched entity name text is obtained from the candidate standardized entity name text.

本申请另一方面,还提供一种政务服务文本匹配方法,包括:On the other hand, the present application also provides a method for text matching in government services, including:

获得用于描述政务服务材料名称的、待匹配实体名称文本对应的目标实体 类别;Obtain the target entity category corresponding to the name text of the entity to be matched, which is used to describe the name of the government service material;

根据所述目标实体类别,获得实体类别与所述目标实体类别相同的所述待 匹配实体名称文本对应的候选标准化实体名称文本;According to the target entity category, obtain the candidate standardized entity name text corresponding to the entity name text to be matched whose entity category is the same as the target entity category;

根据所述待匹配实体名称文本与所述候选标准化实体名称文本的文本相似 度,从所述候选标准化实体名称文本中获得与所述待匹配实体名称文本匹配的 目标标准化实体名称文本;According to the text similarity between the entity name text to be matched and the candidate standardized entity name text, obtain the target standardized entity name text that matches the to-be-matched entity name text from the candidate standardized entity name text;

将所述待匹配实体名称文本与所述目标标准化实体名称文本进行关联。Associating the entity name text to be matched with the target standardized entity name text.

本申请另一方面,还提供一种地址文本匹配方法,包括:In another aspect of the present application, an address text matching method is also provided, including:

获得用于描述地理位置名称的、待匹配实体名称文本对应的目标实体类别;Obtain the target entity category corresponding to the entity name text to be matched for describing the geographic location name;

根据所述目标实体类别,获得实体类别与所述目标实体类别相同的所述待 匹配实体名称文本对应的候选标准化实体名称文本;According to the target entity category, obtain the candidate standardized entity name text corresponding to the entity name text to be matched whose entity category is the same as the target entity category;

根据所述待匹配实体名称文本与所述候选标准化实体名称文本的文本相似 度,从所述候选标准化实体名称文本中获得与所述待匹配实体名称文本匹配的 目标标准化实体名称文本;According to the text similarity between the entity name text to be matched and the candidate standardized entity name text, obtain the target standardized entity name text that matches the to-be-matched entity name text from the candidate standardized entity name text;

将所述待匹配实体名称文本与所述目标标准化实体名称文本进行关联。Associating the entity name text to be matched with the target standardized entity name text.

与现有技术相比,本申请具有如下优点:Compared with the prior art, the present application has the following advantages:

本申请提供一种文本匹配方法,通过获得待匹配实体名称文本对应的目标 实体类别;根据所述目标实体类别,获得实体类别与所述目标实体类别相同的 所述待匹配实体名称文本对应的候选标准化实体名称文本;根据所述待匹配实 体名称文本与所述候选标准化实体名称文本的文本相似度,从所述候选标准化 实体名称文本中获得与所述待匹配实体名称文本匹配的目标标准化实体名称文 本。本申请提供的文本匹配方法,能够根据待匹配实体名称文本与候选标准化 实体名称文本的文本相似度,获得待匹配实体名称文本匹配的目标标准化实体 名称文本,无需依靠人工汇总的方式来将待匹配实体名称文本与目标标准化实 体名称文本进行匹配,从而提高了实体名称文档标准化的效率。The present application provides a text matching method, by obtaining the target entity category corresponding to the entity name text to be matched; according to the target entity category, obtain the candidate entity corresponding to the entity name text to be matched whose entity category is the same as the target entity category standardized entity name text; according to the text similarity between the entity name text to be matched and the candidate standardized entity name text, obtain the target standardized entity name matching the to-be-matched entity name text from the candidate standardized entity name text text. The text matching method provided by the present application can obtain the target standardized entity name text matched by the entity name text to be matched according to the text similarity between the entity name text to be matched and the candidate standardized entity name text, and does not need to rely on manual aggregation to match the to-be-matched entity name text. The entity name text is matched with the target normalized entity name text, thereby improving the efficiency of entity name document normalization.

附图说明Description of drawings

图1为本申请第一实施例提供的文本匹配方法的第一场景示意图。FIG. 1 is a schematic diagram of a first scenario of the text matching method provided by the first embodiment of the present application.

图2为本申请第一实施例中提供的一种文本匹配方法的流程图。FIG. 2 is a flowchart of a text matching method provided in the first embodiment of the present application.

图3为本申请第一实施例中提供的文本匹配方法的第一场景示意图。FIG. 3 is a schematic diagram of a first scenario of the text matching method provided in the first embodiment of the present application.

图4为本申请第二实施例中提供的一种文本匹配装置的示意图。FIG. 4 is a schematic diagram of a text matching apparatus provided in a second embodiment of the present application.

图5为本申请实施例中提供的一种电子设备的示意图。FIG. 5 is a schematic diagram of an electronic device provided in an embodiment of the present application.

图6为本申请第五实施例中提供的一种文本匹配方法的流程图。FIG. 6 is a flowchart of a text matching method provided in a fifth embodiment of the present application.

具体实施方式Detailed ways

在下面的描述中阐述了很多具体细节以便于充分理解本申请。但是本申请 能够以很多不同于在此描述的其它方式来实施,本领域技术人员可以在不违背 本申请内涵的情况下做类似推广,因此本申请不受下面公开的具体实施的限制。In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present application. However, the application can be implemented in many other ways different from those described here, and those skilled in the art can make similar promotions without violating the connotation of the application. Therefore, the application is not limited by the specific implementation disclosed below.

为了更清楚地展示本申请第一实施例中提供的文本匹配方法,首先介绍一 下本申请第一实施例中提供的文本匹配方法的应用场景。本申请第一实施例中 提供的文本匹配方法,执行主体可以为服务端,也可以为安装有相关文本识别 应用的客户端,也可以同时为服务端和客户端,也就是通过为服务端和客户端 与服务端之间的交互来完成文本匹配方法。所谓客户端为安装在用户设备上能 够实现本申请实施例提供的文本匹配方法的应用程序或软件,所谓用户设备在 具体实现方式上一般为手机、PC(Personal Computer,个人计算机)等以及平板 电脑等,所谓能够实现本申请实施例提供的文本匹配方法的应用程序(APP, Application)或软件可以为手机应用程序、网页版在线文本匹配软件以及电脑软 件等。所谓服务端为用于为上述客户端提供数据处理等服务的计算设备,所谓 服务端在具体实现方式上一般为服务器或者服务器集群。In order to more clearly demonstrate the text matching method provided in the first embodiment of the present application, an application scenario of the text matching method provided in the first embodiment of the present application is first introduced. For the text matching method provided in the first embodiment of the present application, the execution subject may be the server, or the client with the relevant text recognition application installed, or both the server and the client, that is, by performing the server and the client at the same time. The interaction between the client and the server to complete the text matching method. The so-called client is an application program or software installed on the user equipment that can implement the text matching method provided by the embodiments of the present application. The so-called user equipment is generally a mobile phone, a PC (Personal Computer, personal computer), etc., and a tablet computer in a specific implementation manner. etc., the so-called application program (APP, Application) or software capable of implementing the text matching method provided by the embodiments of the present application may be a mobile phone application program, web version online text matching software, computer software, and the like. The so-called server is a computing device used to provide services such as data processing for the above-mentioned clients, and the so-called server is generally a server or a server cluster in a specific implementation manner.

本申请第一实施例中,具体以通过为服务端和客户端与服务端之间的交互 来完成文本匹配方法的应用场景、客户端为安装在电脑上能够实现本申请实施 例提供的文本匹配方法的电脑软件为例,对本申请第一实施例中提供的文本匹 配方法进行详细地说明请参照图1,其为本申请第一实施例提供的文本匹配方法 的第一场景示意图。In the first embodiment of the present application, the application scenario of the text matching method is completed through the interaction between the server and the client and the server, and the client can realize the text matching provided by the embodiment of the present application if the client is installed on a computer. Taking the computer software of the method as an example, for a detailed description of the text matching method provided in the first embodiment of the present application, please refer to FIG. 1 , which is a schematic diagram of a first scenario of the text matching method provided by the first embodiment of the present application.

首先,客户端101在用户设备获得针对客户端101触发的文本匹配触发操 作后,会基于用户设备获得待匹配实体名称文本。First, after the user equipment obtains the text matching trigger operation triggered by the client 101, the client 101 will obtain the name text of the entity to be matched based on the user equipment.

本申请第一实施例中,所谓实体名称文本为目标文本中用于描述实体的名 称的文本,所谓目标文本一般为一片文档、一段话或者一句话,如:政务服务 事项、政务服务之外的网络服务事项、设备操作流程以及化学实验步骤介绍、 以及等。当实体名称文本为用于描述政务服务材料名称的文本时,所谓实体一 般为政务服务事项中的所需要的政务服务材料;当实体名称文本为用于描述网 络服务材料名称的文本时,所谓实体也可以为政务服务之外的网络服务事项中 的所需要的网络服务材料。另外,所谓实体还可以为其他类型的实体,如:设 备操作流程说明中的设备、化学实验步骤介绍中的化学品以及化学反应装置等。 也就是说,本申请第一实施例中,对目标文本以及实体不做具体限定。In the first embodiment of the present application, the so-called entity name text is the text used to describe the name of the entity in the target text, and the so-called target text is generally a piece of document, a paragraph or a sentence, such as: government service matters, other than government affairs services Network service matters, equipment operation procedures and chemical experiment steps introduction, and so on. When the entity name text is the text used to describe the name of the government service material, the so-called entity is generally the required government service material in the government service matters; when the entity name text is the text used to describe the name of the network service material, the so-called entity It can also be used for network service materials required in network service matters other than government services. In addition, the so-called entity can also be other types of entities, such as: equipment in the description of equipment operation procedures, chemicals in the introduction of chemical experiment steps, and chemical reaction devices. That is to say, in the first embodiment of the present application, the target text and the entity are not specifically limited.

所谓待匹配实体名称文本一般为非标准化实体名称文本,也可以为标准化 实体名称文本。所谓标准化实体名称文本和非标准化实体名称文本分别为对同 一实体名称的标准描述文本和非标准描述文本。具体可以以“中华人民共和国 居民身份证”为例,“中华人民共和国居民身份证”的标准化实体名称文本为“中 华人民共和国居民身份证”,非标准化实体名称文本可以为“个人身份证”,“双 人身份证”,“夫妻身份证”等等。也可以以“不动产登记申请表”为例,“不动 产登记申请表”的标准化实体名称文本为“不动产登记申请表”,非标准化实体 名称文本可以为“不动产登记申请书”等。还可以“人事档案”为例,“人事档 案”的标准化实体名称文本为“人事档案”,非标准化实体名称文本可以为“申 请人人事档案”、“参保人员人事档案”以及“人事档案原件”等等。The so-called entity name text to be matched is generally a non-standardized entity name text, and can also be a standardized entity name text. The so-called standardized entity name text and non-standardized entity name text are respectively standard description text and non-standard description text for the same entity name. Specifically, the "People's Republic of China Resident Identity Card" can be taken as an example. The standardized entity name text of the "People's Republic of China Resident Identity Card" is "People's Republic of China Resident Identity Card", and the non-standardized entity name text can be "Personal Identity Card". "Double ID Card", "Couple ID Card" and so on. The “Real Estate Registration Application Form” can also be taken as an example. The standardized entity name text of the “Real Estate Registration Application Form” is “Real Estate Registration Application Form”, and the non-standardized entity name text can be “Real Estate Registration Application Form”, etc. You can also take "Personnel File" as an example. The standardized entity name text of "Personnel File" is "Personnel File", and the text of non-standardized entity name can be "Applicant Personnel File", "Personnel File of Insured Person" and "Original Personnel File". "and many more.

在“互联网+政务服务”中,对政务服务材料名称的非标准化描述,不仅造 成了糟糕的用户体验,还给后续的政务服务事项优化带来了很大的挑战。在其 他应用场景也会为用户带来不便。因此,本申请第一实施例中提供的文本匹配 方法需要为待匹配实体名称文本匹配与其对应的标准化实体名称文本。即,获 得与待匹配实体名称文本匹配的目标标准化实体名称文本,也就是说,对非标 准化实体名称文本标准化。所谓目标标准化实体名称文本即与待匹配实体名称 文本的文本相似度超出文本相似度阈值且与该待匹配实体名称文本描述同一实 体名称的标准描述文本。In "Internet + government services", the non-standard description of the names of government service materials not only caused a bad user experience, but also brought great challenges to the subsequent optimization of government services. In other application scenarios, it will also bring inconvenience to users. Therefore, the text matching method provided in the first embodiment of the present application needs to match the entity name text to be matched with the corresponding standardized entity name text. That is, the target normalized entity name text that matches the entity name text to be matched is obtained, that is, the non-standardized entity name text is normalized. The so-called target standardized entity name text is the standard description text whose text similarity with the entity name text to be matched exceeds the text similarity threshold and describes the same entity name as the entity name text to be matched.

然后,客户端101获得待匹配实体名称文本后,会基于用户对用户设备的 触发操作,进一步向服务端102发出文本匹配指令,该文本匹配指令中携带有 待匹配实体名称文本。另外,客户端101获得待匹配实体名称文本后还可以先 将待匹配实体名称文本发送给服务端102,再向服务端102发出针对该待匹配实 体名称文本的文本匹配指令。本申请第一实施例中,并不对客户端101将待匹 配实体名称文本发送给服务端102的具体方式作具体的限定。在服务端102获 得携带有待匹配实体名称文本的文本匹配指令后,会执行依次以下步骤来获得 与待匹配实体名称文本匹配的目标标准化实体名称文本,具体请参照图2,其为 本申请第一实施例中提供的一种文本匹配方法的流程图。Then, after obtaining the text of the entity name to be matched, the client 101 will further issue a text matching instruction to the server 102 based on the user's trigger operation on the user equipment, and the text matching instruction carries the text of the entity name to be matched. In addition, after obtaining the entity name text to be matched, the client 101 may first send the entity name text to be matched to the server 102, and then send a text matching instruction for the entity name text to be matched to the server 102. In the first embodiment of the present application, the specific manner in which the client 101 sends the text of the entity name to be matched to the server 102 is not specifically limited. After the server 102 obtains the text matching instruction carrying the entity name text to be matched, the following steps are performed in order to obtain the target standardized entity name text matching the entity name text to be matched. For details, please refer to FIG. A flowchart of a text matching method provided in the embodiment.

在步骤S201中,获得待匹配实体名称文本对应的目标实体类别。In step S201, the target entity category corresponding to the name text of the entity to be matched is obtained.

所谓实体类别为实体名称文本描述的实体的类别,该实体类别为根据预设 的实体类别划分策略预先划分好的类别。在具体实施过程中,获得目标实体类 别的过程为:首先,采取预设的分词策略对待匹配实体名称文本进行分词,获 得待匹配实体名称文本中的类别关键词;然后,根据待匹配实体名称文本中的 类别关键词,获得目标实体类别。所谓类别关键词为实体名称文本中能够标识 实体类别的词。具体以实体为政务服务材料为例,如:待匹配实体名称文本为 “某某申请书”、“某某认证书”、“某某证”以及“某某表”等时,对“某某申 请书”、“某某认证书”、“某某证”以及“某某表”等来说,“申请书”、“认证书”、 “证”以及“表”等即为待匹配实体名称文本中的类别关键词。本申请第一实 施例中,根据预设的实体类别划分策略预先划分好的类别,也就是根据预先统 计的实体名称文本中的类别关键词确定的实体类别。因此,“申请书”类、“认 证书”类、“证”类以及“表”类等也就是根据预设的实体类别划分策略预先划 分好的类别。The so-called entity category is the category of the entity described by the entity name text, and the entity category is a pre-divided category according to a preset entity category division strategy. In the specific implementation process, the process of obtaining the target entity category is as follows: first, adopt a preset word segmentation strategy to segment the text of the entity name to be matched, and obtain the category keywords in the text of the entity name to be matched; then, according to the entity name text to be matched. The category keywords in , get the target entity category. The so-called category keywords are words in the entity name text that can identify entity categories. Specifically, taking the entity as a government service material as an example, for example: when the text of the entity name to be matched is "XX application", "XX certification", "XX certificate" and "XX form", etc. For example, "application", "certificate", "certificate" and "table", "application", "certification", "certificate" and "table" are the names of the entities to be matched Category keywords in the text. In the first embodiment of the present application, the categories are pre-divided according to the preset entity category division strategy, that is, the entity categories are determined according to the category keywords in the pre-statistical entity name text. Therefore, the "application" category, "certificate" category, "certificate" category and "table" category are also pre-divided categories according to the preset entity category division strategy.

本申请第一实施例中,根据待匹配实体名称文本中的类别关键词,获得目 标实体类别的具体实现方式为:根据待匹配实体名称文本中的类别关键词以及 类别关键词与类别的对应关系,获得目标实体类别。In the first embodiment of the present application, the specific implementation method of obtaining the target entity category according to the category keywords in the entity name text to be matched is: according to the category keywords in the entity name text to be matched and the corresponding relationship between the category keywords and categories , get the target entity class.

本申请第一实施例中,根据预先统计的实体名称文本中的类别关键词确定 实体的类别,能够确保在获得实体名称文本中的类别关键词后,即可直接根据 类别关键词来获得实体名称文本的实体类别,从而使实体名称文本的实体类别 的获取更为简单、快捷。In the first embodiment of the present application, the category of the entity is determined according to the category keywords in the pre-statistical entity name text, which ensures that after the category keywords in the entity name text are obtained, the entity name can be obtained directly according to the category keywords. The entity category of the text, so that the acquisition of the entity category of the entity name text is simpler and faster.

需要说明的是,在遇到新实体名称文本时,可以利用逆向最大匹配法对新 实体名称文本的实体类别进行确认。It should be noted that when encountering a new entity name text, the reverse maximum matching method can be used to confirm the entity category of the new entity name text.

所谓预先设置的分词策略可以为:采用自然语言处理中用于分词的中文分 词器对待匹配实体名称文本进行分词。本申请第一实施例中,为了控制分词的 粒度,可以进一步引入个性化词典,将实体名称文本文本按照顺序切分成连续 词序,然后根据规则以及连续词序是否在给定的个性化词典中来决定连续词序 是否为最终的分词结果。The so-called pre-set word segmentation strategy can be: using the Chinese word segmenter used for word segmentation in natural language processing to segment the text of the entity name to be matched. In the first embodiment of the present application, in order to control the granularity of word segmentation, a personalized dictionary can be further introduced, and the entity name text text is divided into consecutive word orders according to the order, and then it is determined according to the rules and whether the consecutive word order is in a given personalized dictionary. Whether the consecutive word order is the final word segmentation result.

在步骤S202中,根据目标实体类别,获得实体类别与目标实体类别相同的 待匹配实体名称文本对应的候选标准化实体名称文本。In step S202, according to the target entity category, the candidate standardized entity name text corresponding to the entity name text to be matched whose entity category is the same as the target entity category is obtained.

所谓待匹配实体名称文本对应的候选标准化实体名称文本为基于待匹配实 体名称文本中的关键词获得的、与待匹配实体名称文本中的关键词相关联的标 准化实体名称文本。在具体实施过程中,针对待匹配实体名称文本,采用bm25 (Best Match,最佳匹配)的召回策略,并利用ES(Elastic Search,分布式全文 检索)工具在预设的标准化实体名称文本数据中快速召回待匹配实体名称相关 联的关联标准化实体名称文本。The so-called candidate standardized entity name text corresponding to the entity name text to be matched is the standardized entity name text obtained based on the keywords in the entity name text to be matched and associated with the keywords in the entity name text to be matched. In the specific implementation process, for the entity name text to be matched, the bm25 (Best Match, best match) recall strategy is adopted, and ES (Elastic Search, distributed full-text retrieval) tool is used in the preset standardized entity name text data. Quickly recall the associated normalized entity name text associated with the entity name to be matched.

本申请第一实施例中,获得候选标准化实体名称文本的具体实现方式为: 首先,根据待匹配实体名称文本中的关键词,获得待匹配实体名称相关联的关 联标准化实体名称文本。然后,获得关联标准化实体名称文本的实体类别。最 后,根据目标实体类别和实体类别,从关联标准化实体名称文本中获得候选标 准化实体名称文本。其中,关联标准化实体名称文本的实体类别的详细过程与 获得待匹配实体名称文本对应的目标实体类别的过程类似,详情请参照步骤 S201中对目标实体类别获得过程的描述,在此不再进行详细的赘述。In the first embodiment of the present application, the specific implementation method of obtaining the candidate standardized entity name text is as follows: First, according to the keywords in the entity name text to be matched, the associated standardized entity name text associated with the entity name to be matched is obtained. Then, obtain the entity category associated with the normalized entity name text. Finally, according to the target entity category and entity category, candidate normalized entity name texts are obtained from the associated normalized entity name texts. The detailed process of associating the entity category of the standardized entity name text is similar to the process of obtaining the target entity category corresponding to the entity name text to be matched. For details, please refer to the description of the acquisition process of the target entity category in step S201, which will not be described in detail here. of elaboration.

在步骤S203中,根据待匹配实体名称文本与候选标准化实体名称文本的文 本相似度,从候选标准化实体名称文本中获得与待匹配实体名称文本匹配的目 标标准化实体名称文本。In step S203, according to the text similarity between the entity name text to be matched and the candidate standardized entity name text, the target standardized entity name text matching the entity name text to be matched is obtained from the candidate standardized entity name text.

在具体实施过程中,文本相似度的获得过程如下:第一、获得待匹配实体 名称文本中的关键词的权重和候选标准化实体名称文本中的关键词的权重。第 二、获得待匹配实体名称文本中的关键词对应的第一词向量,以及候选标准化 实体名称文本中的关键词对应的第二词向量。第三、根据待匹配实体名称文本 中的关键词的权重、候选标准化实体名称文本中的关键词的权重、第一词向量 以及第二词向量,获得第一词向量和第二词向量的词向量相似度。第四、根据 词向量相似度,获得文本相似度。In the specific implementation process, the process of obtaining the text similarity is as follows: First, obtain the weight of the keywords in the entity name text to be matched and the weight of the keywords in the candidate standardized entity name text. Second, obtain the first word vector corresponding to the keyword in the entity name text to be matched, and the second word vector corresponding to the keyword in the candidate standardized entity name text. Third, according to the weight of the keyword in the entity name text to be matched, the weight of the keyword in the candidate standardized entity name text, the first word vector and the second word vector, obtain the words of the first word vector and the second word vector Vector similarity. Fourth, according to the word vector similarity, the text similarity is obtained.

本申请第一实施例中,获得待匹配实体名称文本中的关键词对应的第一词 向量和候选标准化实体名称文本中的关键词对应的第二词向量的方式为:基于 Word2vec(WordTo Vector,用来产生词向量的相关模型)模型,将待匹配实体 名称文本中的关键词、候选标准化实体名称文本中的关键词映射到一个向量, 从而获得第一词向量和第二词向量,所谓Word2vec模型为根据给定的语料库, 通过训练模型后获得的快速有效地将一个词语表达成向量形式的网络模型。In the first embodiment of the present application, the method of obtaining the first word vector corresponding to the keyword in the entity name text to be matched and the second word vector corresponding to the keyword in the candidate standardized entity name text is: based on Word2vec(WordTo Vector, The related model used to generate word vectors) model, which maps the keywords in the entity name text to be matched and the keywords in the candidate standardized entity name text to a vector, so as to obtain the first word vector and the second word vector, the so-called Word2vec The model is a network model that quickly and efficiently expresses a word into a vector form obtained by training the model according to a given corpus.

本申请第一实施例中,获得待匹配实体名称文本中的关键词的权重和候选 标准化实体名称文本中的关键词的权重的具体实现方式如下:In the first embodiment of the present application, the specific implementation manner of obtaining the weight of the keyword in the entity name text to be matched and the weight of the keyword in the candidate standardized entity name text is as follows:

首先,将待匹配实体名称文本和候选标准化实体名称文本作为一个目标文 本集。First, take the entity name text to be matched and the candidate standardized entity name text as a target text set.

然后,对待匹配实体名称文本以及候选标准化实体名称文本分别进行分词, 获得待匹配实体名称文本以及候选标准化实体名称文本中的不同关键词。Then, the to-be-matched entity name text and the candidate standardized entity name text are segmented respectively to obtain different keywords in the to-be-matched entity name text and the candidate standardized entity name text.

最后,采用TF-IDF(Term Frequency–Inverse Document Frequency,信息检 索数据挖掘的常用加权技术)技术,分别获得待匹配实体名称文本中的不同关 键词在待匹配实体名称文本中的TF(Term Frequency,词频)、待匹配实体名称 文本中的不同关键词在目标文本集中的IDF(Inverse Document Frequency,逆文 本频率指数),以及获得候选标准化实体名称文本中的不同关键词在候选标准化 实体名称文本中的词频、候选标准化实体名称文本中的不同关键词在目标文本 集中的逆文本频率指数;而后根据词频和逆文本频率指数,获得待匹配实体名 称文本中的不同关键词的TF-IDF以及候选标准化实体名称文本中的不同关键词 的TF-IDF;并根据待匹配实体名称文本中的不同关键词的TF-IDF以及候选标 准化实体名称文本中的不同关键词的TF-IDF,确定待匹配实体名称文本中的不 同关键词以及候选标准化实体名称文本中的不同关键词在计算词向量相似度的 权重。Finally, the TF-IDF (Term Frequency-Inverse Document Frequency, a commonly used weighting technique for information retrieval data mining) technology is used to obtain the TF (Term Frequency, word frequency), the IDF (Inverse Document Frequency, inverse text frequency index) of different keywords in the target text set in the entity name text to be matched, and obtain the different keywords in the candidate standardized entity name text in the candidate standardized entity name text. Word frequency and the inverse text frequency index of different keywords in the candidate standardized entity name text in the target text set; then according to the word frequency and inverse text frequency index, obtain the TF-IDF of different keywords in the entity name text to be matched and the candidate standardized entity TF-IDF of different keywords in the name text; and determine the entity name text to be matched according to the TF-IDF of different keywords in the entity name text to be matched and the TF-IDF of different keywords in the candidate standardized entity name text Different keywords in and different keywords in the candidate standardized entity name text are used to calculate the weight of word vector similarity.

以政务服务材料名称文本“中华人民共和国居民身份证”为例,如果政务 服务材料名称文本“中华人民共和国居民身份证”,针对与其对应的候选标准化 实体名称文本共同组成的目标文本集,“中华人民共和国”的TF-IDF为2,“居 民”的TF-IDF为1,“身份证”的TF-IDF为7,则计算词向量相似度时,“中华 人民共和国”的权重为0.2,“居民”的权重为0.1,“身份证”的权重为0.7。Taking the government service material name text "People's Republic of China Resident Identity Card" as an example, if the government service material name text "People's Republic of China Resident Identity Card", for the target text set composed of the corresponding candidate standardized entity name text, "China The TF-IDF of "People's Republic of China" is 2, the TF-IDF of "resident" is 1, and the TF-IDF of "ID card" is 7, then when calculating the word vector similarity, the weight of "People's Republic of China" is 0.2, " The weight of "resident" is 0.1, and the weight of "ID card" is 0.7.

本申请第一实施例中,根据待匹配实体名称文本中的关键词的权重、候选 标准化实体名称文本中的关键词的权重、第一词向量以及第二词向量,获得第 一词向量和第二词向量的词向量相似度的过程为:针对第一词向量和第二词向 量求解第一词向量和第二词向量的余弦相似度作为词向量相似度。具体的,在 针对第一词向量和第二词向量求解第一词向量和第二词向量的余弦相似度时, 将第一词向量和第二词向量中的元素与各自的权重相乘。In the first embodiment of the present application, the first word vector and the second word vector are obtained according to the weight of the keyword in the entity name text to be matched, the weight of the keyword in the candidate standardized entity name text, the first word vector and the second word vector. The process of the word vector similarity of the two word vectors is as follows: for the first word vector and the second word vector, the cosine similarity of the first word vector and the second word vector is obtained as the word vector similarity. Specifically, when calculating the cosine similarity of the first word vector and the second word vector for the first word vector and the second word vector, the elements in the first word vector and the second word vector are multiplied by their respective weights.

为了提高待匹配实体名称文本与候选标准化实体名称文本的文本相似度的 精确度,本申请第一实施例中,在获得文本相似度时,进一步引入了字符串相 似度来求解文本相似度,在具体实施过程中,首先,获得待匹配实体名称文本 匹配的字符串;其次,获得候选标准化实体名称文本对应的字符串;再次,根 据待匹配实体名称文本匹配的字符串以及候选标准化实体名称文本对应的字符 串,获得待匹配实体名称文本匹配的字符串与候选标准化实体名称文本对应的 字符串的字符串相似度;最后,根据词向量相似度和字符串相似度,获得文本 相似度。具体的,根据预设的词向量相似度对应的第一相似度权重以及词向量 相似度对应的第二相似度权重,对词向量相似度和字符串相似度进行加权,获 得文本相似度。In order to improve the accuracy of the text similarity between the entity name text to be matched and the candidate standardized entity name text, in the first embodiment of the present application, when obtaining the text similarity, the string similarity is further introduced to solve the text similarity. In the specific implementation process, first, obtain the character string matched by the text of the entity name to be matched; secondly, obtain the character string corresponding to the text of the candidate standardized entity name; thirdly, according to the character string matched by the text of the entity name to be matched and the corresponding text of the candidate standardized entity name obtain the string similarity between the character string matched by the entity name text to be matched and the character string corresponding to the candidate standardized entity name text; finally, obtain the text similarity according to the word vector similarity and the string similarity. Specifically, according to the preset first similarity weight corresponding to the word vector similarity and the second similarity weight corresponding to the word vector similarity, the word vector similarity and the character string similarity are weighted to obtain the text similarity.

本申请第一实施例中,在根据词向量相似度和字符串相似度,获得文本相 似度之前,需要先判断文本相似度是否达到文本相似度阈值。若文本相似度达 到文本相似度阈值,则从候选标准化实体名称文本中,获得文本相似度达到文 本相似度阈值的候选标准化实体名称文本作为目标标准化实体名称文本。具体 的,从候选标准化实体名称文本中,获得文本相似度达到文本相似度阈值且相 似度最高的候选标准化实体名称文本作为目标标准化实体名称文本。例如:“个 人身份证”与“中华人民共和国居民身份证”的文本相似度为0.78,当预设的 文本相似度阈值为0.7时,“中华人民共和国居民身份证”作为“个人身份证” 的目标标准化实体名称文本。In the first embodiment of the present application, before obtaining the text similarity according to the word vector similarity and the character string similarity, it is necessary to judge whether the text similarity reaches the text similarity threshold. If the text similarity reaches the text similarity threshold, the candidate standardized entity name text whose text similarity reaches the text similarity threshold is obtained from the candidate standardized entity name text as the target standardized entity name text. Specifically, from the candidate standardized entity name text, the candidate standardized entity name text whose text similarity reaches the text similarity threshold and has the highest similarity is obtained as the target standardized entity name text. For example, the text similarity between "Personal ID Card" and "People's Republic of China Resident ID Card" is 0.78. When the preset text similarity threshold is 0.7, "People's Republic of China Resident ID Card" is regarded as the "Personal ID Card". Target normalized entity name text.

另外,如果文本相似度未达到文本相似度阈值,则确定候选标准化实体名 称文本中不存在目标标准化实体名称文本。In addition, if the text similarity does not reach the text similarity threshold, it is determined that the target normalized entity name text does not exist in the candidate normalized entity name text.

本申请第一实施例中,在服务器102确定候选标准化实体名称文本中不存 在目标标准化实体名称文本时,会生成不存在目标标准化实体名称文本的反馈 信息反馈给客户端101,客户端101通过用户设备的交互界面展示该反馈信息。In the first embodiment of the present application, when the server 102 determines that the target standardized entity name text does not exist in the candidate standardized entity name text, it will generate feedback information that the target standardized entity name text does not exist and feed it back to the client 101. The client 101 uses the user The interactive interface of the device displays the feedback information.

在服务器102获得目标标准化实体名称文本后,可以先针对文本匹配指令, 将目标标准化实体名称文本提供给客户端101,再由客户端101将待匹配实体名 称文本与目标标准化实体名称文本进行关联,即,建立待匹配实体名称文本与 目标标准化实体名称文本的对应关系。另外,在服务器102获得目标标准化实 体名称文本后,还可以先将待匹配实体名称文本与目标标准化实体名称文本进 行关联,再获得关联结果提供给客户端101。After the server 102 obtains the target standardized entity name text, it can first provide the target standardized entity name text to the client 101 for the text matching instruction, and then the client 101 associates the to-be-matched entity name text with the target standardized entity name text, That is, a corresponding relationship between the entity name text to be matched and the target standardized entity name text is established. In addition, after the server 102 obtains the target standardized entity name text, it can also first associate the to-be-matched entity name text with the target standardized entity name text, and then obtain the association result and provide it to the client 101.

本申请第一实施例中提供的文本匹配方法还可以应用于服务端为执行主体 的应用场景,请参照图3,其为本申请第一实施例中提供的文本匹配方法的第一 场景示意图。The text matching method provided in the first embodiment of the present application can also be applied to an application scenario where the server is the execution body. Please refer to FIG. 3 , which is a schematic diagram of the first scenario of the text matching method provided in the first embodiment of the present application.

步骤S301:获得待匹配实体名称文本。步骤步骤S302:获得目标实体类别, 即,获得待匹配实体名称文本对应的目标实体类别。步骤S303:获得候选标准 化实体名称文本,即,根据目标实体类别,获得实体类别与目标实体类别相同 的待匹配实体名称文本对应的候选标准化实体名称文本。步骤S304-1:获得词 向量相似度。步骤S304-2:获得字符串相似度。步骤S305:获得文本相似度, 即,根据词向量相似度和字符串相似度,获得文本相似度。步骤S306:判断文 本相似度是否达到文本相似度阈值。步骤S306-1:若是,获得目标标准化实体 名称文本,即,从候选标准化实体名称文本中,获得文本相似度达到文本相似 度阈值的候选标准化实体名称文本作为目标标准化实体名称文本。步骤S306-2: 若否,获得反馈信息,即,确定候选标准化实体名称文本中不存在目标标准化 实体名称文本,获得未获得的目标标准化实体名称文本的反馈信息。Step S301: Obtain the name text of the entity to be matched. Step S302: Obtain the target entity category, that is, obtain the target entity category corresponding to the name text of the entity to be matched. Step S303: Obtain the candidate standardized entity name text, that is, according to the target entity category, obtain the candidate standardized entity name text corresponding to the entity name text to be matched whose entity category is the same as the target entity category. Step S304-1: Obtain word vector similarity. Step S304-2: Obtain the string similarity. Step S305: Obtain the text similarity, that is, obtain the text similarity according to the word vector similarity and the character string similarity. Step S306: Determine whether the text similarity reaches a text similarity threshold. Step S306-1: If yes, obtain the target standardized entity name text, that is, from the candidate standardized entity name text, obtain the candidate standardized entity name text whose text similarity reaches the text similarity threshold as the target standardized entity name text. Step S306-2: If no, obtain feedback information, that is, determine that the target standardized entity name text does not exist in the candidate standardized entity name text, and obtain feedback information of the unobtained target standardized entity name text.

本申请第一实施例中提供的文本匹配方法还可以应用于客户端为执行主体 的应用场景。The text matching method provided in the first embodiment of the present application can also be applied to an application scenario where the client is the execution subject.

在具体实施过程中,客户端在获得待匹配实体名称文本后,会依次执行下 述步骤,首先,获得待匹配实体名称文本对应的目标实体类别;然后,根据目 标实体类别,获得实体类别与目标实体类别相同的待匹配实体名称文本对应的 候选标准化实体名称文本;最后,根据待匹配实体名称文本与候选标准化实体 名称文本的文本相似度,从候选标准化实体名称文本中获得与待匹配实体名称 文本匹配的目标标准化实体名称文本。In the specific implementation process, after obtaining the entity name text to be matched, the client will perform the following steps in sequence. First, obtain the target entity category corresponding to the entity name text to be matched; then, according to the target entity category, obtain the entity category and the target entity category. The candidate standardized entity name text corresponding to the entity name text to be matched with the same entity category; finally, according to the text similarity between the entity name text to be matched and the candidate standardized entity name text, obtain the entity name text to be matched from the candidate standardized entity name text. The matching target normalized entity name text.

本申请第一实施例中,不对本申请第一实施例中提供的文本匹配方法的应 用场景做具体的限定,如:本申请第一实施例中提供的文本匹配方法还可以应 用于其他场景,在此不再一一赘述。提供上述应用场景,是为了便于理解本申 请第一实施例中提供的文本匹配方法,而并非用于限定本申请第一实施例中提 供的文本匹配方法。In the first embodiment of the present application, the application scenarios of the text matching method provided in the first embodiment of the present application are not specifically limited. For example, the text matching method provided in the first embodiment of the present application can also be applied to other scenarios, I won't go into details here. The above application scenarios are provided to facilitate understanding of the text matching method provided in the first embodiment of the present application, but not to limit the text matching method provided in the first embodiment of the present application.

本申请第一实施例中提供一种文本匹配方法,通过获得位置查询文本中的 多个位置要素文本,多个位置要素文本为位置查询文本中用于描述位置的文本; 针对多个位置要素文本,获得多个位置要素文本对应的候选位置信息;根据多 个位置要素文本在位置查询文本中的出现次数以及多个位置要素文本对应的候 选位置信息的聚类得分中的至少一个,对多个位置要素文本对应的候选位置信 息进行排序,根据排序结果获得针对位置查询文本的位置推荐信息。本申请第 一实施例中提供的文本匹配方法,基于多个位置要素文本来对多个位置要素文 本对应的候选位置信息进行排序,能够保证针对位置查询文本的位置推荐信息 中的候选位置信息对应多个位置要素文本,从而提高了在位置查询文本中存在 多个位置要素文本时位置推荐信息的精确度。A first embodiment of the present application provides a text matching method, by obtaining multiple location element texts in the location query text, where the multiple location element texts are texts used to describe locations in the location query text; for multiple location element texts , obtain candidate location information corresponding to multiple location element texts; according to at least one of the number of occurrences of multiple location element texts in the location query text and the clustering scores of candidate location information corresponding to multiple location element texts, for multiple location element texts The candidate position information corresponding to the position element text is sorted, and the position recommendation information for the position query text is obtained according to the sorting result. The text matching method provided in the first embodiment of the present application sorts the candidate location information corresponding to the multiple location element texts based on the multiple location element texts, which can ensure that the candidate location information in the location recommendation information for the location query text corresponds to the corresponding location information. Multiple location element texts, thereby improving the accuracy of location recommendation information when there are multiple location element texts in the location query text.

本申请第一实施例中提供一种文本匹配方法,通过获得待匹配实体名称文 本对应的目标实体类别;根据目标实体类别,获得实体类别与目标实体类别相 同的待匹配实体名称文本对应的候选标准化实体名称文本;根据待匹配实体名 称文本与候选标准化实体名称文本的文本相似度,从候选标准化实体名称文本 中获得与待匹配实体名称文本匹配的目标标准化实体名称文本。本申请第一实 施例中提供的文本匹配方法,能够根据待匹配实体名称文本与候选标准化实体 名称文本的文本相似度,获得待匹配实体名称文本匹配的目标标准化实体名称 文本,无需依靠人工汇总的方式来将待匹配实体名称文本与目标标准化实体名 称文本进行匹配,从而提高了实体名称文档标准化的效率。第二实施例The first embodiment of the present application provides a text matching method, by obtaining the target entity category corresponding to the entity name text to be matched; according to the target entity category, obtain the candidate standardization corresponding to the entity name text to be matched whose entity category is the same as the target entity category Entity name text; according to the text similarity between the entity name text to be matched and the candidate standardized entity name text, obtain the target standardized entity name text matching the entity name text to be matched from the candidate standardized entity name text. The text matching method provided in the first embodiment of the present application can obtain the target standardized entity name text matched by the entity name text to be matched according to the text similarity between the entity name text to be matched and the candidate standardized entity name text, without relying on manual summary The method is used to match the entity name text to be matched with the target standardized entity name text, thereby improving the efficiency of entity name document standardization. Second Embodiment

与本申请实施例提供的文本匹配方法的应用场景以及第一实施例提供的文 本匹配方法相对应的,本申请第二实施例还提供了一种文本匹配方法。由于该 装置实施例基本相似于本申请实施例提供的文本匹配方法的应用场景以及第一 实施例提供的文本匹配方法,所以描述得比较简单,相关之处请参照对本申请 实施例提供的文本匹配方法的应用场景以及第一实施例提供的文本匹配方法的 部分说明即可。下述描述的装置实施例仅仅是示意性的。Corresponding to the application scenario of the text matching method provided by the embodiment of the present application and the text matching method provided by the first embodiment, the second embodiment of the present application further provides a text matching method. Since this embodiment of the apparatus is basically similar to the application scenario of the text matching method provided by the embodiment of the present application and the text matching method provided by the first embodiment, the description is relatively simple. For relevant details, please refer to the text matching provided by the embodiment of the present application. The application scenarios of the method and the partial description of the text matching method provided by the first embodiment are sufficient. The apparatus embodiments described below are merely illustrative.

请参照图4,其为本申请第二实施例中提供的一种文本匹配装置的示意图。Please refer to FIG. 4 , which is a schematic diagram of a text matching apparatus provided in a second embodiment of the present application.

该文本匹配装置,包括:The text matching device includes:

目标实体类别获得单元401,用于获得待匹配实体名称文本对应的目标实体 类别;The target entity category obtaining unit 401 is used to obtain the target entity category corresponding to the entity name text to be matched;

候选文本获得单元402,用于根据所述目标实体类别,获得实体类别与所述 目标实体类别相同的所述待匹配实体名称文本对应的候选标准化实体名称文 本;Candidate text obtaining unit 402, configured to obtain, according to the target entity category, the candidate standardized entity name text corresponding to the entity name text to be matched whose entity category is the same as the target entity category;

目标文本匹配单元403,用于根据所述待匹配实体名称文本与所述候选标准 化实体名称文本的文本相似度,从所述候选标准化实体名称文本中获得与所述 待匹配实体名称文本匹配的目标标准化实体名称文本。A target text matching unit 403, configured to obtain a target matching the entity name text to be matched from the candidate standardized entity name text according to the text similarity between the entity name text to be matched and the candidate standardized entity name text Normalized entity name text.

可选的,本申请第二实施例中提供的文本匹配装置,还包括:文本提供单 元,用于将所述目标标准化实体名称文本提供给用户设备。Optionally, the text matching apparatus provided in the second embodiment of the present application further includes: a text providing unit, configured to provide the target standardized entity name text to the user equipment.

可选的,本申请第二实施例中提供的文本匹配装置,还包括:文本关联单 元,用于将所述待匹配实体名称文本与所述目标标准化实体名称文本进行关联。Optionally, the text matching apparatus provided in the second embodiment of the present application further includes: a text association unit, configured to associate the to-be-matched entity name text with the target standardized entity name text.

可选的,所述文本关联单元具体用于建立所述待匹配实体名称文本与所述 目标标准化实体名称文本的对应关系。Optionally, the text association unit is specifically configured to establish a correspondence between the entity name text to be matched and the target standardized entity name text.

可选的,所述目标实体类别获得单元401具体用于获得所述用户设备发出 的文本匹配指令,所述文本匹配指令中携带有所述待匹配实体名称文本;Optionally, the target entity category obtaining unit 401 is specifically configured to obtain a text matching instruction issued by the user equipment, and the text matching instruction carries the entity name text to be matched;

所述文本提供单元具体用于针对所述文本匹配指令,将所述目标标准化实 体名称文本提供给所述用户设备。The text providing unit is specifically configured to provide the target standardized entity name text to the user equipment for the text matching instruction.

可选的,本申请第二实施例中提供的文本匹配装置,还包括:文本展示单 元,用于展示所述目标标准化实体名称文本。Optionally, the text matching apparatus provided in the second embodiment of the present application further includes: a text display unit, configured to display the text of the target standardized entity name.

可选的,所述目标实体类别获得单元401具体用于采取预设的分词策略对 所述待匹配实体名称文本进行分词,获得所述待匹配实体名称文本中的类别关 键词;根据所述待匹配实体名称文本中的类别关键词,获得所述目标实体类别。Optionally, the target entity category obtaining unit 401 is specifically configured to adopt a preset word segmentation strategy to perform word segmentation on the entity name text to be matched, and obtain category keywords in the entity name text to be matched; Match the category keywords in the entity name text to obtain the target entity category.

可选的,所述候选文本获得单元402具体用于根据所述待匹配实体名称文 本中的关键词,获得所述待匹配实体名称相关联的关联标准化实体名称文本; 获得所述关联标准化实体名称文本的实体类别;根据所述目标实体类别和所述 实体类别,从所述关联标准化实体名称文本中获得所述候选标准化实体名称文 本。Optionally, the candidate text obtaining unit 402 is specifically configured to obtain the associated standardized entity name text associated with the entity name to be matched according to the keywords in the entity name text to be matched; obtain the associated standardized entity name The entity category of the text; according to the target entity category and the entity category, the candidate standardized entity name text is obtained from the associated standardized entity name text.

可选的,所述目标文本匹配单元403具体用于获得所述待匹配实体名称文 本中的关键词的权重和所述候选标准化实体名称文本中的关键词的权重;获得 所述待匹配实体名称文本中的关键词对应的第一词向量,以及所述候选标准化 实体名称文本中的关键词对应的第二词向量;根据所述待匹配实体名称文本中 的关键词的权重、所述候选标准化实体名称文本中的关键词的权重、所述第一 词向量以及所述第二词向量,获得所述第一词向量和所述第二词向量的词向量 相似度;根据所述词向量相似度,获得所述文本相似度。Optionally, the target text matching unit 403 is specifically configured to obtain the weight of the keyword in the entity name text to be matched and the weight of the keyword in the candidate standardized entity name text; obtain the entity name to be matched. The first word vector corresponding to the keyword in the text, and the second word vector corresponding to the keyword in the candidate standardized entity name text; according to the weight of the keyword in the entity name text to be matched, the candidate standardized The weight of the keyword in the entity name text, the first word vector and the second word vector, to obtain the word vector similarity between the first word vector and the second word vector; according to the similarity of the word vector to obtain the text similarity.

可选的,所述根据所述词向量相似度,获得所述文本相似度,包括:获得 所述待匹配实体名称文本匹配的字符串;Optionally, the obtaining the text similarity according to the word vector similarity includes: obtaining the text-matched character string of the entity name to be matched;

获得所述候选标准化实体名称文本对应的字符串;obtaining the character string corresponding to the candidate standardized entity name text;

根据所述待匹配实体名称文本匹配的字符串以及所述候选标准化实体名称 文本对应的字符串,获得所述待匹配实体名称文本匹配的字符串与所述候选标 准化实体名称文本对应的字符串的字符串相似度;According to the character string matched by the entity name text to be matched and the character string corresponding to the candidate standardized entity name text, obtain the difference between the character string matched by the to-be-matched entity name text and the character string corresponding to the candidate standardized entity name text string similarity;

根据所述词向量相似度和所述字符串相似度,获得所述文本相似度。The text similarity is obtained according to the word vector similarity and the character string similarity.

可选的,所述根据所述词向量相似度和所述字符串相似度,获得所述文本 相似度,包括:根据预设的所述词向量相似度对应的第一相似度权重以及所述 词向量相似度对应的第二相似度权重,对所述词向量相似度和所述字符串相似 度进行加权,获得所述文本相似度。Optionally, obtaining the text similarity according to the word vector similarity and the character string similarity includes: according to a preset first similarity weight corresponding to the word vector similarity and the The second similarity weight corresponding to the word vector similarity is weighting the word vector similarity and the character string similarity to obtain the text similarity.

可选的,本申请第二实施例中提供的文本匹配装置,还包括:相似度判断 单元,用于判断所述文本相似度是否达到文本相似度阈值;Optionally, the text matching device provided in the second embodiment of the present application further includes: a similarity judging unit for judging whether the text similarity reaches a text similarity threshold;

所述目标文本匹配单元403具体用于若所述相似度判断单元的判断结果为 是,则从所述候选标准化实体名称文本中,获得所述文本相似度达到所述文本 相似度阈值的所述候选标准化实体名称文本作为目标标准化实体名称文本。The target text matching unit 403 is specifically configured to obtain, from the candidate standardized entity name text, the text whose similarity of the text reaches the threshold of the text similarity if the judgment result of the similarity judgment unit is yes. The candidate normalized entity name text is used as the target normalized entity name text.

可选的,所述从所述候选标准化实体名称文本中,获得所述文本相似度达 到所述文本相似度阈值的所述候选标准化实体名称文本作为目标标准化实体名 称文本,包括:从所述候选标准化实体名称文本中,获得所述文本相似度达到 所述文本相似度阈值且相似度最高的所述候选标准化实体名称文本作为目标标 准化实体名称文本。Optionally, obtaining the candidate standardized entity name text whose text similarity reaches the text similarity threshold from the candidate standardized entity name text as the target standardized entity name text includes: obtaining the candidate standardized entity name text from the candidate standardized entity name text. In the standardized entity name text, the candidate standardized entity name text whose text similarity reaches the text similarity threshold and has the highest similarity is obtained as the target standardized entity name text.

可选的,本申请第二实施例中提供的文本匹配装置,还包括:结果确定单 元,用于若所述相似度判断单元的判断结果为否,则确定所述候选标准化实体 名称文本中不存在所述目标标准化实体名称文本。Optionally, the text matching device provided in the second embodiment of the present application further includes: a result determination unit, configured to determine whether the candidate standardized entity name text is not in the text if the determination result of the similarity determination unit is no. The target normalized entity name text exists.

第三实施例Third Embodiment

与本申请实施例提供的文本匹配方法的应用场景、第一实施例提供的文本 匹配方法相对应的,本申请第三实施例还提供了一种电子设备。由于第三实施 例基本相似于本申请实施例提供的文本匹配方法的应用场景、第一实施例提供 的文本匹配方法,所以描述得比较简单,相关之处参见对本申请实施例提供的 文本匹配方法的应用场景、第一实施例提供的文本匹配方法的部分说明即可。 下述描述的第三实施例仅仅是示意性的。Corresponding to the application scenario of the text matching method provided by the embodiment of the present application and the text matching method provided by the first embodiment, the third embodiment of the present application further provides an electronic device. Since the third embodiment is basically similar to the application scenario of the text matching method provided by the embodiment of the present application and the text matching method provided by the first embodiment, the description is relatively simple. For relevant details, please refer to the text matching method provided by the embodiment of the present application. The application scenario and the partial description of the text matching method provided by the first embodiment are sufficient. The third embodiment described below is merely illustrative.

请参照图5,其为本申请实施例中提供的一种电子设备的示意图。Please refer to FIG. 5 , which is a schematic diagram of an electronic device provided in an embodiment of the present application.

该电子设备,包括:处理器501;The electronic device includes: a processor 501;

以及存储器502,用于存储信息处理方法的程序,该设备通电并通过处理器 运行该信息处理方法的程序后,执行下述步骤:And memory 502, for storing the program of information processing method, after this equipment is powered on and runs the program of this information processing method by processor, executes the following steps:

获得待匹配实体名称文本对应的目标实体类别;Obtain the target entity category corresponding to the entity name text to be matched;

根据所述目标实体类别,获得实体类别与所述目标实体类别相同的所述待 匹配实体名称文本对应的候选标准化实体名称文本;According to the target entity category, obtain the candidate standardized entity name text corresponding to the entity name text to be matched whose entity category is the same as the target entity category;

根据所述待匹配实体名称文本与所述候选标准化实体名称文本的文本相似 度,从所述候选标准化实体名称文本中获得与所述待匹配实体名称文本匹配的 目标标准化实体名称文本。According to the text similarity between the entity name text to be matched and the candidate standardized entity name text, the target standardized entity name text that matches the to-be-matched entity name text is obtained from the candidate standardized entity name text.

需要说明的是,本申请第八实施例提供的电子设备的详细描述,可以参考 对本申请实施例提供的直播服务系统的应用场景、第一实施例提供的直播服务 系统以及上述方法实施例的相关描述,这里不再赘述。It should be noted that, for the detailed description of the electronic device provided by the eighth embodiment of the present application, reference may be made to the application scenario of the live broadcast service system provided by the embodiment of the present application, the live broadcast service system provided by the first embodiment, and the related aspects of the above method embodiments. description, which will not be repeated here.

第四实施例Fourth Embodiment

与本申请实施例提供的文本匹配方法的应用场景、第一实施例提供的文本 匹配方法相对应的,本申请第四实施例还提供了一种存储介质。由于第四实施 例基本相似于本申请实施例提供的文本匹配方法的应用场景、第一实施例提供 的文本匹配方法,所以描述得比较简单,相关之处参见对本申请实施例提供的 文本匹配方法的应用场景、第一实施例提供的文本匹配方法的部分说明即可。 下述描述的装置实施例仅仅是示意性的。Corresponding to the application scenario of the text matching method provided by the embodiment of the present application and the text matching method provided by the first embodiment, the fourth embodiment of the present application further provides a storage medium. Since the fourth embodiment is basically similar to the application scenario of the text matching method provided by the embodiment of the present application and the text matching method provided by the first embodiment, the description is relatively simple. For relevant details, please refer to the text matching method provided by the embodiment of the present application. The application scenario and the partial description of the text matching method provided by the first embodiment are sufficient. The apparatus embodiments described below are merely illustrative.

该存储介质存储有计算机程序,该计算机程序被处理器运行,执行下述步 骤:The storage medium stores a computer program, and the computer program is executed by the processor to perform the following steps:

获得待匹配实体名称文本对应的目标实体类别;Obtain the target entity category corresponding to the entity name text to be matched;

根据所述目标实体类别,获得实体类别与所述目标实体类别相同的所述待 匹配实体名称文本对应的候选标准化实体名称文本;According to the target entity category, obtain the candidate standardized entity name text corresponding to the entity name text to be matched whose entity category is the same as the target entity category;

根据所述待匹配实体名称文本与所述候选标准化实体名称文本的文本相似 度,从所述候选标准化实体名称文本中获得与所述待匹配实体名称文本匹配的 目标标准化实体名称文本。According to the text similarity between the entity name text to be matched and the candidate standardized entity name text, the target standardized entity name text that matches the to-be-matched entity name text is obtained from the candidate standardized entity name text.

需要说明的是,本申请第九实施例提供的存储介质的详细描述,可以参考 对本申请实施例提供的文本匹配方法的应用场景、第一实施例提供的文本匹配 方法的相关描述,这里不再赘述。It should be noted that, for the detailed description of the storage medium provided by the ninth embodiment of the present application, reference may be made to the application scenarios of the text matching method provided by the embodiments of the present application and the related descriptions of the text matching method provided by the first embodiment, which are not repeated here. Repeat.

第五实施例Fifth Embodiment

与本申请实施例提供的文本匹配方法的应用场景以及第一实施例提供的文 本匹配方法相对应的,本申请第五实施例还提供了另一种文本匹配方法。由于 该装置实施例基本相似于本申请实施例提供的文本匹配方法的应用场景以及第 一实施例提供的文本匹配方法,所以描述得比较简单,相关之处请参照对本申 请实施例提供的文本匹配方法的应用场景以及第一实施例提供的文本匹配方法 的部分说明即可。下述描述的方法实施例仅仅是示意性的。Corresponding to the application scenario of the text matching method provided by the embodiment of the present application and the text matching method provided by the first embodiment, the fifth embodiment of the present application further provides another text matching method. Since this embodiment of the apparatus is basically similar to the application scenario of the text matching method provided by the embodiment of the present application and the text matching method provided by the first embodiment, the description is relatively simple. For relevant details, please refer to the text matching provided by the embodiment of the present application. The application scenarios of the method and the partial description of the text matching method provided by the first embodiment are sufficient. The method embodiments described below are merely illustrative.

本申请第五实施例中提供的一种文本匹配方法,以下结合图6进行说明。A text matching method provided in the fifth embodiment of the present application will be described below with reference to FIG. 6 .

图6为本申请第五实施例中提供的一种文本匹配方法的流程图。图6为本 申请第五实施例中提供的一种文本匹配方法的流程图,该方法包括:步骤S601 至步骤S604。FIG. 6 is a flowchart of a text matching method provided in a fifth embodiment of the present application. FIG. 6 is a flowchart of a text matching method provided in the fifth embodiment of the present application, and the method includes steps S601 to S604.

在步骤S601中,获得用于描述政务服务材料名称的、待匹配实体名称文本 对应的目标实体类别。In step S601, the target entity category corresponding to the name text of the entity to be matched for describing the name of the government service material is obtained.

本申请第五实施例中,所谓实体名称文本为目标文本中用于描述实体的名 称的文本,所谓目标文本一般为一片文档、一段话或者一句话,如:政务服务 事项、政务服务之外的网络服务事项、设备操作流程以及化学实验步骤介绍等。 所谓实体一般为政务服务事项中的所需要的政务服务材料,此时,实体名称文 本为用于描述政务服务材料名称的文本;所谓实体也可以为政务服务之外的网 络服务事项中的所需要的网络服务材料,此时,实体名称文本为用于描述网络 服务材料名称的文本。另外,所谓实体还可以为其他类型的实体,如:设备操 作流程说明中的设备、化学实验步骤介绍中的化学品以及化学反应装置等。也 就是说,本申请第一实施例中,对目标文本以及实体不做具体限定。In the fifth embodiment of the present application, the so-called entity name text is the text used to describe the name of the entity in the target text, and the so-called target text is generally a piece of document, a paragraph or a sentence, such as: government service matters, other than government affairs services Network service matters, equipment operation procedures, and chemical experiment procedures are introduced. The so-called entity is generally the required government service materials in government service matters. In this case, the text of the entity name is the text used to describe the name of the government service material; the so-called entity can also be required in network service matters other than government services. The web service material, in this case, the entity name text is the text used to describe the name of the web service material. In addition, the so-called entity can also be other types of entities, such as: equipment in the description of equipment operation procedures, chemicals in the introduction of chemical experiment steps, and chemical reaction devices. That is to say, in the first embodiment of the present application, the target text and entities are not specifically limited.

所谓待匹配实体名称文本一般为非标准化实体名称文本,也可以为标准化 实体名称文本。所谓标准化实体名称文本和非标准化实体名称文本分别为对同 一实体名称的标准描述文本和非标准描述文本。具体可以以“中华人民共和国 居民身份证”为例,“中华人民共和国居民身份证”的标准化实体名称文本为“中 华人民共和国居民身份证”,非标准化实体名称文本可以为“个人身份证”,“双 人身份证”,“夫妻身份证”等等。也可以以“不动产登记申请表”为例,“不动 产登记申请表”的标准化实体名称文本为“不动产登记申请表”,非标准化实体 名称文本可以为“不动产登记申请书”等。还可以“人事档案”为例,“人事档 案”的标准化实体名称文本为“人事档案”,非标准化实体名称文本可以为“申 请人人事档案”、“参保人员人事档案”以及“人事档案原件”等等。The so-called entity name text to be matched is generally a non-standardized entity name text, and can also be a standardized entity name text. The so-called standardized entity name text and non-standardized entity name text are respectively standard description text and non-standard description text for the same entity name. Specifically, the "People's Republic of China Resident Identity Card" can be taken as an example. The standardized entity name text of the "People's Republic of China Resident Identity Card" is "People's Republic of China Resident Identity Card", and the non-standardized entity name text can be "Personal Identity Card". "Double ID Card", "Couple ID Card" and so on. The “Real Estate Registration Application Form” can also be taken as an example. The standardized entity name text of the “Real Estate Registration Application Form” is “Real Estate Registration Application Form”, and the non-standardized entity name text can be “Real Estate Registration Application Form”, etc. You can also take "Personnel File" as an example. The standardized entity name text of "Personnel File" is "Personnel File", and the text of non-standardized entity name can be "Applicant Personnel File", "Personnel File of Insured Person" and "Original Personnel File". "and many more.

所谓实体类别为实体名称文本描述的实体的类别,该实体类别为根据预设 的实体类别划分策略预先划分好的类别。在具体实施过程中,获得目标实体类 别的过程为:首先,采取预设的分词策略对待匹配实体名称文本进行分词,获 得待匹配实体名称文本中的类别关键词;然后,根据待匹配实体名称文本中的 类别关键词,获得目标实体类别。所谓类别关键词为实体名称文本中能够标识 实体类别的词。具体以实体为政务服务材料为例,如:待匹配实体名称文本为 “某某申请书”、“某某认证书”、“某某证”以及“某某表”等时,对“某某申 请书”、“某某认证书”、“某某证”以及“某某表”等来说,“申请书”、“认证书”、 “证”以及“表”等即为待匹配实体名称文本中的类别关键词。本申请第五实 施例中,根据预设的实体类别划分策略预先划分好的类别,也就是根据预先统 计的实体名称文本中的类别关键词确定的实体类别。因此,“申请书”类、“认 证书”类、“证”类以及“表”类等也就是根据预设的实体类别划分策略预先划 分好的类别。The so-called entity category is the category of the entity described by the entity name text, and the entity category is a pre-divided category according to a preset entity category division strategy. In the specific implementation process, the process of obtaining the target entity category is as follows: first, adopt a preset word segmentation strategy to segment the text of the entity name to be matched, and obtain the category keywords in the text of the entity name to be matched; then, according to the entity name text to be matched. The category keywords in , get the target entity category. The so-called category keywords are words in the entity name text that can identify entity categories. Specifically, taking the entity as a government service material as an example, for example: when the text of the entity name to be matched is "XX application", "XX certification", "XX certificate" and "XX form", etc. For example, "application", "certificate", "certificate" and "table", "application", "certification", "certificate" and "table" are the names of the entities to be matched Category keywords in the text. In the fifth embodiment of the present application, the categories are pre-divided according to the preset entity category division strategy, that is, the entity categories are determined according to the category keywords in the pre-statistical entity name text. Therefore, the "application" category, "certificate" category, "certificate" category and "table" category are also pre-divided categories according to the preset entity category division strategy.

在步骤S602中,根据目标实体类别,获得实体类别与目标实体类别相同的 待匹配实体名称文本对应的候选标准化实体名称文本。In step S602, according to the target entity category, the candidate standardized entity name text corresponding to the entity name text to be matched whose entity category is the same as the target entity category is obtained.

本申请第五实施例中,所谓待匹配实体名称文本对应的候选标准化实体名 称文本为基于待匹配实体名称文本中的关键词获得的、与待匹配实体名称文本 中的关键词相关联的标准化实体名称文本。在具体实施过程中,针对待匹配实 体名称文本,采用bm25的召回策略,并利用ES工具在预设的标准化实体名称 文本数据中快速召回待匹配实体名称相关联的关联标准化实体名称文本。In the fifth embodiment of the present application, the so-called candidate standardized entity name text corresponding to the entity name text to be matched is a standardized entity obtained based on the keywords in the entity name text to be matched and associated with the keywords in the entity name text to be matched. name text. In the specific implementation process, for the entity name text to be matched, the recall strategy of bm25 is adopted, and the ES tool is used to quickly recall the associated standardized entity name text associated with the entity name to be matched in the preset standardized entity name text data.

在步骤S603中,根据待匹配实体名称文本与候选标准化实体名称文本的文 本相似度,从候选标准化实体名称文本中获得与待匹配实体名称文本匹配的目 标标准化实体名称文本。In step S603, according to the text similarity between the entity name text to be matched and the candidate standardized entity name text, the target standardized entity name text matching the entity name text to be matched is obtained from the candidate standardized entity name text.

在具体实施过程中,在根据词向量相似度和字符串相似度,获得文本相似 度之前,需要先判断文本相似度是否达到文本相似度阈值。若文本相似度达到 文本相似度阈值,则从候选标准化实体名称文本中,获得文本相似度达到文本 相似度阈值的候选标准化实体名称文本作为目标标准化实体名称文本。具体的, 从候选标准化实体名称文本中,获得文本相似度达到文本相似度阈值且相似度 最高的候选标准化实体名称文本作为目标标准化实体名称文本。In the specific implementation process, before obtaining the text similarity according to the word vector similarity and the character string similarity, it is necessary to judge whether the text similarity reaches the text similarity threshold. If the text similarity reaches the text similarity threshold, the candidate standardized entity name text whose text similarity reaches the text similarity threshold is obtained from the candidate standardized entity name text as the target standardized entity name text. Specifically, from the candidate standardized entity name texts, the candidate standardized entity name text whose text similarity reaches the text similarity threshold and has the highest similarity is obtained as the target standardized entity name text.

另外,如果文本相似度未达到文本相似度阈值,则确定候选标准化实体名 称文本中不存在目标标准化实体名称文本。In addition, if the text similarity does not reach the text similarity threshold, it is determined that the target normalized entity name text does not exist in the candidate normalized entity name text.

在步骤S604中,将待匹配实体名称文本与目标标准化实体名称文本进行关 联。In step S604, the entity name text to be matched is associated with the target standardized entity name text.

所谓关联的具体实现方式一般为:建立待匹配实体名称文本与目标标准化 实体名称文本的对应关系。The specific implementation of the so-called association is generally as follows: establishing a corresponding relationship between the entity name text to be matched and the target standardized entity name text.

第六实施例Sixth Embodiment

与本申请实施例提供的文本匹配方法的应用场景以及第一实施例提供的文 本匹配方法相对应的,本申请第六实施例还提供了另一种文本匹配方法。由于 该装置实施例基本相似于本申请实施例提供的文本匹配方法的应用场景以及第 一实施例提供的文本匹配方法,所以描述得比较简单,相关之处请参照对本申 请实施例提供的文本匹配方法的应用场景以及第一实施例提供的文本匹配方法 的部分说明即可。下述描述的方法实施例仅仅是示意性的。Corresponding to the application scenario of the text matching method provided by the embodiment of the present application and the text matching method provided by the first embodiment, the sixth embodiment of the present application further provides another text matching method. Since this embodiment of the apparatus is basically similar to the application scenario of the text matching method provided by the embodiment of the present application and the text matching method provided by the first embodiment, the description is relatively simple. For relevant details, please refer to the text matching provided by the embodiment of the present application. The application scenarios of the method and the partial description of the text matching method provided by the first embodiment are sufficient. The method embodiments described below are merely illustrative.

首先,获得用于描述地理位置名称的、待匹配实体名称文本对应的目标实 体类别。First, obtain the target entity category corresponding to the entity name text to be matched, which is used to describe the geographic location name.

本申请第六实施例中,所谓实体名称文本为目标文本中用于描述地理位置 名称的文本,所谓目标文本一般为一片文档、一段话或者一句话,如:创建地 图时搜集到的用于描述地理位置的文档,社会治理、城市管理过程中用于描述 涉案地理位置的文档等。需要说明的是,本申请第一实施例中,对目标文本以 及实体不做具体限定。In the sixth embodiment of the present application, the so-called entity name text is the text used to describe the geographical location name in the target text, and the so-called target text is generally a piece of document, a paragraph or a sentence, such as: collected when creating a map for description Documents of geographic location, documents used to describe the geographic location involved in the process of social governance and urban management, etc. It should be noted that, in the first embodiment of the present application, the target text and the entity are not specifically limited.

所谓待匹配实体名称文本一般为非标准化实体名称文本,也可以为标准化 实体名称文本。所谓标准化实体名称文本和非标准化实体名称文本分别为对同 一实体名称的标准描述文本和非标准描述文本。具体可以以“中华人民共和国” 为例,“中华人民共和国”的标准化实体名称文本为“中华人民共和国”,非标 准化实体名称文本可以为“中国”,“我国”,等等。也可以以“北京奥林匹克公 园”为例,“北京奥林匹克公园”的标准化实体名称文本为“北京奥林匹克公园”, 非标准化实体名称文本可以为“奥林匹克公园”等。The so-called entity name text to be matched is generally a non-standardized entity name text, and can also be a standardized entity name text. The so-called standardized entity name text and non-standardized entity name text are respectively standard description text and non-standard description text for the same entity name. Specifically, we can take "People's Republic of China" as an example. The standardized entity name text of "People's Republic of China" is "People's Republic of China", and the non-standardized entity name text can be "China", "my country", and so on. Take "Beijing Olympic Park" as an example. The standardized entity name text of "Beijing Olympic Park" is "Beijing Olympic Park", and the non-standardized entity name text can be "Olympic Park", etc.

所谓实体类别为实体名称文本描述的实体的类别,该实体类别为根据预设 的实体类别划分策略预先划分好的类别。在具体实施过程中,获得目标实体类 别的过程为:首先,采取预设的分词策略对待匹配实体名称文本进行分词,获 得待匹配实体名称文本中的类别关键词;然后,根据待匹配实体名称文本中的 类别关键词,获得目标实体类别。所谓类别关键词为实体名称文本中能够标识 实体类别的词。具体以实体为政务服务材料为例,如:待匹配实体名称文本为 “某某国”、“某某省省”、“某某市”以及“某某山”等时,对于“某某国”、“某 某省省”、“某某市”以及“某某山”等来说,“国”、“省”、“市”以及“山”等 即为待匹配实体名称文本中的类别关键词。本申请第六实施例中,根据预设的 实体类别划分策略预先划分好的类别,也就是根据预先统计的实体名称文本中 的类别关键词确定的实体类别。因此,“国家”类、“省直辖市”类、“山”类以 及“区”类等也就是根据预设的实体类别划分策略预先划分好的类别。The so-called entity category is the category of the entity described by the entity name text, and the entity category is a pre-divided category according to a preset entity category division strategy. In the specific implementation process, the process of obtaining the target entity category is as follows: first, adopt a preset word segmentation strategy to segment the text of the entity name to be matched, and obtain the category keywords in the text of the entity name to be matched; then, according to the entity name text to be matched. The category keywords in , get the target entity category. The so-called category keywords are words in the entity name text that can identify entity categories. Specifically, take the entity as a government service material as an example. For example, when the text of the entity name to be matched is "a certain country", "a certain province", "a certain city" and "a certain mountain", etc., for "a certain country", For "a certain province", "a certain city" and "a certain mountain", "country", "province", "city" and "mountain" are the category keywords in the entity name text to be matched . In the sixth embodiment of the present application, the categories are pre-divided according to the preset entity category division strategy, that is, the entity categories are determined according to the category keywords in the pre-statistic entity name text. Therefore, "country", "province", "mountain" and "district" are also pre-divided according to the preset entity classification strategy.

其次,根据目标实体类别,获得实体类别与目标实体类别相同的待匹配实 体名称文本对应的候选标准化实体名称文本。Secondly, according to the target entity category, obtain the candidate standardized entity name text corresponding to the entity name text to be matched whose entity category is the same as the target entity category.

本申请第六实施例中,所谓待匹配实体名称文本对应的候选标准化实体名 称文本为基于待匹配实体名称文本中的关键词获得的、与待匹配实体名称文本 中的关键词相关联的标准化实体名称文本。在具体实施过程中,针对待匹配实 体名称文本,采用bm25的召回策略,并利用ES工具在预设的标准化实体名称 文本数据中快速召回待匹配实体名称相关联的关联标准化实体名称文本。In the sixth embodiment of the present application, the so-called candidate standardized entity name text corresponding to the entity name text to be matched is a standardized entity obtained based on the keywords in the entity name text to be matched and associated with the keywords in the entity name text to be matched. name text. In the specific implementation process, for the entity name text to be matched, the recall strategy of bm25 is adopted, and the ES tool is used to quickly recall the associated standardized entity name text associated with the entity name to be matched in the preset standardized entity name text data.

再次,根据待匹配实体名称文本与候选标准化实体名称文本的文本相似度, 从候选标准化实体名称文本中获得与待匹配实体名称文本匹配的目标标准化实 体名称文本。Thirdly, according to the text similarity between the entity name text to be matched and the candidate standardized entity name text, the target standardized entity name text matching the entity name text to be matched is obtained from the candidate standardized entity name text.

在具体实施过程中,在根据词向量相似度和字符串相似度,获得文本相似 度之前,需要先判断文本相似度是否达到文本相似度阈值。若文本相似度达到 文本相似度阈值,则从候选标准化实体名称文本中,获得文本相似度达到文本 相似度阈值的候选标准化实体名称文本作为目标标准化实体名称文本。具体的, 从候选标准化实体名称文本中,获得文本相似度达到文本相似度阈值且相似度 最高的候选标准化实体名称文本作为目标标准化实体名称文本。In the specific implementation process, before obtaining the text similarity according to the word vector similarity and the character string similarity, it is necessary to judge whether the text similarity reaches the text similarity threshold. If the text similarity reaches the text similarity threshold, the candidate standardized entity name text whose text similarity reaches the text similarity threshold is obtained from the candidate standardized entity name text as the target standardized entity name text. Specifically, from the candidate standardized entity name texts, the candidate standardized entity name text whose text similarity reaches the text similarity threshold and has the highest similarity is obtained as the target standardized entity name text.

另外,如果文本相似度未达到文本相似度阈值,则确定候选标准化实体名 称文本中不存在目标标准化实体名称文本。In addition, if the text similarity does not reach the text similarity threshold, it is determined that the target normalized entity name text does not exist in the candidate normalized entity name text.

最后,将待匹配实体名称文本与目标标准化实体名称文本进行关联。Finally, associate the entity name text to be matched with the target standardized entity name text.

所谓关联的具体实现方式一般为:建立待匹配实体名称文本与目标标准化 实体名称文本的对应关系。The specific implementation of the so-called association is generally as follows: establishing a corresponding relationship between the entity name text to be matched and the target standardized entity name text.

本申请虽然以较佳实施例公开如上,但其并不是用来限定本申请,任何本 领域技术人员在不脱离本发明的精神和范围内,都可以做出可能的变动和修改, 因此本申请的保护范围应当以本申请权利要求所界定的范围为准。Although the present application is disclosed above with preferred embodiments, it is not intended to limit the present application. Any person skilled in the art can make possible changes and modifications without departing from the spirit and scope of the present invention. Therefore, this application The scope of protection shall be subject to the scope defined by the claims of this application.

在一个典型的配置中,计算设备包括一个或多个处理器(CPU)、输入/输出 接口、网络接口和内存。In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.

内存可能包括计算机可读介质中的非永久性存储器,随机存取存储器 (RAM)和/或非易失性内存等形式,如只读存储器(ROM)或闪存(Flash RAM)。 内存是计算机可读介质的示例。Memory may include non-persistent memory in computer readable media, random access memory (RAM) and/or non-volatile memory, such as read only memory (ROM) or flash memory (Flash RAM). Memory is an example of a computer-readable medium.

1、计算机可读介质包括永久性和非永久性、可移动和非可移动媒体可以由 任何方法或技术来实现信息存储。信息可以是计算机可读指令、数据结构、程 序的模块或其他数据。计算机的存储介质的例子包括,但不限于相变内存 (PRAM)、静态随机存取存储器(SRAM)、动态随机存取存储器(DRAM)、其他 实体类别的随机存取存储器(RAM)、只读存储器(ROM)、电可擦除可编程只读 存储器(EEPROM)、快闪记忆体或其他内存技术、只读光盘只读存储器 (CD-ROM)、数字多功能光盘(DVD)或其他光学存储、磁盒式磁待,磁待磁磁 盘存储或其他磁性存储介质或任何其他非传输介质,可用于存储可以被计算设 备访问的信息。按照本文中的界定,计算机可读介质不包括非暂存电脑可读媒 体(Transitory Media),如调制的数据信号和载波。1. Computer readable media including persistent and non-permanent, removable and non-removable media can be implemented by any method or technology for information storage. Information may be computer readable instructions, data structures, modules of programs, or other data. Examples of computer storage media include, but are not limited to, phase-change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other physical classes of random access memory (RAM), read-only memory Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), Flash Memory or other memory technology, Compact Disc Read Only Memory (CD-ROM), Digital Versatile Disc (DVD) or other optical storage , magnetic cartridges, magnetic disk storage, or other magnetic storage media or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, computer-readable media does not include non-transitory computer-readable media (Transitory Media), such as modulated data signals and carrier waves.

2、本领域技术人员应明白,本申请的实施例可提供为方法、系统或计算机 程序产品。因此,本申请可采用完全硬件实施例、完全软件实施例或结合软件 和硬件方面的实施例的形式。而且,本申请可采用在一个或多个其中包含有计 算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、 CD-ROM、光学存储器等)上实施的计算机程序产品的形式。2. Those skilled in the art should understand that the embodiments of the present application may be provided as methods, systems or computer program products. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.

Claims (19)

1. A text matching method, comprising:
obtaining a target entity type corresponding to the entity name text to be matched;
according to the target entity category, obtaining a candidate standardized entity name text corresponding to the entity name text to be matched, wherein the entity category is the same as the target entity category;
and obtaining a target standardized entity name text matched with the entity name text to be matched from the candidate standardized entity name text according to the text similarity between the entity name text to be matched and the candidate standardized entity name text.
2. The text matching method according to claim 1, further comprising: providing the target standardized entity name text to a user device.
3. The text matching method according to claim 1 or 2, further comprising: and associating the entity name text to be matched with the target standardized entity name text.
4. The text matching method of claim 3, wherein the associating the entity name text to be matched with the target standardized entity name text comprises: and establishing a corresponding relation between the entity name text to be matched and the target standardized entity name text.
5. The text matching method according to claim 2, wherein the obtaining of the target entity category corresponding to the entity name text to be matched comprises: obtaining a text matching instruction sent by the user equipment, wherein the text matching instruction carries the name text of the entity to be matched;
the providing the location recommendation information to the user equipment includes: providing the target standardized entity name text to the user equipment for the text matching instruction.
6. The text matching method according to claim 1, further comprising: and displaying the target standardized entity name text.
7. The text matching method according to claim 1, wherein the obtaining of the target entity category corresponding to the entity name text to be matched comprises:
adopting a preset word segmentation strategy to segment the entity name text to be matched to obtain category keywords in the entity name text to be matched;
and obtaining the target entity category according to the category key words in the entity name text to be matched.
8. The text matching method according to claim 1, wherein the obtaining, according to the target entity category, a candidate standardized entity name text corresponding to the entity name text to be matched having an entity category that is the same as the target entity category comprises:
obtaining an associated standardized entity name text associated with the entity name to be matched according to the key words in the entity name text to be matched;
obtaining an entity category of the associated standardized entity name text;
and obtaining the candidate standardized entity name text from the associated standardized entity name text according to the target entity category and the entity category.
9. The text matching method according to claim 1, wherein the obtaining of the target standardized entity name text matched with the entity name text to be matched from the candidate standardized entity name text according to the text similarity between the entity name text to be matched and the candidate standardized entity name text comprises:
obtaining the weight of the key words in the entity name text to be matched and the weight of the key words in the candidate standardized entity name text;
obtaining a first word vector corresponding to a keyword in the entity name text to be matched and a second word vector corresponding to the keyword in the candidate standardized entity name text;
obtaining word vector similarity of the first word vector and the second word vector according to the weight of the keyword in the entity name text to be matched, the weight of the keyword in the candidate standardized entity name text, the first word vector and the second word vector;
and obtaining the text similarity according to the word vector similarity.
10. The text matching method according to claim 9, wherein the obtaining the text similarity according to the word vector similarity comprises:
obtaining a character string matched with the name text of the entity to be matched;
obtaining a character string corresponding to the candidate standardized entity name text;
according to the character string matched with the entity name text to be matched and the character string corresponding to the candidate standardized entity name text, obtaining the character string similarity of the character string matched with the entity name text to be matched and the character string corresponding to the candidate standardized entity name text;
and obtaining the text similarity according to the word vector similarity and the character string similarity.
11. The text matching method according to claim 10, wherein the obtaining the text similarity according to the word vector similarity and the character string similarity comprises: and weighting the word vector similarity and the character string similarity according to a preset first similarity weight corresponding to the word vector similarity and a preset second similarity weight corresponding to the word vector similarity to obtain the text similarity.
12. The text matching method according to claim 1, further comprising: judging whether the text similarity reaches a text similarity threshold value;
the obtaining a target standardized entity name text matched with the entity name text to be matched from the candidate standardized entity name text according to the text similarity between the entity name text to be matched and the candidate standardized entity name text comprises: and if the text similarity reaches the text similarity threshold, obtaining the candidate standardized entity name text with the text similarity reaching the text similarity threshold from the candidate standardized entity name text as a target standardized entity name text.
13. The text matching method according to claim 12, wherein the obtaining, from the candidate standardized entity name texts, the candidate standardized entity name texts of which the text similarity reaches the text similarity threshold as target standardized entity name texts comprises: and obtaining the candidate standardized entity name text with the text similarity reaching the text similarity threshold and the highest similarity from the candidate standardized entity name texts as a target standardized entity name text.
14. The text matching method according to claim 12, further comprising: and if the text similarity does not reach the text similarity threshold, determining that the target standardized entity name text does not exist in the candidate standardized entity name text.
15. A text matching apparatus, comprising:
the target entity type obtaining unit is used for obtaining a target entity type corresponding to the entity name text to be matched;
a candidate text obtaining unit, configured to obtain, according to the target entity category, a candidate standardized entity name text corresponding to the entity name text to be matched, where the entity category is the same as the target entity category;
and the target text matching unit is used for obtaining a target standardized entity name text matched with the entity name text to be matched from the candidate standardized entity name text according to the text similarity between the entity name text to be matched and the candidate standardized entity name text.
16. An electronic device, comprising:
a processor; and
a memory for storing a program of a text matching method, the apparatus performing the following steps after being powered on and running the program of the text matching method by the processor:
obtaining a target entity type corresponding to the entity name text to be matched;
according to the target entity category, obtaining a candidate standardized entity name text corresponding to the entity name text to be matched, wherein the entity category is the same as the target entity category;
and obtaining a target standardized entity name text matched with the entity name text to be matched from the candidate standardized entity name text according to the text similarity between the entity name text to be matched and the candidate standardized entity name text.
17. A storage medium storing a program of a text matching method, the program being executed by a processor to perform the steps of: obtaining a target entity type corresponding to the entity name text to be matched;
according to the target entity category, obtaining a candidate standardized entity name text corresponding to the entity name text to be matched, wherein the entity category is the same as the target entity category;
and obtaining a target standardized entity name text matched with the entity name text to be matched from the candidate standardized entity name text according to the text similarity between the entity name text to be matched and the candidate standardized entity name text.
18. A government affairs service text matching method, comprising:
obtaining a target entity category corresponding to the entity name text to be matched and used for describing the name of the government affair service material;
according to the target entity category, obtaining a candidate standardized entity name text corresponding to the entity name text to be matched, wherein the entity category is the same as the target entity category;
according to the text similarity between the entity name text to be matched and the candidate standardized entity name text, obtaining a target standardized entity name text matched with the entity name text to be matched from the candidate standardized entity name text;
and associating the entity name text to be matched with the target standardized entity name text.
19. An address text matching method, comprising:
obtaining a target entity category which is used for describing the geographical position name and corresponds to the entity name text to be matched;
according to the target entity category, obtaining a candidate standardized entity name text corresponding to the entity name text to be matched, wherein the entity category is the same as the target entity category;
according to the text similarity between the entity name text to be matched and the candidate standardized entity name text, obtaining a target standardized entity name text matched with the entity name text to be matched from the candidate standardized entity name text;
and associating the entity name text to be matched with the target standardized entity name text.
CN202110130726.7A 2021-01-29 2021-01-29 A text matching method, device, electronic device and storage medium, and a government service text matching method Active CN114818706B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110130726.7A CN114818706B (en) 2021-01-29 2021-01-29 A text matching method, device, electronic device and storage medium, and a government service text matching method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110130726.7A CN114818706B (en) 2021-01-29 2021-01-29 A text matching method, device, electronic device and storage medium, and a government service text matching method

Publications (2)

Publication Number Publication Date
CN114818706A true CN114818706A (en) 2022-07-29
CN114818706B CN114818706B (en) 2025-01-17

Family

ID=82526849

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110130726.7A Active CN114818706B (en) 2021-01-29 2021-01-29 A text matching method, device, electronic device and storage medium, and a government service text matching method

Country Status (1)

Country Link
CN (1) CN114818706B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115640445A (en) * 2022-10-27 2023-01-24 上海喜马拉雅科技有限公司 Search request response method, device, computer equipment and storage medium
CN116244421A (en) * 2023-03-03 2023-06-09 广联达科技股份有限公司 Method, device, equipment and readable storage medium for item name matching

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109885664A (en) * 2019-01-08 2019-06-14 厦门快商通信息咨询有限公司 A kind of Intelligent dialogue method, robot conversational system, server and storage medium
CN110377558A (en) * 2019-06-14 2019-10-25 平安科技(深圳)有限公司 Document searching method, device, computer equipment and storage medium
CN110413875A (en) * 2019-06-26 2019-11-05 腾讯科技(深圳)有限公司 A method and related device for pushing text information
CN110442869A (en) * 2019-08-01 2019-11-12 腾讯科技(深圳)有限公司 A kind of medical treatment text handling method and its device, equipment and storage medium
CN110825863A (en) * 2019-11-11 2020-02-21 腾讯科技(深圳)有限公司 Text pair fusion method and device
CN111126054A (en) * 2019-12-03 2020-05-08 东软集团股份有限公司 Method, device, storage medium and electronic equipment for determining similar texts
CN111259144A (en) * 2020-01-16 2020-06-09 中国平安人寿保险股份有限公司 Multi-model fusion text matching method, device, equipment and storage medium

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109885664A (en) * 2019-01-08 2019-06-14 厦门快商通信息咨询有限公司 A kind of Intelligent dialogue method, robot conversational system, server and storage medium
CN110377558A (en) * 2019-06-14 2019-10-25 平安科技(深圳)有限公司 Document searching method, device, computer equipment and storage medium
CN110413875A (en) * 2019-06-26 2019-11-05 腾讯科技(深圳)有限公司 A method and related device for pushing text information
CN110442869A (en) * 2019-08-01 2019-11-12 腾讯科技(深圳)有限公司 A kind of medical treatment text handling method and its device, equipment and storage medium
CN110825863A (en) * 2019-11-11 2020-02-21 腾讯科技(深圳)有限公司 Text pair fusion method and device
CN111126054A (en) * 2019-12-03 2020-05-08 东软集团股份有限公司 Method, device, storage medium and electronic equipment for determining similar texts
CN111259144A (en) * 2020-01-16 2020-06-09 中国平安人寿保险股份有限公司 Multi-model fusion text matching method, device, equipment and storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115640445A (en) * 2022-10-27 2023-01-24 上海喜马拉雅科技有限公司 Search request response method, device, computer equipment and storage medium
CN116244421A (en) * 2023-03-03 2023-06-09 广联达科技股份有限公司 Method, device, equipment and readable storage medium for item name matching

Also Published As

Publication number Publication date
CN114818706B (en) 2025-01-17

Similar Documents

Publication Publication Date Title
CN108153901B (en) Knowledge graph-based information pushing method and device
CN111797214A (en) Question screening method, device, computer equipment and medium based on FAQ database
WO2023134057A1 (en) Affair information query method and apparatus, and computer device and storage medium
CN112148889A (en) Recommendation list generation method and device
US9720904B2 (en) Generating training data for disambiguation
WO2020057022A1 (en) Associative recommendation method and apparatus, computer device, and storage medium
CN108256070B (en) Method and apparatus for generating information
US9292579B2 (en) Method and system for document data extraction template management
JP2013541793A (en) Multi-mode search query input method
US10037381B2 (en) Apparatus and method for searching information based on Wikipedia's contents
US20210141822A1 (en) Systems and methods for identifying latent themes in textual data
US11361030B2 (en) Positive/negative facet identification in similar documents to search context
US12229780B2 (en) Embedding service for unstructured data
CN115795030A (en) Text classification method and device, computer equipment and storage medium
WO2022105119A1 (en) Training corpus generation method for intention recognition model, and related device thereof
CN117112595A (en) Information query method and device, electronic equipment and storage medium
CN106407316A (en) Topic model-based software question and answer recommendation method and device
CN114818706B (en) A text matching method, device, electronic device and storage medium, and a government service text matching method
CN111126073B (en) Semantic retrieval method and device
CN115062135B (en) A patent screening method and electronic equipment
CN119128150A (en) A text clustering method, device, computer equipment and storage medium
CN114254112B (en) Methods, systems, devices, and media for pre-classification of sensitive information
CN112947844A (en) Data storage method and device, electronic equipment and medium
CN110674383A (en) Public opinion query method, device and equipment
CN111401047A (en) Method, device and computer equipment for generating dispute focus of legal documents

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant