CN116069897A

CN116069897A - Query error correction methods, devices, computer equipment, storage media and program products

Info

Publication number: CN116069897A
Application number: CN202111295015.1A
Authority: CN
Inventors: 陈小帅
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2021-11-03
Filing date: 2021-11-03
Publication date: 2023-05-05

Abstract

The application provides a query error correction method, a query error correction device, a computer device, a storage medium and a program product, and relates to the technical fields of artificial intelligence, cloud technology, intelligent traffic, auxiliary driving and the like. Determining first error indication information of query information based on a target model matched with a user, so that personalized error judgment is performed based on the target model of the user, and the judgment result is more matched with the actual interest requirement of the user; when the query information is wrong, determining at least one piece of candidate information based on target error correction data matched with a user, and subsequently, further screening correction information based on second error indication information of the candidate information determined by the target model; and the target error correction data includes at least interest data of the user; therefore, personalized error correction is carried out on the query information based on the interest angle of the user, the error correction requirement of thousands of people and thousands of faces can be met, the actual accuracy of error judgment and correction is improved, and the actual accuracy of query error correction is further improved.

Description

Query error correction methods, devices, computer equipment, storage media and program products

技术领域technical field

本申请涉及人工智能、云技术、智慧交通、辅助驾驶等技术领域，本申请涉及一种查询纠错方法、装置、计算机设备、存储介质及程序产品。The present application relates to technical fields such as artificial intelligence, cloud technology, intelligent transportation, and assisted driving. The present application relates to a query error correction method, device, computer equipment, storage medium, and program product.

背景技术Background technique

当前，搜索引擎是人们的获取信息最重要的方式之一，用户只需在搜索框中输入关键字，便可查询到感兴趣的相关信息。由于用户所输入的关键字经常出错，因此查询纠错应运而生。查询纠错是对用户输入的关键字进行错误识别与纠正，以使搜索引擎基于纠错后的关键字返回用户期望的查询结果。Currently, search engines are one of the most important ways for people to obtain information. Users only need to input keywords in the search box to find relevant information they are interested in. Since the keywords entered by users are often wrong, query error correction came into being. Query error correction is to identify and correct errors entered by users, so that the search engine can return the query results expected by users based on the corrected keywords.

相关技术中，通常使用搜索引擎统一的词表进行查询纠错，词表可收集有搜索引擎中高频出现的正确词汇；例如，对于任一用户输入的关键字“直拨”，可通过词表统一将“直拨”纠正为“直播”。In related technologies, a unified vocabulary of search engines is usually used for query error correction, and the vocabulary can collect correct words that appear frequently in the search engine; for example, for any user input keyword "direct dial", unified vocabulary can be used Corrected "direct dial" to "live".

上述纠错过程本质是，使用统一的纠错标准来衡量每个用户的关键字背后的查询意图。然而，仍存在部分用户纠错后的关键字与该用户对应的查询意图不相符，从而无法基于纠错后关键字得到期望结果，因此上述查询纠错时实际纠错的准确率较低。The essence of the above error correction process is to use a unified error correction standard to measure the query intent behind each user's keyword. However, there are still some users whose corrected keywords do not match the user's corresponding query intention, so that the expected results cannot be obtained based on the corrected keywords. Therefore, the actual error correction accuracy rate of the above-mentioned query error correction is low.

发明内容Contents of the invention

本申请提供了一种查询纠错的方法、装置、计算机设备、存储介质及程序产品，可以解决相关技术中实际纠错的准确率较低的问题。所述技术方案如下：The present application provides a query error correction method, device, computer equipment, storage medium and program product, which can solve the problem of low accuracy of actual error correction in related technologies. Described technical scheme is as follows:

一方面，提供了一种查询纠错方法，所述方法包括：On the one hand, a query error correction method is provided, the method includes:

响应于接收到用户输入搜索框的查询信息，基于与所述用户匹配的目标模型，确定所述查询信息的第一错误指示信息，所述目标模型用于基于所述用户在目标应用的兴趣数据判定输入所述目标模型的信息是否错误；In response to receiving query information entered into the search box by the user, determining first error indication information of the query information based on a target model matched with the user, the target model being used to base the user's interest data in the target application determining whether the information input into the target model is wrong;

响应于所述第一错误指示信息指示所述查询信息错误，基于与所述用户匹配的目标纠错数据以及所述查询信息，确定与所述查询信息关联的至少一个候选信息，所述目标纠错数据至少包括所述用户在所述目标应用的兴趣数据；In response to the first error indication information indicating that the query information is wrong, determining at least one candidate information associated with the query information based on the target error correction data matched with the user and the query information, the target correction The error data at least includes the interest data of the user in the target application;

基于所述目标模型，确定所述至少一个候选信息的第二错误指示信息；determining second error indication information for the at least one candidate information based on the target model;

基于所述第二错误指示信息，从所述至少一个候选信息中确定所述查询信息的纠正信息。Correction information for the query information is determined from the at least one candidate information based on the second error indication information.

另一方面，提供了一种查询纠错的装置，该装置包括：In another aspect, a query error correction device is provided, which includes:

第一确定模块，用于响应于接收到用户输入搜索框的查询信息，基于与所述用户匹配的目标模型，确定所述查询信息的第一错误指示信息，所述目标模型用于基于所述用户在目标应用的兴趣数据判定输入所述目标模型的信息是否错误；The first determination module is configured to determine first error indication information of the query information based on a target model matched with the user in response to receiving query information input by the user into the search box, and the target model is used to determine the first error indication information of the query information based on the target model The user's interest data in the target application determines whether the information input into the target model is wrong;

候选确定模块，用于响应于所述第一错误指示信息指示所述查询信息错误，基于与所述用户匹配的目标纠错数据以及所述查询信息，确定与所述查询信息关联的至少一个候选信息，所述目标纠错数据至少包括所述用户在所述目标应用的兴趣数据；A candidate determination module, configured to determine at least one candidate associated with the query information based on the target error correction data matching the user and the query information in response to the first error indication information indicating that the query information is wrong Information, the target error correction data at least includes the interest data of the user in the target application;

第二确定模块，用于基于所述目标模型，确定所述至少一个候选信息的第二错误指示信息；A second determining module, configured to determine second error indication information of the at least one candidate information based on the target model;

纠正模块，用于基于所述第二错误指示信息，从所述至少一个候选信息中确定所述查询信息的纠正信息。A correction module, configured to determine correction information of the query information from the at least one candidate information based on the second error indication information.

另一方面，提供了一种计算机设备，包括存储器、处理器及存储在存储器上的计算机程序，所述处理器执行所述计算机程序以实现上述的查询纠错方法。In another aspect, a computer device is provided, including a memory, a processor, and a computer program stored on the memory, and the processor executes the computer program to implement the above query error correction method.

另一方面，提供了一种计算机可读存储介质，其上存储有计算机程序，所述计算机程序被处理器执行时实现上述的查询纠错方法。In another aspect, a computer-readable storage medium is provided, on which a computer program is stored, and when the computer program is executed by a processor, the above query error correction method is implemented.

另一方面，提供了一种计算机程序产品，包括计算机程序，所述计算机程序被处理器执行时实现上述的查询纠错方法。In another aspect, a computer program product is provided, including a computer program, and when the computer program is executed by a processor, the above query error correction method is implemented.

本申请提供的技术方案带来的有益效果是：The beneficial effects brought by the technical solution provided by the application are:

通过基于与用户匹配的目标模型，确定查询信息的第一错误指示信息，该目标模型于基于用户的兴趣数据判定输入模型的信息是否错误，从而可以基于用户的兴趣数据对用户进行个性化错误判定，使得判定结果与用户的实际兴趣需求更匹配；并在查询信息错误时，基于与用户匹配的目标纠错数据，确定与查询信息关联的至少一个候选信息，后续基于目标模型所确定的候选信息的第二错误指示信息，进一步筛选出纠正信息；而目标纠错数据至少包括用户的兴趣数据；从而可以基于用户的兴趣角度的对查询信息进行个性化纠错，保证纠正得到的纠正信息更加精确的匹配用户的真实查询意图，可以满足千人千面的纠错需求，提高了错误判定过程以及纠正过程的实际准确性，进而提高查询纠错的实际准确率。By determining the first error indication information of the query information based on the target model matched with the user, the target model is used to determine whether the information input into the model is wrong based on the user's interest data, so that the user can be personalized and wrongly judged based on the user's interest data , so that the judgment result more matches the user's actual interest needs; and when the query information is wrong, based on the target error correction data that matches the user, determine at least one candidate information associated with the query information, and subsequently determine the candidate information based on the target model The second error indication information is used to further filter out the correction information; and the target error correction data includes at least the user's interest data; thus, the query information can be personalized error correction based on the user's interest angle, ensuring that the correction information obtained after correction is more accurate Matching the user's real query intent can meet the error correction needs of thousands of people, improve the actual accuracy of the error judgment process and correction process, and then improve the actual accuracy of query error correction.

附图说明Description of drawings

为了更清楚地说明本申请实施例中的技术方案，下面将对本申请实施例描述中所需要使用的附图作简单地介绍。In order to more clearly illustrate the technical solutions in the embodiments of the present application, the following briefly introduces the drawings that need to be used in the description of the embodiments of the present application.

图1为本申请实施例提供的一种查询纠错方法的实施环境示意图；FIG. 1 is a schematic diagram of an implementation environment of a query error correction method provided by an embodiment of the present application;

图2a为本申请实施例提供的一种查询纠错方法的流程示意图；FIG. 2a is a schematic flowchart of a query error correction method provided by an embodiment of the present application;

图2b为本申请实施例提供的一种查询纠错方法的流程示意图；Fig. 2b is a schematic flowchart of a query error correction method provided by the embodiment of the present application;

图3为本申请实施例提供的一种用户群模型的结构示意图；FIG. 3 is a schematic structural diagram of a user group model provided in an embodiment of the present application;

图4为本申请实施例提供的一种查询纠错的流程示意图；FIG. 4 is a schematic diagram of a query error correction process provided by an embodiment of the present application;

图5为本申请实施例提供的一种查询纠错装置的结构示意图；FIG. 5 is a schematic structural diagram of a query error correction device provided in an embodiment of the present application;

图6为本申请实施例提供的一种计算机设备的结构示意图。FIG. 6 is a schematic structural diagram of a computer device provided by an embodiment of the present application.

具体实施方式Detailed ways

下面结合本申请中的附图描述本申请的实施例。应理解，下面结合附图所阐述的实施方式，是用于解释本申请实施例的技术方案的示例性描述，对本申请实施例的技术方案不构成限制。Embodiments of the present application are described below with reference to the drawings in the present application. It should be understood that the implementation manner described below in conjunction with the accompanying drawings is an exemplary description for explaining the technical solutions of the embodiments of the present application, and does not limit the technical solutions of the embodiments of the present application.

本技术领域技术人员可以理解，除非特意声明，这里使用的单数形式“一”、“一个”、“所述”和“该”也可包括复数形式。应该进一步理解的是，本申请实施例所使用的术语“包括”以及“包含”是指相应特征可以实现为所呈现的特征、信息、数据、步骤、操作、元件和/或组件，但不排除实现为本技术领域所支持其他特征、信息、数据、步骤、操作、元件、组件和/或它们的组合等。应该理解，当我们称一个元件被“连接”或“耦接”到另一元件时，该一个元件可以直接连接或耦接到另一元件，也可以指该一个元件和另一元件通过中间元件建立连接关系。此外，这里使用的“连接”或“耦接”可以包括无线连接或无线耦接。这里使用的术语“和/或”指示该术语所限定的项目中的至少一个，例如“A和/或B”指示实现为“A”，或者实现为“A”，或者实现为“A和B”。Those skilled in the art will understand that unless otherwise stated, the singular forms "a", "an", "said" and "the" used herein may also include plural forms. It should be further understood that the terms "comprising" and "comprising" used in the embodiments of the present application mean that the corresponding features can be implemented as the presented features, information, data, steps, operations, elements and/or components, but do not exclude The realization is other features, information, data, steps, operations, elements, components and/or their combinations etc. supported by the technical field. It should be understood that when we say that an element is "connected" or "coupled" to another element, the one element can be directly connected or coupled to the other element, or it can mean that the one element and another element pass through intermediate elements. Establish a connection relationship. Additionally, "connected" or "coupled" as used herein may include wireless connection or wireless coupling. The term "and/or" used herein indicates at least one of the items defined by the term, for example, "A and/or B" indicates implementation as "A", or implementation as "A", or implementation as "A and B ".

为使本申请的目的、技术方案和优点更加清楚，下面将结合附图对本申请实施方式作进一步地详细描述。In order to make the purpose, technical solution and advantages of the present application clearer, the implementation manners of the present application will be further described in detail below in conjunction with the accompanying drawings.

本申请提供的查询纠错方法，涉及下述的云计算、云存储、大数据等技术，示例性的，可以利用云计算资源池中包括的计算设备，调用与用户匹配的模型以对用户输入的查询信息进行错误判定；又例如，可以利用云存储技术，存储每个用户的纠错数据，每个用户群的用户群模型，目标应用的通用模型、通用纠错数据；在接收到用户的查询请求时调用模型、相匹配的纠错数据等。示例性的，还可以采用大数据中分布式数据库的方式，对大量用户的兴趣数据、用户群的兴趣数据集等大量数据进行分布式存储。The query error correction method provided by this application involves the following technologies such as cloud computing, cloud storage, and big data. Exemplarily, computing devices included in the cloud computing resource pool can be used to call a model that matches the user to input data to the user. For example, cloud storage technology can be used to store the error correction data of each user, the user group model of each user group, the general model of the target application, and the general error correction data; Invoke the model, matching error correction data, etc. when querying the request. Exemplarily, the way of distributed database in big data can also be used to store a large amount of data such as interest data of a large number of users and interest data sets of user groups in a distributed manner.

云计算(cloud computing)是一种计算模式，它将计算任务分布在大量计算机构成的资源池上，使各种应用系统能够根据需要获取计算力、存储空间和信息服务。提供资源的网络被称为“云”。“云”中的资源在使用者看来是可以无限扩展的，并且可以随时获取，按需使用，随时扩展，按使用付费。Cloud computing is a computing model that distributes computing tasks on a resource pool composed of a large number of computers, enabling various application systems to obtain computing power, storage space and information services as needed. The network that provides resources is called a "cloud". From the user's point of view, the resources in the "cloud" can be infinitely expanded, and can be obtained at any time, used on demand, expanded at any time, and paid according to use.

作为云计算的基础能力提供商，会建立云计算资源池(简称云平台，一般称为IaaS(Infrastructure as a Service，基础设施即服务)平台，在资源池中部署多种类型的虚拟资源，供外部客户选择使用。云计算资源池中主要包括：计算设备(为虚拟化机器，包含操作系统)、存储设备、网络设备。As a basic capability provider of cloud computing, a cloud computing resource pool (referred to as a cloud platform, generally called an IaaS (Infrastructure as a Service, infrastructure as a service) platform will be established, and various types of virtual resources will be deployed in the resource pool for supply. External customers choose to use. The cloud computing resource pool mainly includes: computing equipment (a virtualized machine, including an operating system), storage equipment, and network equipment.

按照逻辑功能划分,在IaaS(Infrastructure as a Service，基础设施即服务)层上可以部署PaaS(Platform as a Service,平台即服务)层，PaaS层之上再部署SaaS(Software as a Service,软件即服务)层，也可以直接将SaaS部署在IaaS上。PaaS为软件运行的平台，如数据库、web容器等。SaaS为各式各样的业务软件，如web门户网站、短信群发器等。一般来说，SaaS和PaaS相对于IaaS是上层。According to the division of logical functions, the PaaS (Platform as a Service) layer can be deployed on the IaaS (Infrastructure as a Service) layer, and the SaaS (Software as a Service, software as a service) layer can be deployed on the PaaS layer. Service) layer, or directly deploy SaaS on IaaS. PaaS is a platform on which software runs, such as databases, web containers, etc. SaaS is a variety of business software, such as web portals, SMS group senders, etc. Generally speaking, SaaS and PaaS are the upper layer relative to IaaS.

云存储(cloud storage)是在云计算概念上延伸和发展出来的一个新的概念，分布式云存储系统(以下简称存储系统)是指通过集群应用、网格技术以及分布存储文件系统等功能，将网络中大量各种不同类型的存储设备(存储设备也称之为存储节点)通过应用软件或应用接口集合起来协同工作，共同对外提供数据存储和业务访问功能的一个存储系统。Cloud storage (cloud storage) is a new concept extended and developed from the concept of cloud computing. Distributed cloud storage system (hereinafter referred to as storage system) refers to the functions of cluster application, grid technology and distributed storage file system. A storage system that integrates a large number of different types of storage devices (storage devices are also called storage nodes) in the network to work together through application software or application interfaces to jointly provide data storage and service access functions.

大数据(Big data)是指无法在一定时间范围内用常规软件工具进行捕捉、管理和处理的数据集合，是需要新处理模式才能具有更强的决策力、洞察发现力和流程优化能力的海量、高增长率和多样化的信息资产。随着云时代的来临，大数据也吸引了越来越多的关注，大数据需要特殊的技术，以有效地处理大量的容忍经过时间内的数据。适用于大数据的技术，包括大规模并行处理数据库、数据挖掘、分布式文件系统、分布式数据库、云计算平台、互联网和可扩展的存储系统。Big data refers to a collection of data that cannot be captured, managed and processed by conventional software tools within a certain period of time. , high growth rates and diverse information assets. With the advent of the cloud era, big data has also attracted more and more attention, and big data requires special techniques to effectively process large amounts of data that tolerate elapsed time. Technologies applicable to big data, including massively parallel processing databases, data mining, distributed file systems, distributed databases, cloud computing platforms, the Internet, and scalable storage systems.

图1为本申请提供的一种查询纠错方法的实施环境示意图。如图1所示，该实施环境包括：服务器101和终端102，该服务器101可以为应用程序的后台服务器。该终端102安装有应用程序，该终端102和该服务器102可以基于该应用程序进行数据交互。FIG. 1 is a schematic diagram of an implementation environment of a query error correction method provided by the present application. As shown in FIG. 1 , the implementation environment includes: a server 101 and a terminal 102. The server 101 may be a background server of an application program. The terminal 102 is installed with an application program, and the terminal 102 and the server 102 can perform data interaction based on the application program.

该应用程序可配置有搜索功能，搜索功能是指基于用户输入的查询信息返回与该查询信息相关的搜索结果。本申请中，该应用程序还可以配置有查询纠错功能，查询纠错是指对用户输入的查询信息进行错误判定并纠正，以使服务器101基于纠正后的信息返回用户期望的搜索结果。本申请中，可以基于用户的兴趣数据对查询信息进行个性化查询纠错，使得应用程序的查询纠错功能可以满足千人千面的个性化需求。The application program may be configured with a search function, and the search function refers to returning search results related to the query information based on the query information input by the user. In this application, the application program can also be configured with a query error correction function. Query error correction refers to the error determination and correction of the query information input by the user, so that the server 101 returns the search result expected by the user based on the corrected information. In this application, personalized query correction can be performed on the query information based on the user's interest data, so that the query correction function of the application program can meet the personalized needs of thousands of people.

个性化查询纠错是指，不同用户错误的类型、错误的纠正目标不一样，本申请可以实现对用户输入的查询信息进行个性化纠错，满足不同兴趣用户的个性化纠错需求，提升视频平台的搜索用户体验。例如，用户错误地输入了查询信息“李浅名场面合辑”，对“庆余年”感兴趣用户纠正为“李沁名场面合辑”更合适，对“李卫当官”古装官场剧感兴趣的用户纠正为“李倩”更为合理。又例如，用户输入了查询信息“惊诧故事”，对于喜欢看惊悚类视频用户来说，用户输入的query没有问题，但对于倾向警匪片、成龙粉丝来说，纠正为“警察故事”更为合理与智能。Personalized query error correction means that different users have different error types and error correction goals. This application can realize personalized error correction for user input query information, meet the personalized error correction needs of users with different interests, and improve video quality. Platform search user experience. For example, the user mistakenly entered the query information "Li Qianming Scene Collection", and the user who is interested in "Celebrating More Than Years" corrected it to "Li Qinming Scene Collection" is more appropriate. It is more reasonable for the user to correct it as "Li Qian". For another example, the user enters the query information "Surprise Story". For users who like to watch horror videos, the query input by the user is no problem, but for fans who tend to watch gangster movies and Jackie Chan, it is more reasonable and appropriate to correct it to "Police Story". intelligent.

在一个可能场景中，该终端102可以向服务器101发送用户输入的查询信息，由服务器101对查询信息进行错误判定并纠正，得到纠正信息。服务器101继续基于纠正信息进行搜索。在另一可能场景中，由终端102获取用户输入的查询信息并对查询信息进行错误判定并纠正，终端102将纠正得到纠正信息发送至服务器101，服务器基于纠正信息进行搜索。In a possible scenario, the terminal 102 may send the query information input by the user to the server 101, and the server 101 will judge and correct the query information to obtain corrected information. The server 101 continues searching based on the corrected information. In another possible scenario, the terminal 102 acquires the query information input by the user and makes an error judgment and correction on the query information. The terminal 102 sends the corrected information to the server 101, and the server searches based on the corrected information.

当然，对于查询信息的错误判定并纠正过程，也可以由服务器101和终端102之间交互实现；例如，在另一场景示例中，由终端102对查询信息进行错误判定，并将查询信息和错误判定的结果发送至服务器101，由服务器101基于错误判定的结果对查询信息进行纠正；或者，服务器101和终端102均可以对查询信息进行错误判定和纠正，也即是，终端102将错误判定和纠正后的信息发送至服务器101，服务器101对终端102发送的信息进行再次错误判定和纠正。当然，本申请实施例中，查询纠错过程以及查询纠错过程包括的各个步骤，可以由终端或服务器等任一计算机设备执行或多个计算机设备之间交互执行，本申请实施例对查询纠错的具体执行设备不做限制。Of course, the error determination and correction process of the query information can also be realized through interaction between the server 101 and the terminal 102; The result of the judgment is sent to the server 101, and the server 101 corrects the query information based on the result of the wrong judgment; or, both the server 101 and the terminal 102 can perform wrong judgment and correction on the query information, that is, the terminal 102 makes the wrong judgment and The corrected information is sent to the server 101, and the server 101 performs error determination and correction on the information sent by the terminal 102 again. Of course, in this embodiment of the application, the query error correction process and the various steps included in the query error correction process can be executed by any computer device such as a terminal or server, or between multiple computer devices. There is no restriction on the specific execution device of the error.

本申请实施例可应用于各种场景，包括但不限于大数据搜索、云技术、人工智能、智慧交通、辅助驾驶等。该应用程序可以为支持查询纠错功能的任意应用，例如，视频应用、直播应用、内容交互应用、社交应用、搜索引擎、购物应用、游戏应用、企业通讯与办公工具平台等。例如，对于用户在视频应用的搜索框中输入查询信息以进行查询的场景，本申请中可以先对查询信息进行纠错，以基于纠正后的信息为用户返回期望的视频列表。该应用程序可以为独立的应用，也可以为安装在独立应用的程序插件，例如，独立的视频应用或者安装于社交应用中视频播放小程序、浏览器中基于Web网页的视频网站等。The embodiments of the present application can be applied to various scenarios, including but not limited to big data search, cloud technology, artificial intelligence, smart transportation, assisted driving, etc. The application can be any application that supports the query error correction function, for example, a video application, a live broadcast application, a content interaction application, a social application, a search engine, a shopping application, a game application, an enterprise communication and office tool platform, and the like. For example, for a scenario where a user enters query information in the search box of a video application to perform a query, the application may first correct the query information, so as to return the desired video list to the user based on the corrected information. The application program can be an independent application, or a program plug-in installed in the independent application, for example, an independent video application or a video playback applet installed in a social application, a video website based on a web page in a browser, and the like.

服务器101可以是独立的物理服务器，也可以是多个物理服务器构成的服务器集群或者分布式系统，还可以是提供云服务、云数据库、云计算、云函数、云存储、网络服务、云通信、中间件服务、域名服务、安全服务、CDN(Content Delivery Network，内容分发网络)、以及大数据和人工智能平台等基础云计算服务的云服务器或服务器集群。上述网络可以包括但不限于：有线网络，无线网络，其中，该有线网络包括：局域网、城域网和广域网，该无线网络包括：蓝牙、Wi-Fi及其他实现无线通信的网络。终端102可以是手机、智能语音交互设备、智能家电、车载终端(例如车载导航终端、车载电脑等)、平板电脑、笔记本电脑、数字广播接收器、MID(Mobile Internet Devices，移动互联网设备)、台式计算机、智能音箱、智能手表等。终端102以及服务器101可以通过有线或无线通信方式进行直接或间接地连接，但并不局限于此。另外，该终端102以及服务器101的数量可以为一个或多个，本申请实施例对终端数量和服务器数量的具体数值不做限定。具体也可基于实际应用场景需求确定，在此不作限定。The server 101 can be an independent physical server, or a server cluster or a distributed system composed of multiple physical servers, and can also provide cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communications, Cloud servers or server clusters for basic cloud computing services such as middleware services, domain name services, security services, CDN (Content Delivery Network, content distribution network), and big data and artificial intelligence platforms. The above-mentioned network may include but not limited to: wired network, wireless network, wherein, the wired network includes: local area network, metropolitan area network and wide area network, and the wireless network includes: bluetooth, Wi-Fi and other networks that realize wireless communication. The terminal 102 can be a mobile phone, an intelligent voice interaction device, a smart home appliance, a vehicle-mounted terminal (such as a vehicle-mounted navigation terminal, a vehicle-mounted computer, etc.), a tablet computer, a notebook computer, a digital broadcast receiver, a MID (Mobile Internet Devices, mobile Internet device), a desktop Computers, smart speakers, smart watches, etc. The terminal 102 and the server 101 may be connected directly or indirectly through wired or wireless communication, but is not limited thereto. In addition, the number of terminals 102 and servers 101 may be one or more, and the embodiment of the present application does not limit the specific values of the number of terminals and the number of servers. The details can also be determined based on actual application scenario requirements, and are not limited here.

图2a为本申请实施例提供的一种查询纠错方法的流程示意图。该方法的执行主体可以为计算机设备，该计算机设备可以为服务器或终端。如图2所示，该方法包括以下步骤。Fig. 2a is a schematic flowchart of a query error correction method provided by an embodiment of the present application. The execution body of the method may be a computer device, and the computer device may be a server or a terminal. As shown in Figure 2, the method includes the following steps.

步骤201、计算机设备响应于接收到用户输入搜索框的查询信息，基于与该用户匹配的目标模型，确定该查询信息的第一错误指示信息。Step 201: In response to receiving query information entered into a search box by a user, the computer device determines first error indication information of the query information based on a target model matched with the user.

其中，该目标模型用于基于该用户在目标应用的兴趣数据判定输入该目标模型的信息是否错误。该计算机设备将该查询信息输入该目标模型，通过该目标模型对该查询信息的错误判定过程，输出该查询信息的第一错误指示信息。该第一错误指示信息至少用于指示该查询信息是否错误。示例性的，用户成熟度不同，与该用户匹配的目标模型也可以不相同；该计算机设备可以配置有用户成熟度与错误判定模型之间的关联关系；该计算机设备可基于该关联关系，将该用户的用户成熟度对应的错误判定模型作为目标模型。示例性的，该第一错误指示信息至少包括该查询信息是否正确的正确概率。Wherein, the target model is used to determine whether the information input into the target model is wrong based on the interest data of the user in the target application. The computer device inputs the query information into the target model, and outputs first error indication information of the query information through the target model's error determination process of the query information. The first error indication information is at least used to indicate whether the query information is incorrect. Exemplarily, the target model matching the user may be different if the user maturity is different; the computer device may be configured with an association between the user maturity and the error judgment model; the computer device may be based on the association, The error judgment model corresponding to the user maturity of the user is used as the target model. Exemplarily, the first error indication information includes at least a correct probability of whether the query information is correct.

步骤202、计算机设备响应于该第一错误指示信息指示该查询信息错误，基于与该用户匹配的目标纠错数据以及该查询信息，确定与该查询信息关联的至少一个候选信息。Step 202: The computer device determines at least one candidate information associated with the query information based on the target error correction data matching the user and the query information in response to the first error indication information indicating that the query information is wrong.

其中，该目标纠错数据至少包括该用户在该目标应用的兴趣数据。本申请中，该计算机设备可以基于查询信息，从目标纠错数据中查找与该查询信息相似的相似信息，将相似度超过目标相似度阈值的相似信息作为该至少一个候选信息。当然，该候选信息是从目标纠错数据中查找的信息，因此，该候选信息也包括该用户在该目标应用的兴趣数据。Wherein, the target error correction data at least includes the user's interest data in the target application. In the present application, the computer device may search the target error correction data for similar information similar to the query information based on the query information, and use the similar information whose similarity exceeds the target similarity threshold as the at least one candidate information. Certainly, the candidate information is information searched from the target error correction data, therefore, the candidate information also includes the user's interest data in the target application.

示例性的，该计算机设备可以将该用户的纠错数据或者用户群的纠错数据中的至少一项，作为该目标纠错数据。其中，该用户群的纠错数据包括该用户群的兴趣数据集，该用户的纠错数据包括该用户在目标应用的兴趣数据。当然，该目标纠错数据还可以包括通用纠错数据，该通用纠错数据包括该目标应用的全局用户的兴趣数据集。Exemplarily, the computer device may use at least one item of the user's error correction data or the user group's error correction data as the target error correction data. Wherein, the error correction data of the user group includes the interest data set of the user group, and the error correction data of the user includes the interest data of the user in the target application. Certainly, the target error correction data may also include general error correction data, and the general error correction data includes interest data sets of global users of the target application.

步骤203、计算机设备基于该目标模型，确定该至少一个候选信息的第二错误指示信息。Step 203, the computer device determines second error indication information of the at least one candidate information based on the target model.

该计算机设备将每个候选信息输入目标模型，得到每个候选信息的第二错误指示信息，该第二错误指示信息包括候选信息的正确概率。The computer device inputs each candidate information into the target model to obtain second error indication information for each candidate information, the second error indication information including the correct probability of the candidate information.

步骤204、计算机设备基于该第二错误指示信息，从该至少一个候选信息中确定该查询信息的纠正信息。Step 204, the computer device determines correction information of the query information from the at least one candidate information based on the second error indication information.

该计算机设备可以从至少一个候选信息中筛选出满足指定概率条件的纠正信息，例如，指定概率条件可以为候选信息的正确概率大于0.8、候选信息的正确概率大于查询信息的正确概率等。The computer device can filter corrected information satisfying a specified probability condition from at least one candidate information, for example, the specified probability condition can be that the correct probability of the candidate information is greater than 0.8, the correct probability of the candidate information is greater than the correct probability of the query information, and the like.

图2b为本申请实施例提供的一种查询纠错方法的流程示意图。该方法的执行主体可以为计算机设备，该计算机设备可以为服务器或终端。如图2b所示，该方法包括以下步骤。Fig. 2b is a schematic flowchart of a query error correction method provided by the embodiment of the present application. The execution body of the method may be a computer device, and the computer device may be a server or a terminal. As shown in Figure 2b, the method includes the following steps.

步骤301、计算机设备接收用户输入搜索框的查询信息。Step 301, the computer device receives the query information input by the user into the search box.

用户可以基于目标应用的搜索框输入查询信息；如果该计算机设备为终端，终端检测到搜索框中包括查询信息时获取该查询信息，或者终端检测到用户触发的搜索指令时，获取该搜索框中查询信息。如果该计算机设备为服务器，该终端将该查询信息发送至服务器，服务器接收终端发送的查询信息。The user can input query information based on the search box of the target application; if the computer device is a terminal, the terminal acquires the query information when it detects that the search box includes the query information, or when the terminal detects a search instruction triggered by the user, it acquires the information in the search box. search information. If the computer device is a server, the terminal sends the query information to the server, and the server receives the query information sent by the terminal.

该查询信息可以包括用户输入的查询字符串、从查询图像中提取的字符串或者表情图标的文字描述信息中的至少一项。示例性的，用户可以在搜索框中输入字符串。或者，用户还可以基于搜索框输入图像、表情图标等，计算机设备可以获取图像、表情图标等对应的字符串。相应的，计算机设备接收查询信息的实现方式，至少可以包括以下方式一至方式四所示出的四种方式。The query information may include at least one of a query character string input by a user, a character string extracted from a query image, or text description information of an emoticon. Exemplarily, the user may input a character string in the search box. Alternatively, the user can also input images, emoticons, etc. based on the search box, and the computer device can obtain character strings corresponding to the images, emoticons, and the like. Correspondingly, the implementation manner for the computer device to receive the query information may include at least the four manners shown in the following manners 1 to 4.

方式一、计算机设备接收该用户在该目标应用的搜索框中输入的查询字符串，将该查询字符串作为该查询信息。Mode 1: The computer device receives the query string input by the user in the search box of the target application, and uses the query string as the query information.

其中，字符串可以包括但不限于文字、拼音、符号、数字、标点符号、运算符等。目标应用的页面中可以包括搜索框，该搜索框可以包括字符串输入插件，用户可以触发该字符串输入插件，以在搜索框中输入查询字符串，计算机设备接收用户输入的查询字符串。Wherein, the character string may include but not limited to words, pinyin, symbols, numbers, punctuation marks, operators and so on. The page of the target application may include a search box, and the search box may include a character string input plug-in, and the user may trigger the character string input plug-in to input a query string in the search box, and the computer device receives the query string input by the user.

方式二、计算机设备接收该用户通过该搜索框输入的查询图像，并提取该查询图像包括的字符串，将从该查询图像中提取的字符串作为该查询信息。Method 2: The computer device receives the query image input by the user through the search box, extracts a character string included in the query image, and uses the character string extracted from the query image as the query information.

搜索框还可以包括图像输入插件，用户可以触发该图像输入插件以输入查询图像，计算机设备可以对该查询图像进行字符串提取，得到与该查询图像中包括的字符串。例如，查询图像可以为包括影视剧名称的影视剧封面图像，计算机设备可以从影视剧封面图像中提取影视剧名称。当然，计算机设备也可以对查询图像进行图像识别，识别出图像内容对应的文字信息，例如，基于图像中所识别出明星头像，进一步得到该明星的姓名。The search box may also include an image input plug-in, the user can trigger the image input plug-in to input a query image, and the computer device can perform character string extraction on the query image to obtain a character string included in the query image. For example, the query image may be a film and television drama cover image including a film and television drama title, and the computer device may extract the film and television drama title from the film and television drama cover image. Of course, the computer device can also perform image recognition on the query image to recognize the text information corresponding to the image content, for example, based on the recognized celebrity's head portrait in the image, and further obtain the celebrity's name.

方式三、计算机设备接收该用户通过该搜索框输入的表情图标，并获取该表情图标的文字描述信息，将该表情图标的文字描述信息作为该查询信息。Method 3: The computer device receives the emoticon input by the user through the search box, obtains the text description information of the emoticon, and uses the text description information of the emoticon as the query information.

搜索框还可以包括表情图标输入插件，用户可以触发表情输入插件以输入表情图标，计算机设备可以从表情图标和文字描述信息的关联关系中，获取用户输入的表情图标所对应的文字描述信息。示例性的，用户触发目标应用的页面中搜索框的表情图像输入插件，页面中可以显示多个候选表情图标，计算机设备可以获取多个候选表情图标中被用户选中的表情图标。例如，候选表情图标中可以包括多个影视剧中多种角色对应的表情，例如用户可以选择感兴趣的影视剧女主角对应表情图标进行查询。The search box can also include an emoticon input plug-in, and the user can trigger the emoticon input plug-in to input an emoticon, and the computer device can obtain the text description information corresponding to the emoticon input by the user from the association between the emoticon icon and the text description information. Exemplarily, the user triggers the emoticon image input plug-in of the search box on the page of the target application, multiple candidate emoticons may be displayed on the page, and the computer device may acquire the emoticon selected by the user among the multiple candidate emoticons. For example, candidate emoticons may include emoticons corresponding to various roles in multiple film and television dramas, for example, a user may select an emoticon corresponding to an interested heroine of a film and television drama to query.

方式四、计算机设备接收该用户通过该搜索框输入的查询字符串和表情图标，并获取该表情图标的文字描述信息，将该查询字符串和该表情图标的文字描述信息作为该查询信息。Mode 4: The computer device receives the query string and the emoticon input by the user through the search box, and obtains the text description information of the emoticon, and uses the query string and the text description information of the emoticon as the query information.

用户可以输入字符串和表情图标两种类型的信息，该查询信息可以包括查询字符串和表情图标的文字描述信息。其中，计算机设备获取表情图标的文字描述信息的方式，与上述方式三的实现方式同理，此处不再一一赘述。The user may input two types of information, character strings and emoticons, and the query information may include query strings and text description information of emoticons. Wherein, the manner in which the computer device obtains the text description information of the emoticon is the same as the implementation manner of the third manner above, and will not be repeated here.

需要说明的是，计算机设备可以结合上述方式一至方式三中任意两种或三种方式来获取查询信息，本申请仅以上述方式四中结合用户输入字符串和表情图标进行举例说明，当然，还可以结合用户输入字符以及图像获取查询信息，或者结合用户输入字符、图像以及表情图标获取查询信息等多种结合方式，本申请实施例对具体的结合方式不做限定。It should be noted that the computer device can obtain query information by combining any two or three of the above methods 1 to 3. This application only uses the combination of user input strings and emoticons in the above method 4 for illustration. The query information may be obtained by combining the user input of characters and images, or the query information may be obtained by combining the user input of characters, images, and emoticons. The embodiment of the present application does not limit the specific combination method.

本步骤中，通过配置搜索框，既可以准确获取用户在搜索框中输入的字符串作为查询信息，后续基于用户输入的字符串进行查询纠错，提高了获取查询信息的准确性；又可以将从查询图像中提取的字符串或者表情图标的文字描述信息作为查询信息，支持对用户在多种搜索途径下输入信息的查询纠错，提高了本申请中查询纠错方法的适用性。In this step, by configuring the search box, the character string entered by the user in the search box can be accurately obtained as the query information, and subsequent query error correction is performed based on the string entered by the user, which improves the accuracy of obtaining the query information; The text description information of character strings or emoticons extracted from the query image is used as query information, which supports query error correction of information input by users in various search ways, and improves the applicability of the query error correction method in this application.

步骤302、响应于接收到用户输入搜索框的查询信息，计算机设备基于该用户的用户成熟度，确定与该用户匹配的目标模型。Step 302: In response to receiving the query information entered by the user into the search box, the computer device determines a target model matching the user based on the user maturity of the user.

该用户成熟度用于指示该用户在该目标应用的活跃程度；该目标模型用于基于该用户在目标应用的兴趣数据判定输入该目标模型的信息是否错误。用户成熟度不同，与该用户匹配的目标模型也可以不相同；本步骤中，该计算机设备可以确定该用户的用户成熟度，再基于该用户成熟度确定目标模型。该计算机设备可以配置有用户成熟度与错误判定模型之间的关联关系；该计算机设备可基于该关联关系，将该用户的用户成熟度对应的错误判定模型作为目标模型。The user maturity is used to indicate the user's activity level in the target application; the target model is used to determine whether the information input into the target model is wrong based on the user's interest data in the target application. Depending on the user maturity, the target model matching the user may also be different; in this step, the computer device may determine the user maturity of the user, and then determine the target model based on the user maturity. The computer device may be configured with an association relationship between the user maturity and the error judgment model; based on the association relationship, the computer device may use the error judgment model corresponding to the user maturity of the user as the target model.

在一种可能实施方式中，本步骤可以包括以下步骤3021-步骤3022(图中未示出)。In a possible implementation manner, this step may include the following steps 3021-3022 (not shown in the figure).

步骤3021、计算机设备确定该用户的用户成熟度。Step 3021, the computer device determines the user maturity of the user.

该计算机设备可以基于用户在该目标应用的活跃时间、交互情况等，确定用户成熟度。该计算机设备可以基于该用户在目标应用的活跃时间或者用户兴趣标签数中的至少一项，确定该用户成熟度。在一种可能实施方式中，该用户成熟度可以是用户在目标应用的活跃时间内累计的用户兴趣画像的成熟程度。该计算机设备确定该用户成熟度的过程可以包括：该计算机设备确定该用户在该目标应用的活跃时间；该计算机设备基于该用户的交互操作的操作对象关联标签，统计该用户在该活跃时间内的兴趣标签数；该计算机设备基于该活跃时间和该兴趣标签数，确定该用户成熟度。示例性的，该兴趣标签数可以是用户兴趣画像中所包括的多个兴趣标签的总数。示例性的，该计算机设备可以将交互操作的操作对象关联标签作为该用户的兴趣标签，并基于交互操作的次数，统计用户的兴趣标签数。例如，每新增一个兴趣标签，配置该兴趣标签的兴趣标签数为1，每增加一次对该兴趣标签的交互操作，将该兴趣标签数累加1。示例性的，该活跃时间可以包括用户在该目标应用上进行交互操作的时长，例如，交互操作包括但不限于：播放视频、点赞、发弹幕消息、发布评论、发布视频、浏览社区中帖子、下载视频、与目标应用上其他用户互动、浏览商品详情页或者商品交易等与目标应用相关的任意操作。The computer device may determine user maturity based on the user's active time, interaction conditions, and the like in the target application. The computer device may determine the maturity of the user based on at least one of the user's active time in the target application or the number of user interest tags. In a possible implementation manner, the user maturity may be the maturity of user interest profiles accumulated by the user during the active time of the target application. The process for the computer device to determine the user's maturity may include: the computer device determines the user's active time in the target application; the computer device calculates the user's active time within the active time based on the operation object associated label of the user's interactive operation the number of interest tags; the computer device determines the maturity of the user based on the active time and the number of interest tags. Exemplarily, the number of interest tags may be the total number of multiple interest tags included in the user interest profile. Exemplarily, the computer device may use the tag associated with the operation object of the interactive operation as the user's interest tag, and count the number of the user's interest tags based on the number of interactive operations. For example, each time an interest tag is added, the number of interest tags of the interest tag is configured as 1, and each time an interactive operation of the interest tag is added, the number of interest tags is accumulated by 1. Exemplarily, the active time may include the duration of the user's interactive operations on the target application. For example, the interactive operations include but are not limited to: playing videos, giving likes, sending barrage messages, posting comments, posting videos, and browsing in the community. Any operation related to the target app, such as posting, downloading videos, interacting with other users on the target app, browsing product detail pages or product transactions.

该计算机设备基于该活跃时间和该兴趣标签数，确定该用户成熟度的过程可以包括：当该活跃时间不超过目标时间阈值时，该计算机设备确定该用户成熟度为第一成熟度；当该活跃时间超过目标时间阈值时，如果兴趣标签数不超过目标标签数阈值，该计算机设备确定用户成熟度为第二成熟度；如果活跃时间超过目标时间阈值且兴趣标签数超过目标标签数阈值，该计算机设备确定用户成熟度为第三成熟度。示例性的，该目标时间阈值、目标标签数阈值可以基于需要进行配置，本申请实施例对此不做具体限定。例如，目标时间阈值可以为5天、10天等；目标标签数阈值可以为300、500等。Based on the active time and the number of interest tags, the computer device determines the user maturity. The process may include: when the active time does not exceed the target time threshold, the computer device determines that the user maturity is the first maturity; when the When the active time exceeds the target time threshold, if the number of interest tags does not exceed the target tag number threshold, the computer device determines that the user maturity is the second maturity; if the active time exceeds the target time threshold and the interest tag number exceeds the target tag number threshold, the The computer device determines the user maturity as a third maturity. Exemplarily, the target time threshold and the target tag count threshold can be configured based on needs, which are not specifically limited in this embodiment of the present application. For example, the target time threshold may be 5 days, 10 days, etc.; the target label number threshold may be 300, 500, etc.

在一种可能示例中，该计算机设备可以配置有三种用户成熟度，按照用户成熟度高低进行升序排列，该三种用户成熟度可以包括新冷、用户兴趣画像不够成熟以及用户兴趣画像成熟；示例性的，可以通过如下(1)-(3)的过程判定用户为哪种成熟度的用户：In a possible example, the computer device may be configured with three user maturity levels, which are arranged in ascending order according to the level of user maturity, and the three user maturity levels may include newness, immature user interest portrait, and mature user interest portrait; example Specifically, the maturity level of the user can be determined through the following (1)-(3) process:

(1)判定是否为新冷用户：用户的活跃时间是否小于T(如T为5天)，小于T则为新冷用户；(1) Determine whether it is a new cold user: whether the active time of the user is less than T (such as T is 5 days), if it is less than T, it is a new cold user;

(2)判定是否为用户兴趣画像不够成熟用户：如果不是新冷用户，继续判断用户兴趣标签数量是否小于等于K个(如K为300)，如果小于等于K则为用户兴趣画像不够成熟用户；(2) Determine whether the user’s interest profile is not mature enough: if it is not a new user, continue to determine whether the number of user interest tags is less than or equal to K (for example, K is 300), if it is less than or equal to K, the user’s interest profile is not mature enough;

(3)判定是否为用户兴趣画像成熟用户：如果不是新冷用户且用户兴趣标签数量大于K个，则为用户兴趣画像成熟用户。(3) Determine whether it is a mature user of user interest portrait: if it is not a new user and the number of user interest tags is greater than K, it is a mature user of user interest portrait.

在一种可能示例中，该操作对象关联标签可以包括但不限于：与该操作对象相关联的标签、与该操作对象的关联对象相关联的标签、用户通过标签配置操作所配置的兴趣标签等。例如，操作对象的关联对象可以为：用户发布或回复的帖子所关联的某个影视剧、某个创作者等。示例性的，该目标应用可以为视频应用，该视频应用配置有查询纠错功能。用户在该视频应用的交互操作可以包括但不限于：视频播放操作、视频互动操作、在社区中帖子互动操作、对视频应用中创作者的互动操作、标签配置操作等。示例性的，该计算机设备可以通过如下(1)-(3)的过程统计兴趣标签数：In a possible example, the associated tags of the operation object may include, but not limited to: tags associated with the operation object, tags associated with objects associated with the operation object, interest tags configured by the user through tag configuration operations, etc. . For example, the associated object of the operation object may be: a certain film and television drama, a certain creator, etc. associated with the post published or replied by the user. Exemplarily, the target application may be a video application, and the video application is configured with a query error correction function. The user's interactive operations on the video application may include, but are not limited to: video playback operations, video interactive operations, interactive operations on posts in the community, interactive operations on creators in the video application, label configuration operations, etc. Exemplarily, the computer device can count the number of tags of interest through the following processes (1)-(3):

(1)用户完成对一个视频的有效播放操作(播放完成度大于一定的阈值；例如播放时长超过30分钟)、对一个视频互动操作(点赞、转发、评论、弹幕、分享等)，则将该视频的标签累积到用户的用户兴趣画像；如果用户兴趣画像中没有相同兴趣标签，则将该兴趣标签的兴趣标签数配置为1；如果用户画像中已有相同兴趣标签，对用户兴趣画像中相同兴趣标签的兴趣标签数累加1。(1) The user completes an effective playback operation of a video (the playback completion degree is greater than a certain threshold; for example, the playback time exceeds 30 minutes), and interacts with a video (like, forward, comment, barrage, share, etc.), then Accumulate the tags of the video to the user's user interest profile; if the user interest profile does not have the same interest tag, configure the number of interest tags of the interest tag as 1; if the user profile already has the same interest tag, the user interest profile The number of interest tags of the same interest tag in .

(2)用户在社区中帖子互动操作兴趣积累：在社区发布的帖子或者回复的帖子中关联到某个人物、某个影视剧、某个影视角色，将对应影视剧的演员、角色、人物标签、特征描述标签等关联标签累积到用户画像；如果用户兴趣画像中没有相同兴趣标签，则将该兴趣标签的兴趣标签数配置为1；如果用户兴趣画像中已有相同兴趣标签，对用户兴趣画像中相同兴趣标签的兴趣标签数累加1。(2) Accumulation of user interest in post interaction operations in the community: If a post or reply post in the community is associated with a certain character, a certain film and television drama, or a certain film and television character, it will correspond to the actors, roles, and characters of the film and television drama. , feature description tags and other related tags are accumulated to the user profile; if the user interest profile does not have the same interest tag, the number of interest tags of the interest tag is configured as 1; if the user interest profile already has the same interest tag, the user interest profile The number of interest tags of the same interest tag in .

(3)用户完成对一个创作者的互动操作(关注操作、点赞操作、创造者的作品分享操作、创作者的主页分享操作等)，将该创作者所有视频的标签累积到用户画像，如果用户兴趣画像中没有相同兴趣标签，则将该兴趣标签的兴趣标签数配置为1；如果用户兴趣画像中已有相同兴趣标签，对用户兴趣画像中相同兴趣标签的兴趣标签数累加1。(3) The user completes an interactive operation on a creator (following operation, like operation, sharing operation of the creator's work, sharing operation of the creator's homepage, etc.), and the tags of all the creator's videos are accumulated to the user portrait, if If there is no same interest tag in the user interest profile, configure the number of interest tags of the interest tag as 1; if the same interest tag already exists in the user interest profile, add 1 to the number of interest tags of the same interest tag in the user interest profile.

在一种可能示例中，还可以对用户兴趣画像中兴趣标签进行衰减；例如，如果用户的某个兴趣标签的兴趣标签数在时间D(例如一周、一个月、半年等)内没有增长，将对该兴趣标签的兴趣标签数进行衰减，例如，可以采用半衰期衰减方式，每经过时间D将兴趣标签数衰减为之前的一半；例如，原兴趣标签数为100、时间D为一个月，则第一个月后兴趣标签数衰减为50，第二个月后兴趣标签数衰减为25，以此类推。In a possible example, the interest tags in the user interest profile can also be attenuated; for example, if the number of interest tags of a certain interest tag of the user does not increase within time D (such as one week, one month, half a year, etc.), the Attenuate the number of interest tags of this interest tag. For example, the half-life decay method can be used to decay the number of interest tags to half of the previous time D. For example, if the original number of interest tags is 100 and the time D is one month, then the first After one month, the number of interest tags decays to 50, after the second month, the number of interest tags decays to 25, and so on.

步骤3022、计算机设备响应于该用户成熟度超过成熟度阈值时，计算机设备基于与该用户的兴趣相似的用户群，确定与该用户匹配的至少一个目标模型。Step 3022: When the computer device responds that the user's maturity exceeds the maturity threshold, the computer device determines at least one target model that matches the user based on user groups that have similar interests to the user.

该目标模型至少包括与该用户的兴趣相似的用户群所对应的用户群模型，该用户群模型用于基于用户群的兴趣数据集判定输入该用户群模型的信息是否错误，该用户群包括该用户的至少两个兴趣相似用户；用户群的兴趣数据集包括该用户在目标应用的兴趣数据。在一种可能示例中，本步骤可以包括：当该用户成熟度超过成熟度阈值时，该计算机设备可以基于与该用户的兴趣相似的用户群，将该用户群对应的用户群模型确定为与该用户匹配的目标模型。在另一种可能示例中，该目标模型可以包括用户群模型和通用模型，该通用模型用于基于该目标应用的全局用户数据集判定输入该通用模型的信息是否错误；相应的，本步骤可以包括：当该用户成熟度超过成熟度阈值时，该计算机设备还可以基于与该用户的兴趣相似的用户群，将通用模型和该用户群模型确定为与该用户匹配的目标模型。示例性的，该成熟度阈值可以为第一成熟度，也即是，当该用户成熟度为第二成熟度或第三成熟度时，该用户成熟度超过成熟度阈值。The target model at least includes a user group model corresponding to a user group with similar interests to the user, and the user group model is used to determine whether the information input to the user group model is wrong based on the interest data set of the user group, and the user group includes the user group The user has at least two users with similar interests; the interest data set of the user group includes the interest data of the user in the target application. In a possible example, this step may include: when the user maturity exceeds a maturity threshold, the computer device may determine, based on a user group with similar interests to the user, that the user group model corresponding to the user group is the same as The target model that this user matched. In another possible example, the target model may include a user group model and a general model, and the general model is used to determine whether the information input into the general model is wrong based on the global user data set of the target application; correspondingly, this step may Including: when the maturity of the user exceeds the maturity threshold, the computer device may also determine the general model and the user group model as target models matching the user based on user groups with similar interests to the user. Exemplarily, the maturity threshold may be the first maturity, that is, when the user maturity is the second maturity or the third maturity, the user maturity exceeds the maturity threshold.

需要说明的是，该计算机设备可以基于用户的兴趣标签，将目标应用的多个用户划分为多个用户群，每个用户群包括兴趣相似的多个用户。在一种可能示例中，该计算机设备可以对用户成熟度为第三成熟度的用户进行聚类，例如，对兴趣成熟的用户按兴趣相似度进行聚类，构建兴趣相似的用户群。其中，两个用户之间的兴趣相似度为：两个用户的兴趣标签集合所包括的重复兴趣标签的累积计数除以两个用户的兴趣标签数的商值。例如，可以采用KMeans(k-means clustering algorithm，k均值聚类算法)聚类方法，对兴趣成熟用户进行聚类得到多个兴趣相似的用户群，以及每个兴趣相似用户群的聚类中心。其中，每个用户群的聚类中心可以该用户群的中心点，例如，聚类中心可以包括中心点所对应的各个兴趣标签的兴趣标签数。示例性的，采用KMeans聚类方法的目标使得同一聚类内用户兴趣相近，不同聚类间用户兴趣相差较远。采用KMeans聚类时目标聚类数量可以为C，其中，C可以为兴趣成熟用户数量除以标准用户数U的商值，例如，U可配置为10万。It should be noted that the computer device may divide multiple users of the target application into multiple user groups based on user interest tags, and each user group includes multiple users with similar interests. In a possible example, the computer device may cluster users whose user maturity is the third maturity, for example, cluster users with mature interests according to interest similarity to construct a user group with similar interests. Wherein, the interest similarity between two users is: the quotient of the cumulative count of repeated interest tags included in the interest tag sets of the two users divided by the number of interest tags of the two users. For example, the KMeans (k-means clustering algorithm, k-means clustering algorithm) clustering method can be used to cluster users with mature interests to obtain multiple user groups with similar interests and the cluster center of each user group with similar interests. Wherein, the clustering center of each user group may be a center point of the user group, for example, the clustering center may include the number of interest tags of each interest tag corresponding to the center point. Exemplarily, the goal of using the KMeans clustering method is to make the interests of users in the same cluster similar, and the interests of users in different clusters are far apart. When using KMeans clustering, the number of target clusters can be C, where C can be the quotient of the number of users with mature interests divided by the number of standard users U, for example, U can be configured as 100,000.

通过构建兴趣相似的用户群，能够辅助兴趣不成熟用户的查询纠错，例如，通过为兴趣不成熟用户映射到兴趣成熟的用户群，提升兴趣不成熟用户的个性化数据丰富度。另外，还能够不用为每个用户单独错误判定模型，避免了错误判定模型训练与存储成本高等问题，从而不用为每个用户单独构建模型，通过为用户群构建错误判定模型，既能训练与存储成本可控，同时满足用户兴趣个性化纠错需求。By building user groups with similar interests, it is possible to assist users with immature interests in query error correction. For example, by mapping immature users to user groups with mature interests, the richness of personalized data for users with immature interests can be improved. In addition, it is also possible to avoid the need for a separate error judgment model for each user, avoiding the high cost of error judgment model training and storage, so that there is no need to build a separate model for each user, and by building an error judgment model for the user group, both training and storage The cost is controllable, and at the same time, it meets the user's personalized error correction needs.

示例性的，将第三成熟度的用户进行聚类得到多个用户群后，对于用户成熟度为第三成熟度的用户，与该用户的兴趣相似的用户群即为该用户所在的用户群。对于用户成熟度为第二成熟度的用户，可以基于每个用户群的聚类中心和该第二成熟度的用户的每个兴趣标签的数量，计算该用户与每个用户群的相似度，将相似度最大对应的用户群作为与该用户的兴趣相似的用户群。Exemplarily, after clustering the users of the third maturity level to obtain multiple user groups, for users whose user maturity is the third maturity level, the user group with similar interests to the user is the user group to which the user belongs . For users whose user maturity is the second maturity, the similarity between the user and each user group can be calculated based on the cluster center of each user group and the number of each interest tag of the second maturity user, The user group corresponding to the maximum similarity is taken as the user group whose interests are similar to the user.

在一种可能示例中，当该用户成熟度不超过成熟度阈值时，例如，对于用户成熟度为第一成熟度的新冷用户，该计算机设备可以将通用模型确定为与该用户匹配的目标模型。In a possible example, when the user maturity does not exceed the maturity threshold, for example, for a new cold user whose user maturity is the first maturity, the computer device may determine the general model as the target matching the user Model.

需要说明的是，通过上述用户兴趣标签累积方法和用户成熟度判定方法，可对实现对用户进行不同个性化等级的划分，从活跃时间、兴趣标签数等多个维度进行划分，提高了用户成熟度的划分的准确性。进一步的，针对不同成熟等级为用户匹配不同的错误判定模型，以及后续使用不同的纠错数据等，从而方便后续针对不同的成熟等级进行个性化查询纠错，提高查询纠错的准确性。It should be noted that, through the above method of accumulating user interest tags and judging method of user maturity, users can be divided into different levels of personalization, divided from multiple dimensions such as active time and number of interest tags, which improves user maturity. The accuracy of the degree division. Furthermore, different error judgment models are matched to users for different maturity levels, and different error correction data are subsequently used, so as to facilitate subsequent personalized query error correction for different maturity levels and improve the accuracy of query error correction.

步骤303、计算机设备基于与该用户匹配的目标模型，确定该查询信息的第一错误指示信息。Step 303, the computer device determines first error indication information of the query information based on the target model matched with the user.

该计算机设备将该查询信息输入该目标模型，通过该目标模型对该查询信息的错误判定过程，输出该查询信息的第一错误指示信息，该第一错误指示信息至少用于指示该查询信息是否错误。示例性的，该第一错误指示信息至少包括该查询信息是否正确的正确概率。The computer device inputs the query information into the target model, and outputs the first error indication information of the query information through the error judgment process of the target model on the query information, and the first error indication information is at least used to indicate whether the query information is mistake. Exemplarily, the first error indication information includes at least a correct probability of whether the query information is correct.

该第一错误指示信息用于指示该查询信息是否错误以及该查询信息的错误类型。在一种可能实施方式中，该第一错误指示信息包括该查询信息的正确概率以及错误类型概率，该错误类型概率用于指示该查询信息存在至少一种错误类型的错误的可能性；则步骤303的过程可以包括：该计算机设备通过该目标模型，提取该查询信息的至少一种描述信息，该至少一种描述信息包括该查询信息的拼音、笔画、位置编码、该查询信息所包括关键字或者该查询信息所包括词组中的至少一项；该计算机设备基于该查询信息的至少一种描述信息，对该查询信息进行错误判定，得到该查询信息的正确概率以及错误类型概率。The first error indication information is used to indicate whether the query information is wrong and an error type of the query information. In a possible implementation manner, the first error indication information includes a correct probability and an error type probability of the query information, and the error type probability is used to indicate the possibility that the query information has at least one type of error; then the step The process of 303 may include: the computer device extracts at least one type of description information of the query information through the target model, and the at least one type of description information includes pinyin, strokes, position codes, and keywords included in the query information. Or at least one of the phrases included in the query information; the computer device makes an error judgment on the query information based on at least one description information of the query information, and obtains the correct probability and error type probability of the query information.

示例一，对于目标模型包括用户群模型或通用模型中的任一模型的情况，示例性的，当用户成熟度为第二成熟度或第三成熟度时，可以通过用户群模型输出查询信息的正确概率。当用户成熟度为第一成熟度时，可以通过通用模型输出查询信息的正确概率。Example 1, for the case where the target model includes any model in the user group model or the general model, for example, when the user maturity level is the second maturity level or the third maturity level, the query information can be output through the user group model probability of being correct. When the user maturity is the first maturity, the correct probability of query information can be output through the general model.

示例二，当该目标模型包括用户群模型和通用模型时，查询信息的正确概率为两个模型分别输出的两个正确概率的平均值；示例性的，当用户成熟度为第二成熟度或者第三成熟度时，将用户群模型输出的第一正确概率以及通用模型输出的第二正确概率之间的平均值，确定为该查询信息的正确概率。Example 2, when the target model includes a user group model and a general model, the correct probability of the query information is the average of the two correct probabilities respectively output by the two models; exemplary, when the user maturity is the second maturity or At the third maturity level, the average value between the first correct probability output by the user group model and the second correct probability output by the general model is determined as the correct probability of the query information.

需要说明的是，基于目标模型输出错误类型概率的方式，与上述示例一和示例二中示出的获取正确概率的过程同理，可以如示例一同理，获取一个目标模型输出的错误类型概率；或者，也可以如示例二同理，获取用户群模型输出的第一错误类型概率以及通用模型输出的第二错误类型概率之间的平均值，作为查询信息的错误类型概率，此处不再一一赘述。It should be noted that the method of outputting the probability of the wrong type based on the target model is the same as the process of obtaining the correct probability shown in the above-mentioned examples 1 and 2. The same reasoning as the example can be used to obtain the probability of the wrong type output by a target model; Alternatively, as in Example 2, the average value between the first error type probability output by the user group model and the second error type probability output by the general model can be obtained as the error type probability of the query information, which will not be repeated here. A repeat.

该目标模型可以为BERT(Bidirectional Encoder Representations fromTransformers，基于转换器的双向编码表征)模型，如图3所示，查询信息可以表示为用户Query(查询)，将用户Query输入已训练好的BERT模型中，通过该BERT模型，获取该用户Query中词表示、用户Query中字表示、用户Query中字位置编码、用户Query中字笔画表示、用户Query中字拼音表示等多项描述信息；图3中CLS表示占位标记，CLS位置对应输出用于表示整句话的上下文特征，可以放在句子首位。BERT模型基于利用多项描述信息得到的各个位置的特征，输出对应表示各个位置的深度特征，例如可以是768维度的特征向量；然后基于输出的多个位置的深度特征对应的特征向量进行最大值池化操作(maxpool)，得到各个位置对应的深度最大值，将表示各个位置深度最大值的向量和以及CLS位输出的表示上下文特征向量进行拼接，并基于拼接得到的向量经过全连接层、并经过输出层的激励函数(例如softmax归一化指数函数)等，输出正确概率以及错误类型概率。The target model can be a BERT (Bidirectional Encoder Representations from Transformers, bidirectional encoding representation based on the converter) model, as shown in Figure 3, the query information can be expressed as a user Query (query), and the user Query is input into the trained BERT model Through the BERT model, multiple descriptive information such as the word representation in the user query, the character representation in the user query, the position code of the character in the user query, the stroke representation of the characters in the user query, and the pinyin representation of the characters in the user query are obtained; in Figure 3, CLS represents a placeholder mark , the CLS position corresponding output is used to represent the context features of the whole sentence, which can be placed at the first place of the sentence. The BERT model is based on the features of each position obtained by using multiple description information, and outputs the depth features corresponding to each position, for example, it can be a 768-dimensional feature vector; then based on the feature vectors corresponding to the output depth features of multiple positions, the maximum value The pooling operation (maxpool) obtains the maximum value of the depth corresponding to each position, splices the vector representing the maximum depth of each position and the representation context feature vector output by the CLS bit, and passes the fully connected layer based on the vector obtained by splicing, and After the activation function of the output layer (such as the softmax normalized exponential function), etc., the correct probability and the wrong type probability are output.

在一个可能实现方式中，该用户群模型的训练过程包括：计算机设备获取该用户群中至少两个用户的历史查询信息，并通过对历史查询信息中部分信息的噪音生成操作，生成包括噪音数据的负样本数据；计算机设备通过该初始模型对该负样本数据进行预测，得到预测样本数据；计算机设备基于该预测样本数据和该历史查询信息的正样本数据之间的相似度，对该初始模型的模型参数进行调整，直至该初始模型符合模型训练条件时停止调整，得到该用户群模型。其中，该噪音生成操作包括替换词片段操作、删除词片段操作或者增加词片段操作中的至少一项，该替换词片段操作包括替换为拼音相似词片段的操作、替换为字形相似词片段的操作或者替换为语义相似词片段中的至少一项。In a possible implementation, the training process of the user group model includes: a computer device obtains historical query information of at least two users in the user group, and generates noise data including The negative sample data of the negative sample data; the computer device predicts the negative sample data through the initial model to obtain the predicted sample data; the computer device calculates the initial model based on the similarity between the predicted sample data and the positive sample data Adjust the model parameters until the initial model meets the model training conditions and stop adjusting to obtain the user group model. Wherein, the noise generation operation includes at least one of replacing a word segment operation, deleting a word segment operation or adding a word segment operation, and the replacing word segment operation includes replacing with a phonetic similar word segment, replacing with a font similar word segment Or replace it with at least one item in the semantically similar word segment.

示例性的，对于任一用户群的用户群模型，可以获取该用户群中多个用户之前输入的历史查询信息集合(例如Query数据)，并在输入的历史查询信息集合中随机筛选一部分，对筛选出的这部分历史查询信息集合进行噪音生成，以自动构建包括噪音数据的负样本数据。示例性的，可以通过以下(1)-(3)的方式进行噪音生成：Exemplarily, for the user group model of any user group, the historical query information set (such as Query data) input by multiple users in the user group can be obtained, and a part of the input historical query information set is randomly selected, and the The filtered part of the historical query information set is used for noise generation to automatically construct negative sample data including noise data. Exemplarily, noise generation can be performed in the following ways (1)-(3):

(1)通过替换词片段操作生成替换型错误的噪音数据，将Query数据中词汇随机替换为其它词汇，包括以下(a)-(d)的四种方式：(1) Generate replacement-type error noise data by replacing word fragment operations, and randomly replace words in the query data with other words, including the following four methods (a)-(d):

(a)将某些位置的词汇随机替换为替他词汇；(a) Randomly replace words in certain positions with other words;

(b)将某些位置的词汇随机替换为字形相似的其它词汇；(b) Randomly replace words in certain positions with other words with similar shapes;

(c)将某些位置的词汇随机替换为字音相似的其它词汇；(c) Randomly replace words in certain positions with other words with similar sound;

(d)将某些位置的词汇随机替换为字内容相似的其它词汇。(d) Randomly replace words in certain positions with other words with similar word content.

(2)通过删除词片段操作生成缺失型错误的噪音数据，随机删除Query数据中某些位置的词汇。(2) Generate missing noise data by deleting word fragments, and randomly delete words in certain positions in the Query data.

(3)通过增加词片段操作生成冗余型错误的噪音数据，随机在Query数据中某些位置随机增加一些其他词汇。(3) Generate redundant error noise data by adding word fragment operations, and randomly add some other words at certain positions in the Query data.

其中，上述(1)-(3)的方式生成的随机噪音数据作为错误类负样本Query数据所包括的噪音数据，Query数据的原始形式可经过一定的人工校验作为正确类正样本Query数据，然后根据正样本和负样本对初始错误判定模型进行训练。在一种可能示例中，该计算机设备可以对上述负样本中各个Query数据进行正确形式预测，例如，预测负样本中各个Query数据所对应的正确数据，得到预测样本数据；基于预测样本数据和正样本数据之间的相似度，对初始错误判定模型进行训练，待该初始错误判定模型训练收敛后，得到用户群模型。在又一种可能示例中，计算机设备还可以预测负样本中各个Query数据的样本指示信息，该样本指示信息可以包括样本正确概率、样本错误类型概率，以及负样本的噪音生成过程中获取各个Query数据的真值标签，该真值标签可以包括负样本中各个Query数据是否错误以及错误类型的真值；基于该样本指示信息和真值标签之间的相似度，对初始错误判定模型进行训练，待该初始错误判定模型训练收敛后，得到用户群模型。Among them, the random noise data generated by the above (1)-(3) method is used as the noise data included in the wrong negative sample Query data, and the original form of the Query data can be used as the correct positive sample Query data after a certain amount of manual verification. The initial error decision model is then trained on the basis of positive and negative samples. In a possible example, the computer device can predict the correct form of each Query data in the negative sample, for example, predict the correct data corresponding to each Query data in the negative sample to obtain the predicted sample data; based on the predicted sample data and the positive sample The similarity between the data is used to train the initial error judgment model, and after the initial error judgment model training converges, the user group model is obtained. In yet another possible example, the computer device can also predict the sample indication information of each Query data in the negative sample, and the sample indication information can include the probability of the correct sample, the probability of the wrong type of the sample, and each Query obtained during the noise generation process of the negative sample. The true value label of the data, the true value label can include whether each Query data in the negative sample is wrong and the true value of the error type; based on the similarity between the sample indication information and the true value label, the initial error judgment model is trained, After the initial error judgment model training converges, the user group model is obtained.

示例性的，对于通用模型，可以获取该目标应用上所有用户之前输入的全局历史查询信息集合(例如Query数据)，并在输入的历史查询信息集合中随机筛选一部分，对筛选出的这部分历史查询信息集合进行噪音生成，以自动构建包括噪音数据的负样本数据。正样本数据可以为经过一定的人工校验的全局历史查询信息集合。其中，该利用全局历史查询信息集合的负样本数据的生成方式，与上述用户群模型对应的负样本数据的生成方式同理，此处不再一一赘述。其中，该计算机设备可以通过初始通用模型对负样本进行预测，基于预测的预测样本数据和正样本数据之间的相似度，对初始通用模型进行训练，待该初始通用模型训练收敛后，得到通用模型。其中，通用模型的训练方式与上述用户群模型的训练方式同理，此处不再一一赘述。Exemplarily, for a general model, the global historical query information set (such as Query data) input by all users on the target application can be obtained, and a part of the input historical query information set is randomly selected, and the filtered part of the historical query information Query information set for noise generation to automatically construct negative sample data including noisy data. Positive sample data can be a collection of global historical query information that has undergone certain manual verification. Wherein, the generation method of the negative sample data using the global historical query information set is the same as the generation method of the negative sample data corresponding to the user group model above, and will not be repeated here. Wherein, the computer device can predict the negative samples through the initial general model, based on the similarity between the predicted prediction sample data and the positive sample data, train the initial general model, and after the initial general model training converges, the general model is obtained . Wherein, the training method of the general model is the same as the training method of the above-mentioned user group model, and will not be repeated here.

本步骤中，通过在用户群中用户输入的历史Query数据上随机筛选一部分，并对这部分数据进行噪音生成，自动构建Query错误数据，并基于正样本和负样本训练得到用户群模型，方便后续基于用户群模型对不同兴趣的用户进行个性化错误判定，提高了错误判定的准确性，进一步提高了查询纠错的准确性。In this step, a part of the historical Query data input by users in the user group is randomly selected, noise generation is performed on this part of the data, Query error data is automatically constructed, and the user group model is obtained based on positive and negative sample training, which is convenient for follow-up Based on the user group model, personalized error judgments are made for users with different interests, which improves the accuracy of error judgments and further improves the accuracy of query error correction.

进一步的，当第一错误指示信息指示查询信息错误时，基于以下步骤304，对查询信息进行纠错，当第一错误指示信息指示查询信息不存在错误时，则可以直接基于查询信息进行后续的搜索以返回搜索结果，可以不再执行后续步骤304。Further, when the first error indication information indicates that the query information is wrong, based on the following step 304, error correction is performed on the query information; when the first error indication information indicates that there is no error in the query information, subsequent steps can be directly performed based on the query information Search to return search results, and the subsequent step 304 may not be performed.

步骤304、响应于该第一错误指示信息指示该查询信息错误，计算机设备基于该用户的成熟度，确定与该用户匹配的目标纠错数据。Step 304: In response to the first error indication information indicating that the query information is wrong, the computer device determines target error correction data that matches the user based on the maturity of the user.

该目标纠错数据至少包括该用户在该目标应用的兴趣数据；示例性的，该计算机设备可以将该用户的纠错数据或者用户群的纠错数据中的至少一项，作为该目标纠错数据。其中，该用户群的纠错数据包括该用户群的兴趣数据集，该用户的纠错数据包括该用户在目标应用的兴趣数据。The target error correction data includes at least the user's interest data in the target application; for example, the computer device may use at least one of the user's error correction data or the user group's error correction data as the target error correction data data. Wherein, the error correction data of the user group includes the interest data set of the user group, and the error correction data of the user includes the interest data of the user in the target application.

在一种可能实施方式中，该目标纠错数据还可以包括通用纠错数据，该通用纠错数据包括该目标应用的全局用户的兴趣数据集；基于目标纠错数据的几种可能情况，本步骤的实现方式可以包括以下四种。In a possible implementation manner, the target error correction data may also include general error correction data, and the general error correction data includes the interest data set of global users of the target application; based on several possible situations of the target error correction data, this The implementation of the steps may include the following four methods.

第一种方式、该计算机设备基于该用户的用户群，将该用户群的纠错数据确定为与该用户匹配的目标纠错数据。In the first manner, the computer device determines the error correction data of the user group as the target error correction data matching the user based on the user group of the user.

在一个可能示例中，当用户成熟度为第二成熟度时，可以将与该用户兴趣相似的用户群的纠错数据，作为该用户的目标纠错数据。该用户群的纠错数据可以包括该用户群在该目标应用的兴趣数据集。In a possible example, when the user maturity level is the second maturity level, the error correction data of user groups with similar interests to the user may be used as the user's target error correction data. The error correction data of the user group may include the interest data set of the user group in the target application.

其中，用户群的兴趣数据集可以包括该用户群在该目标应用的行为数据。在一个可能示例中，用户群在目标应用的行为数据包括但不限于：用户群有效观看过或所关注的创作者的视频标题、视频描述，用户群的视频评论、弹幕，用户群在社区中发布的贴子或回复的评论，用户群的用户历史查询信息等。需要说明的是，行为数据可以基于需要进行配置，本申请仅以上述示出的几种数据进行举例说明。当然，行为数据还可能包括用户收藏或购买商品的商品名称、收藏或转发的文章标题、观看直播时的直播间标题、下载视频的视频标题等，本申请实施例对该行为数据可能包括数据不做限定。Wherein, the interest data set of the user group may include the behavior data of the user group in the target application. In a possible example, the behavior data of the user group in the target application includes, but is not limited to: video titles and video descriptions of the creators that the user group has effectively watched or followed, video comments and bullet chats of the user group, and the user group’s content in the community. Comments on posts or replies published in the website, user history query information of user groups, etc. It should be noted that the behavior data can be configured based on needs, and this application only uses the above-mentioned several types of data as examples for illustration. Of course, the behavior data may also include the product name of the user’s favorite or purchased product, the title of the article favorited or forwarded, the title of the live room when watching the live broadcast, the video title of the downloaded video, etc. The behavior data in this embodiment of the application may include data that does not Do limited.

在一种可能实施方式中，该目标纠错数据还包括该兴趣数据的查询索引，该查询索引用于索引该兴趣数据中的相似片段。该计算机设备可以基于用户群的兴趣数据集生成该用户群的纠错数据，该过程可以包括：该计算机设备对用户群的兴趣数据集进行分词，基于分词所得到的多个片段之间的相似度，生成用于索引相似片段的查询索引；基于该多个片段和相似片段的查询索引，生成该用户群的纠错数据。在一个示例中，计算机设备对兴趣数据集进行分词，可以筛选出分词结果中出现频次超过目标频次阈值的片段，基于筛选出的片段之间的相似度，生成查询索引。其中，目标频次阈值可以基于需要进行配置，本申请实施例对此不做限制，例如目标频次阈值可以为5次、3次、10次等。In a possible implementation manner, the target error correction data further includes a query index of the interest data, and the query index is used to index similar segments in the interest data. The computer device may generate error correction data of the user group based on the interest data set of the user group, and the process may include: the computer device performs word segmentation on the interest data set of the user group, and based on the similarity between multiple segments obtained by word segmentation degree, generating a query index for indexing similar segments; and generating error correction data for the user group based on the plurality of segments and the query index of similar segments. In an example, the computer device performs word segmentation on the interest data set, and may filter out segments whose occurrence frequency exceeds a target frequency threshold in the word segmentation results, and generate a query index based on the similarity between the screened out segments. Wherein, the target frequency threshold may be configured based on needs, which is not limited in this embodiment of the present application. For example, the target frequency threshold may be 5 times, 3 times, 10 times, and so on.

在一种可能示例中，可以基于多个片段之间的拼音相似度构建纠错数据，则该查询索引包括拼音查询索引，该拼音查询索引用于索引该兴趣数据中拼音相似的词片段。在又一可能示例中，还可以基于多个片段之间的字形相似度构建纠错数据，则该查询索引包括笔画查询索引，该笔画查询索引用于索引该兴趣数据中笔画相似的词片段；在又一可能示例中，还可以基于多个片段之间的语义相似度构建纠错数据，则该查询索引包括语义查询索引，该语义查询索引用于索引该兴趣数据中关键字语义相似的词片段。相应的，以下(1)-(3)分别对应示出了基于拼音相似度、字形相似度或语义相似度构建纠错数据的方式。In a possible example, the error correction data can be constructed based on the similarity of pinyin between multiple segments, then the query index includes a pinyin query index, and the pinyin query index is used to index word segments with similar pinyin in the interest data. In yet another possible example, error correction data can also be constructed based on the font similarity between multiple segments, then the query index includes a stroke query index, and the stroke query index is used to index word segments with similar strokes in the interest data; In yet another possible example, error correction data can also be constructed based on the semantic similarity between multiple segments, then the query index includes a semantic query index, and the semantic query index is used to index words with similar semantics in the interest data fragment. Correspondingly, the following (1)-(3) respectively show the ways of constructing error correction data based on pinyin similarity, font similarity or semantic similarity.

(1)基于拼音相似度构建纠错数据：对兴趣数据集进行分词，基于出现频次超过目标频次阈值的多个词片段构建用户群词表，并对该用户词表中词片段进行拼音注音；并基于多个词片段的拼音，获取多个词片段中拼音相似词片段，将拼音作为该拼音相似词片段的拼音索引，得到包括多个词片段和拼音索引的用户群的纠错数据。例如，将词条的拼音作为Elasticsearch(用于分布式全文检索)相似检索的查询索引；则输入某个词条的拼音，即能获取与该词条的拼音相似的一个或多个相似词条，还可以获取到该词条与相似词条之间的拼音相似度。(1) Construct error correction data based on pinyin similarity: perform word segmentation on the interest data set, build a user group vocabulary based on multiple word segments whose frequency of occurrence exceeds the target frequency threshold, and perform pinyin notation on the word segments in the user vocabulary; And based on the pinyin of the multiple word segments, the pinyin-like word segments among the multiple word segments are obtained, and the pinyin is used as the pinyin index of the pinyin-like word segment, and the error correction data of the user group including the multiple word segments and the pinyin index is obtained. For example, use the pinyin of an entry as the query index for similar retrieval in Elasticsearch (used for distributed full-text search); then input the pinyin of a certain entry to obtain one or more similar entries that are similar to the pinyin of the entry , and the pinyin similarity between the entry and similar entries can also be obtained.

(2)基于字形相似度构建纠错数据：对已构建的用户群词表中词片段进行笔画拆解；并基于多个词片段的笔画序列，获取多个词片段中字形相似词片段，将笔画序列作为该字形相似词片段的字形索引，得到包括多个词片段和字形索引的用户群的纠错数据。例如，将词条的笔画序列作为Elasticsearch相似检索的查询索引；则输入某个词条的笔画序列，即能获取与该词条的字形相似的一个或多个相似词条，还可以获取到该词条与相似词条之间的字形相似度。(2) Constructing error correction data based on font similarity: dismantling the strokes of the word fragments in the constructed user group vocabulary; The stroke sequence is used as a grapheme index of the grapheme-similar word segment, and error correction data of a user group including multiple word segments and grapheme indexes are obtained. For example, use the stroke sequence of an entry as the query index for Elasticsearch similarity retrieval; then input the stroke sequence of a certain entry to obtain one or more similar entries that are similar in shape to the entry, and you can also obtain the Glyph similarity between an entry and similar entries.

(3)基于语义相似度构建纠错数据：对已构建的用户群词表中词片段按字粒度拆分，并基于拆分的关键字以及与该关键字语义相似的相似关键字，确定包括相似关键字的多个语义相似词片段；并为多个语义相似词片段构建语义查询索引。例如，将语义相似词片段作为语义查询索引；例如，语义相似可以是词片段所包括关键字的字内容相似，可实现输入某个查询词条，返回词条字内容相似的其它词条，例如输入“满城全带黄金甲”按字相似可查询词条字内容相似的“满城尽带黄金甲”；还可以获取到该词条与相似词条之间的字内容相似度。(3) Construct error correction data based on semantic similarity: split the word fragments in the constructed user group vocabulary according to word granularity, and based on the split keywords and similar keywords semantically similar to the keyword, determine including multiple semantically similar word fragments of similar keywords; and construct a semantic query index for multiple semantically similar word fragments. For example, use semantically similar word fragments as a semantic query index; for example, semantic similarity can be that the word content of the keywords included in the word fragments is similar, and it can be realized to enter a certain query entry and return other entries with similar word content, such as Enter "Manchengquandaijinjia" and search for "Manchengjinjia" with similar word content according to word similarity; you can also get the word content similarity between this entry and similar entries.

其中，上述(2)和(3)中构建用户群词表的方式与(1)中同理，在(2)和(3)中不再一一赘述。Wherein, the method of constructing the user group vocabulary in (2) and (3) above is the same as that in (1), and will not be repeated in (2) and (3).

第二种方式、该计算机设备基于该用户的用户群，将该用户的纠错数据以及该用户群的纠错数据确定为该目标纠错数据。In the second manner, the computer device determines the error correction data of the user and the error correction data of the user group as the target error correction data based on the user group of the user.

在一个可能示例中，当用户成熟度为第三成熟度时，可以将与该用户兴趣相似的用户群的纠错数据，作为该用户的目标纠错数据。该用户的纠错数据可以包括该用户在该目标应用的兴趣数据。In a possible example, when the user maturity level is the third maturity level, the error correction data of user groups with similar interests to the user may be used as the user's target error correction data. The user's error correction data may include the user's interest data in the target application.

其中，该用户的纠错数据可以包括查询索引，该计算机设备也可以基于对用户的兴趣数据进行分词的方式，得到包括多个片段和查询索引的纠错数据。该用户的兴趣数据也可以包括用户在目标应用的行为数据，该用户的行为数据可以包括第一种方式中示出的行为数据的举例说明。另外，该查询索引可以包括拼音索引、字形索引或语义索引中的至少一项。其中，该基于用户的兴趣数据构建用户的纠错数据的方式，以及具体分别基于拼音相似度、字形相似度或语义相似度构建用户的纠错数据的方式，与上述第一种方式中构建用户群的纠错数据的过程同理。此处对于行为数据、构建用户的纠错数据的具体实现方式等不再一一赘述。Wherein, the user's error correction data may include a query index, and the computer device may also obtain the error correction data including multiple segments and query indexes based on word segmentation of the user's interest data. The user's interest data may also include the user's behavior data in the target application, and the user's behavior data may include an illustration of the behavior data shown in the first manner. In addition, the query index may include at least one of a pinyin index, a font index, or a semantic index. Among them, the method of constructing the user's error correction data based on the user's interest data, and the specific method of constructing the user's error correction data based on the similarity of pinyin, font similarity or semantic similarity, are similar to the construction of user error correction data in the first method above. The process of group error correction data is the same. The behavior data and the specific implementation of constructing the user's error correction data will not be repeated here.

第三种方式、该计算机设备基于该用户的用户群，将该用户群的纠错数据以及通用纠错数据确定为该目标纠错数据。In a third manner, the computer device determines the error correction data and general error correction data of the user group as the target error correction data based on the user group of the user.

在一个可能示例中，当用户成熟度为第二成熟度时，可以将通用纠错数以及与该用户兴趣相似的用户群的纠错数据，作为该用户的目标纠错数据。In a possible example, when the user maturity is the second maturity, the general error correction number and the error correction data of user groups with similar interests to the user may be used as the user's target error correction data.

其中，该通用纠错数据至少包括该目标应用的全局用户的兴趣数据集，也即是，目标应用中所有用户的兴趣数据。当然，全局用户的纠错数据还可以包括查询索引，该计算机设备也可以基于对全局用户的兴趣数据集进行分词的方式，得到包括多个片段和查询索引的纠错数据。该全局用户的兴趣数据集也可以包括所有用户在目标应用的行为数据；该查询索引可以包括拼音索引、字形索引或语义索引中的至少一项。其中，该基于全局用户的兴趣数据集构建通用纠错数据的方式，以及具体分别基于拼音相似度、字形相似度或语义相似度构建通用纠错数据的方式，与上述第一种方式中构建用户群的纠错数据的过程同理。此处对于构建通用纠错数据的具体实现方式不再一一赘述。Wherein, the general error correction data includes at least the interest data set of the global users of the target application, that is, the interest data of all users in the target application. Of course, the error correction data of the global user may also include a query index, and the computer device may also obtain the error correction data including multiple segments and the query index based on word segmentation of the interest data set of the global user. The global user interest data set may also include behavior data of all users in the target application; the query index may include at least one of a pinyin index, a font index, or a semantic index. Among them, the method of constructing general error correction data based on the interest data set of global users, and the method of constructing general error correction data based on pinyin similarity, font similarity or semantic The process of group error correction data is the same. Here, the specific implementation manner of constructing the general error correction data will not be repeated one by one.

第四种方式、该计算机设备基于该用户的用户群，将该用户的纠错数据、该用户群的纠错数据以及通用纠错数据确定为该目标纠错数据。In a fourth manner, the computer device determines the error correction data of the user, the error correction data of the user group, and the general error correction data as the target error correction data based on the user group of the user.

在一个可能示例中，当用户成熟度为第三成熟度时，可以将通用纠错数、该用户的纠错数据以及与该用户兴趣相似的用户群的纠错数据，作为该用户的目标纠错数据。其中，对于用户的纠错数据、通用纠错数据以及用户群的纠错数据的构建方式，与上述三种方式同理，此处不再一一赘述。In a possible example, when the user maturity is the third maturity, the general error correction number, the user's error correction data, and the error correction data of user groups similar to the user's interests can be used as the user's target correction data. wrong data. Among them, the construction methods of the user's error correction data, the general error correction data and the user group's error correction data are the same as the above three methods, and will not be repeated here.

本步骤中，通过该查询信息错误时，基于用户成熟度来获取与用户匹配的目标纠错数据，该目标纠错数据包括用户的兴趣数据，方便后续基于用户的兴趣数据，以用户兴趣方向为纠正目标来纠正查询信息，满足不同兴趣用户的个性化纠错需求，进一步提高查询纠错的实际纠错的准确性，提高用户搜索体验。In this step, when the query information is wrong, the target error correction data that matches the user is obtained based on the user's maturity. The target error correction data includes the user's interest data, which is convenient for subsequent follow-up based on the user's interest data, with the user's interest direction as the Correct the target to correct the query information, meet the personalized error correction needs of users with different interests, further improve the accuracy of the actual error correction of the query error correction, and improve the user search experience.

步骤305、计算机设备基于与该用户匹配的目标纠错数据以及该查询信息，确定与该查询信息关联的至少一个候选信息。Step 305, the computer device determines at least one candidate information associated with the query information based on the target error correction data matched with the user and the query information.

该目标纠错数据至少包括该用户在该目标应用的兴趣数据。本申请中，该计算机设备可以基于查询信息，从目标纠错数据中查找与该查询信息相似的相似信息，将相似度超过目标相似度阈值的相似信息作为该至少一个候选信息。当然，该候选信息是从目标纠错数据中查找的信息，因此，该候选信息也包括该用户在该目标应用的兴趣数据。The target error correction data at least includes the user's interest data in the target application. In the present application, the computer device may search the target error correction data for similar information similar to the query information based on the query information, and use the similar information whose similarity exceeds the target similarity threshold as the at least one candidate information. Certainly, the candidate information is information searched from the target error correction data, therefore, the candidate information also includes the user's interest data in the target application.

在一种可能实施方式中，该查询信息包括查询字符串，可以基于字符串所包括的词片段，查找关联词片段，以基于关联词片段构建候选信息。相应的，本步骤可以包括：该计算机设备基于该查询字符串包括的至少一个词片段，从该目标纠错数据中获取该至少一个词片段对应的至少一个关联词片段；该计算机设备基于该至少一个关联词片段，替换该查询信息中对应的至少一个词片段，得到该至少一个候选信息。例如，对于字符串“ABXXX”中第一个词片段“AB”，目标纠错数据中与“AB”关联的关联词片段包括“CD”、“EF”，对“AB”进行替换后，得到“ABXXX”的候选信息“CDXXX”、“EFXXX”；其中，“AB”可以代表英文词汇、中文词组、一个或多个汉字、符号等多种形式。In a possible implementation manner, the query information includes a query string, and associated word segments may be searched based on the word segments included in the string, so as to construct candidate information based on the associated word segments. Correspondingly, this step may include: the computer device obtains at least one associated word segment corresponding to the at least one word segment from the target error correction data based on the at least one word segment included in the query string; the computer device obtains at least one associated word segment corresponding to the at least one word segment based on the at least one Associating word segments, replacing at least one corresponding word segment in the query information, to obtain the at least one candidate information. For example, for the first word segment "AB" in the string "ABXXX", the associated word segments associated with "AB" in the target error correction data include "CD" and "EF". After replacing "AB", we get " ABXXX" candidate information "CDXXX", "EFXXX"; where "AB" can represent English words, Chinese phrases, one or more Chinese characters, symbols and other forms.

在一个可能实现方式中，该目标纠错数据还包括该兴趣数据的查询索引，则对于查询字符串包括的每一个词片段，该计算机设备基于查询索引，从目标纠错数据中查询与该词片段相似的至少一个关联词片段。该关联词片段可以包括拼音相似词片段、字形相似词片段或者语义相似词片段中的至少一项。基于上述步骤304中示出的拼音查询索引、字形查询索引以及语音查询索引，本步骤中，对于每个词片段，该计算机设备可以通过以下方式一至方式三中的至少一种，获取该词片段的关联词片段。In a possible implementation, the target error correction data also includes a query index of the interest data, and for each word segment included in the query string, the computer device searches the target error correction data for the word segment based on the query index. At least one associated word segment that the segments are similar to. The associated word segment may include at least one of a pinyin-like word segment, a shape-like word segment, or a semantically similar word segment. Based on the pinyin query index, grapheme query index and phonetic query index shown in the above step 304, in this step, for each word segment, the computer device can obtain the word segment by at least one of the following methods 1 to 3 associated word fragments.

方式一、该查询索引包括拼音查询索引，计算机设备基于该词片段的拼音和拼音查询索引，从该目标纠错数据中查询该词片段的至少一个拼音相似词片段。Mode 1. The query index includes a pinyin query index, and the computer device queries at least one pinyin-like word segment of the word segment from the target error correction data based on the pinyin of the word segment and the pinyin query index.

该计算机设备可以提取该词片段的拼音，并从目标纠错数据中查询以该拼音为索引的至少一个拼音相似词片段。例如，对于用户输入信息中的某个词片段“优惠卷”，可以输入对应拼音“you hui juan”，输出基于拼音索引查询到的拼音相似词片段“优惠券”。The computer device can extract the pinyin of the word segment, and query at least one pinyin-like word segment indexed by the pinyin from the target error correction data. For example, for a word segment "coupon" in the user input information, the corresponding pinyin "you hui juan" can be input, and the pinyin similar word segment "coupon" found based on the pinyin index can be output.

方式二、该查询索引包括笔画查询索引，计算机设备基于该词片段的笔画序列和笔画查询索引，从该目标纠错数据中查询该词片段的至少一个字形相似词片段。Mode 2. The query index includes a stroke query index, and the computer device searches the target error correction data for at least one similar word segment of the word segment based on the stroke sequence of the word segment and the stroke query index.

计算机设备可以提取词片段的笔画序列，并从目标纠错数据中查询以该笔画序列为索引的至少一个字形相似词片段。例如，对于用户输入信息中的某个词片段“钩鱼”，可以输入对应的笔画序列，输出基于笔画序列索引查询到的字形相似词片段“钓鱼”。The computer device can extract the stroke sequence of the word segment, and query at least one word segment with similar font shape indexed by the stroke sequence from the target error correction data. For example, for a certain word segment "hook fish" in the information input by the user, the corresponding stroke sequence can be input, and the word segment "fishing" with a similar shape can be output based on the stroke sequence index query.

方式三、该查询索引包括语义查询索引，计算机设备基于该词片段和语义查询索引，从该目标纠错数据中查询该词片段的至少一个语义相似词片段。Method 3: The query index includes a semantic query index, and the computer device searches at least one semantically similar word segment of the word segment from the target error correction data based on the word segment and the semantic query index.

计算机设备可以根据词片段和语义查询索引，从目标纠错数据中查询以该词片段为索引的至少一个语义相似词片段。例如输入“满城全带黄金甲”按字相似可查询词条字内容相似的“满城尽带黄金甲”；例如输入“忽然”，输出基于“忽然”查询到的语义相似词片段“突然”、“猛然”。The computer device can query the target error correction data for at least one semantically similar word segment indexed by the word segment according to the word segment and the semantic query index. For example, if you input "Full City with Golden Armor" according to the similarity of words, you can search for "Full City with Golden Armor" with similar entry words; for example, enter "Suddenly", and output the semantically similar word segment "Suddenly" based on the query of "Suddenly". ","suddenly".

需要说明的是，对应于步骤304中示出的目标纠错数据的四种方式，相应的，本步骤可以基于目标纠错数据所包括的一种或多种纠错数据，从目标纠错数据所包括的每种纠错数据中查询该至少一个关联词片段。以目标纠错数据包括用户群的纠错数据和用户的纠错数据为例，对于步骤305中方式一，计算机设备获取至少一个拼音相似词片段的过程可以包括：计算机设备可以分别基于该词片段和用户群的纠错数据的拼音索引，从用户群的纠错数据中查询该词片段的至少一个第一拼音相似词片段；以及，基于该词片段和用户的纠错数据的拼音索引，从用户的纠错数据中查询该词片段的至少一个第二拼音相似词片段。当然，当目标纠错数据包括一种纠错数据或者三种纠错数据或者两种纠错数据中其他情况时，以及对于字形相似词片段或者语义相似词片段等情况，关联词片段的获取方式与上述举例的基于两种纠错数据获取关联词片段同理，此处不再一一列举赘述。It should be noted that, corresponding to the four ways of the target error correction data shown in step 304, correspondingly, this step can be based on one or more types of error correction data included in the target error correction data, from the target error correction data The at least one associated word segment is queried in each type of error correction data included. Taking the target error correction data as an example including the error correction data of the user group and the user error correction data, for the first method in step 305, the process for the computer device to obtain at least one pinyin-like word segment may include: the computer device may respectively base on the word segment and the pinyin index of the error correction data of the user group, query at least one first pinyin similar word segment of the word segment from the error correction data of the user group; and, based on the pinyin index of the error correction data of the word segment and the user, from At least one second pinyin-like word segment of the word segment is queried in the user's error correction data. Of course, when the target error correction data includes one kind of error correction data or three kinds of error correction data or other situations in two kinds of error correction data, and for situations such as font-like word segments or semantically similar word segments, the acquisition method of associated word segments is the same as The same reasoning applies to the acquisition of associated word segments based on the two types of error correction data in the above examples, and will not be repeated here.

需要说明的是，对于查询信息中每个词片段，可以此采用该词片段的相似词片段(例如上述方式一至方式三中的三种相似词片段)对查询信息中该词片段进行替换，构建得到一个候选信息。例如，可以对查询信息中一个词片段进行替换，则替换得到的一个候选信息中包括一个相似词片段。又例如，也可以同时对查询信息中两个或更多个词片段进行替换，则替换得到的一个候选信息中包括两个或更多个相似词片段。本申请实施例中，可以采用一次替换一个词片段或多个词片段构建候选信息，上述仅以一次替换一个词片段为例进行说明，本申请对构建候选信息的具体方式不做限定。It should be noted that, for each word segment in the query information, similar word segments of the word segment (such as the three similar word segments in the above-mentioned mode 1 to mode 3) can be used to replace the word segment in the query information to construct Get a candidate information. For example, a word segment in the query information may be replaced, and a candidate information obtained from the replacement includes a similar word segment. For another example, it is also possible to replace two or more word segments in the query information at the same time, then one piece of candidate information obtained by replacement includes two or more similar word segments. In the embodiment of the present application, one word fragment or multiple word fragments can be replaced at a time to construct candidate information. The above is only an example of replacing one word fragment at a time. This application does not limit the specific method of constructing candidate information.

步骤306、计算机设备基于该目标模型，确定该至少一个候选信息的第二错误指示信息。Step 306, the computer device determines second error indication information of the at least one candidate information based on the target model.

该计算机设备将每个候选信息输入目标模型，得到每个候选信息的第二错误指示信息，该第二错误指示信息包括候选信息的正确概率。其中，与步骤303中获取查询信息的正确概率的过程同理，相应的，本步骤中，可以基于步骤303中示例一的情况，获取用户群模型输出的候选信息的正确概率；或者，获取通用模型输出的候选信息的正确概率。或者，也可以基于步骤303中示例二的情况，获取用户群模型输出的第三正确概率以及获取通用模型输出的第四正确概率之间的平均值，作为候选信息的正确概率。其中，获取候选信息的正确概率的具体实现方式，与上述步骤303中示例一或示例二中过程同理，此处不再一一赘述。The computer device inputs each candidate information into the target model to obtain second error indication information for each candidate information, the second error indication information including the correct probability of the candidate information. Wherein, it is the same as the process of obtaining the correct probability of query information in step 303. Correspondingly, in this step, based on the situation of Example 1 in step 303, the correct probability of the candidate information output by the user group model can be obtained; or, the general The correct probability of the candidate information output by the model. Alternatively, based on the case of Example 2 in step 303, the average value between the third correct probability output by the user group model and the fourth correct probability output by the general model may be obtained as the correct probability of the candidate information. Wherein, the specific implementation manner of obtaining the correct probability of the candidate information is the same as the process in the first example or the second example in the above step 303, and will not be repeated here.

在步骤305-306中，通过基于相似度为查询信息构建多个候选信息，提改了候选命中正确信息的可能行。而且，还可以基于拼音查询索引、字形查询索引或者词条查询索引等多种方式构建候选信息，丰富了候选信息的来源，后续对用户的查询信息进行纠正更加精确，进而降低用户错误输入查询信息时的删除、重输代价，提高了后续搜索的实际搜索准确性和效率。In steps 305-306, by constructing a plurality of candidate information for the query information based on the similarity, the possibility of the candidate matching the correct information is improved. Moreover, candidate information can also be constructed based on multiple methods such as pinyin query index, grapheme query index, or entry query index, which enriches the source of candidate information, and the subsequent correction of user query information is more accurate, thereby reducing user error input query information The time-consuming deletion and re-entry costs improve the actual search accuracy and efficiency of subsequent searches.

步骤307、计算机设备基于该第二错误指示信息，从该至少一个候选信息中确定该查询信息的纠正信息。Step 307, the computer device determines correction information of the query information from the at least one candidate information based on the second error indication information.

在一种可能实施方式中，该计算机设备可以先基于该错误类型概率对候选信息的正确概率进行修正，再基于修正后的正确概率确定纠正信息。相应的，本步骤可以通过以下步骤3071-3072(图中未示出)实现。In a possible implementation manner, the computer device may first correct the correct probability of the candidate information based on the error type probability, and then determine the correction information based on the corrected correct probability. Correspondingly, this step can be realized through the following steps 3071-3072 (not shown in the figure).

步骤3071、计算机设备基于该查询信息的错误类型概率，对该至少一个候选信息的正确概率进行修正，得到该至少一个候选信息的正确得分。Step 3071, the computer device corrects the correct probability of the at least one candidate information based on the wrong type probability of the query information, and obtains the correct score of the at least one candidate information.

在一种可能实施方式中，该候选信息包括替换该查询信息中词片段的关联词片段；该计算机设备可以基于该关联词片段与该查询信息中被替换词片段之间的相似类型，从该错误类型概率所包括的至少一种错误类型的概率中，获取与该相似类型匹配的匹配错误类型的类型概率；该计算机设备基于该候选信息的正确概率和该类型概率，确定该候选信息的正确得分。示例性的，该相似类型包括：字音相似、字形相似、语义相似中的至少一项；其中，该字音相似的匹配错误类型包括字音相似错误，该字形相似的匹配错误类型包括字形相似错误，该语义相似的匹配错误类型包括字内容相似错误、缺失型错误或者冗余型错误中的至少一项。其中，如果关联词片段字数比原始替换位置的被替换词片段的字数多为缺失型错误，如果关联词片段字数比原始替换位置的被替换词片段的字数少为冗余型错误。其中，关联词片段与该查询信息中被替换词片段之间的相似类型，说明了该关联词片段的来源，例如，如果关联词片段是基于拼音查询索引得到的拼音相似词片段，则对应关联词片段和被替换词片段之间相似类型为字音相似；如果关联词片段是字形相似词片段，对应相似类型为字形相似；如果关联词片段是语义相似词片段，对象相似类型为语义相似。In a possible implementation manner, the candidate information includes an associated word segment that replaces the word segment in the query information; the computer device may determine from the error type Among the probabilities of at least one error type included in the probability, the type probability of the matching error type matching the similar type is obtained; the computer device determines the correct score of the candidate information based on the correct probability of the candidate information and the type probability. Exemplarily, the similarity type includes: at least one of similarity in pronunciation, similarity in shape, and similarity in semantics; wherein, the matching error type of similarity in pronunciation includes similarity in pronunciation, and the matching error type in similarity in shape includes similarity in shape. The semantically similar matching error types include at least one of word content similar errors, missing errors, or redundant errors. Wherein, if the number of words in the associated word segment is more than the number of words in the replaced word segment in the original replacement position, it is a missing error, and if the number of words in the associated word segment is less than the number of words in the replaced word segment in the original replacement position, it is a redundant error. Wherein, the similar type between the associated word segment and the replaced word segment in the query information illustrates the source of the associated word segment, for example, if the associated word segment is a pinyin similar word segment obtained based on the pinyin query index, then the corresponding associated word segment and the replaced word segment The type of similarity between the replacement word segments is phonetic similarity; if the related word segment is a similar word segment in shape, the corresponding similar type is similar in shape; if the related word segment is a semantically similar word segment, the object similarity type is semantic similarity.

在一个可能示例中，该计算机设备可以将该类型概率与该正确概率之间的乘积值，确定为该候选信息的正确得分。例如，基于目标模型得到候选信息的正确概率为P3，查询信息的错误类型概率为P2，并从P2中提取出匹配错误类型的类型概率，则正确得分P4可以表示为：P4＝P3×P2[候选信息对应的匹配错误类型]，其中，P2[候选信息对应的匹配错误类型]表示P2中匹配错误类型的类型概率。In a possible example, the computer device may determine the product value of the type probability and the correct probability as the correct score of the candidate information. For example, based on the target model, the correct probability of the candidate information is P3, the wrong type probability of the query information is P2, and the type probability of the matching wrong type is extracted from P2, then the correct score P4 can be expressed as: P4=P3×P2[ The matching error type corresponding to the candidate information], wherein, P2[the matching error type corresponding to the candidate information] represents the type probability of the matching error type in P2.

候选信息可以包括的关联词片段与被替换词片段之间可以存在一项或多项相似类型；当存在多项相似类型时，被替换词片段的拼音相似词片段、字形相似词片段或者语义相似词片段中的至少两项存在重合。例如，被替换词片段的拼音相似词片段和字形相似词片段均为关联词片段M；例如，查询信息“优惠卷”的拼音相似词片段、字形相似词片段均包括“优惠券”。There can be one or more similar types between the associated word segment that the candidate information can include and the replaced word segment; At least two of the fragments overlap. For example, the pinyin-like word segment and the font-like word segment of the replaced word segment are related word segments M; for example, the pinyin-like word segment and the font-like word segment of the query information "coupon" both include "coupon".

在一个可能示例中，候选信息包括一个关联词片段，被替换词片段的拼音相似词片段、字形相似词片段、语义相似词片段之间互不重合，候选信息对应一个匹配错误类型，则直接获取该查询信息的匹配错误类型的类型概率。关联词片段为被替换词片段的拼音相似词片段、字形相似词片段或者语义相似词片段的其中一项。例如，对于查询信息“ABXXX”及其对应的候选信息“CDXXX”，“CD”的来源只包括基于拼音查询索引得到的拼音相似词片段，也即是“CD”为第一个词片段“AB”的拼音相似词片段，则获取查询信息“ABXXX”的错误类型概率中字音相似错误的概率。In a possible example, the candidate information includes a related word segment, and the replaced word segments do not overlap with each other, and the candidate information corresponds to a matching error type, then directly obtain the Type probability of matching wrong type for query information. The associated word segment is one of the pinyin-like word segment, the shape-like word segment or the semantically similar word segment of the replaced word segment. For example, for the query information "ABXXX" and its corresponding candidate information "CDXXX", the source of "CD" only includes pinyin similar word segments obtained based on the pinyin query index, that is, "CD" is the first word segment "AB ", then obtain the error probability of similar phonetic errors in the error type probability of the query information "ABXXX".

在另一个可能示例中，一个候选信息中包括一个关联词片段，该关联词片段与被替换词片段之间存在至少两项相似类型，也即是，被替换词片段的拼音相似词片段、字形相似词片段或者语义相似词片段中的至少两项重合。则计算机设备基于该关联片段与被替换片段之间的至少两项相似类型，确定该至少两项相似类型对应的至少两种相似度，并基于该至少两种相似度中满足阈值条件的目标相似类型，获取与该目标相似类型匹配的匹配错误类型的类型概率。其中，满足阈值条件的可以包括但不限于：相似度高于一定阈值、相似度最大、相似度高于最小阈值且不超过最大阈值等。例如，相似度高于一定阈值60％。In another possible example, a piece of candidate information includes an associated word segment, and there are at least two similarity types between the associated word segment and the replaced word segment, that is, the pinyin-like word segment of the replaced word segment, the shape-like word At least two of the fragments or semantically similar word fragments overlap. Then the computer device determines at least two similarities corresponding to the at least two similarity types based on at least two similarity types between the associated segment and the replaced segment, and based on the target similarity of the at least two similarity levels that meet the threshold condition type, to get the type probability of the matching error type matching with this target similar type. Wherein, those that meet the threshold condition may include but are not limited to: the similarity is higher than a certain threshold, the similarity is the largest, the similarity is higher than the minimum threshold and does not exceed the maximum threshold, and the like. For example, the similarity is above a certain threshold of 60%.

例如，对于查询信息“ABXXX”及其对应的候选信息“JKXXX”，“JK”为第一个词片段“AB”的拼音相似词片段以及字形相似片段，其中，“JK”与“AB”的拼音相似度为91％、字形相似度为80％，则基于拼音相似度91％，获取与字音相似匹配的字音相似错误的类型概率。For example, for the query information "ABXXX" and its corresponding candidate information "JKXXX", "JK" is the pinyin-like word segment and the font-like segment of the first word segment "AB", wherein the "JK" and "AB" If the pinyin similarity is 91% and the font similarity is 80%, then based on the pinyin similarity of 91%, the type probability of the phonetic similarity error matching with the phonetic similarity is obtained.

上述两种示例仅以候选信息包括一个关联词片段为例说明。当然，一个候选信息中还可以包括至少两个关联词片段，该至少两个关联词片段与被替换词片段之间存在至少一项或两项相似类型。则可以结合该至少两个关联词片段分别与被替换词片段之间的至少一项或两项相似类型，确定类型概率。在一个可能示例中，可以基于公式一，计算类型概率P0：The above two examples are only described by taking the candidate information including a related word segment as an example. Of course, one piece of candidate information may also include at least two associated word segments, and there are at least one or two similar types between the at least two associated word segments and the replaced word segment. Then the type probability can be determined in combination with at least one or two similar types between the at least two associated word segments and the replaced word segment respectively. In a possible example, the type probability P0 can be calculated based on Formula 1:

公式一：P0＝∑x_ip_i；Formula 1: P0＝∑x _i p _i ;

其中，P0为类型概率；i为候选信息的该至少两个关联词片段中第i个关联词片段；x_i为第i个关联词片段与被替换词片段之间的相似类型对应的相似度(如拼音相似度、字形相似度或字内容相似度)；p_i为与第i个关联词片段对应相似类型匹配的匹配错误类型的类型概率。Wherein, P0 is the type probability; i is the ith associated word segment in this at least two associated word segments of candidate information; x _i is the corresponding similarity (such as pinyin) of the similar type between the ith associated word segment and the replaced word segment similarity, font similarity or word content similarity); p _i is the type probability of the matching error type matching with the i-th associated word segment corresponding to the similar type.

在另一种可能实施方式中，还可以通过修正因子，计算得到正确得分，每个用户的错误习惯各不相同；例如某个用户经常存在某些固定类型的错误；因此可以基于每个用户的错误习惯，为每个用户配置个性化的目标关联关系，该目标关联关系包括错误类型和修正因子之间的对应关系。则本步骤还可以包括：该计算机设备可以基于该匹配错误类型，从与该用户匹配的目标关联关系中，获取该匹配错误类型对应的修正因子，将该类型概率、修正因子以及正确概率之间的乘积值，确定为该候选信息的正确得分。In another possible implementation, the correction factor can also be used to calculate the correct score, and the error habits of each user are different; for example, a certain user often has certain fixed types of errors; therefore, based on each user's For error habits, a personalized target relationship is configured for each user, and the target relationship includes the correspondence between error types and correction factors. Then this step may also include: the computer device may obtain a correction factor corresponding to the matching error type from the target association relationship matched with the user based on the matching error type, and calculate the relationship between the type probability, the correction factor, and the correct probability The product value of is determined as the correct score of the candidate information.

步骤3072、计算机设备基于该至少一个候选信息的正确得分和该第一错误指示信息，从该至少一个候选信息中筛选出正确得分符合目标条件的纠正信息。Step 3072: Based on the correct score of the at least one candidate information and the first error indication information, the computer device screens out corrected information whose correct score meets the target condition from the at least one candidate information.

该目标条件可以包括但不限于：正确得分大于查询信息的正确概率、正确得分超过目标得分阈值、正确得分最大、正确得分大于查询信息的错误类型概率对应的正确概率(也即是(1—P2))等。计算机设备可以将最终正确得分满足目标条件的候选信息，作为纠正信息，后续基于纠正信息进行搜索。The target condition may include but not limited to: the correct score is greater than the correct probability of the query information, the correct score exceeds the target score threshold, the correct score is the largest, and the correct score is greater than the correct probability corresponding to the error type probability of the query information (that is, (1-P2 ))wait. The computer device can use the candidate information whose final correct score satisfies the target condition as the corrected information, and subsequently search based on the corrected information.

为了更清晰介绍本申请实施例的执行流程，下面以图4为例进行说明，图4是本申请实施例提供的一种查询纠错的流程图，如图4所示，以目标应用为视频应用为例，用户在视频平台进行搜索时，用户输入用户Query，计算机设备对用户输入的用户Query，进行个性化错误判定，例如，利用用户群模型确定用户Query的正确概率、错误类型概率等。当用户Query存在错误时，对用户Query进行个性化纠错并返回纠正后的用户Query进行搜索。其中，可以利用个性化数据构建用户群模型，例如通过用户的历史查询信息训练得到用户群模型。在进行个性化纠错时，对于不成熟用户和成熟用户，可以利用用户群的个性化纠错数据、用户的个性化纠错数据中的至少一项进行纠正，当然，也可以结合通用纠错数据。另外，对于用户成熟度为新冷的用户，可以利用通用纠错数据进行纠正。In order to more clearly introduce the execution process of the embodiment of the present application, the following uses Figure 4 as an example to illustrate. Figure 4 is a flow chart of query error correction provided by the embodiment of the present application. For example, when a user searches on a video platform, the user inputs a user Query, and the computer device performs personalized error judgment on the user Query input by the user. For example, the user group model is used to determine the correct probability and error type probability of the user Query. When there is an error in the user Query, perform personalized error correction on the user Query and return the corrected user Query for searching. Among them, the personalized data can be used to construct the user group model, for example, the user group model can be obtained through the training of the user's historical query information. When performing personalized error correction, for immature users and mature users, at least one of the personalized error correction data of the user group and the user's personalized error correction data can be used for correction. Of course, it can also be combined with general error correction data. In addition, for users whose user maturity is new, general error correction data can be used for correction.

需要说明的是，本申请通过对不同兴趣用户构建个性化查询纠错模型以及纠错数据、纠错策略，提升视频平台、搜索引擎、社交应用、直播应用等目标应用的纠错功能对用户个性化查询纠错的满足能力，更加智能的与精准地对用户输入的查询信息进行纠错，进一步降低用户错误输入查询信息检索视频、文章、商品等信息时的重新输入代价，提升目标应用搜索功能的可用性。尤其是对于视频平台，用户体量较大，不同用户或者不同用户群体的兴趣分布存在差异；对于关注的明星、喜欢的视频类型以及讨论的话题等方面，不同用户存在较大的不同，用户在视频平台的搜索行为，是用户主观发现视频相关信息的手段，必然受用户的主观兴趣影响，而相关技术中，有些对用户视频搜索Query的纠错方案，未能充分利用针对视频平台用户个性化兴趣，导致不能个性化处理不同兴趣用户的纠错问题；而本申请提出的用户个性化视频搜索query纠错实现方法，基于视频平台用户兴趣，为不同兴趣用户构建个性化错判定模型以及纠错数据，使得视频平台的查询纠错功能满足千人千面的个性化需求，更加准确地为用户输入的视频搜索Query进行错误判定和纠正，极大程度避免用户错误输入视频搜索Query时的删除、重输，提升用户的个性化搜索效率以及提高了后续搜索的实际准确性。It should be noted that this application builds personalized query error correction models, error correction data, and error correction strategies for users with different interests to improve the error correction functions of target applications such as video platforms, search engines, social applications, and live broadcast applications. The ability to satisfy query error correction, more intelligently and accurately correct the query information entered by the user, further reduce the re-input cost when the user incorrectly enters the query information to retrieve information such as videos, articles, and commodities, and improves the search function of the target application availability. Especially for video platforms, the number of users is relatively large, and the interest distribution of different users or different user groups is different; there are great differences among different users in terms of the stars they follow, the types of videos they like, and the topics discussed. The search behavior of the video platform is a means for users to subjectively discover video-related information, which is bound to be affected by the user's subjective interests. However, in related technologies, some error correction schemes for users' video search queries fail to make full use of the personalization of video platform users. interest, which leads to the inability to personalize the error correction problem of users with different interests; and the user personalized video search query error correction implementation method proposed in this application, based on the user interests of the video platform, builds a personalized error judgment model and error correction for users with different interests The data enables the query error correction function of the video platform to meet the individual needs of thousands of people, more accurately judge and correct errors for the video search query input by the user, and greatly avoid deletion, Re-entry improves the user's personalized search efficiency and improves the actual accuracy of subsequent searches.

通过上述步骤307实现了不同个性化等级用户的查询信息的个性化错误判定与个性化纠正，提升视频搜索的用户体验。并且通过基于错误类型概率，对关联词和被替换词进行相似度类型匹配，从而后可以基于匹配的错误类型的类型概率，对候选信息的正确概率进行定向个性化修正，提高了得到的纠正信息的准确性，进一步提高查询的准确性。Through the above step 307, personalized error determination and personalized correction of query information of users with different personalization levels are realized, and user experience of video search is improved. And based on the error type probability, the similarity type matching is performed on the associated word and the replaced word, so that based on the type probability of the matching error type, the correct probability of the candidate information can be directional and personalized, and the accuracy of the corrected information obtained is improved. Accuracy, to further improve the accuracy of the query.

图5为本申请实施例提供的一种查询纠错装置的结构示意图。如图5所示，该装置包括：FIG. 5 is a schematic structural diagram of a query error correction device provided by an embodiment of the present application. As shown in Figure 5, the device includes:

第一确定模块501，用于响应于接收到用户输入搜索框的查询信息，基于与该用户匹配的目标模型，确定该查询信息的第一错误指示信息，该目标模型用于基于该用户在目标应用的兴趣数据判定输入该目标模型的信息是否错误；The first determination module 501 is configured to, in response to receiving the query information entered into the search box by the user, determine the first error indication information of the query information based on the target model matching the user, and the target model is used to The interest data of the application determines whether the information input into the target model is wrong;

候选确定模块502，用于响应于该第一错误指示信息指示该查询信息错误，基于与该用户匹配的目标纠错数据以及该查询信息，确定与该查询信息关联的至少一个候选信息，该目标纠错数据至少包括该用户在该目标应用的兴趣数据；A candidate determination module 502, configured to respond to the first error indication information indicating that the query information is wrong, and determine at least one candidate information associated with the query information based on the target error correction data matching the user and the query information, the target The error correction data includes at least the interest data of the user in the target application;

第二确定模块503，用于基于该目标模型，确定该至少一个候选信息的第二错误指示信息；A second determining module 503, configured to determine second error indication information of the at least one candidate information based on the target model;

纠正模块504，用于基于该第二错误指示信息，从该至少一个候选信息中确定该查询信息的纠正信息。A correction module 504, configured to determine correction information of the query information from the at least one candidate information based on the second error indication information.

在一种可能实施方式中，该候选确定模块502，包括：In a possible implementation manner, the candidate determination module 502 includes:

关联词获取单元，用于基于该查询字符串包括的至少一个词片段，从该目标纠错数据中获取该至少一个词片段对应的至少一个关联词片段；An associated word acquisition unit, configured to acquire at least one associated word segment corresponding to the at least one word segment from the target error correction data based on at least one word segment included in the query string;

候选获取单元，用于基于该至少一个关联词片段，替换该查询信息中对应的至少一个词片段，得到该至少一个候选信息。The candidate acquisition unit is configured to replace at least one corresponding word segment in the query information based on the at least one associated word segment, to obtain the at least one candidate information.

在一种可能实施方式中，该目标纠错数据还包括该兴趣数据的查询索引，该查询索引用于索引该兴趣数据中的相似片段；In a possible implementation manner, the target error correction data further includes a query index of the interest data, and the query index is used to index similar segments in the interest data;

该关联词获取单元，用于以下至少一项：The associated word acquisition unit is used for at least one of the following:

对于每个词片段，基于该词片段的拼音和拼音查询索引，从该目标纠错数据中查询该词片段的至少一个拼音相似词片段，该查询索引包括拼音查询索引，该拼音查询索引用于索引该兴趣数据中拼音相似的词片段；For each word segment, based on the pinyin and the pinyin query index of the word segment, query at least one pinyin similar word segment of the word segment from the target error correction data, the query index includes a pinyin query index, and the pinyin query index is used for Index the word fragments with similar pinyin in the interest data;

基于该词片段的笔画序列和笔画查询索引，从该目标纠错数据中查询该词片段的至少一个字形相似词片段，该查询索引包括笔画查询索引，该笔画查询索引用于索引该兴趣数据中笔画相似的词片段；Based on the stroke sequence and stroke query index of the word segment, query at least one similar word segment of the word segment from the target error correction data, the query index includes a stroke query index, and the stroke query index is used to index the interest data Word fragments with similar strokes;

基于该词片段和语义查询索引，从该目标纠错数据中查询该词片段的至少一个语义相似词片段，该查询索引包括语义查询索引，该语义查询索引用于索引该兴趣数据中关键字语义相似的词片段。Based on the word segment and the semantic query index, at least one semantically similar word segment of the word segment is queried from the target error correction data, the query index includes a semantic query index, and the semantic query index is used to index the keyword semantics in the interest data Similar word fragments.

在一种可能实施方式中，该第一错误指示信息包括该查询信息的正确概率以及错误类型概率，该错误类型概率用于指示该查询信息存在至少一种错误类型的错误的可能性；该第二错误指示信息包括候选信息的正确概率；In a possible implementation manner, the first error indication information includes a correct probability and an error type probability of the query information, and the error type probability is used to indicate the possibility of at least one type of error in the query information; 2. The error indicates the correct probability that the information includes the candidate information;

该纠正模块504，包括：The correction module 504 includes:

修正单元，用于基于该查询信息的错误类型概率，对该至少一个候选信息的正确概率进行修正，得到该至少一个候选信息的正确得分；A correction unit, configured to correct the correct probability of the at least one candidate information based on the wrong type probability of the query information, to obtain the correct score of the at least one candidate information;

筛选单元，用于基于该至少一个候选信息的正确得分和该第一错误指示信息，从该至少一个候选信息中筛选出正确得分符合目标条件的纠正信息。A screening unit, configured to, based on the correct score of the at least one candidate information and the first error indication information, filter out corrected information whose correct score meets the target condition from the at least one candidate information.

在一种可能实施方式中，该候选信息包括替换该查询信息中词片段的关联词片段；In a possible implementation manner, the candidate information includes an associated word segment replacing a word segment in the query information;

该修正单元，用于基于该关联词片段与该查询信息中被替换词片段之间的相似类型，从该错误类型概率所包括的至少一种错误类型的概率中，获取与该相似类型匹配的匹配错误类型的类型概率；基于该候选信息的正确概率和该类型概率，确定该候选信息的正确得分。The correcting unit is configured to obtain a match matching the similar type from the probability of at least one error type included in the error type probability based on the similar type between the associated word segment and the replaced word segment in the query information A type probability of the wrong type; based on the correct probability of the candidate information and the type probability, a correct score of the candidate information is determined.

在一种可能实施方式中，该相似类型包括：字音相似、字形相似、语义相似中的至少一项；In a possible implementation manner, the similarity type includes: at least one of similarity in pronunciation, similarity in shape, and similarity in semantics;

其中，该字音相似的匹配错误类型包括字音相似错误，该字形相似的匹配错误类型包括字形相似错误，该语义相似的匹配错误类型包括字内容相似错误、缺失型错误或者冗余型错误中的至少一项。Wherein, the matching error types with similar phonetics include similar phonetic errors, the matching error types with similar fonts include similar font errors, and the matching error types with similar semantics include at least one of similar word content errors, missing errors, or redundant errors. one item.

在一种可能实施方式中，该修正单元，用于以下任一项：In a possible implementation manner, the correction unit is used for any of the following:

将该类型概率与该正确概率之间的乘积值，确定为该候选信息的正确得分；determining the product value between the type probability and the correct probability as the correct score of the candidate information;

基于该匹配错误类型，从与该用户匹配的目标关联关系中，获取该匹配错误类型对应的修正因子，将该类型概率、修正因子以及正确概率之间的乘积值，确定为该候选信息的正确得分，该目标关联关系包括错误类型和修正因子之间的对应关系。Based on the matching error type, the correction factor corresponding to the matching error type is obtained from the target association relationship matched with the user, and the product value among the type probability, correction factor and correct probability is determined as the correctness of the candidate information. score, the target correlation includes the correspondence between error types and correction factors.

在一种可能实施方式中，该第一错误指示信息包括该查询信息的正确概率以及错误类型概率；该第一确定模块，用于通过该目标模型，提取该查询信息的至少一种描述信息，该至少一种描述信息包括该查询信息的拼音、笔画、位置编码、该查询信息所包括关键字或者该查询信息所包括词组中的至少一项；基于该查询信息的至少一种描述信息，对该查询信息进行错误判定，得到该查询信息的正确概率以及错误类型概率。In a possible implementation manner, the first error indication information includes a correct probability and an error type probability of the query information; the first determination module is configured to extract at least one description information of the query information through the target model, The at least one descriptive information includes at least one item of pinyin, strokes, position codes, keywords included in the query information, or phrases included in the query information; based on at least one description information of the query information, the Error judgment is performed on the query information to obtain the correct probability and error type probability of the query information.

在一种可能实施方式中，该目标模型包括用户群模型，该用户群模型用于基于用户群的兴趣数据集判定输入该用户群模型的信息是否错误，该用户群包括该用户的至少两个兴趣相似用户；In a possible implementation manner, the target model includes a user group model, and the user group model is used to determine whether the information input to the user group model is wrong based on the interest data set of the user group, and the user group includes at least two of the user. users with similar interests;

该装置还包括模型训练模块，该模型训练模块，用于获取该用户群中至少两个用户的历史查询信息，并通过对历史查询信息中部分信息的噪音生成操作，生成包括噪音数据的负样本数据；通过该初始模型对该负样本数据进行预测，得到预测样本数据；基于该预测样本数据和该历史查询信息的正样本数据之间的相似度，对该初始模型的模型参数进行调整，直至该初始模型符合模型训练条件时停止调整，得到该用户群模型；The device also includes a model training module, the model training module is used to obtain the historical query information of at least two users in the user group, and generate a negative sample including noise data by performing a noise generation operation on part of the information in the historical query information data; the negative sample data is predicted by the initial model to obtain the predicted sample data; based on the similarity between the predicted sample data and the positive sample data of the historical query information, the model parameters of the initial model are adjusted until Stop adjusting when the initial model meets the model training conditions, and obtain the user group model;

其中，该噪音生成操作包括替换词片段操作、删除词片段操作或者增加词片段操作中的至少一项，该替换词片段操作包括替换为拼音相似词片段的操作、替换为字形相似词片段的操作或者替换为语义相似词片段中的至少一项。Wherein, the noise generation operation includes at least one of replacing a word segment operation, deleting a word segment operation or adding a word segment operation, and the replacing word segment operation includes replacing with a phonetic similar word segment, replacing with a font similar word segment Or replace it with at least one item in the semantically similar word segment.

在一种可能实施方式中，该装置还包括：In a possible implementation manner, the device also includes:

用户成熟度确定模块，用于响应于接收到用户输入搜索框的查询信息，确定该用户的用户成熟度，该用户成熟度用于指示该用户在该目标应用的活跃程度；The user maturity determination module is configured to determine the user maturity of the user in response to receiving the query information input by the user into the search box, and the user maturity is used to indicate the activity degree of the user in the target application;

目标模型确定模块，用于响应于该用户成熟度超过成熟度阈值时，基于与该用户的兴趣相似的用户群，确定与该用户匹配的至少一个目标模型。The target model determining module is configured to determine at least one target model matching the user based on a user group having interests similar to the user in response to the user's maturity exceeding a maturity threshold.

在一种可能实施方式中，该目标模型确定模块，用于以下至少一项：In a possible implementation manner, the target model determination module is used for at least one of the following:

基于与该用户的兴趣相似的用户群，将该用户群对应的用户群模型确定为与该用户匹配的目标模型；Based on a user group with similar interests to the user, determine a user group model corresponding to the user group as a target model matching the user;

基于与该用户的兴趣相似的用户群，将通用模型和该用户群模型确定为与该用户匹配的目标模型；Based on a user group with similar interests to the user, determine the general model and the user group model as target models matching the user;

其中，该通用模型用于基于该目标应用的全局用户数据集判定输入该通用模型的信息是否错误。Wherein, the general model is used to determine whether the information input into the general model is wrong based on the global user data set of the target application.

在一种可能实施方式中，该用户成熟度确定模块，用于确定该用户在该目标应用的活跃时间；基于该用户的交互操作的操作对象关联标签，统计该用户在该活跃时间内的兴趣标签数；基于该活跃时间和该兴趣标签数，确定该用户成熟度。In a possible implementation, the user maturity determination module is configured to determine the user's active time in the target application; based on the user's interaction operation object associated label, count the user's interest in the active time The number of tags; based on the active time and the number of tags of interest, determine the maturity of the user.

在一种可能实施方式中，该装置还包括以下至少一项：In a possible implementation manner, the device further includes at least one of the following:

第一查询模块，用于接收该用户在该目标应用的搜索框中输入的查询字符串，将该查询字符串作为该查询信息；The first query module is configured to receive a query string input by the user in the search box of the target application, and use the query string as the query information;

第二查询模块，用于接收该用户通过该搜索框输入的查询图像，并提取该查询图像包括的字符串，将从该查询图像中提取的字符串作为该查询信息；The second query module is used to receive the query image input by the user through the search box, and extract the character string included in the query image, and use the character string extracted from the query image as the query information;

第三查询模块，用于接收该用户通过该搜索框输入的表情图标，并获取该表情图标的文字描述信息，将该表情图标的文字描述信息作为该查询信息；The third query module is used to receive the emoticon input by the user through the search box, and obtain the text description information of the emoticon, and use the text description information of the emoticon as the query information;

第四查询模块，用于接收该用户通过该搜索框输入的查询字符串和表情图标，并获取该表情图标的文字描述信息，将该查询字符串和和该表情图标的文字描述信息作为该查询信息。The fourth query module is used to receive the query string and the emoticon input by the user through the search box, and obtain the text description information of the emoticon, and use the query string and the text description information of the emoticon as the query information.

在一种可能实施方式中，该装置还包括纠错数据确定模块，用于以下任一项：In a possible implementation manner, the device further includes an error correction data determination module, configured for any of the following:

基于该用户的用户群，将该用户群的纠错数据确定为与该用户匹配的目标纠错数据，该用户群的纠错数据包括该用户群的兴趣数据集；Based on the user group of the user, the error correction data of the user group is determined as the target error correction data matching the user, and the error correction data of the user group includes the interest data set of the user group;

基于该用户的用户群，将该用户的纠错数据以及该用户群的纠错数据确定为该目标纠错数据，该用户的纠错数据包括该用户的兴趣数据；Based on the user group of the user, determine the error correction data of the user and the error correction data of the user group as the target error correction data, and the error correction data of the user includes the interest data of the user;

基于该用户的用户群，将该用户群的纠错数据以及通用纠错数据确定为该目标纠错数据，该通用纠错数据包括该目标应用的全局用户的兴趣数据集；Based on the user group of the user, determine the error correction data and the general error correction data of the user group as the target error correction data, and the general error correction data includes the interest data set of the global users of the target application;

基于该用户的用户群，将该用户的纠错数据、该用户群的纠错数据以及通用纠错数据确定为该目标纠错数据。Based on the user group of the user, the error correction data of the user, the error correction data of the user group and the common error correction data are determined as the target error correction data.

本实施例的查询纠错装置可执行本申请上述实施例所示的查询纠错方法，其实现原理相类似，此处不再赘述。The query error correction device in this embodiment can execute the query error correction method shown in the above embodiments of the present application, and its realization principle is similar, so it will not be repeated here.

图6是本申请实施例中提供了一种计算机设备的结构示意图。如图6所示，该计算机设备包括：存储器和处理器；至少一个程序，存储于存储器中，用于被处理器执行时，与相关技术相比可实现：FIG. 6 is a schematic structural diagram of a computer device provided in an embodiment of the present application. As shown in Figure 6, the computer device includes: a memory and a processor; at least one program, stored in the memory, for being executed by the processor, compared with related technologies, can realize:

在一个可选实施例中提供了一种计算机设备，如图6所示，图6所示的计算机设备600包括：处理器601和存储器603。其中，处理器601和存储器603相连，如通过总线602相连。可选地，计算机设备600还可以包括收发器604，收发器604可以用于该计算机设备与其他计算机设备之间的数据交互，如数据的发送和/或数据的接收等。需要说明的是，实际应用中收发器604不限于一个，该计算机设备600的结构并不构成对本申请实施例的限定。A computer device is provided in an optional embodiment. As shown in FIG. 6 , the computer device 600 shown in FIG. 6 includes: a processor 601 and a memory 603 . Wherein, the processor 601 is connected to the memory 603 , such as through a bus 602 . Optionally, the computer device 600 may further include a transceiver 604, which may be used for data interaction between the computer device and other computer devices, such as sending and/or receiving data. It should be noted that, in practical applications, the transceiver 604 is not limited to one, and the structure of the computer device 600 does not limit the embodiment of the present application.

处理器601可以是CPU(Central Processing Unit，中央处理器)，通用处理器，DSP(Digital Signal Processor，数据信号处理器)，ASIC(Application SpecificIntegrated Circuit，专用集成电路)，FPGA(Field Programmable Gate Array，现场可编程门阵列)或者其他可编程逻辑器件、晶体管逻辑器件、硬件部件或者其任意组合。其可以实现或执行结合本申请公开内容所描述的各种示例性的逻辑方框，模块和电路。处理器601也可以是实现计算功能的组合，例如包含一个或多个微处理器组合，DSP和微处理器的组合等。Processor 601 can be CPU (Central Processing Unit, central processing unit), general purpose processor, DSP (Digital Signal Processor, data signal processor), ASIC (Application Specific Integrated Circuit, application specific integrated circuit), FPGA (Field Programmable Gate Array, field programmable gate array) or other programmable logic devices, transistor logic devices, hardware components, or any combination thereof. It can implement or execute the various illustrative logical blocks, modules and circuits described in connection with the present disclosure. The processor 601 may also be a combination that implements computing functions, such as a combination of one or more microprocessors, a combination of a DSP and a microprocessor, and the like.

总线602可包括一通路，在上述组件之间传送信息。总线602可以是PCI(Peripheral Component Interconnect，外设部件互连标准)总线或EISA(ExtendedIndustry Standard Architecture，扩展工业标准结构)总线等。总线602可以分为地址总线、数据总线、控制总线等。为便于表示，图6中仅用一条粗线表示，但并不表示仅有一根总线或一种类型的总线。Bus 602 may include a path for communicating information between the components described above. The bus 602 may be a PCI (Peripheral Component Interconnect, Peripheral Component Interconnect Standard) bus or an EISA (Extended Industry Standard Architecture, Extended Industry Standard Architecture) bus, etc. The bus 602 can be divided into address bus, data bus, control bus and so on. For ease of representation, only one thick line is used in FIG. 6 , but it does not mean that there is only one bus or one type of bus.

存储器603可以是ROM(Read Only Memory，只读存储器)或可存储静态信息和指令的其他类型的静态存储设备，RAM(Random Access Memory，随机存取存储器)或者可存储信息和指令的其他类型的动态存储设备，也可以是EEPROM(Electrically ErasableProgrammable Read Only Memory，电可擦可编程只读存储器)、CD-ROM(Compact DiscRead Only Memory，只读光盘)或其他光盘存储、光碟存储(包括压缩光碟、激光碟、光碟、数字通用光碟、蓝光光碟等)、磁盘存储介质或者其他磁存储设备、或者能够用于携带或存储具有指令或数据结构形式的期望的程序代码并能够由计算机存取的任何其他介质，但不限于此。Memory 603 can be ROM (Read Only Memory, read-only memory) or other types of static storage devices that can store static information and instructions, RAM (Random Access Memory, random access memory) or other types of memory that can store information and instructions Dynamic storage devices can also be EEPROM (Electrically Erasable Programmable Read Only Memory, Electrically Erasable Programmable Read-Only Memory), CD-ROM (Compact DiscRead Only Memory, CD-ROM) or other CD-ROM storage, CD-ROM storage (including compact CD-ROM, laser discs, compact discs, digital versatile discs, blu-ray discs, etc.), magnetic disk storage media or other magnetic storage devices, or any other medium capable of carrying or storing desired program code in the form of instructions or data structures and accessible by a computer media, but not limited thereto.

存储器603用于存储执行本申请方案的应用程序代码(计算机程序)，并由处理器601来控制执行。处理器601用于执行存储器603中存储的应用程序代码，以实现前述方法实施例所示的内容。The memory 603 is used to store the application program code (computer program) for executing the solution of the present application, and the execution is controlled by the processor 601 . The processor 601 is configured to execute the application program code stored in the memory 603, so as to implement the content shown in the foregoing method embodiments.

其中，计算机设备包括但不限于：服务器、终端或者任意支持查询纠错功能的电子设备等。Wherein, the computer equipment includes, but is not limited to: a server, a terminal, or any electronic equipment that supports query and error correction functions.

本申请实施例提供了一种计算机可读存储介质，该计算机可读存储介质上存储有计算机程序，当其在计算机设备上运行时，使得计算机设备可以执行前述方法实施例中查询纠错方法的相应内容。An embodiment of the present application provides a computer-readable storage medium, and a computer program is stored on the computer-readable storage medium. When it is run on a computer device, the computer device can execute the query error correction method in the foregoing method embodiments. Corresponding content.

本申请实施例提供了一种计算机程序产品或计算机程序，该计算机程序产品或计算机程序包括计算机指令，该计算机指令存储在计算机可读存储介质中。计算机设备的处理器从计算机可读存储介质读取该计算机指令，处理器执行该计算机指令，使得该计算机设备执行上述的查询纠错方法。An embodiment of the present application provides a computer program product or computer program, where the computer program product or computer program includes computer instructions, and the computer instructions are stored in a computer-readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions, so that the computer device executes the above query error correction method.

本申请的说明书和权利要求书及上述附图中的术语“第一”、“第二”、“第三”、“第四”、“1”、“2”等(如果存在)是用于区别类似的对象，而不必用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换，以便这里描述的本申请的实施例能够以除图示或文字描述以外的顺序实施。The terms "first", "second", "third", "fourth", "1", "2", etc. (if any) in the description and claims of this application and the above drawings are used for Distinguishes between similar objects and does not necessarily describe a particular order or sequence. It is to be understood that the data so used are interchangeable under appropriate circumstances such that the embodiments of the application described herein can be practiced in sequences other than those illustrated or described in writing.

应该理解的是，虽然附图的流程图中的各个步骤按照箭头的指示依次显示，但是这些步骤并不是必然按照箭头指示的顺序依次执行。除非本文中有明确的说明，这些步骤的执行并没有严格的顺序限制，其可以以其他的顺序执行。而且，附图的流程图中的至少一部分步骤可以包括多个子步骤或者多个阶段，这些子步骤或者阶段并不必然是在同一时刻执行完成，而是可以在不同的时刻执行，其执行顺序也不必然是依次进行，而是可以与其他步骤或者其他步骤的子步骤或者阶段的至少一部分轮流或者交替地执行。It should be understood that although the various steps in the flow chart of the accompanying drawings are displayed sequentially according to the arrows, these steps are not necessarily executed sequentially in the order indicated by the arrows. Unless otherwise specified herein, there is no strict order restriction on the execution of these steps, and they can be executed in other orders. Moreover, at least some of the steps in the flowcharts of the accompanying drawings may include multiple sub-steps or multiple stages, and these sub-steps or stages may not necessarily be executed at the same time, but may be executed at different times, and the order of execution is also It is not necessarily performed sequentially, but may be performed alternately or alternately with at least a part of other steps or sub-steps or stages of other steps.

以上所述仅是本发明的部分实施方式，应当指出，对于本技术领域的普通技术人员来说，在不脱离本发明原理的前提下，还可以做出若干改进和润饰，这些改进和润饰也应视为本发明的保护范围。The above descriptions are only part of the embodiments of the present invention. It should be pointed out that those skilled in the art can make some improvements and modifications without departing from the principles of the present invention. It should be regarded as the protection scope of the present invention.

Claims

1. A query error correction method, characterized in that the method comprises:

In response to receiving query information entered into the search box by the user, determining first error indication information of the query information based on a target model matched with the user, the target model being used to base the user's interest data in the target application determining whether the information input into the target model is wrong;

In response to the first error indication information indicating that the query information is wrong, determining at least one candidate information associated with the query information based on the target error correction data matched with the user and the query information, the target correction The error data at least includes the interest data of the user in the target application;

determining second error indication information for the at least one candidate information based on the target model;

Correction information for the query information is determined from the at least one candidate information based on the second error indication information.

2. The query error correction method according to claim 1, wherein the query information includes a query character string, and the determination is based on the target error correction data matched with the user and the query information, At least one candidate information associated with the query information, including:

Based on at least one word segment included in the query string, at least one associated word segment corresponding to the at least one word segment is obtained from the target error correction data;

Based on the at least one associated word segment, replace at least one corresponding word segment in the query information to obtain the at least one candidate information.

3. The query error correction method according to claim 2, wherein the target error correction data also includes a query index of the data of interest, and the query index is used to index similar segments in the data of interest;

The obtaining at least one associated word segment corresponding to the at least one word segment from the target error correction data based on at least one word segment included in the query string includes at least one of the following:

For each word segment, based on the pinyin of the word segment and a pinyin query index, query at least one pinyin-like word segment of the word segment from the target error correction data, the query index includes a pinyin query index, the The pinyin query index is used to index word fragments with similar pinyin in the interest data;

Based on the stroke sequence and stroke query index of the word segment, query at least one font-similar word segment of the word segment from the target error correction data, the query index includes a stroke query index, and the stroke query index is used for Indexing word segments with similar strokes in the interest data;

Based on the word segment and the semantic query index, at least one semantically similar word segment of the word segment is queried from the target error correction data, the query index includes a semantic query index, and the semantic query index is used to index the Word fragments with semantically similar keywords in interest data.

4. The query error correction method according to claim 1, wherein the first error indication information includes the correct probability and error type probability of the query information, and the error type probability is used to indicate the query information there is a probability of an error of at least one error type; the second error indication information includes a probability of correctness of the candidate information;

The determining correction information of the query information from the at least one candidate information based on the second error indication information includes:

Correcting the correct probability of the at least one candidate information based on the wrong type probability of the query information to obtain the correct score of the at least one candidate information;

Based on the correct score of the at least one candidate information and the first error indication information, the corrected information whose correct score meets the target condition is screened out from the at least one candidate information.

5. The query error correction method according to claim 4, wherein the candidate information comprises an associated word segment replacing the word segment in the query information;

The correcting the correct probability of the at least one candidate information based on the error type probability of the query information to obtain the correct score of the at least one candidate information includes:

Based on the similarity type between the associated word segment and the replaced word segment in the query information, from the probability of at least one error type included in the error type probability, a matching error type matching the similar type is obtained type probability;

A correct score for the candidate information is determined based on the correct probability of the candidate information and the type probability.

6. The query error correction method according to claim 5, wherein the similarity types include: at least one of similarity in pronunciation, similarity in font shape, and similarity in semantics;

Wherein, the matching error types with similar phonetics include similar phonetic errors, the matching error types with similar fonts include similar font errors, and the matching error types with similar semantics include similar word content errors, missing errors, or redundant errors. At least one of the .

7. The query error correction method according to claim 5, wherein said determining the correct score of said candidate information based on the correct probability of said candidate information and said type probability comprises any of the following:

determining the product value between the type probability and the correct probability as the correct score of the candidate information;

Based on the type of matching error, the correction factor corresponding to the type of matching error is obtained from the target association relationship matched with the user, and the product value of the probability of the type, the correction factor, and the correct probability is determined as the The correct score of the candidate information, the target correlation includes the correspondence between error types and correction factors.

8. The query error correction method according to claim 1, wherein the first error indication information includes the correct probability and the error type probability of the query information; the target model based on the user matching, Determining the first error indication information of the query information includes:

Through the target model, at least one kind of descriptive information of the query information is extracted, the at least one kind of descriptive information includes pinyin, strokes, position codes of the query information, keywords included in the query information or the query information at least one of the phrases included in the message;

Based on at least one description information of the query information, an error judgment is performed on the query information to obtain a correct probability and a wrong type probability of the query information.

9. The query error correction method according to claim 8, wherein the target model includes a user group model, and the user group model is used to determine the information input into the user group model based on the interest data set of the user group Whether it is wrong, the user group includes at least two users with similar interests of the user;

The training process of the user group model includes:

Obtain historical query information of at least two users in the user group, and generate negative sample data including noise data by performing a noise generation operation on part of the information in the historical query information;

Wherein, the noise generating operation includes at least one of replacing a word segment operation, deleting a word segment operation, or adding a word segment operation, and the replacing word segment operation includes replacing with a phonetic similar word segment, replacing with a font similar word segment or replaced by at least one of the semantically similar word fragments;

Predicting the negative sample data through the initial model to obtain predicted sample data;

Based on the similarity between the predicted sample data and the positive sample data of the historical query information, adjust the model parameters of the initial model until the initial model meets the model training conditions and stop adjusting, and obtain the user group model.

10. The query error correction method according to claim 1, wherein, before determining the first error indication information of the query information based on the target model matched with the user, the method further comprises:

In response to receiving query information input by the user into the search box, determine the user maturity of the user, where the user maturity is used to indicate the activity of the user in the target application;

In response to when the user maturity exceeds a maturity threshold, at least one target model matching the user is determined based on a user group having interests similar to the user.

11. The query error correction method according to claim 10, wherein said determining at least one target model matching said user based on a user group similar in interest to said user comprises at least one of the following:

Determining a user group model corresponding to the user group as a target model matching the user based on a user group with similar interests to the user;

determining a general model and the user group model as a target model matching the user based on a user group having interests similar to the user;

Wherein, the general model is used to determine whether the information input into the general model is wrong based on the global user data set of the target application.

12. The query error correction method according to claim 10, wherein said determining the user maturity of said user comprises:

determining the active time of the user in the target application;

Based on the operation object associated tags of the user's interactive operation, counting the number of interest tags of the user within the active time;

Based on the active time and the number of interest tags, the user maturity is determined.

13. The query error correction method according to claim 1, characterized in that, in response to receiving the query information entered into the search box by the user, based on the target model matched with the user, the first element of the query information is determined. Before the error indication information, the method also includes at least one of the following:

receiving a query string input by the user in the search box of the target application, and using the query string as the query information;

receiving a query image input by the user through the search box, extracting a character string included in the query image, and using the character string extracted from the query image as the query information;

receiving the emoticon input by the user through the search box, and obtaining the text description information of the emoticon, and using the text description information of the emoticon as the query information;

receiving the query string and the emoticon input by the user through the search box, and obtaining the text description information of the emoticon, and using the query string and the text description information of the emoticon as the query information .

14. The query error correction method according to claim 1, characterized in that, before determining at least one candidate information of the query information based on the query information and the target error correction data matched with the user, the Said method also includes any of the following:

Based on the user group of the user, determining the error correction data of the user group as the target error correction data matching the user, the error correction data of the user group includes the interest data set of the user group;

Based on the user group of the user, determining the error correction data of the user and the error correction data of the user group as the target error correction data, and the error correction data of the user includes the interest data of the user;

Based on the user group of the user, determining the error correction data and general error correction data of the user group as the target error correction data, the general error correction data including the interest data set of global users of the target application;

Based on the user group of the user, the error correction data of the user, the error correction data of the user group and the common error correction data are determined as the target error correction data.

15. A query error correction device, characterized in that the device comprises:

The first determination module is configured to determine first error indication information of the query information based on a target model matched with the user in response to receiving query information input by the user into the search box, and the target model is used to determine the first error indication information of the query information based on the target model The user's interest data in the target application determines whether the information input into the target model is wrong;

A candidate determination module, configured to determine at least one candidate associated with the query information based on the target error correction data matching the user and the query information in response to the first error indication information indicating that the query information is wrong Information, the target error correction data at least includes the interest data of the user in the target application;

A second determining module, configured to determine second error indication information of the at least one candidate information based on the target model;

A correction module, configured to determine correction information of the query information from the at least one candidate information based on the second error indication information.

16. A computer device comprising a memory, a processor and a computer program stored on the memory, wherein the processor executes the computer program to realize the query error correction described in any one of claims 1 to 14 method.

17. A computer-readable storage medium, on which a computer program is stored, wherein, when the computer program is executed by a processor, the query error correction method according to any one of claims 1 to 14 is implemented.

18. A computer program product, comprising a computer program, characterized in that, when the computer program is executed by a processor, the query error correction method according to any one of claims 1 to 14 is implemented.