WO2019153613A1

WO2019153613A1 - Chat response method, electronic device and storage medium

Info

Publication number: WO2019153613A1
Application number: PCT/CN2018/090643
Authority: WO
Inventors: 于凤英; 王健宗; 肖京
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2018-02-09
Filing date: 2018-06-11
Publication date: 2019-08-15
Anticipated expiration: 2020-08-09
Also published as: CN108491433B; CN108491433A

Abstract

Provided is a chat response method, comprising: acquiring a session question; querying a candidate question set related to the session question in a question-answer knowledge base; calculating the text similarity between the session question and each candidate question; determining whether a question similar to the session question exists; if so, searching for and outputting an answer associated with the similar question; if not, querying a candidate answer set related to the session question in the question-answer knowledge base; calculating the topic similarity between the session question and each candidate answer; determining whether an answer similar to the session question exists; if so, outputting the similar answer; if not, establishing a sequence prediction model, inputting the session question into the sequence prediction model to generate a response answer, and outputting the response answer as a target answer. The described method may provide customers with accurate and response feedback regarding the session question, thus improving the quality of service.

Description

Chat response method, electronic device and storage medium

本申请要求于2018年2月9日提交中国专利局，申请号为201810135747.6、发明名称为“聊天应答方法、电子装置及存储介质”的中国专利申请的优先权，其全部内容通过引用结合在本申请中。This application claims priority to Chinese Patent Application No. 201101135747.6, entitled "Chat Response Method, Electronic Device, and Storage Media", filed on February 9, 2018, the entire contents of which are incorporated herein by reference. In the application.

Technical field

本申请涉及计算机技术领域，尤其涉及一种聊天应答方法、电子装置及存储介质。The present application relates to the field of computer technologies, and in particular, to a chat response method, an electronic device, and a storage medium.

Background technique

随着科技的发展，AI(Artificial Intelligence，人工智能)正逐步改变着我们的生活方式，例如智能问答就是其中一种。当客户通过文字或语音在线咨询时，可以由线上的智能客服为客户进行智能应答。智能问答可以有效缓解客户服务的等待状况，提升服务质量，因而有着非常广阔的前景。With the development of technology, AI (Artificial Intelligence) is gradually changing our way of life. For example, smart question and answer is one of them. When the customer consults online via text or voice, the customer can be intelligently answered by the online intelligent customer service. Intelligent Q&A can effectively alleviate the waiting situation of customer service and improve service quality, so it has a very broad prospect.

然而，即使是在特定的服务领域，例如金融、银行、证券、保险等垂直的领域中，在线咨询的过程中也会包含一些纯闲聊的内容。此时针对客户输入的聊天会话内容，若无法快速准确和有效应变地响应客户，则会降低智能客服的服务质量，无法为客户带来人性化的高质量体验。However, even in a specific service area, such as financial, banking, securities, insurance and other vertical areas, the online consultation process will contain some pure chat content. At this time, if the chat session content input by the customer cannot respond to the customer quickly, accurately, and effectively, the service quality of the intelligent customer service will be reduced, and the humanized high quality experience cannot be brought to the customer.

发明内容Summary of the invention

鉴于以上原因，有必要提供一种聊天应答方法、电子装置及存储介质，可以针对会话问题为客户做出准确和应变的反馈，从而提高服务质量。In view of the above reasons, it is necessary to provide a chat response method, an electronic device and a storage medium, which can provide accurate and responsive feedback to customers for conversational problems, thereby improving service quality.

为实现上述目的，本申请提供一种聊天应答方法，该方法包括：预处理步骤：获取客户输入的会话问题，对所述会话问题进行预处理，得到会话问题的文本特征信息，所述文本特征信息包括各词条在所述会话问题中的词性、位置和词类归属信息，所述词类归属包括归属于关键词或命名实体；第一计算步骤：为问答知识库构建倒排索引，所述问答知识库包括预先整理的多个问题以及每个问题关联的一个或多个答案，根据所述文本特征信息，通过倒排索引查询的方式从问答知识库中查询与所述会话问题相关的候选问题集合，并分别计算所述会话问题与所述候选问题集合中每个候选问题的文本相似度；问题检索步骤：根据预设规则及所述文本相似度，判断候选问题集合中是否存在所述会话问题的近似问题，若所述候选问题集合中存在所述会话问题的近似问题，则在问答知识库中查找该近似问题的关联答案，将所述关联答案作为所述会话问题的目标答案输出；第二计算步骤：若所述候选问题集合中不存在所述会话问题的近似问题，则根据所述文本特征信息，通过倒排索引查询的方式从问答知识库中查询与所述会话问题相关的候选答案集合，并分别计算所述会话问题与所述候选答案集合中每个候选答案的主题相似度；答案检索步骤：根据预设规则及所述主题相似度，判断候选答案集合中是否存在所述会话问题的近似答案，若所述候选答案集合中存在所述会话问题的近似答案，则将所述近似答案作为所述会话问题的目标答案输出；答案预测步骤：若候选答案集合中不存在所述会话问题的近似答案，则通过seq2seq模型对所述问答知识库中的各个问题和答案进行编码和解码的迭代训练，从而构建序列预测模型，将所述会话问题输入所述序列预测模型生成应变答案，将所述应变答案作为所述会话问题的目标答案输出。To achieve the above objective, the present application provides a chat response method, the method comprising: a pre-processing step of: acquiring a session question input by a client, pre-processing the session problem, and obtaining text feature information of the session problem, the text feature The information includes part of speech, location and part-of-speech attribution information of each term in the conversation question, the word class attribution includes belonging to the keyword or the named entity; and the first calculating step: constructing an inverted index for the question-and-answer knowledge base, the question and answer The knowledge base includes a plurality of questions pre-arranged and one or more answers associated with each question, and according to the text feature information, querying candidate questions related to the conversation problem from the question-and-answer knowledge base by means of inverted index query Collecting, and separately calculating text similarity between the conversation problem and each candidate question in the candidate question set; a question retrieval step: determining whether the session exists in the candidate question set according to the preset rule and the text similarity An approximation problem of the problem, if there is an approximation of the conversation problem in the set of candidate questions And searching for the associated answer of the approximate question in the question and answer knowledge base, and outputting the associated answer as the target answer of the conversation question; and second calculating step: if the session problem does not exist in the candidate question set Approximating the problem, according to the text feature information, querying a candidate answer set related to the conversation question from the question and answer knowledge base by means of an inverted index query, and calculating each of the conversation problem and the candidate answer set separately The topic similarity of the candidate answers; the answer retrieval step: determining, according to the preset rule and the topic similarity, whether there is an approximate answer of the conversation question in the candidate answer set, if the conversation problem exists in the candidate answer set An approximate answer, the approximate answer is output as the target answer of the conversation question; the answer prediction step: if the approximate answer of the conversation problem does not exist in the candidate answer set, the seq2seq model is used in the question and answer knowledge base Each question and answer is iteratively trained in encoding and decoding to construct a sequence prediction module Type, inputting the conversation question into the sequence prediction model to generate a strain answer, and outputting the strain answer as a target answer of the conversation question.

为实现上述目的，本申请还提供一种电子装置，该电子装置包括存储器和处理器，所述存储器中包括聊天应答程序，该聊天应答程序被所述处理器执行时实现如下步骤：预处理步骤：获取客户输入的会话问题，对所述会话问题进行预处理，得到会话问题的文本特征信息，所述文本特征信息包括各词条在所述会话问题中的词性、位置和词类归属信息，所述词类归属包括归属于关键词或命名实体；第一计算步骤：为问答知识库构建倒排索引，所述问答知识库包括预先整理的多个问题以及每个问题关联的一个或多个答案，根据所述文本特征信息，通过倒排索引查询的方式从问答知识库中查询与所述会话问题相关的候选问题集合，并分别计算所述会话问题与所述候选问题集合中每个候选问题的文本相似度；问题检索步骤：根据预设规则及所述文本相似度，判断候选问题集合中是否存在所述会话问题的近似问题，若所述候选问题集合中存在所述会话问题的近似问题，则在问答知识库中查找该近似问题的关联答案，将所述关联答案作为所述会话问题的目标答案输出；第二计算步骤：若所述候选问题集合中不存在所述会话问题的近似问题，则根据所述文本特征信息，通过倒排索引查询的方式从问答知识库中查询与所述会话问题相关的候选答案集合，并分别计算所述会话问题与所述候选答案集合中每个候选答案的主题相似度；答案检索步骤：根据预设规则及所述主题相似度，判断候选答案集合中是否存在所述会话问题的近似答案，若所述候选答案集合中存在所述会话问题的近似答案，则将所述近似答案作为所述会话问题的目标答案输出；答案预测步骤：若候选答案集合中不存在所述会话问题的近似答案，则通过seq2seq模型对所述问答知识库中的各个问题和答案进行编码和解码的迭代训练，从而构建序列预测模型，将所述会话问题输入所述序列预测模型生成应变答案，将所述应变答案作为所述会话问题的目标答案输出。To achieve the above object, the present application further provides an electronic device including a memory and a processor, wherein the memory includes a chat response program, and the chat response program is executed by the processor to implement the following steps: a pre-processing step Obtaining a session problem input by the client, pre-processing the session problem, and obtaining text feature information of the session problem, where the text feature information includes part of speech, location, and word class attribution information of each term in the conversation question, The term class attribution includes attribution to a keyword or a named entity; a first calculation step: constructing an inverted index for the question and answer knowledge base, the question and answer knowledge base including a plurality of questions pre-arranged and one or more answers associated with each question, Determining, according to the text feature information, a candidate question set related to the session problem from a question and answer knowledge base by means of an inverted index query, and respectively calculating the session problem and each candidate question in the candidate question set Text similarity; question retrieval step: according to preset rules and the similarity of the text, Whether there is an approximation problem of the conversation problem in the set of candidate questions, if there is an approximation problem of the conversation problem in the candidate question set, the associated answer of the approximation question is searched in the question and answer knowledge base, and the associated answer is a target answer output as the conversation problem; a second calculation step: if there is no approximation problem of the conversation problem in the candidate question set, according to the text feature information, the question and answer knowledge is obtained by means of an inverted index query Querying, in the library, a set of candidate answers related to the conversation question, and separately calculating a topic similarity between the conversation problem and each candidate answer in the candidate answer set; an answer retrieval step: according to a preset rule and the topic is similar Degree, determining whether there is an approximate answer of the conversation question in the candidate answer set, if an approximate answer of the conversation question exists in the candidate answer set, outputting the approximate answer as a target answer of the conversation question; Prediction step: if there is no approximate answer to the conversation question in the candidate answer set, An iterative training of encoding and decoding each question and answer in the Q&A knowledge base by the seq2seq model, thereby constructing a sequence prediction model, inputting the conversation problem into the sequence prediction model to generate a strain answer, and using the strain answer as The target answer output of the conversation question.

此外，为实现上述目的，本申请还提供一种计算机可读存储介质，所述计算机可读存储介质中包括聊天应答程序，该聊天应答程序被处理器执行时，实现如上所述的聊天应答方法的任意步骤。In addition, in order to achieve the above object, the present application further provides a computer readable storage medium including a chat response program, when the chat response program is executed by a processor, implementing the chat response method as described above Any step.

本申请提出的聊天应答方法、电子装置及存储介质，在获取会话问题并进行预处理后，通过倒排索引查询的方式从问答知识库中查询与所述会话问题相关的候选问题集合，并分别计算所述会话问题与所述候选问题集合中每个候选问题的文本相似度，判断候选问题集合中是否存在所述会话问题的近似问题，若是，则在问答知识库中查找该近似问题的关联答案，将所述关联答案作为所述会话问题的目标答案输出，若所述候选问题集合中不存在所述会话问题的近似问题，则通过倒排索引查询的方式从问答知识库中查询与所述会话问题相关的候选答案集合，并分别计算所述会话问题与所述候选答案集合中每个候选答案的主题相似度，判断候选答案集合中是否存在所述会话问题的近似答案，若是，则将所述近似答案作为所述会话问题的目标答案输出，若候选答案集合中不存在所述会话问题的近似答案，则通过seq2seq模型对所述问答知识库中的各个问题和答案进行编码和解码的迭代训练，从而构建序列预测模型，将所述会话问题输入所述序列预测模型生成应变答案，将所述应变答案作为所述会话问题的目标答案输出,可以针对会话问题为客户做出准确和应变的反馈，从而提高服务质量。The chat response method, the electronic device and the storage medium proposed by the present application, after acquiring the session problem and performing pre-processing, query the candidate question set related to the conversation problem from the question-and-answer knowledge base by means of the inverted index query, and respectively Calculating a text similarity between the conversation problem and each candidate question in the candidate question set, determining whether there is an approximation problem of the conversation problem in the candidate question set, and if so, searching for an association of the approximation problem in the Q&A knowledge base The answer is that the associated answer is output as the target answer of the conversation question. If there is no approximation problem of the conversation problem in the candidate question set, the query and the query are obtained from the question and answer knowledge base by means of an inverted index query. Determining a set of candidate answers related to the conversation problem, and separately calculating a topic similarity of the conversation question and each candidate answer in the candidate answer set, and determining whether an approximate answer of the conversation question exists in the candidate answer set, and if so, Outputting the approximate answer as a target answer to the conversation question, if the candidate If there is no approximate answer to the conversation problem in the answer set, the iterative training of encoding and decoding each question and answer in the question and answer knowledge base is performed by the seq2seq model, thereby constructing a sequence prediction model, and inputting the conversation problem The sequence prediction model generates a strain answer, and outputs the strain answer as a target answer of the conversation question, and can provide accurate and responsive feedback to the client for the conversation problem, thereby improving service quality.

DRAWINGS

图1为本申请电子装置较佳实施例的运行环境示意图；1 is a schematic diagram of an operating environment of a preferred embodiment of an electronic device of the present application;

图2为本申请电子装置与客户端较佳实施例的交互示意图；2 is a schematic diagram of interaction between an electronic device and a client according to a preferred embodiment of the present application;

图3为本申请聊天应答方法较佳实施例的流程图；3 is a flow chart of a preferred embodiment of a chat response method of the present application;

图4为图1中聊天应答程序的程序模块图。4 is a program block diagram of the chat response program of FIG. 1.

本申请目的的实现、功能特点及优点将结合实施例，参照附图做进一步说明。The implementation, functional features and advantages of the present application will be further described with reference to the accompanying drawings.

Detailed ways

下面将参考若干具体实施例来描述本申请的原理和精神。应当理解，此处所描述的具体实施例仅仅用以解释本申请，并不用于限定本申请。The principles and spirit of the present application are described below with reference to a number of specific embodiments. It is understood that the specific embodiments described herein are merely illustrative of the application and are not intended to be limiting.

本领域的技术人员知道，本申请的实施方式可以实现为一种方法、装置、设备、系统或计算机程序产品。因此，本申请可以具体实现为完全的硬件、完全的软件(包括固件、驻留软件、微代码等)，或者硬件和软件结合的形式。Those skilled in the art will appreciate that embodiments of the present application can be implemented as a method, apparatus, device, system, or computer program product. Accordingly, the application can be embodied in a complete hardware, complete software (including firmware, resident software, microcode, etc.), or a combination of hardware and software.

根据本申请的实施例，提出了一种聊天应答方法、电子装置及存储介质。According to an embodiment of the present application, a chat response method, an electronic device, and a storage medium are proposed.

参照图1所示，为本申请电子装置较佳实施例的运行环境示意图。1 is a schematic diagram of an operating environment of a preferred embodiment of an electronic device of the present application.

该电子装置1可以是服务器、便携式计算机、桌上型计算机等具有存储和运算功能的终端设备。The electronic device 1 may be a terminal device having a storage and computing function such as a server, a portable computer, or a desktop computer.

该电子装置1包括存储器11、处理器12、网络接口13及通信总线14。所述网络接口13可选地可以包括标准的有线接口和无线接口(如WI-FI接口)。通信总线14用于实现上述组件之间的连接通信。The electronic device 1 includes a memory 11, a processor 12, a network interface 13, and a communication bus 14. The network interface 13 can optionally include a standard wired interface and a wireless interface (such as a WI-FI interface). The communication bus 14 is used to implement connection communication between the above components.

存储器11包括至少一种类型的可读存储介质。所述至少一种类型的可读存储介质可为如闪存、硬盘、多媒体卡、卡型存储器等的非易失性存储介质。在一些实施例中，所述可读存储介质可以是所述电子装置1的内部存储单元，例如该电子装置1的硬盘。在另一些实施例中，所述可读存储介质也可以是所述电子装置1的外部存储器11，例如所述电子装置1上配备的插接式硬盘，智能存储卡(Smart Media Card,SMC)，安全数字(Secure Digital,SD)卡，闪存卡(Flash Card)等。The memory 11 includes at least one type of readable storage medium. The at least one type of readable storage medium may be a non-volatile storage medium such as a flash memory, a hard disk, a multimedia card, a card type memory, or the like. In some embodiments, the readable storage medium may be an internal storage unit of the electronic device 1, such as a hard disk of the electronic device 1. In other embodiments, the readable storage medium may also be an external memory 11 of the electronic device 1, such as a plug-in hard disk equipped on the electronic device 1, a smart memory card (SMC). , Secure Digital (SD) card, Flash Card, etc.

在本实施例中，所述存储器11的可读存储介质通常用于存储安装于所述电子装置1的聊天应答程序10及问答知识库4等。所述存储器11还可以用于暂时地存储已经输出或者将要输出的数据。In the present embodiment, the readable storage medium of the memory 11 is generally used to store the chat response program 10, the Q&A knowledge base 4, and the like installed in the electronic device 1. The memory 11 can also be used to temporarily store data that has been output or is about to be output.

处理器12在一些实施例中可以是一中央处理器(Central Processing Unit,CPU)，微处理器或其他数据处理芯片，用于运行存储器11中存储的程序代码或处理数据，例如执行聊天应答程序10等。The processor 12, in some embodiments, may be a Central Processing Unit (CPU), microprocessor or other data processing chip for running program code or processing data stored in the memory 11, such as executing a chat response program. 10 and so on.

图1仅示出了具有组件11-14以及聊天应答程序10的电子装置1，但是应理解的是，并不要求实施所有示出的组件，可以替代的实施更多或者更少的组件。1 shows only the electronic device 1 having the components 11-14 and the chat response program 10, but it should be understood that not all illustrated components may be implemented, and more or fewer components may be implemented instead.

可选地，该电子装置1还可以包括用户接口，用户接口可以包括输入单元比如键盘(Keyboard)、语音输入装置比如麦克风(microphone)等具有语音识别功能的设备、语音输出装置比如音响、耳机等。可选地，用户接口还可以包括标准的有线接口、无线接口。Optionally, the electronic device 1 may further include a user interface, and the user interface may include an input unit such as a keyboard, a voice input device such as a microphone, a device with a voice recognition function, a voice output device such as an audio, a headphone, and the like. . Optionally, the user interface may also include a standard wired interface and a wireless interface.

可选地，该电子装置1还可以包括显示器，显示器也可以称为显示屏或显示单元。在一些实施例中可以是LED显示器、液晶显示器、触控式液晶显示器以及有机发光二极管(Organic Light-Emitting Diode，OLED)显示器等。显示器用于显示在电子装置1中处理的信息以及用于显示可视化的用户界面。Optionally, the electronic device 1 may further include a display, which may also be referred to as a display screen or a display unit. In some embodiments, it may be an LED display, a liquid crystal display, a touch liquid crystal display, and an Organic Light-Emitting Diode (OLED) display. The display is used to display information processed in the electronic device 1 and a user interface for displaying visualizations.

可选地，该电子装置1还包括触摸传感器。所述触摸传感器所提供的供用户进行触摸操作的区域称为触控区域。此外，这里所述的触摸传感器可以为电阻式触摸传感器、电容式触摸传感器等。而且，所述触摸传感器不仅包括接触式的触摸传感器，也可包括接近式的触摸传感器等。此外，所述触摸传感器可以为单个传感器，也可以为例如阵列布置的多个传感器。用户可以通过触摸所述触控区域启动聊天应答程序10。Optionally, the electronic device 1 further comprises a touch sensor. The area provided by the touch sensor for the user to perform a touch operation is referred to as a touch area. Further, the touch sensor described herein may be a resistive touch sensor, a capacitive touch sensor, or the like. Moreover, the touch sensor includes not only a contact type touch sensor but also a proximity type touch sensor or the like. Furthermore, the touch sensor may be a single sensor or a plurality of sensors arranged, for example, in an array. The user can initiate the chat response program 10 by touching the touch area.

此外，该电子装置1的显示器的面积可以与所述触摸传感器的面积相同，也可以不同。可选地，将显示器与所述触摸传感器层叠设置，以形成触摸显示屏。该装置基于触摸显示屏侦测用户触发的触控操作。In addition, the area of the display of the electronic device 1 may be the same as or different from the area of the touch sensor. Optionally, a display is stacked with the touch sensor to form a touch display. The device detects a user-triggered touch operation based on a touch screen display.

该电子装置1还可以包括射频(Radio Frequency，RF)电路、传感器和音频电路等等，在此不再赘述。The electronic device 1 may further include a radio frequency (RF) circuit, a sensor, an audio circuit, and the like, and details are not described herein.

参阅图2所示，为本申请电子装置1与客户端2较佳实施例的交互示意图。所述聊天应答程序10运行于电子装置1中，在图2中所述电子装置1的较佳实施例为服务器。所述电子装置1通过网络3与客户端2通信连接。所述客户端2可以运行于各类终端设备中，例如智能手机、便携式计算机等。用户通过客户端2登录至所述电子装置1后，可以向聊天应答程序10输入会话问题，所述会话问题可以为对特定领域的会话问题，也可以为聊天会话内容。聊天应答程序10可以采用所述聊天应答方法，根据所述会话问题确定合适的响应内容，并将所述响应内容反馈给客户端2。Referring to FIG. 2, it is a schematic diagram of interaction between the electronic device 1 and the client 2 according to a preferred embodiment of the present application. The chat response program 10 runs in the electronic device 1. In Fig. 2, the preferred embodiment of the electronic device 1 is a server. The electronic device 1 is communicatively coupled to the client 2 via a network 3. The client 2 can run in various types of terminal devices, such as smart phones, portable computers, and the like. After the user logs in to the electronic device 1 through the client 2, the session question can be input to the chat answering program 10, and the session question can be a session problem for a specific domain or a chat session content. The chat response program 10 can adopt the chat response method, determine an appropriate response content according to the session problem, and feed back the response content to the client 2.

参阅图3所示，为本申请聊天应答方法较佳实施例的流程图。电子装置1的处理器12执行存储器11中存储的聊天应答程序10时实现聊天应答方法的如下步骤：Referring to FIG. 3, it is a flowchart of a preferred embodiment of the chat response method of the present application. The following steps of implementing the chat response method when the processor 12 of the electronic device 1 executes the chat response program 10 stored in the memory 11:

步骤S1，获取客户输入的会话问题，对所述会话问题进行预处理，得到会话问题的文本特征信息，所述文本特征信息包括各词条在所述会话问题中的词性、位置和词类归属信息，所述词类归属包括归属于关键词或命名实体。所述会话问题例如可以为对特定领域的会话问题，例如“保修期是多久”，也可以为聊天会话内容，例如“今天天气很不错”。为了便于后续对所述会话问题的处理，步骤S1可以先对所述会话问题进行一些预处理。Step S1: Obtain a session question input by the client, perform pre-processing on the session problem, and obtain text feature information of the session problem, where the text feature information includes part of speech, location, and word class attribution information of each term in the conversation question. The word class attribution includes attribution to a keyword or a named entity. The session question can be, for example, a conversational question for a particular domain, such as "how long the warranty period is," or it can be a chat session content, such as "The weather is very good today." In order to facilitate subsequent processing of the session problem, step S1 may first perform some pre-processing on the session problem.

具体地，步骤S1进行的预处理可以包括如下处理：Specifically, the pre-processing performed in step S1 may include the following processing:

对所述会话问题进行分词处理，从而切分出会话问题的各词条，例如，所述会话问题为“保修期是多久”，则分词后得到的词条是“保修期”、“是”、“多”、“久”，所述分词处理的方法包括基于词典进行正向最大匹配和/或基于词典进行逆向最大匹配；Performing word segmentation on the conversation problem, thereby segmenting the terms of the conversation problem. For example, the conversation question is “how long the warranty period is”, and the terms obtained after the word segmentation are “warranty period” and “yes”. , "multi", "long", the method of word segmentation includes performing a forward maximum match based on a dictionary and/or performing a reverse maximum match based on a dictionary;

对经所述分词处理得到的各词条进行词性解析，并对各词条的词性进行标注，例如对上述会话问题的示例，按照预设规则进行词性标注后的结果为“保修期/名词”、“是/动词”、“多/副词”、“久/形容词”，所述词性解析通过经预设大规模语料库训练得到的词性标注模型实现；Performing part-of-speech analysis on each term obtained by the word segmentation, and marking the part-of-speech of each term. For example, for the example of the above-mentioned conversation problem, the result of the part-of-speech tagging according to the preset rule is “warranty period/noun”. "Yes/verbs", "multiple/adverbs", "long/adjectives", the part of speech analysis is realized by a part-of-speech tagging model obtained through a preset large-scale corpus training;

对所述会话问题进行命名实体识别，从而识别出具有特定意义的命名实体，所述命名实体包括人名、地名、组织机构、专有名词，所述命名实体识别的方法包括基于词典和规则的方法，以及基于统计学习的方法；Named entity identification is performed on the session problem, thereby identifying a named entity having a specific meaning, the named entity includes a person name, a place name, an organization, a proper noun, and the method for identifying the named entity includes a dictionary-based and rule-based method And methods based on statistical learning;

根据所述各词条以及所述命名实体，从所述会话问题中提取关键词，所述关键词为字符数量多于第一预设阈值的词组，或者为存在于预设词典中的命名实体，所述预设词典包括业务场景专有词典。Extracting a keyword from the conversation question according to the term and the named entity, the keyword being a phrase whose number of characters is greater than a first preset threshold, or a named entity existing in a preset dictionary The preset dictionary includes a business scenario-specific dictionary.

步骤S2，为问答知识库4构建倒排索引，所述问答知识库4包括预先整理的多个问题以及每个问题关联的一个或多个答案，根据所述文本特征信息，通过倒排索引查询的方式从问答知识库4中查询与所述会话问题相关的候选问题集合，并分别计算所述会话问题与所述候选问题集合中每个候选问题的文本相似度。Step S2, constructing an inverted index for the question and answer knowledge base 4, the question and answer knowledge base 4 includes a plurality of questions arranged in advance and one or more answers associated with each question, and querying through the inverted index according to the text feature information The method queries the candidate question set related to the conversation question from the question and answer knowledge base 4, and separately calculates the text similarity of the session question and each candidate question in the candidate question set.

在一个实施例中，所述为问答知识库4构建倒排索引包括：In one embodiment, the constructing the inverted index for the Q&A knowledge base 4 includes:

对问答知识库4中的每个问题和答案分别进行分词、词性标注、关键词提取、关键词出现位置记录、分配ID号的操作，以及为每个问题和答案分词后得到的各词条分配ID号；Each question and answer in the Q&A Knowledge Base 4 is divided into word segmentation, part-of-speech tagging, keyword extraction, keyword occurrence location record, assignment of ID number, and assignment of each term after each word and answer segmentation. ID number;

对问答知识库4中每个问题和答案根据相应的ID号进行排序，对所述每个问题和答案分词后得到的各词条根据相应的ID号进行排序，并将具有同一词条ID的所有问题ID和答案ID放到该词条对应的倒排记录表中；Each question and answer in the Q&A knowledge base 4 is sorted according to the corresponding ID number, and each term obtained after each word and answer word segmentation is sorted according to the corresponding ID number, and will have the same term ID. All question IDs and answer IDs are placed in the inverted record table corresponding to the entry;

将所有倒排记录表合并为最终的倒排索引。Combine all inverted record tables into the final inverted index.

所述候选问题集合中包括至少一个候选问题，且由于采用的是倒排索引查询的方式，每个候选问题都与所述会话问题存在一定程度的联系。每个候选问题与所述会话问题的所述联系可以通过所述文本相似度来反映，若会话问题与相应的候选问题之间的文本相似度越高，则认为会话问题与该候选问题越相似。At least one candidate question is included in the candidate question set, and each candidate question has a certain degree of association with the session problem due to the manner of using an inverted index query. The association between each candidate question and the conversation question may be reflected by the text similarity. If the text similarity between the conversation problem and the corresponding candidate question is higher, the conversation problem is considered to be similar to the candidate problem. .

具体地，步骤S2分别计算所述会话问题与所述候选问题集合中每个候选问题的文本相似度的方法可以包括:Specifically, the method for calculating the text similarity between the session problem and each candidate question in the candidate question set in step S2 may include:

构建卷积神经网络，通过所述卷积神经网络对所述问答知识库4中的所有问题语句进行样本训练，得到所述问答知识库4中问题语句对应的卷积神经网络模型；Constructing a convolutional neural network, and performing sample training on all the problem sentences in the Q&A knowledge base 4 through the convolutional neural network, and obtaining a convolutional neural network model corresponding to the problem statement in the Q&A knowledge base 4;

将所述会话问题和所述候选问题集合中的每个候选问题分别输入所述卷积神经网络模型，通过所述卷积神经网络模型的卷积核卷积得到所述会话问题和所述候选问题集合中的每个候选问题各自对应的特征向量；Entering each of the conversation problem and the candidate question set into the convolutional neural network model, respectively, and obtaining the conversation problem and the candidate by convolution kernel convolution of the convolutional neural network model a feature vector corresponding to each candidate question in the question set;

分别计算所述会话问题对应的特征向量与所述候选问题集合中的每个候选问题对应的特征向量之间的余弦距离，从而得到所述会话问题与所述候选问题集合中每个候选问题的文本相似度。Calculating a cosine distance between the feature vector corresponding to the session problem and the feature vector corresponding to each candidate question in the candidate question set, respectively, to obtain the session problem and each candidate problem in the candidate question set Text similarity.

步骤S3，根据预设规则及所述文本相似度，判断候选问题集合中是否存在所述会话问题的近似问题，若所述候选问题集合中存在所述会话问题的近似问题，则在问答知识库中查找该近似问题的关联答案，将所述关联答案作为所述会话问题的目标答案输出。Step S3: determining, according to the preset rule and the text similarity, whether there is an approximation problem of the conversation problem in the candidate question set, and if there is an approximation problem of the conversation problem in the candidate question set, then the Q&A knowledge base Finding the associated answer of the approximation question, and outputting the associated answer as the target answer of the conversation question.

具体地，所述预设规则可以包括：判断是否存在与会话问题的文本相似度大于第二预设阈值的候选问题，若存在与会话问题的文本相似度大于第二预设阈值的候选问题，则判定候选问题集合中存在所述会话问题的近似问题。若不存在与会话问题的文本相似度大于第二预设阈值的候选问题，则判定候选问题集合中不存在所述会话问题的近似问题。Specifically, the preset rule may include: determining whether there is a candidate problem that the text similarity with the conversation problem is greater than the second preset threshold, and if there is a candidate problem that the text similarity with the conversation problem is greater than the second preset threshold, Then, an approximation problem of the conversation problem exists in the candidate problem set. If there is no candidate problem that the text similarity with the conversation problem is greater than the second preset threshold, then it is determined that there is no approximation problem of the conversation problem in the candidate problem set.

若存在与会话问题的文本相似度大于第二预设阈值的候选问题，则步骤S3从所述与会话问题的文本相似度大于第二预设阈值的候选问题中选择最大文本相似度对应的候选问题作为所述近似问题，并在问答知识库4中查找该近似问题的关联答案，将所述关联答案作为所述会话问题的目标答案输出。值得注意的是，所述近似问题在问答知识库4中也可能有不止一个关联答案，当近似问题在问答知识库4中有多个关联答案时，步骤S3可以取所述多个关联答案中，在预设时间段(例如最近一周)内输出频率最高的关联答案作为所述会话问题的目标答案输出。If there is a candidate problem that the text similarity with the conversation problem is greater than the second preset threshold, step S3 selects a candidate corresponding to the maximum text similarity from the candidate questions with the text similarity of the conversation problem being greater than the second preset threshold. The problem is taken as the approximation question, and the associated answer of the approximation question is searched in the Q&A knowledge base 4, and the associated answer is output as the target answer of the conversation question. It should be noted that the approximation question may also have more than one associated answer in the Q&A knowledge base 4. When the approximation question has multiple associated answers in the Q&A knowledge base 4, step S3 may take the plurality of associated answers. And outputting the associated answer with the highest frequency in the preset time period (for example, the most recent week) as the target answer of the conversation question.

步骤S4，若所述候选问题集合中不存在所述会话问题的近似问题，则根据所述文本特征信息，通过倒排索引查询的方式从问答知识库4中查询与所述会话问题相关的候选答案集合，并分别计算所述会话问题与所述候选答案集合中每个候选答案的主题相似度。Step S4: If there is no approximation problem of the conversation problem in the candidate question set, query the candidate related to the conversation problem from the question and answer knowledge base 4 by using an inverted index query according to the text feature information. An answer set, and separately calculating a topic similarity of the conversation question and each candidate answer in the candidate answer set.

所述候选答案集合中包括至少一个候选答案，且由于采用的是倒排索引查询的方式，每个候选答案都与所述会话问题存在一定程度的联系。每个候选答案与所述会话问题的所述联系可以通过所述主题相似度来反映，若会话问题与相应的候选答案之间的主题相似度越高，则认为会话问题与该候选答案的主题越相似，从而认为该候选答案越有可能是该会话问题对应的答案。At least one candidate answer is included in the set of candidate answers, and each candidate answer has a certain degree of association with the session question due to the manner of using an inverted index query. The association of each candidate answer with the conversation question may be reflected by the topic similarity, and if the topic similarity between the conversation question and the corresponding candidate answer is higher, the conversation problem and the topic of the candidate answer are considered The more similar, the more likely the candidate answer is to be the answer to the conversation question.

具体地，步骤S4分别计算所述会话问题与所述候选答案集合中每个候选答案的主题相似度的方法可以包括：Specifically, the method for calculating the topic similarity of each of the candidate answers in the candidate answer set in step S4 may include:

所述分别计算所述会话问题与所述候选答案集合中每个候选答案的主题相似度包括：The calculating the similarity degree between the conversation problem and each candidate answer in the candidate answer set respectively includes:

采用线性判别分析(Linear Discriminant Analysis，LDA)模型分别提取所述会话问题和所述候选答案集合中每个候选答案的主题向量；Extracting the conversation problem and the topic vector of each candidate answer in the candidate answer set by using a Linear Discriminant Analysis (LDA) model;

分别计算所述会话问题的主题向量与所述候选答案集合中每个候选答案的主题向量之间的余弦距离，从而得到所述会话问题与所述候选答案集合中每个候选答案的主题相似度。Calculating a cosine distance between a topic vector of the conversation question and a topic vector of each candidate answer in the candidate answer set, respectively, to obtain a topic similarity between the conversation question and each candidate answer in the candidate answer set .

步骤S5，根据预设规则及所述主题相似度，判断候选答案集合中是否存在所述会话问题的近似答案，若所述候选答案集合中存在所述会话问题的近似答案，则将所述近似答案作为所述会话问题的目标答案输出。Step S5: determining, according to the preset rule and the topic similarity, whether an approximate answer of the conversation problem exists in the candidate answer set, and if the approximate answer of the conversation problem exists in the candidate answer set, the approximation is The answer is output as the target answer to the conversation question.

具体地，所述预设规则可以包括：判断是否存在与会话问题的主题相似度大于第三预设阈值的候选答案，若存在与会话问题的主题相似度大于第三预设阈值的候选答案，则判定候选答案集合中存在所述会话问题的近似答案。若不存在与会话问题的主题相似度大于第三预设阈值的候选答案，则判定候选答案集合中不存在所述会话问题的近似答案。Specifically, the preset rule may include: determining whether there is a candidate answer whose topic similarity with the session problem is greater than a third preset threshold, and if there is a candidate answer with a topic similarity of the conversation problem being greater than a third preset threshold, Then, an approximate answer to the conversation question exists in the candidate answer set. If there is no candidate answer with the topic similarity of the conversation problem being greater than the third preset threshold, it is determined that there is no approximate answer of the conversation question in the candidate answer set.

若存在与会话问题的主题相似度大于第三预设阈值的候选答案，则将所述候选答案作为会话问题的近似答案，步骤S5将所述近似答案作为所述会话问题的目标答案输出。值得注意的是，与会话问题的主题相似度大于第三预设阈值的候选答案在问答知识库4中也可能有不止一个，当与会话问题的主题相似度大于第三预设阈值的候选答案在问答知识库4中有多个时，步骤S5可以取所述多个候选答案中，在预设时间段(例如最近一周)内输出频率最高的作为所述会话问题的近似答案。If there is a candidate answer whose topic similarity to the conversation question is greater than the third preset threshold, the candidate answer is taken as an approximate answer to the conversation question, and step S5 outputs the approximate answer as the target answer of the conversation question. It is worth noting that the candidate answers with the topic similarity of the conversation problem greater than the third preset threshold may also have more than one in the Q&A knowledge base 4, when the similarity with the topic of the conversation problem is greater than the third predetermined threshold. When there are multiple in the question and answer knowledge base 4, step S5 may take the approximate answer which is the highest frequency of the conversation question in the preset time period (for example, the most recent week).

步骤S6，若候选答案集合中不存在所述会话问题的近似答案，则通过seq2seq模型对所述问答知识库4中的各个问题和答案进行编码和解码的迭代训练，从而构建序列预测模型，将所述会话问题输入所述序列预测模型生成应变答案，将所述应变答案作为所述会话问题的目标答案输出。所述seq2seq模型由用于进行所述编码和解码迭代训练的前向长短记忆网络LSTM模型和后向LSTM模型，以及用于计算每次编码和解码的隐藏层信息权重的注意力机制构成。Step S6, if there is no approximate answer of the conversation question in the candidate answer set, the iterative training of encoding and decoding each question and answer in the Q&A knowledge base 4 by using the seq2seq model, thereby constructing a sequence prediction model, The conversation question is input to the sequence prediction model to generate a strain answer, and the strain answer is output as a target answer of the conversation question. The seq2seq model consists of a forward long and short memory network LSTM model and a backward LSTM model for performing the encoding and decoding iterative training, and an attention mechanism for calculating hidden layer information weights for each encoding and decoding.

根据本实施例提供的聊天应答方法，在获取会话问题并进行预处理后，通过倒排索引查询的方式从问答知识库4中查询与所述会话问题相关的候选问题集合，并分别计算所述会话问题与所述候选问题集合中每个候选问题的文本相似度，判断候选问题集合中是否存在所述会话问题的近似问题，若是，则在问答知识库4中查找该近似问题的关联答案，将所述关联答案作为所述会话问题的目标答案输出，若所述候选问题集合中不存在所述会话问题的近似问题，则通过倒排索引查询的方式从问答知识库4中查询与所述会话问题相关的候选答案集合，并分别计算所述会话问题与所述候选答案集合中每个候选答案的主题相似度，判断候选答案集合中是否存在所述会话问题的近似答案，若是，则将所述近似答案作为所述会话问题的目标答案输出，若候选答案集合中不存在所述会话问题的近似答案，则通过seq2seq模型对所述问答知识库中的各个问题和答案进行编码和解码的迭代训练，从而构建序列预测模型，将所述会话问题输入所述序列预测模型生成应变答案，将所述应变答案作为所述会话问题的目标答案输出。通过本实施例提供的聊天应答方法可以针对会话问题为客户做出准确和应变的反馈，从而提高服务质量。According to the chat response method provided in this embodiment, after the session problem is acquired and pre-processed, the candidate question set related to the session problem is queried from the question-and-answer knowledge base 4 by means of an inverted index query, and the said a textual similarity between the conversation problem and each candidate question in the candidate question set, determining whether there is an approximation problem of the conversation problem in the candidate question set, and if so, searching for the associated answer of the approximation question in the Q&A knowledge base 4, Outputting the associated answer as a target answer of the conversation question. If there is no approximation problem of the conversation problem in the candidate question set, querying and speaking from the Q&A knowledge base 4 by means of an inverted index query a set of candidate answers related to the conversation problem, and respectively calculating a topic similarity of the conversation question and each candidate answer in the candidate answer set, and determining whether there is an approximate answer of the conversation question in the candidate answer set, and if so, The approximate answer is output as the target answer of the conversation question, if not in the candidate answer set An approximate answer to the conversation problem exists, and iterative training of encoding and decoding each question and answer in the question and answer knowledge base by the seq2seq model, thereby constructing a sequence prediction model, and inputting the conversation problem into the sequence prediction model A strain answer is generated, and the strain answer is output as a target answer to the conversation question. The chat response method provided by the embodiment can provide accurate and responsive feedback for the client for the session problem, thereby improving the service quality.

参阅图4所示，为图1中聊天应答程序10的程序模块图。在本实施例中，聊天应答程序10被分割为多个模块，该多个模块被存储于存储器11中，并由处理器12执行，以完成本申请。本申请所称的模块是指能够完成特定功能的一系列计算机程序指令段。Referring to FIG. 4, it is a program module diagram of the chat response program 10 in FIG. In the present embodiment, the chat response program 10 is divided into a plurality of modules that are stored in the memory 11 and executed by the processor 12 to complete the present application. A module as referred to in this application refers to a series of computer program instructions that are capable of performing a particular function.

所述聊天应答程序10可以被分割为：预处理模块110、第一计算模块120、问题检索模块130、第二计算模块140、答案检索模块150和答案预测模块160。The chat response program 10 can be divided into: a pre-processing module 110, a first calculation module 120, a question retrieval module 130, a second calculation module 140, an answer retrieval module 150, and an answer prediction module 160.

预处理模块110，用于获取客户输入的会话问题，对所述会话问题进行预处理，得到会话问题的文本特征信息，所述文本特征信息包括各词条在所述会话问题中的词性、位置和词类归属信息，所述词类归属包括归属于关键词或命名实体。The pre-processing module 110 is configured to obtain a session problem input by the client, and perform pre-processing on the session problem to obtain text feature information of the session problem, where the text feature information includes the part of speech and location of each term in the conversation problem. And word class attribution information, the word class attribution includes attribution to a keyword or a named entity.

具体地，预处理模块110用于对所述会话问题进行以下预处理：Specifically, the pre-processing module 110 is configured to perform the following pre-processing on the session problem:

对所述会话问题进行分词处理，从而切分出会话问题的各词条，所述分词处理的方法包括基于词典进行正向最大匹配和/或基于词典进行逆向最大匹配；Performing word segmentation on the conversation problem, thereby segmenting the terms of the conversation problem, the method of word segmentation includes performing forward maximum matching based on the dictionary and/or performing reverse maximum matching based on the dictionary;

对经所述分词处理得到的各词条进行词性解析，并对各词条的词性进行标注，所述词性解析通过经预设大规模语料库训练得到的词性标注模型实现；Performing part-of-speech analysis on each term obtained by the word segmentation process, and labeling the part-of-speech of each term, and the part-of-speech analysis is realized by a part-of-speech tagging model obtained through a preset large-scale corpus training;

第一计算模块120，用于为问答知识库4构建倒排索引，所述问答知识库包括预先整理的多个问题以及每个问题关联的一个或多个答案，根据所述文本特征信息，通过倒排索引查询的方式从问答知识库4中查询与所述会话问题相关的候选问题集合，并分别计算所述会话问题与所述候选问题集合中每个候选问题的文本相似度。a first calculation module 120, configured to build an inverted index for the Q&A knowledge base 4, the Q&A knowledge base includes a plurality of questions pre-arranged and one or more answers associated with each question, according to the text feature information, The manner of inverting the index query queries the candidate question set related to the session question from the question and answer knowledge base 4, and separately calculates the text similarity of the session question and each candidate question in the candidate question set.

具体地，第一计算模块120用于通过以下方式为问答知识库4构建倒排索引：Specifically, the first calculating module 120 is configured to construct an inverted index for the question and answer knowledge base 4 by:

第一计算模块120计算所述会话问题与所述候选问题集合中每个候选问题的文本相似度包括：The first calculation module 120 calculates a text similarity between the conversation problem and each candidate question in the candidate question set, including:

问题检索模块130，用于根据预设规则及所述文本相似度，判断候选问题集合中是否存在所述会话问题的近似问题，若所述候选问题集合中存在所述会话问题的近似问题，则在问答知识库中查找该近似问题的关联答案，将所述关联答案作为所述会话问题的目标答案输出。The problem retrieving module 130 is configured to determine, according to the preset rule and the text similarity, whether there is an approximation problem of the session problem in the candidate question set, and if there is an approximation problem of the session problem in the candidate question set, Finding the associated answer of the approximate question in the question and answer knowledge base, and outputting the associated answer as the target answer of the conversation question.

具体地，问题检索模块130判断是否存在与会话问题的文本相似度大于第二预设阈值的候选问题，若是，则从所述与会话问题的文本相似度大于第二预设阈值的候选问题中选择最大文本相似度对应的候选问题作为所述近似问题；若不存在与会话问题的文本相似度大于第二预设阈值的候选问题，则判定所述候选问题集合中不存在所述会话问题的近似问题。Specifically, the problem retrieval module 130 determines whether there is a candidate problem that the text similarity with the conversation problem is greater than the second preset threshold, and if so, from the candidate problem that the text similarity with the conversation problem is greater than the second preset threshold Selecting a candidate question corresponding to the maximum text similarity as the approximation question; if there is no candidate problem that the text similarity with the session problem is greater than the second preset threshold, determining that the session problem does not exist in the candidate question set Approximate problem.

第二计算模块140，用于若所述候选问题集合中不存在所述会话问题的近似问题，则根据所述文本特征信息，通过倒排索引查询的方式从问答知识库4中查询与所述会话问题相关的候选答案集合，并分别计算所述会话问题与所述候选答案集合中每个候选答案的主题相似度。The second calculating module 140 is configured to: if the approximation problem of the session problem does not exist in the candidate question set, query and query from the Q&A knowledge base 4 by using an inverted index query according to the text feature information A set of candidate answers related to the conversation question, and calculating a topic similarity of the conversation question and each candidate answer in the candidate answer set, respectively.

第二计算模块140计算所述会话问题与所述候选答案集合中每个候选答案的主题相似度包括：The second calculation module 140 calculates a topic similarity between the conversation question and each candidate answer in the candidate answer set, including:

采用线性判别分析模型分别提取所述会话问题和所述候选答案集合中每个候选答案的主题向量；Extracting the conversation problem and the topic vector of each candidate answer in the candidate answer set by using a linear discriminant analysis model;

答案检索模块150，用于根据预设规则及所述主题相似度，判断候选答案集合中是否存在所述会话问题的近似答案，若所述候选答案集合中存在所述会话问题的近似答案，则将所述近似答案作为所述会话问题的目标答案输出。The answer retrieval module 150 is configured to determine, according to the preset rule and the topic similarity, whether an approximate answer of the conversation problem exists in the candidate answer set, if an approximate answer of the conversation problem exists in the candidate answer set, The approximate answer is output as a target answer to the conversation question.

具体地，答案检索模块150判断是否存在与会话问题的主题相似度大于第三预设阈值的候选答案，若是，则从所述与会话问题的主题相似度大于第三预设阈值的候选答案中选择最大主题相似度对应的候选答案作为所述近似答案；若不存在与会话问题的主题相似度大于第三预设阈值的候选答案，则判定所述候选答案集合中不存在所述会话问题的近似答案。Specifically, the answer retrieval module 150 determines whether there is a candidate answer whose topic similarity to the conversation problem is greater than the third preset threshold, and if so, from the candidate answers that the topic similarity with the conversation question is greater than the third preset threshold Selecting a candidate answer corresponding to the maximum topic similarity as the approximate answer; if there is no candidate answer with a topic similarity of the conversation problem greater than the third preset threshold, determining that the session problem does not exist in the candidate answer set Approximate answer.

答案预测模块160，用于若候选答案集合中不存在所述会话问题的近似答案，则通过seq2seq模型对所述问答知识库4中的各个问题和答案进行编码和解码的迭代训练，从而构建序列预测模型，将所述会话问题输入所述序列预测模型生成应变答案，将所述应变答案作为所述会话问题的目标答案输出。答案预测模块160所述seq2seq模型由用于进行所述编码和解码迭代训练的前向长短记忆网络LSTM模型和后向LSTM模型，以及用于计算每次编码和解码的隐藏层信息权重的注意力机制构成。The answer prediction module 160 is configured to: if the approximate answer of the conversation problem does not exist in the candidate answer set, perform iterative training on encoding and decoding each question and answer in the Q&A knowledge base 4 by using the seq2seq model, thereby constructing a sequence And predicting a model, inputting the conversation question into the sequence prediction model to generate a strain answer, and outputting the strain answer as a target answer of the conversation question. The answer prediction module 160 describes the seq2seq model by a forward-long memory network LSTM model and a backward LSTM model for performing the encoding and decoding iterative training, and attention for calculating hidden layer information weights for each encoding and decoding. Mechanism composition.

在图1所示的电子装置1较佳实施例的运行环境示意图中，包含可读存储介质的存储器11中可以包括操作系统、聊天应答程序10及问答知识库4。处理器12执行存储器11中存储的聊天应答程序10时实现如下步骤：In the operating environment diagram of the preferred embodiment of the electronic device 1 shown in FIG. 1, the memory 11 including the readable storage medium may include an operating system, a chat response program 10, and a question and answer knowledge base 4. When the processor 12 executes the chat response program 10 stored in the memory 11, the following steps are implemented:

预处理步骤：获取客户输入的会话问题，对所述会话问题进行预处理，得到会话问题的文本特征信息，所述文本特征信息包括各词条在所述会话问题中的词性、位置和词类归属信息，所述词类归属包括归属于关键词或命名实体；a pre-processing step: obtaining a session question input by the client, pre-processing the session problem, and obtaining text feature information of the session problem, where the text feature information includes part of speech, location, and part-of-speech attribution of each term in the conversation problem Information, the word class attribution includes belonging to a keyword or a named entity;

第一计算步骤：为问答知识库构建倒排索引，所述问答知识库包括预先整理的多个问题以及每个问题关联的一个或多个答案，根据所述文本特征信息，通过倒排索引查询的方式从问答知识库中查询与所述会话问题相关的候选问题集合，并分别计算所述会话问题与所述候选问题集合中每个候选问题的文本相似度；a first calculating step: constructing an inverted index for the question and answer knowledge base, the question and answer knowledge base including a plurality of questions arranged in advance and one or more answers associated with each question, and querying through the inverted index according to the text feature information Means querying a candidate question set related to the conversation problem from a question and answer knowledge base, and separately calculating text similarity between the conversation problem and each candidate question in the candidate question set;

问题检索步骤：根据预设规则及所述文本相似度，判断候选问题集合中是否存在所述会话问题的近似问题，若所述候选问题集合中存在所述会话问题的近似问题，则在问答知识库中查找该近似问题的关联答案，将所述关联答案作为所述会话问题的目标答案输出；a problem retrieval step: determining, according to a preset rule and the text similarity, whether there is an approximation problem of the conversation problem in the candidate question set, and if there is an approximation problem of the conversation problem in the candidate question set, then the question and answer knowledge Finding an associated answer of the approximate problem in the library, and outputting the associated answer as a target answer of the conversation question;

第二计算步骤：若所述候选问题集合中不存在所述会话问题的近似问题，则根据所述文本特征信息，通过倒排索引查询的方式从问答知识库中查询与所述会话问题相关的候选答案集合，并分别计算所述会话问题与所述候选答案集合中每个候选答案的主题相似度；a second calculating step: if there is no approximation problem of the session problem in the candidate question set, querying, according to the text feature information, a query related to the session problem from the Q&A knowledge base by means of an inverted index query a set of candidate answers, and separately calculating a topic similarity of the conversation question and each candidate answer in the candidate answer set;

答案检索步骤：根据预设规则及所述主题相似度，判断候选答案集合中是否存在所述会话问题的近似答案，若所述候选答案集合中存在所述会话问题的近似答案，则将所述近似答案作为所述会话问题的目标答案输出；An answer retrieval step: determining, according to a preset rule and the topic similarity, whether an approximate answer of the conversation question exists in the candidate answer set, and if an approximate answer of the conversation problem exists in the candidate answer set, The approximate answer is output as the target answer to the conversation question;

答案预测步骤：若候选答案集合中不存在所述会话问题的近似答案，则通过seq2seq模型对所述问答知识库中的各个问题和答案进行编码和解码的迭代训练，从而构建序列预测模型，将所述会话问题输入所述序列预测模型生成应变答案，将所述应变答案作为所述会话问题的目标答案输出。An answer prediction step: if an approximate answer of the conversation problem does not exist in the candidate answer set, iteratively trains and decodes each question and answer in the question and answer knowledge base through the seq2seq model, thereby constructing a sequence prediction model, The conversation question is input to the sequence prediction model to generate a strain answer, and the strain answer is output as a target answer of the conversation question.

其中，所述对所述会话问题进行预处理包括：The pre-processing of the session problem includes:

所述分别计算所述会话问题与所述候选问题集合中每个候选问题的文本相似度包括：And calculating, respectively, the text similarity between the conversation problem and each candidate question in the candidate question set includes:

构建卷积神经网络，通过所述卷积神经网络对所述问答知识库中的所有问题语句进行样本训练，得到所述问答知识库中问题语句对应的卷积神经网络模型；Constructing a convolutional neural network, and performing sample training on all problem sentences in the question and answer knowledge base through the convolutional neural network, and obtaining a convolutional neural network model corresponding to the problem statement in the question and answer knowledge base;

分别计算所述会话问题对应的特征向量与所述候选问题集合中的每个候选问题对应的特征向量之间的余弦距离，从而得到所述会话问题与所述候选问题集合中每个候选问题的文本相似度；Calculating a cosine distance between the feature vector corresponding to the session problem and the feature vector corresponding to each candidate question in the candidate question set, respectively, to obtain the session problem and each candidate problem in the candidate question set Text similarity

所述根据预设规则及所述问题相似度，判断候选问题集合中是否存在所述会话问题的近似问题包括：The approximating the problem of determining whether the session problem exists in the candidate question set according to the preset rule and the problem similarity includes:

判断是否存在与会话问题的文本相似度大于第二预设阈值的候选问题，若是，则从所述与会话问题的文本相似度大于第二预设阈值的候选问题中选择最大文本相似度对应的候选问题作为所述近似问题；Determining whether there is a candidate problem that the text similarity with the conversation problem is greater than the second preset threshold, and if yes, selecting the maximum text similarity corresponding to the candidate problem that the text similarity with the conversation problem is greater than the second preset threshold Candidate questions as the approximation problem;

若不存在与会话问题的文本相似度大于第二预设阈值的候选问题，则判定所述候选问题集合中不存在所述会话问题的近似问题；If there is no candidate problem that the text similarity with the conversation problem is greater than the second preset threshold, determining that there is no approximation problem of the conversation problem in the candidate problem set;

所述根据预设规则及所述主题相似度，判断候选答案集合中是否存在所述会话问题的近似答案包括：The determining, according to the preset rule and the topic similarity, determining whether the candidate answer exists in the candidate answer set includes:

判断是否存在与会话问题的主题相似度大于第三预设阈值的候选答案，若是，则从所述与会话问题的主题相似度大于第三预设阈值的候选答案中选择最大主题相似度对应的候选答案作为所述近似答案；Determining whether there is a candidate answer whose topic similarity to the conversation problem is greater than a third preset threshold, and if yes, selecting a maximum topic similarity corresponding to the candidate answers having the topic similarity of the conversation problem being greater than the third preset threshold a candidate answer as the approximate answer;

若不存在与会话问题的主题相似度大于第三预设阈值的候选答案，则判定所述候选答案集合中不存在所述会话问题的近似答案。If there is no candidate answer with the topic similarity of the conversation problem being greater than the third preset threshold, it is determined that the approximate answer of the conversation question does not exist in the candidate answer set.

所述为问答知识库构建倒排索引包括：The constructing the inverted index for the Q&A knowledge base includes:

对问答知识库中的每个问题和答案分别进行分词、词性标注、关键词提取、关键词出现位置记录、分配ID号的操作，以及为每个问题和答案分词后得到的各词条分配ID号；Each question and answer in the Q&A knowledge base is divided into word segmentation, part-of-speech tagging, keyword extraction, keyword location location record, assignment ID number, and ID assigned to each term after each word and answer segmentation. number;

对问答知识库中每个问题和答案根据相应的ID号进行排序，对所述每个问题和答案分词后得到的各词条根据相应的ID号进行排序，并将具有同一词条ID的所有问题ID和答案ID放到该词条对应的倒排记录表中；Each question and answer in the Q&A knowledge base is sorted according to the corresponding ID number, and each term obtained after each word and answer word segmentation is sorted according to the corresponding ID number, and all the items with the same item ID are The question ID and the answer ID are placed in the inverted record table corresponding to the entry;

所述seq2seq模型由用于进行所述编码和解码迭代训练的前向长短记忆网络LSTM模型和后向LSTM模型，以及用于计算每次编码和解码的隐藏层信息权重的注意力机制构成。The seq2seq model consists of a forward long and short memory network LSTM model and a backward LSTM model for performing the encoding and decoding iterative training, and an attention mechanism for calculating hidden layer information weights for each encoding and decoding.

具体原理请参照上述图4关于聊天应答程序10的程序模块图及图3关于聊天应答方法较佳实施例的流程图的介绍。For specific principles, please refer to the program module diagram of the chat response program 10 in FIG. 4 and the flowchart of the preferred embodiment of the chat response method in FIG.

此外，本申请实施例还提出一种计算机可读存储介质，所述计算机可读存储介质可以是硬盘、多媒体卡、SD卡、闪存卡、SMC、只读存储器(ROM)、可擦除可编程只读存储器(EPROM)、便携式紧致盘只读存储器(CD-ROM)、USB存储器等等中的任意一种或者几种的任意组合。所述计算机可读存储介质中包括存储有问答知识库4及聊天应答程序10等，所述聊天应答程序10被所述处理器12执行时实现如下操作：In addition, the embodiment of the present application further provides a computer readable storage medium, which may be a hard disk, a multimedia card, an SD card, a flash memory card, an SMC, a read only memory (ROM), and an erasable programmable Any combination or combination of any one or more of read only memory (EPROM), portable compact disk read only memory (CD-ROM), USB memory, and the like. The computer readable storage medium includes a Q&A knowledge base 4, a chat response program 10, and the like. When the chat response program 10 is executed by the processor 12, the following operations are implemented:

本申请之计算机可读存储介质的具体实施方式与上述聊天应答方法以及电子装置1的具体实施方式大致相同，在此不再赘述。The specific implementation of the computer readable storage medium of the present application is substantially the same as the above-described chat response method and the specific embodiment of the electronic device 1, and details are not described herein again.

需要说明的是，在本文中，术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含，从而使得包括一系列要素的过程、装置、物品或者方法不仅包括那些要素，而且还包括没有明确列出的其他要素，或者是还包括为这种过程、装置、物品或者方法所固有的要素。在没有更多限制的情况下，由语句“包括一个……”限定的要素，并不排除在包括该要素的过程、装置、物品或者方法中还存在另外的相同要素。It is to be understood that the term "comprises", "comprising", or any other variants thereof, is intended to encompass a non-exclusive inclusion, such that a process, apparatus, article, or method that comprises a series of elements includes those elements. It also includes other elements not explicitly listed, or elements that are inherent to such a process, device, item, or method. An element that is defined by the phrase "comprising a ..." does not exclude the presence of additional equivalent elements in the process, the device, the item, or the method that comprises the element.

通过以上的实施方式的描述，本领域的技术人员可以清楚地了解到上述实施例方法可借助软件加必需的通用硬件平台的方式来实现，当然也可以通过硬件，但很多情况下前者是更佳的实施方式。基于这样的理解，本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来，该计算机软件产品存储在如上所述的一个存储介质中，包括若干指令用以使得一台终端设备(可以是手机，计算机，服务器，或者网络设备等)执行本申请各个实施例所述的方法。Through the description of the above embodiments, those skilled in the art can clearly understand that the foregoing embodiment method can be implemented by means of software plus a necessary general hardware platform, and of course, can also be through hardware, but in many cases, the former is better. Implementation. Based on such understanding, portions of the technical solution of the present application that contribute substantially or to the prior art may be embodied in the form of a software product stored in a storage medium as described above, including a number of instructions. To enable a terminal device (which may be a mobile phone, a computer, a server, or a network device, etc.) to perform the methods described in the various embodiments of the present application.

以上仅为本申请的优选实施例，并非因此限制本申请的专利范围，凡是利用本申请说明书及附图内容所作的等效结构或等效流程变换，或直接或间接运用在其他相关的技术领域，均同理包括在本申请的专利保护范围内。The above is only a preferred embodiment of the present application, and is not intended to limit the scope of the patent application, and the equivalent structure or equivalent process transformations made by the specification and the drawings of the present application, or directly or indirectly applied to other related technical fields. The same is included in the scope of patent protection of this application.

Claims

A chat response method, the method comprising:

a pre-processing step: obtaining a session question input by the client, pre-processing the session problem, and obtaining text feature information of the session problem, where the text feature information includes part of speech, location, and part-of-speech attribution of each term in the conversation problem Information, the word class attribution includes belonging to a keyword or a named entity;

a first calculating step: constructing an inverted index for the question and answer knowledge base, the question and answer knowledge base including a plurality of questions arranged in advance and one or more answers associated with each question, and querying through the inverted index according to the text feature information Means querying a candidate question set related to the conversation problem from a question and answer knowledge base, and separately calculating text similarity between the conversation problem and each candidate question in the candidate question set;

a problem retrieval step: determining, according to a preset rule and the text similarity, whether there is an approximation problem of the conversation problem in the candidate question set, and if there is an approximation problem of the conversation problem in the candidate question set, then the question and answer knowledge Finding an associated answer of the approximate problem in the library, and outputting the associated answer as a target answer of the conversation question;

a second calculating step: if there is no approximation problem of the session problem in the candidate question set, querying, according to the text feature information, a query related to the session problem from the Q&A knowledge base by means of an inverted index query a set of candidate answers, and separately calculating a topic similarity of the conversation question and each candidate answer in the candidate answer set;

An answer retrieval step: determining, according to a preset rule and the topic similarity, whether an approximate answer of the conversation question exists in the candidate answer set, and if an approximate answer of the conversation problem exists in the candidate answer set, The approximate answer is output as the target answer to the conversation question;

An answer prediction step: if an approximate answer of the conversation problem does not exist in the candidate answer set, iteratively trains and decodes each question and answer in the question and answer knowledge base through the seq2seq model, thereby constructing a sequence prediction model, The conversation question is input to the sequence prediction model to generate a strain answer, and the strain answer is output as a target answer of the conversation question.

The chat response method according to claim 1, wherein the preprocessing the session problem comprises:

Performing word segmentation on the conversation problem, thereby segmenting the terms of the conversation problem, the method of word segmentation includes performing forward maximum matching based on the dictionary and/or performing reverse maximum matching based on the dictionary;

Performing part-of-speech analysis on each term obtained by the word segmentation process, and labeling the part-of-speech of each term, and the part-of-speech analysis is realized by a part-of-speech tagging model obtained through a preset large-scale corpus training;

Named entity identification is performed on the session problem, thereby identifying a named entity having a specific meaning, the named entity includes a person name, a place name, an organization, a proper noun, and the method for identifying the named entity includes a dictionary-based and rule-based method And methods based on statistical learning;

Extracting a keyword from the conversation question according to the term and the named entity, the keyword being a phrase whose number of characters is greater than a first preset threshold, or a named entity existing in a preset dictionary The preset dictionary includes a business scenario-specific dictionary.

The chat response method according to claim 1, wherein the calculating the text similarity of each of the session problem and each candidate question in the candidate question set separately comprises:

Constructing a convolutional neural network, and performing sample training on all problem sentences in the question and answer knowledge base through the convolutional neural network, and obtaining a convolutional neural network model corresponding to the problem statement in the question and answer knowledge base;

Entering each of the conversation problem and the candidate question set into the convolutional neural network model, respectively, and obtaining the conversation problem and the candidate by convolution kernel convolution of the convolutional neural network model a feature vector corresponding to each candidate question in the question set;

Calculating a cosine distance between the feature vector corresponding to the session problem and the feature vector corresponding to each candidate question in the candidate question set, respectively, to obtain the session problem and each candidate problem in the candidate question set Text similarity

The calculating the similarity degree between the conversation problem and each candidate answer in the candidate answer set respectively includes:

Extracting the conversation problem and the topic vector of each candidate answer in the candidate answer set by using a linear discriminant analysis model;

Calculating a cosine distance between a topic vector of the conversation question and a topic vector of each candidate answer in the candidate answer set, respectively, to obtain a topic similarity between the conversation question and each candidate answer in the candidate answer set .

The chat response method according to claim 1, wherein the approximating the problem of whether the session problem exists in the candidate question set according to the preset rule and the problem similarity includes:

Determining whether there is a candidate problem that the text similarity with the conversation problem is greater than the second preset threshold, and if yes, selecting the maximum text similarity corresponding to the candidate problem that the text similarity with the conversation problem is greater than the second preset threshold Candidate questions as the approximation problem;

If there is no candidate problem that the text similarity with the conversation problem is greater than the second preset threshold, determining that there is no approximation problem of the conversation problem in the candidate problem set;

The determining, according to the preset rule and the topic similarity, determining whether the candidate answer exists in the candidate answer set includes:

Determining whether there is a candidate answer whose topic similarity to the conversation problem is greater than a third preset threshold, and if yes, selecting a maximum topic similarity corresponding to the candidate answers having the topic similarity of the conversation problem being greater than the third preset threshold a candidate answer as the approximate answer;

If there is no candidate answer with the topic similarity of the conversation problem being greater than the third preset threshold, it is determined that the approximate answer of the conversation question does not exist in the candidate answer set.

The chat response method according to claim 1, wherein the constructing the inverted index for the question and answer knowledge base comprises:

Each question and answer in the Q&A knowledge base is divided into word segmentation, part-of-speech tagging, keyword extraction, keyword location location record, assignment ID number, and ID assigned to each term after each word and answer segmentation. number;

Each question and answer in the Q&A knowledge base is sorted according to the corresponding ID number, and each term obtained after each word and answer word segmentation is sorted according to the corresponding ID number, and all the items with the same item ID are The question ID and the answer ID are placed in the inverted record table corresponding to the entry;

Combine all inverted record tables into the final inverted index.

The chat response method according to claim 1, wherein said seq2seq model is used by a forward-long memory network LSTM model and a backward LSTM model for performing said encoding and decoding iterative training, and for calculating each time The attention and mechanism of the hidden layer information weighting of encoding and decoding.

A chat response method according to any one of claims 2 to 5, wherein said seq2seq model is composed of a forward long and short memory network LSTM model and a backward LSTM model for performing said coding and decoding iterative training, and A attention mechanism for calculating the weight of hidden layer information for each encoding and decoding.

An electronic device includes a memory and a processor, wherein the memory includes a chat response program, and the chat response program is executed by the processor to implement the following steps:

The electronic device of claim 8 wherein said pre-processing said session problem comprises:

The electronic device according to claim 9, wherein the calculating the text similarity of each of the conversation problem and each candidate question in the candidate question set respectively comprises:

The electronic device according to claim 8, wherein the approximating the problem of whether the session problem exists in the candidate question set according to the preset rule and the problem similarity includes:

According to the preset rule and the topic similarity, determining whether the candidate answer has an approximate answer of the conversation problem includes:

The electronic device according to claim 8, wherein the constructing the inverted index for the Q&A knowledge base comprises:

Combine all inverted record tables into the final inverted index.

The electronic device of claim 8 wherein said seq2seq model is comprised of a forward long and short memory network LSTM model and a backward LSTM model for performing said encoding and decoding iterative training, and for calculating each encoding And the attention mechanism of the decoded hidden layer information weights.

The electronic device according to any one of claims 9 to 12, wherein the seq2seq model is used by a forward-long memory network LSTM model and a backward LSTM model for performing the encoding and decoding iterative training, and The attention mechanism is constructed to calculate the hidden layer information weights for each encoding and decoding.

A computer readable storage medium, comprising: a chat response program, wherein when the chat response program is executed by a processor, the chat response program is executed by the processor to implement the following steps :

The computer readable storage medium of claim 15 wherein said preprocessing said session problem comprises:

The computer readable storage medium of claim 16, wherein the calculating the text similarity of the session question and each candidate question in the candidate question set respectively comprises:

The computer readable storage medium according to claim 15, wherein the approximating the problem of whether the session problem exists in the candidate question set according to the preset rule and the problem similarity includes:

The computer readable storage medium of claim 15, wherein the constructing the inverted index for the question and answer knowledge base comprises:

Combine all inverted record tables into the final inverted index.

The computer readable storage medium of claim 15 wherein said seq2seq model is comprised of a forward long and short memory network LSTM model and a backward LSTM model for performing said encoding and decoding iterative training, and for computing Each time the encoding and decoding of the hidden layer information weights attention mechanism is constructed.