[go: up one dir, main page]

CN111813907A - A Question Intention Recognition Method in Natural Language Question Answering Technology - Google Patents

A Question Intention Recognition Method in Natural Language Question Answering Technology Download PDF

Info

Publication number
CN111813907A
CN111813907A CN202010557964.1A CN202010557964A CN111813907A CN 111813907 A CN111813907 A CN 111813907A CN 202010557964 A CN202010557964 A CN 202010557964A CN 111813907 A CN111813907 A CN 111813907A
Authority
CN
China
Prior art keywords
question
model
input
layer
paragraph
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010557964.1A
Other languages
Chinese (zh)
Inventor
李伟
卢心陶
郭佳月
张奎明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University of Technology ZJUT
Original Assignee
Zhejiang University of Technology ZJUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University of Technology ZJUT filed Critical Zhejiang University of Technology ZJUT
Priority to CN202010557964.1A priority Critical patent/CN111813907A/en
Publication of CN111813907A publication Critical patent/CN111813907A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/211Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/253Grammatical analysis; Style critique
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Evolutionary Biology (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Software Systems (AREA)
  • Human Computer Interaction (AREA)
  • Databases & Information Systems (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Machine Translation (AREA)

Abstract

本发明专利提出了一种自然语言问答技术中的问句意图识别方法,其方法包括:一种端到端的可执行修正问题生成模型,该模型包括输入层、编码层、匹配层与解码层四部分,通过结合句子生成、机器阅读理解以及复制机制生成修订后的问题;以及一种文章压缩自动生成训练数据的方法。本发明方法可以在现有问答系统输入问题太短而无法识别答案时提高机器阅读回答准确性,同时,即使在输入问题足够长的情况下通过给出修正问题候选集合,能够在候选集合中选择与提问者问题意图相匹配的问题,从而在问答系统中获得较高精度的答案。

Figure 202010557964

The patent of the present invention proposes a question sentence intent recognition method in natural language question answering technology, the method includes: an end-to-end executable correction question generation model, the model includes four input layer, encoding layer, matching layer and decoding layer. section, generating revised questions by combining sentence generation, machine reading comprehension, and replication mechanisms; and a method for automatically generating training data for article compression. The method of the invention can improve the answering accuracy of machine reading when the input question of the existing question answering system is too short to recognize the answer, and at the same time, even if the input question is long enough, the candidate set of correction questions can be selected from the candidate set. Questions that match the questioner's intent to obtain high-precision answers in question answering systems.

Figure 202010557964

Description

一种自然语言问答技术中的问句意图识别方法A Question Intention Recognition Method in Natural Language Question Answering Technology

技术领域technical field

本发明公开了一种自然语言问答技术中的问句意图识别方法,涉及自然语言处理、问答系统与人工智能领域。The invention discloses a question sentence intent recognition method in natural language question answering technology, which relates to the fields of natural language processing, question answering system and artificial intelligence.

背景技术Background technique

随着移动互联网的发展,诸如智能手机、语音机器人以及智能音箱等语音设备不断普及,对基于人工智能回答用户提问的问答技术的需求日益增长。近年来,基于机器阅读理解的问题解答引起了特别的关注,机器阅读理解是一种问答技术,使得系统能够阅读与理解自然语言文档或段落,并从段落中查找与提取答案的信息,从而能够进行问题回答。With the development of the mobile Internet, voice devices such as smartphones, voice robots, and smart speakers continue to popularize, and there is an increasing demand for question-and-answer technology based on artificial intelligence to answer user questions. In recent years, question answering based on machine reading comprehension has attracted special attention. Machine reading comprehension is a question-and-answer technique that enables systems to read and understand natural language documents or paragraphs, and to find and extract information for answers from paragraphs, so as to be able to Answer questions.

虽然目前基于机器阅读理解的问题解答能够达到较高回答精度,但在实际使用时仍然存在问题。在实际的问答系统中,当提问者输入了意图模棱两可的问题或者是系统所需要信息不足的简短问题时,就很难达到较高的回答准确性。例如在实际场景,机器客服与提问者之间的交互,一个不熟悉相关业务知识的提问者有时会问机器客服一些意图不明确的问题。因此将带有推测与识别确认提问者意图的机制引入当前基于机器阅读的问答系统,实现对模糊问题的精准回答十分必要。Although the current question answering based on machine reading comprehension can achieve high answering accuracy, there are still problems in practical use. In the actual question answering system, when the questioner inputs ambiguous questions or short questions with insufficient information required by the system, it is difficult to achieve high answer accuracy. For example, in the actual scene, the interaction between the customer service machine and the questioner, a questioner who is not familiar with the relevant business knowledge sometimes asks the customer service machine some questions with unclear intentions. Therefore, it is necessary to introduce a mechanism with speculation and recognition to confirm the questioner's intention into the current question answering system based on machine reading to achieve accurate answers to vague questions.

发明内容SUMMARY OF THE INVENTION

为了克服现有机器阅读理解问答技术存在的上述不足,本发明的目的在于提供一种自然语言问答技术中的问句意图识别方法。In order to overcome the above-mentioned shortcomings of the existing machine reading comprehension question answering technology, the purpose of the present invention is to provide a question sentence intent recognition method in the natural language question answering technology.

为实现上述目的,本发明采用如下的技术方案:For achieving the above object, the present invention adopts the following technical scheme:

一种自然语言问答技术中的问句意图识别方法,所述方法采用一种端到端的可执行修正问题生成模型,该模型通过结合句子生成模型、机器阅读模型以及复制机制,所述方法包括以下步骤:A question sentence intent recognition method in natural language question answering technology, the method adopts an end-to-end executable correction question generation model, the model combines sentence generation model, machine reading model and replication mechanism, the method includes the following step:

步骤1,当输入的问题与其相对应的提问段落被输入到生成模型中时,该模型会使用机器阅读模型读取并提取输入的问题与段落的内容中的相关信息;Step 1, when the input question and its corresponding question paragraph are input into the generative model, the model will use the machine reading model to read and extract the relevant information in the content of the input question and paragraph;

步骤2,该模型基于注意力观察监督机制,提取信息的同时由句子生成模型生成修正的问题集合;Step 2, the model is based on the attention observation and supervision mechanism, and the sentence generation model generates a revised set of questions while extracting information;

步骤3,接着,复制机制会将输入的问题以及段落中的重要单词与表达方式复制到修正后的问题集合当中,从而根据段落的内容生成修订后的最终问题;Step 3, then, the copy mechanism will copy the input question and the important words and expressions in the paragraph into the revised set of questions, so as to generate the revised final question according to the content of the paragraph;

所述修正问题生成模型结构包括如下四部分:The structure of the correction problem generation model includes the following four parts:

输入层:将输入的问题句子与提问段落表示为单热编码的序列矢量模型;Input layer: The input question sentence and question paragraph are represented as a sequence vector model of one-hot encoding;

编码层:使用单词嵌入模型将输入层的段落标记序列与问题句子转换为连续值的向量,并同时考虑到段落与问题句子的上下文关系来创建向量序列;Encoding layer: use the word embedding model to convert the sequence of paragraph tokens and question sentences of the input layer into vectors of continuous values, and at the same time consider the contextual relationship between paragraphs and question sentences to create a vector sequence;

匹配层:捕获段落中的单词与输入问题句子之间的相互关系,对段落的向量序列进行建模;Matching layer: captures the relationship between the words in the paragraph and the input question sentence, and models the vector sequence of the paragraph;

解码层:使用具有注意力机制与两个复制机制的GRU-RNN门控循环单元递归神经网络,生成构成修正问题的单词标记序列。Decoding layer: Generates a sequence of word tokens that constitute the correction problem using a GRU-RNN gated recurrent neural network with an attention mechanism and two replication mechanisms.

进一步,所述解码层当中,GRU-RNN门控循环单元递归神经网络结构如下:Further, in the decoding layer, the GRU-RNN gated recurrent unit recurrent neural network structure is as follows:

激活函数:GRU-RNN中选择PReLu作为激活函数,该函数作为修正线性单元,可以将输入向量当中的所有负值的斜率根据具体训练数据作出改变,其余值不变,增加区分对比度,提高模型学习效率;Activation function: PReLu is selected as the activation function in GRU-RNN. This function is used as a modified linear unit, which can change the slope of all negative values in the input vector according to the specific training data, and the remaining values remain unchanged. Increase the contrast and improve model learning efficiency;

神经元选择:门控循环单元GRU是RNN当中的一种神经元,在隐藏层中通过两个门结构来控制信息的留存,可以筛选出重要与不重要的信息;Neuron selection: Gated recurrent unit GRU is a kind of neuron in RNN. In the hidden layer, two gate structures are used to control the retention of information, which can filter out important and unimportant information;

其它参数:模型采用一层带有注意力机制的GRU与softmax层构成,自适应梯度下降法,学习率为0.0007。Other parameters: The model is composed of a layer of GRU with an attention mechanism and a softmax layer, adaptive gradient descent, and a learning rate of 0.0007.

再进一步,采用一种段落句子压缩自动生成训练数据的方法,通过句子压缩,从机器阅读语料问题库中产生意图模糊的简短提问句,从而自动修正提问模型。Furthermore, a method of automatically generating training data by paragraph sentence compression is adopted. Through sentence compression, short question sentences with ambiguous intentions are generated from the question bank of machine reading corpus, so as to automatically correct the questioning model.

本发明的有益效果为:将带有推测提问者意图的机制引入当前基于机器阅读的问答系统,可以在现有问答系统输入问题太短而无法识别答案时提高机器阅读回答准确性,同时,即使在输入问题足够长的情况下通过给出修正问题候选集合,能够在候选集合中选择与提问者问句意图相匹配的问题,从而在问答系统中获得较高精度的答案。The beneficial effects of the present invention are: introducing the mechanism with the intention of the questioner into the current question answering system based on machine reading, it can improve the answering accuracy of machine reading when the input question of the existing question answering system is too short to recognize the answer, and at the same time, even if When the input question is long enough, by giving a candidate set of correction questions, the question that matches the intention of the questioner's question can be selected from the candidate set, so that a higher-precision answer can be obtained in the question answering system.

附图说明Description of drawings

图1为融入修正问题生成模型的机器阅读问答系统结构示意图;Figure 1 is a schematic diagram of the structure of a machine reading question answering system incorporating a corrected question generation model;

图2为本发明修正问题生成模型结构示意图;2 is a schematic structural diagram of a problem-correcting generation model of the present invention;

图3为残差公路网络核心结构图;Figure 3 is the core structure diagram of the residual highway network;

图4为GRU-RNN单层结构展开图。Figure 4 is an expanded view of the single-layer structure of GRU-RNN.

图5为本发明所属GRU单元基本结构图FIG. 5 is a basic structural diagram of a GRU unit to which the present invention belongs

具体实施方式Detailed ways

应指明,以下说明都是示例性的,旨在对本发明申请提供进一步的说明。除另有指明,本文使用的所有科学与技术术语具有与本申请所属技术领域的普通技术人员通常理解的相同含义。It should be noted that the following descriptions are all exemplary and are intended to provide further explanation for the present application. Unless otherwise specified, all scientific and technical terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs.

另外需要注意的是,这里所使用的术语仅是为了描述具体实施方式,而非意图限制根据本申请的示例性实施方案,下面结合附图与实施例对本发明作进一步说明。In addition, it should be noted that the terms used here are only used to describe the specific embodiments, but not intended to limit the exemplary embodiments according to the present application. The present invention will be further described below with reference to the accompanying drawings and examples.

参照图1~图4,一种自然语言问答技术中的问句意图识别方法,所述方法采用一种端到端的可执行修正问题生成模型,该模型通过结合段落生成模型、机器阅读模型以及复制机制,该模型的任务是将输入问题与相应段落作为输入并输出相关意图修正后的问题,提问者可以从多个修正问题候选集合中选择符合自身正确意图的修正问题,所述方法包括以下步骤:Referring to Figures 1 to 4, a method for recognizing the intent of a question sentence in a natural language question answering technology, the method adopts an end-to-end executable correction question generation model, which combines a paragraph generation model, a machine reading model and a copy The task of the model is to take the input question and the corresponding paragraph as input and output the corrected question with the relevant intention. The questioner can select the corrected question that conforms to his correct intention from a plurality of corrected question candidate sets. The method includes the following steps :

步骤1,如图1所示当输入的问题与其相对应的提问段落被输入到生成模型中时,该模型会使用机器阅读模型读取并提取输入的问题与段落的内容中的相关答案信息;Step 1, as shown in Figure 1, when the input question and its corresponding question paragraph are input into the generation model, the model will use the machine reading model to read and extract the relevant answer information in the content of the input question and paragraph;

该模型会将输入问题的单词标记序列与段落的单词标记序列作为输入,并从段落中估计答案可能存在的起点与终点的位置,其输出为带有从段落中提取估计答案范围的单词标记序列。其中输入的问题句子会首先通过形态与语法分析进行分词与词性标注,并表示为单热向量编码序列q={q1,q2,…qj},单热向量为V维向量,其中与单词词典的索引相对应的维度值为1,其它维度值为0;段落处理与输入问题类似,段落中单词数量较多,由一个单热向量x={x1,x2,…,xT},段落中应包含将要回答问题的相关信息。The model takes the word token sequence of the input question and the word token sequence of the paragraph as input, and estimates the possible starting and ending positions of the answer from the paragraph. The output is a word token sequence with the estimated answer range extracted from the paragraph. . The input question sentence is firstly subjected to word segmentation and part-of-speech tagging through morphological and grammatical analysis, and is expressed as a one-hot vector encoding sequence q={q 1 , q 2 ,...q j }, and the one-hot vector is a V-dimensional vector, which is the same as The dimension value corresponding to the index of the word dictionary is 1, and the other dimension values are 0; the paragraph processing is similar to the input problem, the number of words in the paragraph is large, and a one-hot vector x={x 1 ,x 2 ,...,x T }, the paragraph should contain relevant information that will answer the question.

步骤2,该模型基于注意力观察监督机制,提取信息的同时由段落生成模型生成修正的问题集合;Step 2, the model is based on the attention observation supervision mechanism, while extracting information, the paragraph generation model generates a revised set of questions;

步骤3,接着,复制机制会将输入的问题以及段落中的重要单词与表达方式复制到修正后的问题集合当中,从而根据段落的内容生成修订后的最终问题。修订后的最终问题是一个句子,其中输入问题的内容具体化为y={y1,y2,…,yK}。In step 3, the copying mechanism will then copy the input question and the important words and expressions in the paragraph into the revised set of questions, thereby generating a revised final question based on the content of the paragraph. The revised final question is a sentence in which the content of the input question is embodied as y={y 1 ,y 2 ,...,y K }.

进一步地,如图2,所述修正问题生成模型结构具体包括:Further, as shown in Fig. 2, the model structure for generating the correction problem specifically includes:

输入层:将输入的问题句子与提问段落表示为单热编码的序列矢量模型;Input layer: The input question sentence and question paragraph are represented as a sequence vector model of one-hot encoding;

编码层:使用单词嵌入模型将输入层的段落标记序列与问题句子转换为连续值的向量,并同时考虑到段落与问题句子的上下文关系来创建向量序列;Encoding layer: use the word embedding model to convert the sequence of paragraph tokens and question sentences of the input layer into vectors of continuous values, and at the same time consider the contextual relationship between paragraphs and question sentences to create a vector sequence;

输入层的x与q成为编码层的输入,表示为V维的单热编码向量,首先将单热编码向量转换为以v维表示的连续值向量。为了转换单词,使用训练后的向量变换矩阵

Figure BDA0002545170210000041
转换后的向量最终通过两层残差公路网络分别获得段落与问题向量序列,分别表示为
Figure BDA0002545170210000042
Figure BDA0002545170210000043
其中残差公路网络如图3所示,在刚开始输入向量x时,按照常规的神经网络进行权值相加,然后通过激活函数ReLu,在权值叠加后再把输入的信息与此时的输出叠加后再通过激活函数。The x and q of the input layer become the input of the encoding layer, which is represented as a V-dimensional one-hot encoded vector. First, the one-hot encoded vector is converted into a v-dimensional continuous-valued vector. To transform words, use the trained vector transformation matrix
Figure BDA0002545170210000041
The converted vector finally obtains the sequence of paragraph and question vectors through the two-layer residual highway network, which are expressed as
Figure BDA0002545170210000042
and
Figure BDA0002545170210000043
Among them, the residual highway network is shown in Figure 3. When the vector x is input at the beginning, the weights are added according to the conventional neural network, and then the activation function ReLu is used to superimpose the weights and then add the input information to the current value. The outputs are superimposed and then passed through the activation function.

接着将转换为连续值向量序列的段落与问题输入到GRU-RNN网络中,本发明当中,使用一层GRU单元,其中GRU的修正线性单元值会在训练过程中随段落与输入问题动态改变,另外,为了防止输入向量在训练过程中影响变小,使用双向GRU,获得段落的上下文矩阵

Figure BDA0002545170210000044
与输入问题的上下文矩阵
Figure BDA0002545170210000045
其中d是隐藏层维数。Then, the paragraphs and questions converted into continuous-valued vector sequences are input into the GRU-RNN network. In the present invention, a layer of GRU units is used, and the corrected linear unit value of the GRU will dynamically change with the paragraphs and input questions during the training process. In addition, in order to prevent the input vector from becoming less influential during the training process, a bidirectional GRU is used to obtain the context matrix of the paragraph
Figure BDA0002545170210000044
Context matrix with input questions
Figure BDA0002545170210000045
where d is the hidden layer dimension.

编码层中的RNN状态还用于确定解码层中RNN的初始状态,解码层的初始状态计算公式如下:The RNN state in the encoding layer is also used to determine the initial state of the RNN in the decoding layer. The initial state of the decoding layer is calculated as follows:

Figure BDA0002545170210000046
Figure BDA0002545170210000046

此计算公式将注意力放在输入问题的上下文矩阵上,其中

Figure BDA0002545170210000047
其中
Figure BDA0002545170210000048
表示输入问题上下文矩阵的最终状态。This calculation formula focuses on the context matrix of the input question, where
Figure BDA0002545170210000047
in
Figure BDA0002545170210000048
Represents the final state of the input question context matrix.

匹配层:捕获段落中的单词与输入问题句子之间的相互关系,对段落的向量序列进行建模;Matching layer: captures the relationship between the words in the paragraph and the input question sentence, and models the vector sequence of the paragraph;

在匹配层中,将使用编码层中的段落与输入问题进行匹配,并从段落中找到与输入问题相关的区域,在匹配层使用双向注意力流(BiDAF)模型来获取段落与输入问题之间的关系,双向注意力流会从段落与输入问题两个方向计算注意力值,最后根据输入问题为段落创建上下文矩阵。In the matching layer, the passages in the encoding layer are matched with the input question, and the regions related to the input question are found from the passages, and the bidirectional attention flow (BiDAF) model is used in the matching layer to obtain the relationship between the passage and the input question. The relationship between the two-way attention flow will calculate the attention value from both the paragraph and the input question, and finally create a context matrix for the paragraph according to the input question.

在BiDAF中,首先使用用于传递的上下文矩阵H和用于输入问题的上下文矩阵U,根据下式计算出相似度矩阵

Figure BDA0002545170210000049
In BiDAF, first use the context matrix H for transfer and the context matrix U for input question, the similarity matrix is calculated according to the following formula
Figure BDA0002545170210000049

Figure BDA00025451702100000410
Figure BDA00025451702100000410

其中,ωs为学习参数,[;]表示行向量级联运算。接下来,基于相似度矩阵,在两个方向上计算从段落到输入问题的双向注意力值。从段落到输入问题的注意力值当中,针对段落当中的单词,用输入问题中的单词计算加权向量,段落中第t个单词的注意力向量

Figure BDA0002545170210000051
的计算方式为:Among them, ω s is the learning parameter, [;] represents the row-vector cascade operation. Next, based on the similarity matrix, the bidirectional attention values from the paragraph to the input question are computed in both directions. From the attention value of the paragraph to the input question, for the words in the paragraph, use the words in the input question to calculate the weighted vector, the attention vector of the t-th word in the paragraph
Figure BDA0002545170210000051
is calculated as:

at=softmaxj(St)a t =softmax j (S t )

Figure BDA0002545170210000052
Figure BDA0002545170210000052

从输入问题到段落的过程中,矩阵

Figure BDA0002545170210000053
中的向量
Figure BDA0002545170210000054
根据段落的序列长度T与输入问题中与之紧密相关的单词加权计算得出:During the process from entering questions to paragraphs, the matrix
Figure BDA0002545170210000053
vector in
Figure BDA0002545170210000054
Calculated by weighting the sequence length T of the paragraph with the words in the input question that are closely related to it:

Figure BDA0002545170210000055
Figure BDA0002545170210000055

接着计算段落中每个单词与输入问题间的双向注意力向量;Then calculate the bidirectional attention vector between each word in the paragraph and the input question;

Figure BDA0002545170210000056
Figure BDA0002545170210000056

最后将双向注意力向量G输入到第一层双向GRU当中,并且获得上下文矩阵M。Finally, the bidirectional attention vector G is input into the first layer of bidirectional GRU, and the context matrix M is obtained.

解码层:使用具有注意力机制与两个复制机制的GRU-RNN门控循环单元递归神经网络,生成构成修正问题的单词标记序列。Decoding layer: Generates a sequence of word tokens that constitute the correction problem using a GRU-RNN gated recurrent neural network with an attention mechanism and two replication mechanisms.

该层中,会根据编码层与匹配层中的信息生成修正后的问题。最终整合为一个网络,该网络结合了基于RNN语言生成模型和用于段落和输入问题的复制机制。In this layer, corrected problems are generated based on the information in the encoding layer and the matching layer. The final integration is a network that combines an RNN-based language generation model with a replication mechanism for passage and input questions.

进一步地,所述RNN语言生成模型,包括:Further, the RNN language generation model includes:

生成模型由具有注意力机制的一层GRU和softmax层组成,当输入修订单词序列y={y1,…,ys}时,由生成模型输出的修正问题的下一个单词概率分布Pg可表示为:The generative model consists of a layer of GRU and a softmax layer with an attention mechanism. When the revised word sequence y={y 1 ,...,y s } is input, the next word probability distribution P g of the revised question output by the generative model can be obtained. Expressed as:

Figure BDA0002545170210000057
Figure BDA0002545170210000057

其中Wg与bg表示学习参数,Vg表示由生成模型生成的单词数,并且V>Vg。通过生成模型生成的单词只是高频单词,低频单词通过复制机制提取,这样可以减小生成模型的大小并提高学习速度,hs+1是GRU的隐藏层,并由

Figure BDA0002545170210000058
Figure BDA0002545170210000059
进行更新。where W g and b g represent learning parameters, V g represents the number of words generated by the generative model, and V > V g . The words generated by the generative model are only high-frequency words, and the low-frequency words are extracted by the replication mechanism, which can reduce the size of the generative model and improve the learning speed. h s+1 is the hidden layer of the GRU, and is composed of
Figure BDA0002545170210000058
Figure BDA0002545170210000059
to update.

根据生成模型输出的先前单词ys,按照如下方法确定GRU的输入向量

Figure BDA00025451702100000510
和编码层一样,经过一次单热编码向量化之后,词嵌入层与两个残差公路网络层将ys转换为连续向量
Figure BDA0002545170210000061
接着使用es与前一个GRU隐藏状态hs计算向量
Figure BDA0002545170210000062
用于注意力值的计算。According to the previous word y s output by the generative model, the input vector of the GRU is determined as follows
Figure BDA00025451702100000510
Like the encoding layer, after a one-hot encoding vectorization, the word embedding layer and the two residual highway network layers convert y s into continuous vectors
Figure BDA0002545170210000061
Then use es and the previous GRU hidden state h s to calculate the vector
Figure BDA0002545170210000062
For the calculation of attention value.

接着计算段落的注意力值αst与输入问题的注意力值βsjThen calculate the attention value α st of the paragraph and the attention value β sj of the input question:

Figure BDA0002545170210000063
Figure BDA0002545170210000063

Figure BDA0002545170210000064
Figure BDA0002545170210000064

最终,GRU的输入值

Figure BDA0002545170210000065
可以表达如下:Finally, the input value of the GRU
Figure BDA0002545170210000065
It can be expressed as follows:

Figure BDA0002545170210000066
Figure BDA0002545170210000066

其中,

Figure BDA0002545170210000067
Figure BDA0002545170210000068
为学习参数,f为PReLu非线性激活函数。in,
Figure BDA0002545170210000067
and
Figure BDA0002545170210000068
is the learning parameter, f is the PReLu nonlinear activation function.

进一步地,所述复制机制,包括:Further, the replication mechanism includes:

该机制是一种在句子和对话的上下文中,通过在句子生成过程中复制输入词的一部分来生成具有一致性句子的机制,目的在于根据输入的问题或段落的内容生成修正的问题,其中输入的问题和段落均被用作复制源,通过复制机制计算获得单词的生成概率分布:This mechanism is a mechanism for generating consistent sentences in the context of sentences and dialogues by duplicating part of the input word during sentence generation, with the aim of generating revised questions based on the content of the input question or paragraph, where the input The questions and passages of are used as copy sources, and the generation probability distribution of words is obtained by the copy mechanism calculation:

Pcp(ys+1|y≤s,x,q)=∑tf(ys+1=xt(s+1)tP cp (y s+1 |y≤s,x,q)=∑ t f(y s+1 =x t(s+1)t ;

Pcq(ys+1|y≤s,x,q)=∑jf(ys+1=qj(s+1)qP cq (y s+1 |y≤s,x,q)=∑ j f(y s+1 =q j(s+1)q ;

其中,f(ys+1=xt)是一个函数,当ys=xt时值为1,否则为0,f(ys+1=qj)同理。Among them, f(y s+1 = x t ) is a function, and the value is 1 when y s = x t , otherwise it is 0, and f(y s+1 = q j ) is the same.

进一步地,所述解码层的基于RNN语言生成模型和用于段落和输入问题的复制机制网络,包括:Further, the RNN-based language generation model of the decoding layer and the replication mechanism network for paragraphs and input questions include:

通过生成模型计算出的每个单词的生成概率的加权和可以用于确定最终输出单词的单词概率分布P:The weighted sum of the generation probability of each word calculated by the generative model can be used to determine the word probability distribution P of the final output word:

P(ys+1|y≤s,x,q)=λsPg(ys+1|y≤s,x,q)P(y s+1 |y≤s,x,q)=λ s P g (y s+1 |y≤s,x,q)

sPcp(ys+1|y≤s,x,q)s P cp (y s+1 |y≤s,x,q)

sPcq(ys+1|y≤s,x,q);s P cq (y s+1 |y≤s,x,q);

其中λsμsυs是权重参数,λsμsυs∈[0,1]且λsss=1,权重参数的值由下式所示的softmax层输出

Figure BDA0002545170210000071
决定:where λ s μ s υ s is the weight parameter, λ s μ s υ s ∈ [0,1] and λ s + μ ss =1, the value of the weight parameter is output by the softmax layer shown in the following equation
Figure BDA0002545170210000071
Decide:

Figure BDA0002545170210000072
Figure BDA0002545170210000072

λs=γs0s=γs1s=γs2 λ ss0 , μ ss1ss2

其中,Highway2()表示两层残差公路网络,Wc与bc表示学习参数中的权重与偏置。接着根据解码层生成的概率分布,在生成的过程中引入集束搜索,并且通过集束宽度b搜索生成范围来生成多个修正问题候选集。Among them, Highway 2 ( ) represents a two-layer residual highway network, and W c and bc represent the weights and biases in the learning parameters. Then, according to the probability distribution generated by the decoding layer, a beam search is introduced in the generation process, and a plurality of correction problem candidate sets are generated by searching the generation range through the beam width b.

修正问题生成模型训练过程中,使用负对数似然损失计算误差L,并通过更新参数的方式最小化误差函数,不断优化生成模型,误差函数定义为:

Figure BDA0002545170210000073
其中N是批处理训练大小,i是每一批数据中第i个样本的索引值。During the training process of the generation model of the correction problem, the error L is calculated using the negative log-likelihood loss, and the error function is minimized by updating the parameters to continuously optimize the generation model. The error function is defined as:
Figure BDA0002545170210000073
where N is the batch training size and i is the index value of the ith sample in each batch.

进一步地,解码层所述GRU-RNN门控循环单元递归神经网络结构包括:Further, the GRU-RNN gated recurrent neural network structure of the decoding layer includes:

激活函数:GRU-RNN中选择PReLu作为激活函数,该函数作为修正线性单元,可以将输入向量当中的所有负值的斜率根据具体训练数据作出改变,其余值不变,增加区分对比度,提高模型学习效率;Activation function: PReLu is selected as the activation function in GRU-RNN. This function is used as a modified linear unit, which can change the slope of all negative values in the input vector according to the specific training data, and the remaining values remain unchanged. Increase the contrast and improve model learning efficiency;

神经元选择:如图4所示,为GRU-RNN单层结构展开图,其基于一种时间序列长度模型,能够有效捕捉上下文之间的关联信息,门控循环单元GRU是RNN当中的一种神经元,在隐藏层中通过两个门结构来控制信息的留存,可以筛选出重要与不重要的信息,如图5所示,GRU单元结构如下:Neuron selection: As shown in Figure 4, it is an expanded view of the single-layer structure of GRU-RNN. It is based on a time series length model and can effectively capture the correlation information between contexts. The gated recurrent unit GRU is one of the RNNs. Neurons control the retention of information through two gate structures in the hidden layer, and can filter out important and unimportant information. As shown in Figure 5, the GRU unit structure is as follows:

序列t的输入为xt,隐藏层的上一层输入为ht-1,表示前一数据中不太重要的信息,这两个输入通过重置门rt,更新门zt与输出门ht的处理,形成单元的输出ht与隐层输出

Figure BDA0002545170210000074
Wz,Wr,W为权重矩阵;The input of the sequence t is x t , and the input of the previous layer of the hidden layer is h t-1 , which represents less important information in the previous data. These two inputs pass through the reset gate r t , the update gate z t and the output gate The processing of h t forms the output h t of the unit and the output of the hidden layer
Figure BDA0002545170210000074
W z , W r , W are weight matrices;

更新门:zt=σ(Wz·[ht-1,xt])Update gate: z t =σ(W z ·[h t-1 ,x t ])

重置门:rt=σ(Wr·[ht-1,xt])Reset gate: r t =σ(W r ·[h t-1 ,x t ])

隐藏层:

Figure BDA0002545170210000075
Hidden layer:
Figure BDA0002545170210000075

输出门:

Figure BDA0002545170210000076
Output gate:
Figure BDA0002545170210000076

其它参数:模型采用一层带有注意力机制的GRU与softmax层构成,自适应梯度下降法,学习率为0.0007。Other parameters: The model is composed of a layer of GRU with an attention mechanism and a softmax layer, adaptive gradient descent, and a learning rate of 0.0007.

进一步地,一种段落句子压缩自动生成训练数据的方法,包括:Further, a method for automatically generating training data for paragraph sentence compression, comprising:

该压缩方法是一种基于句法依赖结构与整数规划的无监督句子压缩方法,首先获得输入句子的语法依存结构,接着从句子的开头按顺序给句子中的单词分配索引号,并基于索引号制定整数规划问题,整数规划表达式定义如下:The compression method is an unsupervised sentence compression method based on syntactic dependency structure and integer programming. First, the syntax dependency structure of the input sentence is obtained, and then the words in the sentence are assigned an index number in sequence from the beginning of the sentence, and formulate based on the index number. Integer programming problem, the integer programming expression is defined as follows:

Figure BDA0002545170210000081
Figure BDA0002545170210000081

当ai=1时,选择句子中的第i个单词,当ai=0时,即该单词未被选中,通过句子压缩将其删除。L是压缩之前句子的长度,l是压缩之后句子的最大单词数,parent(i)在句法依赖上显示与第i个单词的父节点相对应的单词,wi表示单词权重,由

Figure BDA0002545170210000082
计算得出,F为训练语料库中所有单词出现的总频率,F(ai)为单个单词的出现频率。段落句子压缩过程中,压缩后的长度由约束条件∑i≤Lai≤l决定,本发明方法目的在于寻找出问题的最佳意图,因此要对系统中被认为是比较重要的单词进行一定的抑制,
Figure BDA0002545170210000083
约束就是为了防止一直选择权重最大的单词,从而确保生成意图比较精准的问题。When a i =1, the i-th word in the sentence is selected; when a i =0, that is, the word is not selected, and it is deleted by sentence compression. L is the length of the sentence before compression, l is the maximum number of words in the sentence after compression, parent(i) shows the word corresponding to the parent node of the ith word in terms of syntactic dependency, and w i represents the word weight, given by
Figure BDA0002545170210000082
It is calculated that F is the total frequency of occurrence of all words in the training corpus, and F( ai ) is the frequency of occurrence of a single word. In the process of paragraph and sentence compression, the compressed length is determined by the constraint condition ∑ i≤L a i ≤l. The purpose of the method of the present invention is to find the best intention of the problem. Therefore, certain words are considered to be important in the system. inhibition,
Figure BDA0002545170210000083
The constraint is to prevent the word with the largest weight from being selected all the time, so as to ensure that the generation intent is more accurate.

上述内容虽结合附图对本发明的具体实施方法进行了描述,但仅为本发明的优选实施例而已,所属领域的技术人员应该明确,凡在本发明的技术方案基础之上,所做出的任何篡改,等同以及无需付出创造性劳动即可做出的变形或修改,均应包含在本发明的权利要求保护范围以内。Although the above content has described the specific implementation method of the present invention in conjunction with the accompanying drawings, it is only a preferred embodiment of the present invention, and those skilled in the art should be clear that on the basis of the technical solution of the present invention, the Any tampering, equivalent, and deformation or modification that can be made without creative work shall be included within the scope of protection of the claims of the present invention.

Claims (3)

1.一种自然语言问答技术中的问句意图识别方法,其特征在于,所述方法采用一种端到端的可执行修正问题生成模型,该模型通过结合句子生成模型、机器阅读模型以及复制机制,所述方法包括以下步骤:1. a question sentence intent recognition method in natural language question answering technology, it is characterized in that, described method adopts a kind of end-to-end executable correction question generation model, and this model is by combining sentence generation model, machine reading model and replication mechanism , the method includes the following steps: 步骤1,当输入的问题与其相对应的提问段落被输入到生成模型中时,该模型会使用机器阅读模型读取并提取输入的问题与段落的内容中的相关信息;Step 1, when the input question and its corresponding question paragraph are input into the generative model, the model will use the machine reading model to read and extract the relevant information in the content of the input question and paragraph; 步骤2,该模型基于注意力观察监督机制,提取信息的同时由句子生成模型生成修正的问题集合;Step 2, the model is based on the attention observation and supervision mechanism, and the sentence generation model generates a revised set of questions while extracting information; 步骤3,接着,复制机制会将输入的问题以及段落中的重要单词与表达方式复制到修正后的问题集合当中,从而根据段落的内容生成修订后的最终问题;Step 3, then, the copy mechanism will copy the input question and the important words and expressions in the paragraph into the revised set of questions, so as to generate the revised final question according to the content of the paragraph; 所述修正问题生成模型结构包括如下四部分:The structure of the correction problem generation model includes the following four parts: 输入层:将输入的问题句子与提问段落表示为单热编码的序列矢量模型;Input layer: The input question sentence and question paragraph are represented as a sequence vector model of one-hot encoding; 编码层:使用单词嵌入模型将输入层的段落标记序列与问题句子转换为连续值的向量,并同时考虑到段落与问题句子的上下文关系来创建向量序列;Encoding layer: use the word embedding model to convert the sequence of paragraph tokens and question sentences of the input layer into vectors of continuous values, and at the same time consider the contextual relationship between paragraphs and question sentences to create a vector sequence; 匹配层:捕获段落中的单词与输入问题句子之间的相互关系,对段落的向量序列进行建模;Matching layer: captures the relationship between the words in the paragraph and the input question sentence, and models the vector sequence of the paragraph; 解码层:使用具有注意力机制与两个复制机制的GRU-RNN门控循环单元递归神经网络,生成构成修正问题的单词标记序列。Decoding layer: Generates a sequence of word tokens that constitute the correction problem using a GRU-RNN gated recurrent neural network with an attention mechanism and two replication mechanisms. 2.根据权利要求1所述的一种自然语言问答技术中的问句意图识别方法,其特征在于,所述解码层当中,GRU-RNN门控循环单元递归神经网络结构如下:2. the question sentence intent identification method in a kind of natural language question answering technology according to claim 1, is characterized in that, among the described decoding layer, GRU-RNN gated recurrent neural network structure is as follows: 激活函数:GRU-RNN中选择PReLu作为激活函数,该函数作为修正线性单元,可以将输入向量当中的所有负值的斜率根据具体训练数据作出改变,其余值不变,增加区分对比度,提高模型学习效率;Activation function: PReLu is selected as the activation function in GRU-RNN. This function is used as a modified linear unit, which can change the slope of all negative values in the input vector according to the specific training data, and the remaining values remain unchanged. Increase the contrast and improve model learning efficiency; 神经元选择:门控循环单元GRU是RNN当中的一种神经元,在隐藏层中通过两个门结构来控制信息的留存,可以筛选出重要与不重要的信息;Neuron selection: Gated recurrent unit GRU is a kind of neuron in RNN. In the hidden layer, two gate structures are used to control the retention of information, which can filter out important and unimportant information; 其它参数:模型采用一层带有注意力机制的GRU与softmax层构成,自适应梯度下降法,学习率为0.0007。Other parameters: The model is composed of a layer of GRU with an attention mechanism and a softmax layer, adaptive gradient descent, and a learning rate of 0.0007. 3.根据权利要求1或2所述的一种自然语言问答技术中的问句意图识别方法,其特征在于,采用一种段落句子压缩自动生成训练数据的方法,通过句子压缩,从机器阅读语料问题库中产生意图模糊的简短提问句,从而自动修正提问模型。3. The method for recognizing question intent in a natural language question answering technique according to claim 1 and 2, wherein a method for automatically generating training data by compressing a paragraph sentence is used to compress a sentence from a machine-reading corpus. A short question sentence with ambiguous intent is generated in the question bank, thereby automatically correcting the questioning model.
CN202010557964.1A 2020-06-18 2020-06-18 A Question Intention Recognition Method in Natural Language Question Answering Technology Pending CN111813907A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010557964.1A CN111813907A (en) 2020-06-18 2020-06-18 A Question Intention Recognition Method in Natural Language Question Answering Technology

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010557964.1A CN111813907A (en) 2020-06-18 2020-06-18 A Question Intention Recognition Method in Natural Language Question Answering Technology

Publications (1)

Publication Number Publication Date
CN111813907A true CN111813907A (en) 2020-10-23

Family

ID=72846263

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010557964.1A Pending CN111813907A (en) 2020-06-18 2020-06-18 A Question Intention Recognition Method in Natural Language Question Answering Technology

Country Status (1)

Country Link
CN (1) CN111813907A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112541356A (en) * 2020-12-21 2021-03-23 山东师范大学 Method and system for recognizing biomedical named entities
CN113255344A (en) * 2021-05-13 2021-08-13 淮阴工学院 Keyword generation method fusing topic information
CN119577522A (en) * 2025-01-27 2025-03-07 南京小冻梨智能科技有限公司 A method for correcting conversation intention information of the elderly based on deep neural network

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160180728A1 (en) * 2014-12-23 2016-06-23 International Business Machines Corporation Managing answer feasibility
CN110134771A (en) * 2019-04-09 2019-08-16 广东工业大学 An Implementation Method of Fusion Network Question Answering System Based on Multi-Attention Mechanism
CN111061851A (en) * 2019-12-12 2020-04-24 中国科学院自动化研究所 Given fact-based question generation method and system

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160180728A1 (en) * 2014-12-23 2016-06-23 International Business Machines Corporation Managing answer feasibility
CN110134771A (en) * 2019-04-09 2019-08-16 广东工业大学 An Implementation Method of Fusion Network Question Answering System Based on Multi-Attention Mechanism
CN111061851A (en) * 2019-12-12 2020-04-24 中国科学院自动化研究所 Given fact-based question generation method and system

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112541356A (en) * 2020-12-21 2021-03-23 山东师范大学 Method and system for recognizing biomedical named entities
CN112541356B (en) * 2020-12-21 2022-12-06 山东师范大学 Method and system for recognizing biomedical named entities
CN113255344A (en) * 2021-05-13 2021-08-13 淮阴工学院 Keyword generation method fusing topic information
CN113255344B (en) * 2021-05-13 2024-05-17 淮阴工学院 A keyword generation method integrating topic information
CN119577522A (en) * 2025-01-27 2025-03-07 南京小冻梨智能科技有限公司 A method for correcting conversation intention information of the elderly based on deep neural network

Similar Documents

Publication Publication Date Title
CN114429132B (en) A method and device for named entity recognition based on hybrid lattice self-attention network
CN113297364B (en) Natural language understanding method and device in dialogue-oriented system
CN112115687B (en) A generative problem method combining triples and entity types in knowledge base
CN111966812B (en) An automatic question answering method and storage medium based on dynamic word vector
CN115688879B (en) An intelligent customer service voice processing system and method based on knowledge graph
CN112699216A (en) End-to-end language model pre-training method, system, device and storage medium
CN109657239A (en) The Chinese name entity recognition method learnt based on attention mechanism and language model
CN114492441B (en) BiLSTM-BiDAF named entity recognition method based on machine reading comprehension
CN109977199B (en) A reading comprehension method based on attention pooling mechanism
CN115659242B (en) A multimodal sentiment classification method based on modality-enhanced convolutional graph
CN114238649A (en) Common sense concept enhanced language model pre-training method
CN112232053A (en) A text similarity calculation system, method, and storage medium based on multi-keyword pair matching
Wang et al. A deep reinforcement learning based multi-step coarse to fine question answering (MSCQA) system
Zhao et al. Knowledge-aware bayesian co-attention for multimodal emotion recognition
CN115510230A (en) A Mongolian Sentiment Analysis Method Based on Multidimensional Feature Fusion and Comparative Enhancement Learning Mechanism
CN112307179A (en) Text matching method, apparatus, device and storage medium
CN116595953A (en) A Summary Generation Method Based on Knowledge and Semantic Information Enhancement
CN118227769A (en) Knowledge graph enhancement-based large language model question-answer generation method
CN111813907A (en) A Question Intention Recognition Method in Natural Language Question Answering Technology
CN111145914A (en) Method and device for determining lung cancer clinical disease library text entity
CN112613316A (en) Method and system for generating ancient Chinese marking model
CN118332116A (en) Text classification method based on weighted fusion of multi-layer features of mask and causal language model
CN117291187A (en) Text processing method, device, equipment and medium based on attention enhancement
CN117034950A (en) Long sentence embedding method and system for introducing condition mask comparison learning
CN119646221B (en) Aspect-level sentiment analysis method, device and medium based on graph convolutional network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20201023