Disclosure of Invention
The invention aims to provide a multi-round automatic question and answer method which can carry out multiple rounds of interaction with a software developer and a maintainer and fully understand the real intention of a questioner.
The invention also provides a software defect-oriented multi-turn automatic question answering system.
In order to realize the purpose of the invention, the technical solution of the provided software defect-oriented multi-round automatic question-answering method is as follows:
a software defect-oriented multi-round automatic question answering method comprises the following steps:
step 1, crawling a defect report from an open source Bug management library, extracting information which is helpful for defect understanding in the report, extracting entities and relations from a long text, performing knowledge fusion and quality detection, and constructing a software defect knowledge map;
step 2, recording multiple communications between a software developer or maintainer and the system, and constructing a multi-round dialogue memory module;
step 3, constructing a user portrait of the software developer or the maintainer according to the related problems in the software defect field asked by the software developer or the maintainer;
and 4, constructing a guided multi-turn question-answering module according to the dialogue memory and the user portrait.
Further, the construction of the software defect knowledge graph module comprises the following three stages:
the first stage is as follows: the method comprises the steps of crawling a defect report from an open source Bug management library, preprocessing the defect report, extracting information important for defect analysis understanding from a large number of attributes and Description texts in the crawled defect report, wherein the defect report information comprises a defect number (Bug ID), a defect Title (Title), a Product (Product), a Component (Component), Severity (Severity), a Modified state (Modified), a defect handler (identifier), a defect Reporter (Reporter) and defect Description information (Description);
and a second stage: extracting information, namely extracting entities, attributes and interrelations among the entities from the Title and Description information extracted in the first stage, and forming ontology knowledge expression on the basis;
and a third stage: knowledge fusion and quality detection: after acquiring the defect knowledge, performing knowledge fusion to eliminate contradiction and ambiguity; and after quality evaluation, adding qualified parts, namely defect knowledge entities with complete semantic expression and no ambiguity or contradiction, into a knowledge base.
Further, the knowledge fusion specifically comprises the following steps: calculating similarity sim1 between two entities by using Levenshtein distance, namely minimum edit distance, calculating similarity sim2 between two entities by using Dice coefficient, and calculating simple arithmetic mean sim3 of sim1 and sim2
sim3=(sim1+sim2)/2
If sim3 is greater than the set threshold, then both entities are counted as the same entity.
Further, the step 2 of constructing a multi-turn dialogue memory module specifically includes: the system comprises a user conversation state tracking module and a conversation strategy module, wherein the user conversation state tracking module is used for predicting a target of a user in each round of interaction, managing input and interactive question-answer history of each round and outputting a current conversation state; the dialogue strategy module takes optimal action according to the dialogue state to assist the user in completing the task of answer acquisition.
Further, step 3, constructing a user portrait of the software developer or the maintainer according to the related questions in the software defect field asked by the software developer or the maintainer, specifically including collecting daily questions of the software developer or the maintainer through a system, and counting the demands of the software developer or the maintainer on analyzing the software defect questions.
Further, step 4, constructing a guided multi-turn question-answering module according to the dialogue memory and the user portrait of the software developer or maintainer specifically includes:
step 4-1, preprocessing the user problem: analyzing and completing the question sentences by combining the question and answer context information in the multi-turn question and answer memory module to standardize the user problems, extracting defective entities and relations in the user question sentences, and deleting words which are meaningless to defect understanding;
step 4-2, map searching and reasoning: mapping the extracted defect entities and the relations into a structured query statement Cypher of a Neo4j graph database to perform subgraph search operation;
step 4-3, answer sorting: and (4) scoring the candidate answers to be ranked by combining the user characteristics obtained by the user portrait and the candidate answer list through a Lambdarank model, and ranking the candidate answers according to the score.
Correspondingly, the software defect-oriented multi-round automatic question answering system provided by the invention can adopt the following technical scheme:
a software defect-oriented multi-round automatic question-answering system comprises:
the module I is used for crawling a defect report from an open source Bug management library, extracting information which is helpful for defect understanding in the report, extracting entities and relations from a long text, carrying out knowledge fusion and quality detection, and constructing a software defect knowledge map;
the second module is used for recording the multiple communication between a software developer or maintainer and the system and constructing a multi-round dialogue memory module;
a third module, which is used for constructing a user portrait of the software developer or the maintainer according to the related problems in the software defect field asked by the software developer or the maintainer;
and a fourth module for constructing a guided multi-turn question and answer module according to the dialogue memory and the user portrait.
Compared with the prior art, the invention has the remarkable advantages that: 1) the knowledge graph in the software defect field is constructed to be used as a support for question answering, and compared with the traditional mode, the method has the advantages of good effect and high reliability; 2) the method based on the software defect knowledge graph is adopted and the database query statement is set, so that the matching of the problems and the query of the answers can be efficiently carried out; 3) the questions of the user can be accurately understood in a multi-turn question-answering mode, accurate answers are given, and the satisfaction degree of the user is improved.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.
In one embodiment, with reference to fig. 1, the present invention provides a software defect-oriented multi-round automatic question-answering method, which includes the following steps:
step 1, crawling a defect report from an open source Bug management library, extracting information which is helpful for defect understanding in the report, extracting entities and relations from some long texts, performing knowledge fusion and quality detection, and constructing a software defect knowledge graph;
step 2, recording multiple interactions between a software developer or maintainer and the system, and constructing a multi-round dialogue memory module;
step 3, constructing a user portrait of the software developer or the maintainer according to the related problems in the software defect field asked by the software developer or the maintainer;
and 4, constructing a guided multi-turn question-answering module according to the dialogue memory and the user portrait of the software developer or maintainer.
Further, the specific process of constructing the software defect knowledge graph in the step 1 comprises the following steps:
step 1-1, crawling a defect report from an open source Bug management library, preprocessing the defect report, extracting information important for defect analysis understanding from the crawled defect report, wherein the information of the crawled defect report comprises a large number of attributes and Description texts, and the information of the defect report comprises a defect number (Bug ID), a defect Title (Title), a Product (Product), a Component (Component), Severity (Severity), a Modified state (Modified), a defect handler (identifier), a defect Reporter (Reporter) and defect Description information (Description).
Step 1-2, information extraction, namely extracting entities, attributes and interrelations among the entities from Title and Description information extracted in the first stage, and forming ontology knowledge expression on the basis;
step 1-3, knowledge fusion and quality detection: after acquiring the defect knowledge, knowledge fusion is required to eliminate contradictions and ambiguities; and then, quality evaluation is carried out, and qualified parts, namely defect knowledge entities with complete semantic expression and no ambiguity and contradiction are added into a knowledge base.
Further, the information extraction in step 1-2 specifically includes: and (3) applying a deep neural network Bi-LSTM and combining with the Attention to carry out defective entity classification and entity relationship identification.
Further, the knowledge fusion and quality detection in the steps 1 to 3 specifically include:
step 1-3-1, calculating the similarity between every two entities by using Levenshtein distance, namely the minimum edit distance, wherein the calculation formula is as follows:
sim1=1-(leυa,b(|a|,|b|)/max(|a|,|b|))
in the formula, sim1 is a similarity value between every two entities calculated by using a Levenshtein distance, and a and b are two entity character strings;
step 1-3-2, calculating the similarity sim2 between every two entities by using the Dice coefficient, wherein the calculation formula is as follows:
in the formula, sim2 is entity similarity calculated by using a Dice coefficient, and A and B respectively represent two entities;
step 1-3-3, calculating simple arithmetic mean values sim3 of sim1 and sim2, sim3 being (sim1+ sim 2)/2; if sim3 is greater than the set threshold, then two entities are counted as the same entity;
further, the step 2 of constructing a multi-round dialogue memory module specifically includes: the system comprises a user conversation state tracking module and a conversation strategy module, wherein the user conversation state tracking module is used for predicting a target of a user in each round of interaction, managing input and interactive question-answer history of each round and outputting a current conversation state; the dialogue strategy module takes optimal actions (such as providing results, confirming requirements and the like) according to the dialogue state, so that the user is effectively assisted in completing the task of acquiring answers.
Further, step 3, constructing a user representation of the software developer or the maintainer according to the related questions in the software defect field asked by the software developer or the maintainer, specifically including collecting daily questions of the software developer or the maintainer through the system, and further knowing the requirements of the software developer or the maintainer on analyzing the software defect questions.
Further, step 4, constructing a guided multi-turn question-answering module according to the dialogue memory and the user portrait of the software developer or maintainer specifically includes:
and 4-1, preprocessing the user problem. Analyzing and completing the question sentences by combining the question and answer context information in the multi-turn question and answer memory module, standardizing user problems, extracting defective entities and relations in the user question sentences, and deleting stop words, prepositions and other words which are meaningless to defect understanding;
and 4-2, searching and reasoning the map. Mapping the extracted defect entities and the relations into a structured query statement Cypher of a Neo4j graph database to perform subgraph search operation;
and 4-3, sorting answers. Scoring the candidate answers to be ranked by combining the user characteristics obtained by the user portrait and the candidate answer list through a Lambdarank model, and ranking the candidate answers according to the score;
as a specific example, in one embodiment, the method for multi-round automatic question answering oriented to software defects according to the present invention is further verified and explained with reference to fig. 1, and includes the following contents:
1. and crawling a defect report from an open source Bug management library, extracting information which is helpful for defect understanding in the report, extracting entities and relations from some long texts, performing knowledge fusion and quality detection, and constructing a software defect knowledge map. In This embodiment, a crawled bug report, where Title is "Fix dimensions used by XUL syntax change" descriptor is "This is a clinical bug location I has data loss less cause I can be found to be more and less than find. when the term I is not the Page and Properties are found, the term com up and documents the term search is not the same as that of the term book with third view and property on machine and tool work, and after extraction and knowledge fusion and quality detection of the relationship, the obtained entities and relationship are shown in the following table:
2. and recording multiple interactions between a software developer or maintainer and the system, and constructing a multi-turn dialogue memory module. The multi-turn dialog management module takes triplets of the software defect knowledge graph as input. It mainly consists of two parts: the system comprises a state tracking module and a conversation strategy module, wherein the state tracking module is used for estimating a user target of each conversation period. Dialog inputs and history are managed for each dialog cycle and the current state of the dialog is generated. The main function of the dialogue strategy module is to determine the best operation according to the state of the last dialogue so as to help the user to complete the task of acquiring information or service. And displaying the behavior of the subsequent system and the updated dialog state according to the semantic representation input by the user and the current state of the dialog box. Some of the dialog management tasks are the following:
conversation state maintenance: the dialog state at time t +1 depends on the state at the previous time t, the system behavior at the previous time t, and the user behavior corresponding to the current time t + 1.
And (3) generating a system decision: from the states in the dialog state maintenance, system behavior is generated, and it is decided what to do next to represent the observed user input and feedback behavior of the system. After receiving the problems of the user, the user interacts with the knowledge graph, meanwhile, questions are asked for the parts which are not specific and clear, and the user is allowed to continuously complete and perfect the problems. The system may obtain context information for multiple conversation sets from multiple conversation set management modules, including context information for defect knowledge maps, context information for problem entities and relationships, semantic context information for problems, and so on. Based on this information, an answer to the user's question can be accurately found.
3. According to the related questions in the software defect field asked by the software developer or the maintainer, the emotion of the user is analyzed to find out the related characteristics of the user, and the user portrait of the software developer or the maintainer is constructed, so that the candidate answers in the question and answer are sequenced.
Emotion computation can be represented by a triplet, as follows
ST=<T,C,I>
Where T denotes a set of user information, i.e., T ═ T1,t2,...tnI.e. problems with software defects posed by the user.
C represents an emotion category or a set formed by different tendency categories, namely C ═ C1,c2,...,cn}. The method can express discrete emotion characteristics, can combine more complex emotions by using basic emotions, and therefore, the emotion characteristics can be divided into two or more categories according to different application purposes so as to create different emotion classification models. The model directly reflects the basic understanding of the emotion granularity.
I denotes a set of different emotional feature strengths, i.e. { I ═ I }1,i2,...,inGeneral strength can be divided into 3 grades of high, medium and low, and can also be divided into 5 grades of extremely high, medium, low and extremely low, and the strength characteristics are combined with emotional characteristics to form the core and the foundation of emotional calculation.
According to the definition, the calculation of the user emotion can be expressed as the acquisition and identification of the knowledge of software defects in the user input problem, so that the calculation of the user emotion function on different dimensions is realized. Thus, the computation of emotion can be expressed as a state space combination formed by the Cartesian product of the three elements described above, i.e., as
ST=T×C×I
Through the emotion calculation, the system can extract the emotion characteristics of the user, and establishes the portrait of the user by qualitative and quantitative analysis and behavior modeling, so that preparation is made for ordering the candidate answers in question answering.
4. The construction of guided multi-turn questions and answers based on the session memory and user profile of the software developer or maintainer is further described in conjunction with FIG. 2,
(1) a software developer or maintainer inputs a software defect related question which is required to be inquired;
(2) and obtaining the context of the multi-turn conversation: the system acquires multi-round conversation context information from a multi-round conversation memory module, wherein the multi-round conversation context information comprises related constructed knowledge graph context information, software defect problem entities, relationship context information, semantic context information, user emotion context information and the like;
(3) user emotion analysis: calculating the emotion value of the user question based on the emotion calculation model of the dominant-predicate mode and the emotion context information of the user, supplementing the emotion value to the emotion context information of the user, and using the emotion value as the generation of a constructed user portrait and a follow-up question and answer;
(4) user problem preprocessing: preprocessing the problems input by the user, including performing reference resolution and sentence completion according to context information of a plurality of rounds of conversation, performing automatic syntax error correction based on a Bi-LSTM + CRF model, extracting defective entities and relations in question sentences and the like, and preparing for full text search and knowledge graph search of subsequent semantics;
(5) knowledge graph searching: taking the entities and the relations extracted from the problems as conditions, carrying out map search in a Neo4j map database based on Cypher query sentences, and matching the node information of the defect knowledge map; obtaining a candidate answer list of the question;
(6) and (3) answer generation: inputting the candidate answer list and the preprocessed user question into a trained deep learning ranking model Lambdarank model to obtain the similarity ranking of the candidate answers and the user question, if the similarity of the candidate answers is higher than a specified threshold, outputting the candidate answers as the answers corresponding to the questions, otherwise, prompting the user to inquire in a mode of changing the types of the questions.
In addition, the present invention further provides an embodiment of a computer device, which includes a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor implements the steps of the software defect-oriented multi-round automatic question-answering method when executing the computer program.
The present invention also provides an embodiment of a readable storage medium, on which a computer program is stored, which is characterized in that the computer program, when being executed by a processor, implements the steps of the above software defect-oriented multi-round automatic question-answering method.
Corresponding to the above software defect-oriented multi-round automatic question-answering method, this embodiment provides a software defect-oriented multi-round automatic question-answering system, which includes:
the module I is used for crawling a defect report from an open source Bug management library, extracting information which is helpful for defect understanding in the report, extracting entities and relations from a long text, carrying out knowledge fusion and quality detection, and constructing a software defect knowledge map;
the second module is used for recording the multiple communication between a software developer or maintainer and the system and constructing a multi-round dialogue memory module;
a third module, which is used for constructing a user portrait of the software developer or the maintainer according to the related problems in the software defect field asked by the software developer or the maintainer;
and a fourth module for constructing a guided multi-turn question and answer module according to the dialogue memory and the user portrait.
The foregoing illustrates and describes the principles, general features, and advantages of the present invention. It will be understood by those skilled in the art that the present invention is not limited to the embodiments described above, which are described in the specification and illustrated only to illustrate the principle of the present invention, but that various changes and modifications may be made therein without departing from the spirit and scope of the present invention, which fall within the scope of the invention as claimed. The scope of the invention is defined by the appended claims and equivalents thereof.