Detailed Description
In order to make those skilled in the art better understand the technical solutions in the embodiments of the present invention, the technical solutions in the embodiments of the present invention will be described in detail below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all embodiments. All other embodiments obtained by a person skilled in the art based on the embodiments of the present invention shall fall within the scope of the protection of the embodiments of the present invention.
The following further describes specific implementation of the embodiments of the present invention with reference to the drawings.
With the continuous development of computer technology and the continuous progress of artificial intelligence technology, an intelligent conversation system such as a content-based Information Retrieval (CIR) system can be developed, which has an Information Retrieval system of a conversation interface allowing a user to interact with the system to find Information in a spoken or written form through multiple rounds of dialog of natural language, greatly facilitating human-computer interaction efficiency.
In the session information retrieval system, the rewriting model trained on the rewriting task based on the dialogue problem is beneficial to converting simplified query data into the corresponding complete query data without omission of context, so that the query data can be better processed by the information retrieval system.
Fig. 1 is a schematic block diagram of a dialog system according to an example. The dialog system of the present example includes a front-end device 110, a dialog server 120, and a database 130. The front-end device 110 may be a user device including a human-machine interaction module. The front-end device 110 generates query data, e.g., simplified query data, through the human-machine interaction module. The front-end device 110 then transmits the simplified query data to the dialog server 120, such as through a communication module (not shown).
Specifically, the front-end device 110 may be a terminal device such as an embedded device, an internet of things device, or a non-embedded device such as a desktop computer or a server. An embedded operating system, such as a real-time operating system, may be installed in the embedded device to perform communication with the dialog server 120 through a network communication model. As an internet of things device, the front-end device 110 may be implemented as a smart device such as a smart appliance including, but not limited to, a smart watch, a smart speaker, a smart air conditioner, a smart doorbell, etc., and the smart device can implement a smart conversation such as voice interaction, computer vision interaction, etc., with a user through a human-computer interaction module, and perform initial processing based on a conversation instruction of the user and send to the conversation server 120 for further processing, or directly forward to the conversation server 120 for further processing.
The dialog server 120 includes a rewrite module, a query module, and a text generation module. The rewrite module generates contextual query data from the simplified query data and queries a database, such as the conversation database 130, for information based on the contextual query data. Then, the query module transmits the information to the text generation module based on the query, generates a text as a query result, and returns the text to the human-computer interaction module of the front-end device 110.
It should be appreciated that the Query module of the present example may be configured to different functions, e.g., based on contextual Query data, a Structured Query Language (SQL) statement may be generated, for example, to Query (i.e., a database Query scenario). The query module may also obtain a corresponding reply text or reply keyword based on the contextual query data, and then the text generation module generates a text that more conforms to natural language (i.e., a client dialog scene) based on the reply keyword or reply text. Further, the reply text or reply keywords may be based on the target language and the contextual query data may be based on the source language, such that the generated text is a translated text of the contextual query data (i.e., translated query scenario). Additionally, the reply text or reply keywords, contextual query data, can each be based on the source language, and the generated text based on the target language, which is another example of a translation query scenario.
Since the context query data is generated based on the simplified query data, the reliability of the generated context query data depends on the configuration of the rewrite module, and the performance of the rewrite module is particularly important when the rewrite module is implemented by the rewrite module. In addition, simplified query data is used as input, excessive information output by a user is avoided, the intellectualization of the rewriting module is improved, and the reasoning capability of the rewriting module is particularly important under the condition. Similarly, because simplified query data is used as input, the number of samples of context query data is often small, and how to train to obtain a re-model with stronger reasoning ability and generalization ability while ensuring the intellectualization of the re-model is quite difficult.
The invention adopts the inventive concept of comparative learning to optimize the loss function (for example, minimize the loss function) as much as possible, so that the trained task model has stronger performances, such as prediction accuracy and generalization capability. FIG. 2 is a flow diagram of steps of a comparative learning method according to one embodiment of the invention. The solution of the present embodiment may be applied to any suitable electronic device with data processing capability, including but not limited to: a server, a mobile terminal (such as a mobile phone, a PAD and the like), a PC and the like. For example, in a model training (training) phase, a codec model may be trained based on training samples with a computing device (e.g., a data center) configured with a CPU (example of a processing unit) + GPU (example of an acceleration unit) architecture. Computing devices such as data centers may be deployed in cloud servers such as a private cloud, or a hybrid cloud. Accordingly, in the inference (inference) phase, the inference operation may also be performed by using a computing device configured with a CPU (example of processing unit) + GPU (example of acceleration unit) architecture.
The comparative learning method of the embodiment includes:
s210: and constructing a similar sample pair and a non-similar sample pair at least based on the similar enhancement sample and the labeled sample of the training sample.
It should be understood that the similar enhancement samples may be obtained by performing any enhancement processing on the training samples, for example, coarse-grained enhancement may be performed on the training samples, and when the training samples are text samples, the text samples may be translated and then translated back. Sample enhancement processing may be performed on each training sample, where each training sample and the corresponding similar enhancement sample may form a similar sample pair, and any other two different samples may form a non-similar sample pair.
S220: and constructing a loss function of the encoder of the task model based on the similarity of the similar sample pairs and the similarity of the non-similar sample pairs, wherein the function value of the loss function is in direct proportion to the similarity of the similar sample pairs and in inverse proportion to the similarity of the non-similar sample pairs.
It should be understood that the task model is a model based on the encoder-decoder structure, e.g., a query rewrite model. The query rewrite model is used to generate contextual query data from the simplified query data, where the query data may be textual data or conforming data.
S230: an encoder of the task model is trained based on the loss function.
It should be understood that the loss function of the encoder corresponds to the encoder, the encoder and the decoder of the task model may be trained based on the corresponding generation loss function (e.g., the third loss function described below), and the overall loss function of the task model may be composed of the generation loss function and the loss function of the encoder of the task model, e.g., by weighting the generation loss function and the loss function of the encoder, and the loss function of the encoder may be supplemented and adjusted as the generation loss function. The parameters of the task model may be adjusted based on the total loss function while training the encoder and decoder of the task model such that the total loss function is maximized.
In the scheme of the embodiment of the invention, at least based on the similar enhancement samples and the labeled samples of the training samples, the similar sample pairs and the non-similar sample pairs are constructed, so that the quality of the enhancement samples formed by the similar sample pairs and the non-similar sample pairs is higher.
In other examples, the training sample and the similar enhancement sample can be obtained based on the input of the initial sample into the encoder with the first random inactivation probability and the second random inactivation probability, so that the generalization capability of the task model is improved based on the random inactivation probability.
In other examples, constructing similar sample pairs and non-similar sample pairs based on at least similar enhancement samples and annotation samples of the training samples comprises: constructing a first similar sample pair and a first non-similar sample pair based on the training sample and the similar enhancement sample thereof; and constructing a first similar sample pair and a first non-similar sample pair based on the similar enhancement sample and the labeled sample of the training sample. That is, the similar sample pair includes a first similar sample pair and a second similar sample pair, and the dissimilar sample pair includes a first dissimilar sample pair and a second dissimilar sample pair.
In other examples, constructing a loss function for an encoder of a task model based on similarities of similar sample pairs and similarities of non-similar sample pairs includes: determining a first loss function based on the similarity of the first similar sample pair and the similarity of the first dissimilar sample pair, and determining a second loss function based on the similarity of the second similar sample pair and the similarity of the second dissimilar sample pair; and determining a loss function of the encoder of the task model based on a first loss function and a second loss function, wherein the first loss function is mainly used for describing internal contrast loss, and the second loss function is mainly used for describing external contrast loss, so that the effectiveness of the loss function is improved.
In some examples, constructing the first similar sample pair and the first non-similar sample pair based on the training sample and the similar enhancement sample thereof includes: determining a first training sample and a corresponding first similar enhancement sample as a first similar sample pair; and determining second similar enhancement samples corresponding to the first training sample and the second training sample as a first non-similar sample pair, thereby further improving the quality of the similar sample pair and the non-similar sample pair and being beneficial to constructing a more effective loss function.
Further, based on the training sample and the similar enhancement sample thereof, a similar sample pair and a non-similar sample pair are constructed, and the contrast learning method further comprises the following steps: a first training sample and a second training sample are determined as a first non-similar sample pair.
Specifically, for a text sample, an initial text sample (an example of the initial sample) may be input to the encoder for the first time and the second time, respectively, to obtain a first similar sample pair formed by two similar text samples, where the initial text sample may be a paragraph, a sentence, a clause, or the like, and the internal contrast loss is a loss constructed based on the similarity of the first similar sample pair and the similarity of the first non-similar sample pair.
Alternatively, the first similar sample pair may also be the initial text sample and its first input to the encoder resulting in a similar text sample composition.
Alternatively, the first similar sample pair may also be a composition of the initial text sample and its similar text sample that is input to the encoder a second time.
Further, the first similar sample pair may also be a weighted sum of a similar sample pair formed by two similar text samples, the initial text sample and the similar text sample obtained by inputting the initial text sample to the encoder for the first time, and the initial text sample and the similar text sample obtained by inputting the initial text sample to the encoder for the second time.
In other examples, constructing a first similar sample pair and a first non-similar sample pair based on the similar enhanced samples and the labeled samples of the training samples comprises: determining a first fusion sample of the first training sample and the first similar enhancement sample and a labeled sample of the first fusion sample as a second similar sample pair; and determining the labeled samples of the first fused sample and the second fused sample as a second non-similar sample pair, thereby further improving the quality of the similar sample pair and the non-similar sample pair and being beneficial to constructing a more effective loss function.
In particular, for text samples, the external contrast loss is constructed based on the similarity of the second similar sample pair and the similarity of the second non-similar sample pair, i.e. the pairwise distance between the fused sample of similar text samples and the real (groudtuth) rewrite in the sample space.
Similar sample pairs and non-similar sample pairs, and the loss functions constructed based on the similar sample pairs and the non-similar sample pairs, are described below in conjunction with fig. 3.
FIG. 3 is a schematic diagram of a comparative learning process of the embodiment of FIG. 2. The task model in this example may be a query rewrite model, and is a model based on the encoder-decoder structure. The task model of this example includes a word embedding layer 310 (worembedding), an encoder 320 (encoder), and a decoder 330 (decoder). This example focuses on training the encoder 320, i.e., training the encoder based on the training samples and their enhancement samples.
Specifically, a first initial sample output by the word embedding layer 310 is input into the encoder 320, the encoder 320 having a random inactivation probability, in which case a first training sample is output, and a second similar enhancement sample is output, in which case a second random inactivation probability is output. Accordingly, the second initial sample is input into the encoder 320, resulting in a second training sample and a second similar training sample.
The first training sample and the first similar enhanced sample form a first similar sample pair, the first training sample and the second training sample form a first non-similar sample pair, and the first training sample and the second similar enhanced sample pair form a non-similar sample pair.
For a batch (batch) with N instances (instances) and enhanced instances of the same size, for one data sample, the corresponding enhanced data is taken as a positive sample, while the remaining 2 (N-1) data records are taken as negative samples. The contrast loss function in a batch can be expressed as:
n is the batch size, X is a word embedding matrix, X is 2N The positive example pairs in (1) are recorded one by one, which means that each pair in the odd-even rows of X constitutes a positive example pair, and X thus 2N The loss function for all combined samples formed is as follows:
specifically, the first function value of the first loss function is proportional to the similarity of the first similar sample pair, and the first function value is inversely proportional to the similarity of the first non-similar sample pair. The second function value of the second loss function is in direct proportion to the similarity of the second similar sample pair, and the second function value is in inverse proportion to the similarity of the second non-similar sample pair, so that the construction of a more effective loss function is further improved.
In other examples, the comparative learning method further comprises: based on the third loss function, the task model is trained, thereby improving the overall training effect of the task model including the encoder and the decoder.
In one example, the loss function for contrast learning (encoder) is the sum of a first loss function and a second loss function:
L C =L icl +L ecl ,
wherein the first loss function is: l is a radical of an alcohol icl =L cl (Combine[Q′;Q″]) For a first similar sample pair and a second dissimilar sample pair constructing a first loss function, the first training sample and the first similar enhancement sample form a first similar sample pair, the first training sample and the second training sample form a first dissimilar sample pair, and the first training sample and the second similar enhancement sample form a dissimilar sample pair, wherein the combination functionFor the combination of Q 'and Q'. Where Q 'and Q "are two sample matrices (e.g., query embedding matrices) from the same input sample, a first similar sample pair and a first non-similar sample pair are constructed from Q' and Q". The combination function represents the one-to-one connection of two nxm embedded matrices into one 2 nxm matrix. L is cl Is a bulk contrast loss function.
The second loss function is:
wherein,
a labeled sample matrix that is a sample matrix Q 'and Q', a combine function for averaging (an example of a weighting process) the values between Q 'and Q'
Combinations of (a) and (b). Correspondingly, a first fusion sample of the first training sample and the first similar enhancement sample and an annotation sample of the first fusion sample are determined as a second similar sample pair; determining the labeled sample of the first fused sample and the second fused sample as the second non-similar sample pair, i.e., by Q ', Q' and
a second similar sample pair and a second non-similar sample pair are constructed.
Wherein, under the condition that both the encoder and the decoder are trained, the total loss function Lall is as follows:
L all =L G +w LC wherein L is G Generating a loss (encoder-decoder) for the task model; the loss function for contrast learning is L C (ii) a w is a loss weight based on L all When training the task model, the parameters of the task model are adjusted to make L all The maximum value is obtained.
FIG. 4 is a flow chart of steps of a query method according to another embodiment of the invention. The solution of the present embodiment may be applied to any suitable electronic device with data processing capability, including but not limited to: server, mobile terminal (such as mobile phone, PAD, etc.), PC, etc. For example, in a model training (training) phase, a codec model may be trained based on training samples with a computing device (e.g., a data center) configured with a CPU (example of a processing unit) + GPU (example of an acceleration unit) architecture. Computing devices such as data centers may be deployed in cloud servers such as a private cloud, or a hybrid cloud. Accordingly, in the inference (inference) phase, the inference operation may also be performed by using a computing device configured with a CPU (example of processing unit) + GPU (example of acceleration unit) architecture.
The query method of the embodiment comprises the following steps:
s410: simplified query data is obtained.
S420: based on the simplified query data, the simplified query data is input into a query rewrite model to obtain contextual query data.
S430: and querying based on the context query data to obtain a query result.
In the scheme of the embodiment of the invention, at least based on the similar enhancement samples and the labeled samples of the training samples, the similar sample pairs and the non-similar sample pairs are constructed, so that the quality of the enhancement samples formed by the similar sample pairs and the non-similar sample pairs is higher.
Fig. 5 is a flowchart illustrating steps of a human-machine conversation method according to another embodiment of the present invention.
The man-machine conversation method of the embodiment comprises the following steps:
s510: a dialog request is obtained.
S520: and analyzing based on the conversation request to obtain simplified query data.
S530: and querying based on the simplified data by using a query method to obtain a query result.
S540: based on the query results, a dialog reply to the dialog request is generated.
It should be understood that the query method may be the query method of the embodiment of fig. 4.
It should also be understood that in the example of fig. 1, parsing based on the dialog request may be performed in the human-computer interaction module, as well as in the dialog server 120.
According to the man-machine conversation method, the query rewriting model obtained through the training of the collaborative training method is used, the accuracy of data query is improved, and the man-machine conversation efficiency is further improved.
Referring to fig. 6, a schematic structural diagram of an electronic device according to another embodiment of the present invention is shown, and the specific embodiment of the present invention does not limit the specific implementation of the electronic device.
As shown in fig. 6, the electronic device may include: a processor (processor) 602, a communication interface (communication interface) 604, a memory (memory) 606 in which a program 610 is stored, and a communication bus 608.
The processor, the communication interface, and the memory communicate with each other via a communication bus.
A communication interface for communicating with other electronic devices or servers.
And the processor is used for executing the program, and particularly can execute the relevant steps in the method embodiment.
In particular, the program may include program code comprising computer operating instructions.
The processor may be a processor CPU, or an Application Specific Integrated Circuit ASIC (Application Specific Integrated Circuit), or one or more Integrated circuits configured to implement embodiments of the present invention. The intelligent device comprises one or more processors which can be the same type of processor, such as one or more CPUs; or may be different types of processors such as one or more CPUs and one or more ASICs.
And the memory is used for storing programs. The memory may comprise high-speed RAM memory, and may also include non-volatile memory, such as at least one disk memory.
The program may specifically be adapted to cause a processor to perform the following operations: constructing a similar sample pair and a non-similar sample pair based on the training sample and the similar enhancement sample thereof; constructing a loss function of an encoder of a task model based on the similarity of the similar sample pair and the similarity of the dissimilar sample pair, the function value of the loss function being proportional to the similarity of the similar sample pair and inversely proportional to the similarity of the dissimilar sample pair; an encoder of the task model is trained based on the loss function.
Alternatively, the program may be specifically adapted to cause a processor to perform the following operations: acquiring simplified query data; inputting the simplified query data into a query rewrite model to obtain context query data; and querying based on the context query data to obtain a query result.
Alternatively, the program may be specifically adapted to cause a processor to perform the following operations: acquiring a conversation request; analyzing based on the dialogue request to obtain simplified query data; querying based on the simplified data by utilizing a query method to obtain a query result; generating a dialog reply to the dialog request based on the query result.
In addition, for specific implementation of each step in the program, reference may be made to corresponding steps and corresponding descriptions in units in the foregoing method embodiments, which are not described herein again. It can be clearly understood by those skilled in the art that, for convenience and simplicity of description, the specific working processes of the above-described devices and modules may refer to the corresponding process descriptions in the foregoing method embodiments, and are not described herein again.
It should be noted that, according to the implementation requirement, each component/step described in the embodiment of the present invention may be divided into more components/steps, and two or more components/steps or partial operations of the components/steps may also be combined into a new component/step to achieve the purpose of the embodiment of the present invention.
The above-described methods according to the embodiments of the present invention may be implemented in hardware, firmware, or as software or computer code that may be stored in a recording medium such as a CD ROM, RAM, floppy disk, hard disk, or magneto-optical disk, or as computer code downloaded through a network, originally stored in a remote recording medium or a non-transitory machine-readable medium, and to be stored in a local recording medium, so that the methods described herein may be stored in such software processes on a recording medium using a general purpose computer, a dedicated processor, or programmable or dedicated hardware such as an ASIC or FPGA. It will be appreciated that a computer, processor, microprocessor controller, or programmable hardware includes memory components (e.g., RAM, ROM, flash memory, etc.) that can store or receive software or computer code that, when accessed and executed by a computer, processor, or hardware, implements the methods described herein. Further, when a general-purpose computer accesses code for implementing the methods illustrated herein, execution of the code transforms the general-purpose computer into a special-purpose computer for performing the methods illustrated herein.
Those of ordinary skill in the art will appreciate that the various illustrative elements and method steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the embodiments of the present invention.
The above embodiments are only for illustrating the embodiments of the present invention and not for limiting the embodiments of the present invention, and those skilled in the art can make various changes and modifications without departing from the spirit and scope of the embodiments of the present invention, so that all equivalent technical solutions also belong to the scope of the embodiments of the present invention, and the scope of patent protection of the embodiments of the present invention should be defined by the claims.