CN111640425A

CN111640425A - Model training and intention recognition method, device, equipment and storage medium

Info

Publication number: CN111640425A
Application number: CN202010444204.XA
Authority: CN
Inventors: 王晶; 彭程; 罗雪峰; 王健飞
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2020-05-22
Filing date: 2020-05-22
Publication date: 2020-09-08
Anticipated expiration: 2040-05-22
Also published as: CN111640425B

Abstract

The present application discloses a method, device, device and storage medium for model training and intent recognition, and relates to the technical field of artificial intelligence. Among them, the model training method is: according to the training task data set, perform precipitation training on the underlying network of the pre-training model at least twice to obtain a strengthened model of the pre-training model; wherein, the training objects of each precipitation training include at least the underlying network and prediction. layer network, including successively decreasing middle and high-level networks; take at least two networks in the reinforcement model as target networks, and build a distillation model according to the target network, wherein the target network includes a feature recognition network and the prediction layer network; feature recognition The network includes at least the underlying network; by strengthening the target network of the model, the target knowledge of the training task data set is extracted; according to the target knowledge and the training task data set, the distillation model is trained to obtain the target learning model, so as to improve the prediction efficiency of the target learning model and accuracy.

Description

A method, device, device and storage medium for model training and intent recognition

技术领域technical field

本申请实施例涉及计算机技术领域，具体涉及人工智能技术。The embodiments of the present application relate to the field of computer technology, in particular to artificial intelligence technology.

背景技术Background technique

随着人工智能技术的发展，深度学习模型在人机交互领域的应用越来越广泛。预训练模型作为深度学习模型的一种，其结构复杂，模型参数庞大，所以预训练模型可能在运行阶段耗时长，速度慢。为了提高预训练模型的响应速度，现有技术通常需要研发人员从预训练模型中，人工选出权重值较小的网络层，并将其从预训练模型中裁剪掉，以实现对预训练模型的压缩，降低预训练模型的结构复杂度。但是采用该方式裁剪后的预训练模型，受人为因素影响较大，准确性较低，严重影响人机交互效果，亟需改进。With the development of artificial intelligence technology, deep learning models are more and more widely used in the field of human-computer interaction. As a kind of deep learning model, the pre-training model has a complex structure and huge model parameters, so the pre-training model may take a long time and slow in the running phase. In order to improve the response speed of the pre-training model, the prior art usually requires developers to manually select a network layer with a smaller weight value from the pre-training model, and cut them out from the pre-training model to realize the pre-training model. Compression to reduce the structural complexity of the pre-trained model. However, the pre-training model tailored in this way is greatly affected by human factors and has low accuracy, which seriously affects the effect of human-computer interaction and needs to be improved urgently.

发明内容SUMMARY OF THE INVENTION

提供了一种模型训练和意图识别方法、装置、设备及存储介质。Provided are a model training and intent recognition method, apparatus, device and storage medium.

根据第一方面，提供了一种基于知识蒸馏的模型训练方法，该方法包括：According to a first aspect, a model training method based on knowledge distillation is provided, the method comprising:

根据训练任务数据集，对预训练模型的底层网络进行至少两次沉淀训练，得到所述预训练模型的强化模型；其中，各次所述沉淀训练的训练对象至少包括所述底层网络和预测层网络，且包括逐次递减的中高层网络，所述预训练模型自底向上包括所述底层网络、至少一个所述中高层网络和所述预测层网络；According to the training task data set, the underlying network of the pre-training model is subjected to precipitation training at least twice to obtain an enhanced model of the pre-training model; wherein, the training objects of each pre-training training include at least the underlying network and the prediction layer a network, and includes successively decreasing middle and high-level networks, and the pre-training model includes the bottom-layer network, at least one of the middle and high-level networks, and the prediction layer network from bottom to top;

将所述强化模型中的至少两个网络作为目标网络，并根据所述目标网络构建蒸馏模型，其中，所述目标网络包含特征识别网络和所述预测层网络；所述特征识别网络至少包括所述底层网络；At least two networks in the reinforcement model are used as target networks, and a distillation model is constructed according to the target networks, wherein the target network includes a feature identification network and the prediction layer network; the feature identification network includes at least the the underlying network;

通过所述强化模型的目标网络，抽取所述训练任务数据集的目标知识；Extract the target knowledge of the training task data set through the target network of the reinforcement model;

根据所述目标知识和所述训练任务数据集，对所述蒸馏模型进行训练，得到目标学习模型。According to the target knowledge and the training task data set, the distillation model is trained to obtain a target learning model.

根据第二方面，提供了一种意图识别方法，该方法包括：According to a second aspect, an intent recognition method is provided, the method comprising:

获取人机交互设备采集的用户语音数据；Obtain user voice data collected by human-computer interaction equipment;

将所述用户语音数据输入目标学习模型，以获取所述目标学习模型输出的用户意图识别结果；其中，所述目标学习模型基于本申请任一实施例所述的基于知识蒸馏的模型训练方法训练而确定；Input the user voice data into the target learning model to obtain the user intent recognition result output by the target learning model; wherein, the target learning model is trained based on the model training method based on knowledge distillation described in any embodiment of the present application to determine;

根据所述用户意图识别结果确定所述人机交互设备的响应结果。A response result of the human-computer interaction device is determined according to the user intention recognition result.

根据第三方面，提供了一种基于知识蒸馏的模型训练装置，该装置包括：According to a third aspect, a model training device based on knowledge distillation is provided, the device comprising:

沉淀训练模块，用于根据训练任务数据集，对预训练模型的底层网络进行至少两次沉淀训练，得到所述预训练模型的强化模型；其中，各次所述沉淀训练的训练对象至少包括所述底层网络和预测层网络，且包括逐次递减的中高层网络，所述预训练模型自底向上包括所述底层网络、至少一个所述中高层网络和所述预测层网络；The precipitation training module is used to perform precipitation training on the underlying network of the pre-training model at least twice according to the training task data set to obtain the enhanced model of the pre-training model; wherein, the training objects of each pre-training training at least include all the bottom layer network and the prediction layer network, and include successively decreasing middle and high level networks, and the pre-training model includes the bottom layer network, at least one of the middle and high level networks and the prediction layer network from bottom to top;

蒸馏模型构建模块，用于将所述强化模型中的至少两个网络作为目标网络，并根据所述目标网络构建蒸馏模型，其中，所述目标网络包含特征识别网络和所述预测层网络；所述特征识别网络至少包括所述底层网络；a distillation model building module, configured to use at least two networks in the enhanced model as target networks, and build a distillation model according to the target networks, wherein the target network includes a feature recognition network and the prediction layer network; The feature identification network includes at least the underlying network;

目标知识抽取模块，用于通过所述强化模型的目标网络，抽取所述训练任务数据集的目标知识；a target knowledge extraction module, used for extracting the target knowledge of the training task data set through the target network of the reinforcement model;

蒸馏模型训练模块，用于根据所述目标知识和所述训练任务数据集，对所述蒸馏模型进行训练，得到目标学习模型。The distillation model training module is used for training the distillation model according to the target knowledge and the training task data set to obtain a target learning model.

根据第四方面，提供了一种意图识别装置，该装置包括：According to a fourth aspect, an intention recognition device is provided, the device comprising:

语音数据获取模块，用于获取人机交互设备采集的用户语音数据；A voice data acquisition module, used to acquire user voice data collected by human-computer interaction equipment;

意图识别模块，用于将所述用户语音数据输入目标学习模型，以获取所述目标学习模型输出的用户意图识别结果；其中，所述目标学习模型基于本申请任一实施例所述的基于知识蒸馏的模型训练方法训练而确定；An intent recognition module, used to input the user voice data into a target learning model to obtain a user intent recognition result output by the target learning model; wherein, the target learning model is based on the knowledge-based method described in any embodiment of the present application The model training method of distillation is determined by training;

响应结果确定模块，用于根据所述用户意图识别结果确定所述人机交互设备的响应结果。A response result determination module, configured to determine a response result of the human-computer interaction device according to the user intention identification result.

根据第五方面，提供了一种电子设备，该电子设备包括：According to a fifth aspect, an electronic device is provided, the electronic device comprising:

至少一个处理器；以及at least one processor; and

与所述至少一个处理器通信连接的存储器；其中，a memory communicatively coupled to the at least one processor; wherein,

所述存储器存储有可被所述至少一个处理器执行的指令，所述指令被所述至少一个处理器执行，以使所述至少一个处理器能够执行本申请任一实施例所述的基于知识蒸馏的模型训练方法或意图识别方法。The memory stores instructions executable by the at least one processor, the instructions being executed by the at least one processor to enable the at least one processor to perform the knowledge-based implementation of any of the embodiments herein Distilled model training method or intent recognition method.

根据第六方面，提供了一种存储有计算机指令的非瞬时计算机可读存储介质。所述计算机指令用于使所述计算机执行本申请任一实施例所述的基于知识蒸馏的模型训练方法或意图识别方法。According to a sixth aspect, a non-transitory computer-readable storage medium having computer instructions stored thereon is provided. The computer instructions are used to cause the computer to execute the knowledge distillation-based model training method or the intent recognition method described in any embodiment of the present application.

根据本申请实施例的技术解决了现有技术人工压缩预训练模型，准确性低的问题，能够通过低成本自动压缩训练出高精度的目标学习模型，以提高人机交互效果。The technology according to the embodiments of the present application solves the problem of low accuracy of manual compression of the pre-training model in the prior art, and can automatically compress and train a high-precision target learning model at low cost, so as to improve the effect of human-computer interaction.

应当理解，本部分所描述的内容并非旨在标识本公开的实施例的关键或重要特征，也不用于限制本公开的范围。本公开的其它特征将通过以下的说明书而变得容易理解。It should be understood that what is described in this section is not intended to identify key or critical features of embodiments of the disclosure, nor is it intended to limit the scope of the disclosure. Other features of the present disclosure will become readily understood from the following description.

附图说明Description of drawings

附图用于更好地理解本方案，不构成对本申请的限定。其中：The accompanying drawings are used for better understanding of the present solution, and do not constitute a limitation to the present application. in:

图1A是根据本申请实施例提供的一种基于知识蒸馏的模型训练方法的流程图；1A is a flowchart of a model training method based on knowledge distillation provided according to an embodiment of the present application;

图1B是根据本申请实施例提供的一种预训练模型的网络结构示意图；1B is a schematic diagram of a network structure of a pre-training model provided according to an embodiment of the present application;

图2是根据本申请实施例提供的另一种基于知识蒸馏的模型训练方法的流程图；2 is a flowchart of another model training method based on knowledge distillation provided according to an embodiment of the present application;

图3是根据本申请实施例提供的另一种基于知识蒸馏的模型训练方法的流程图；3 is a flowchart of another model training method based on knowledge distillation provided according to an embodiment of the present application;

图4-图5是根据本申请实施例提供的两种基于知识蒸馏的模型训练方法的流程图；Fig. 4-Fig. 5 is the flow chart of two kinds of model training methods based on knowledge distillation provided according to the embodiment of the present application;

图6A是根据本申请实施例提供的另一种基于知识蒸馏的模型训练方法的流程图；6A is a flowchart of another model training method based on knowledge distillation provided according to an embodiment of the present application;

图6B是根据本申请实施例提供的对蒸馏模型进行训练的原理结构示意图；6B is a schematic structural diagram of the principle of training a distillation model according to an embodiment of the present application;

图7是根据本申请实施例提供的一种基于知识蒸馏的模型训练方法的流程图；7 is a flowchart of a model training method based on knowledge distillation provided according to an embodiment of the present application;

图8是根据本申请实施例提供的一种意图识别方法的流程图；FIG. 8 is a flowchart of an intent recognition method provided according to an embodiment of the present application;

图9是根据本申请实施例提供的一种视频处理装置的结构示意图；9 is a schematic structural diagram of a video processing apparatus provided according to an embodiment of the present application;

图10是根据本申请实施例提供的一种意图识别装置的结构示意图；FIG. 10 is a schematic structural diagram of an intention recognition device provided according to an embodiment of the present application;

图11是用来实现本申请实施例的基于知识蒸馏的模型训练方法或意图识别方法的电子设备的框图。FIG. 11 is a block diagram of an electronic device used to implement the knowledge distillation-based model training method or the intent recognition method according to the embodiment of the present application.

具体实施方式Detailed ways

以下结合附图对本申请的示范性实施例做出说明，其中包括本申请实施例的各种细节以助于理解，应当将它们认为仅仅是示范性的。因此，本领域普通技术人员应当认识到，可以对这里描述的实施例做出各种改变和修改，而不会背离本申请的范围和精神。同样，为了清楚和简明，以下的描述中省略了对公知功能和结构的描述。Exemplary embodiments of the present application are described below with reference to the accompanying drawings, which include various details of the embodiments of the present application to facilitate understanding, and should be considered as exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present application. Also, descriptions of well-known functions and constructions are omitted from the following description for clarity and conciseness.

图1A是根据本申请实施例提供的一种基于知识蒸馏的模型训练方法的流程图；图1B是根据本申请实施例提供的一种预训练模型的网络结构示意图。本实施例适用于基于知识蒸馏技术将网络结构复杂的预训练模型压缩训练成一个网络结构简单的目标学习模型的情况。该实施例可以由电子设备中配置的基于知识蒸馏的模型训练装置来执行，该装置可以采用软件和/或硬件来实现。如图1A-1B所示，该方法包括：FIG. 1A is a flowchart of a model training method based on knowledge distillation provided according to an embodiment of the present application; FIG. 1B is a schematic diagram of a network structure of a pre-trained model provided by an embodiment of the present application. This embodiment is applicable to a situation where a pre-training model with a complex network structure is compressed and trained into a target learning model with a simple network structure based on the knowledge distillation technology. This embodiment may be implemented by a knowledge distillation-based model training apparatus configured in an electronic device, and the apparatus may be implemented by software and/or hardware. As shown in Figures 1A-1B, the method includes:

S101，根据训练任务数据集，对预训练模型的底层网络进行至少两次沉淀训练，得到预训练模型的强化模型。S101, according to the training task data set, perform precipitation training on the underlying network of the pre-training model at least twice to obtain an enhanced model of the pre-training model.

其中，本申请实施例的任务训练数据集可以是根据预训练模型需要执行的预测任务，获取与该预测任务相关的样本数据作为训练任务数据集。例如，若预训练模型需要执行的预测任务是对购物平台A中的用户语音数据进行意图识别，则此时可以获取购物平台A中所有的历史用户语音数据，进行相关处理(如打标签、删除无效数据等处理)后得到该预测任务对应的训练任务数据集。The task training data set in the embodiment of the present application may be a prediction task that needs to be performed according to the pre-training model, and sample data related to the prediction task is obtained as the training task data set. For example, if the prediction task that the pre-training model needs to perform is to perform intent recognition on user voice data in shopping platform A, then all historical user voice data in shopping platform A can be obtained at this time, and related processing (such as tagging, deleting, etc.) can be performed. After processing invalid data, etc.), the training task dataset corresponding to the prediction task is obtained.

本申请实施例的预训练模型可以是基于深度学习架构搭建的，且已径采用海量数据训练好的，可以执行某一学习任务的高精度模型，该预训练模型通常具有网络层数较深、每一层网络的维度较宽，且模型参数较多等特点。该与训练模型可以是用户自己采用大量样本数据训练的，也可以是直接从预训练模型数据库中获取的，对此本实施例不进行限定。可选的，该预训练模型自底向上可以包括底层网络、至少一个中高层网络和预测层网络。所述底层网络和所述中高层网络用于进行特征识别；所述预测层网络用于根据识别的特征进行任务预测。其中，底层网络通常用于识别简单的特征，中高层网络通常用于从简单特征中抽象出复杂特征。例如，若预训练模型为进行意图识别的bert模型，则该bert模型的底层网络通常用于识别较简单的语法特征；中高层网络通常用于从语法特征中抽象出复杂特征。预测层网络用于根据底层网络和中高层网络识别出的特征进行任务预测。可选的，本申请实施例的预训练模型可以是bert模型。The pre-training model in this embodiment of the present application may be a high-precision model that is built based on a deep learning architecture and has been trained with massive data and can perform a certain learning task. The pre-training model usually has a deep network layer, Each layer of the network has a wide dimension and many model parameters. The pre-training model may be trained by the user using a large amount of sample data, or may be directly obtained from the pre-training model database, which is not limited in this embodiment. Optionally, the pre-training model may include a bottom-up network, at least one middle and high-level network, and a prediction layer network from the bottom up. The bottom layer network and the middle and high-level networks are used for feature identification; the prediction layer network is used for task prediction according to the identified features. Among them, the bottom network is usually used to identify simple features, and the middle and high-level networks are usually used to abstract complex features from simple features. For example, if the pre-trained model is a bert model for intent recognition, the underlying network of the bert model is usually used to identify simpler grammatical features; the middle and high-level networks are usually used to abstract complex features from grammatical features. The prediction layer network is used for task prediction based on the features identified by the underlying network and the middle and high-level networks. Optionally, the pre-training model in this embodiment of the present application may be a bert model.

示例性的，图1B所示的预训练模型1由12个网络层构成，其中第1网络层至第3网络层为底层网络10，第4网络层至第11网络层为中高层网络11，第12网络层为预测层网络12，其中，中高层网络11又包括中层网络110(即第4网络层至第7网络层)和高层网络111(即第8网络层至第11网络层)。Exemplarily, the pre-training model 1 shown in FIG. 1B is composed of 12 network layers, wherein the first network layer to the third network layer are the bottom network 10, the fourth network layer to the 11th network layer are the middle and high-level networks 11, The twelfth network layer is the prediction layer network 12, wherein the middle and high-level network 11 further includes a middle-level network 110 (ie, the 4th network layer to the 7th network layer) and a high-level network 111 (ie, the 8th network layer to the 11th network layer).

可选的，通常情况下，预训练模型的中高层网络抽象的复杂特征，与预测任务本身的相关性较低，要精准完成预测任务，主要依靠的还是底层网络。所以本操作可以对预训练模型的底层网络进行多次沉淀训练，在多次沉淀训练的过程中，不断调整训练对象(即预训练模型中需要训练的网络层)。其中，各次沉淀训练的训练对象至少包括底层网络和预测层网络，且包括逐次递减的中高层网络。也就是说，本实施例虽然是着重对预训练模型的底层网络进行沉淀训练，但是为了保证训练结果的准确性，每次训练的对象中至少要包括底层网络和预测层网络，对于中高层网络，其随着沉淀训练次数的增加，训练对象中包含的中高层网络的层数呈递减趋势。例如，假设对图1B所示的预训练模型1的底层网络10进行五次沉淀训练，则五次沉淀训练的训练对象中都包含底层网络10和预测层网络12，对于中间层网络，第一次沉淀训练的训练对象中可能包含所有网络层；第二次沉淀训练的训练对象可能递减为包含第4网络层至第9网络层；第三次沉淀训练的训练对象可能再次递减为包含第4网络层至第7网络层，依次递减，到第五次沉淀训练时，训练对象可能已经递减为不包含中高层网络11。Optionally, under normal circumstances, the complex features abstracted by the mid-to-high-level network of the pre-training model have a low correlation with the prediction task itself. To accurately complete the prediction task, the underlying network is mainly relied on. Therefore, this operation can perform multiple precipitation training on the underlying network of the pre-training model, and continuously adjust the training object (ie, the network layer that needs to be trained in the pre-training model) during the process of multiple precipitation training. Among them, the training objects of each precipitation training include at least the bottom network and the prediction layer network, and include the gradually decreasing middle and high level networks. That is to say, although this embodiment focuses on the precipitation training of the underlying network of the pre-training model, in order to ensure the accuracy of the training results, the objects of each training must at least include the underlying network and the prediction layer network. , and with the increase of the number of precipitation training, the number of layers of the middle and high-level networks included in the training object shows a decreasing trend. For example, assuming that the underlying network 10 of the pre-training model 1 shown in FIG. 1B is subjected to five precipitation trainings, the training objects of the five precipitation trainings all include the underlying network 10 and the prediction layer network 12. For the middle layer network, the first The training objects of the second precipitation training may include all network layers; the training objects of the second precipitation training may be reduced to include the 4th network layer to the ninth network layer; the training objects of the third precipitation training may be reduced to include the fourth network layer again. From the network layer to the seventh network layer, it decreases in turn. By the fifth precipitation training, the training object may have been reduced to not include the middle and high-level network 11 .

具体的，本步骤在根据训练任务数据集对预训练模型的底层网络进行至少两次深度训练时，可以是每次向预训练模型输入一部分训练任务数据集，并用这部分训练任务数据集对本次训练对象进行一次沉淀训练，然后将经过多次沉淀训练后的预训练模型作为强化模型。由于本步骤训练对象中的中高层网络随着训练次数的增加，逐次递减，因此本操作可以随着沉淀训练次数的增加，对底层网络进行更为精准的训练更新，使得底层网络的参数越来越精确。也就是说本实施例的强化模型与预训练模型相比，网络结构并没有发生变化，如若预训练模型为bert模型，则沉淀训练后的强化模型也是bert模型，只是底层网络的网络参数更为精准。Specifically, in this step, when the underlying network of the pre-training model is deeply trained at least twice according to the training task data set, a part of the training task data set may be input to the pre-training model each time, and this part of the training task data set may be used for the training task. The second training object is subjected to a precipitation training, and then the pre-trained model after multiple precipitation training is used as a reinforcement model. Since the middle and high-level networks in the training object in this step decrease successively with the increase of training times, this operation can perform more accurate training updates on the underlying network as the number of precipitation training increases, so that the parameters of the underlying network become more and more more precise. That is to say, compared with the pre-training model, the network structure of the reinforcement model in this embodiment has not changed. If the pre-training model is the bert model, the reinforcement model after precipitation training is also the bert model, but the network parameters of the underlying network are more Precise.

S102，将强化模型中的至少两个网络作为目标网络，并根据目标网络构建蒸馏模型。S102, at least two networks in the reinforcement model are used as target networks, and a distillation model is constructed according to the target networks.

其中，目标网络可以是从强化模型所包含的网络中筛选出的完成当前预测任务所需的网络。该目标网络包含特征识别网络和预测层网络；特征识别网络是用于进行特征识别的网络，本申请实施例中特征识别网络至少包括底层网络。可选的，除了底层网络之外，特征识别网络也可以包括部分或全部的中高层网络，对此本实施例不进行限定。Wherein, the target network may be a network required to complete the current prediction task selected from the networks included in the reinforcement model. The target network includes a feature identification network and a prediction layer network; the feature identification network is a network used for feature identification, and the feature identification network in this embodiment of the present application includes at least an underlying network. Optionally, in addition to the underlying network, the feature identification network may also include part or all of the middle and high-level networks, which is not limited in this embodiment.

可选的，本实施例可以是从强化模型中选择至少两个网络作为目标网络，其中，若选择两个网络时，这两个网络为强化模型的底层网络和预测层网络，此时目标网络中的特征识别网络仅包括底层网络；若选择三个或三个以上网络时，可以是在选择底层网络和预测层网络的基础上，从中高层网络中选择剩余的网络，此时目标网络中的特征网络除了包括底层网络外，还包括至少一个中高层网络。需要说明的是，是否将强化模型的中高层网络作为目标网络的特征识别网络，可以综合实际预测任务、后续要抽取的目标知识的类型等因素而定。对此本实施例不进行限定。Optionally, in this embodiment, at least two networks may be selected from the reinforcement model as the target network, wherein, if two networks are selected, the two networks are the bottom layer network and the prediction layer network of the reinforcement model, and the target network at this time. The feature recognition network in the network only includes the bottom network; if three or more networks are selected, the remaining networks can be selected from the middle and high-level networks on the basis of selecting the bottom network and the prediction layer network. In addition to the underlying network, the characteristic network also includes at least one middle and high-level network. It should be noted that whether to use the middle and high-level network of the enhanced model as the feature recognition network of the target network can be determined by combining factors such as the actual prediction task and the type of target knowledge to be extracted subsequently. This embodiment is not limited.

可选的，本实施例根据目标网络构建蒸馏模型时，可以是根据目标网络，构建一个同样包含目标网络的蒸馏模型。需要说明的是，本步骤构建的蒸馏模型的目标网络的网络类型要与强化模型的目标网络的类型相同。具体的，蒸馏模型中也要包括预测层网络和特征识别网络，关于特征识别网络，若从强化模型中选择的特征识别网络中只包括底层网络，则构建的蒸馏模型的特征识别网络中也只包括底层网络；若从强化模型中选择的特征识别网络中不但包括底层网络，还包括中高层网络中的中层网络，则构建的蒸馏模型的特征识别网络中也同样包括底层网络和中层网络。Optionally, when the distillation model is constructed according to the target network in this embodiment, a distillation model that also includes the target network may be constructed according to the target network. It should be noted that the network type of the target network of the distillation model constructed in this step should be the same as the target network type of the reinforcement model. Specifically, the distillation model also includes the prediction layer network and the feature recognition network. Regarding the feature recognition network, if the feature recognition network selected from the reinforcement model only includes the underlying network, the feature recognition network of the constructed distillation model will also only contain Including the bottom network; if the feature recognition network selected from the reinforcement model includes not only the bottom network, but also the middle network in the middle and high-level networks, the feature recognition network of the constructed distillation model also includes the bottom network and the middle network.

可选的，本操作根据目标网络构建蒸馏模型时，可以是结合强化模型的目标网络的网络层结构，构建一个与强化模型具有相同结构的蒸馏模型，即强化模型的同构模型。例如，若强化模型为bert模型，构建的蒸馏模型为只包含强化模型中的目标网络结构的bert模型。构建的蒸馏模型也可以与强化模型的结构不同，但同样要包含强化模型的目标网络类型，即强化模型的异构模型。例如，强化模型为bert模型，构建的蒸馏模型为CNN模型，但是该CNN模型中同样包括与强化模型相同类型的目标网络。具体如何构建同构或异构蒸馏模型的方法，将在后续实施例进行详细介绍。Optionally, when the distillation model is constructed according to the target network in this operation, the network layer structure of the target network of the reinforcement model can be combined to construct a distillation model with the same structure as the reinforcement model, that is, the isomorphism model of the reinforcement model. For example, if the reinforcement model is a bert model, the constructed distillation model is a bert model that only includes the target network structure in the reinforcement model. The structure of the constructed distillation model can also be different from that of the reinforcement model, but it also includes the target network type of the reinforcement model, that is, the heterogeneous model of the reinforcement model. For example, the reinforcement model is a bert model, and the constructed distillation model is a CNN model, but the CNN model also includes the same type of target network as the reinforcement model. The specific method of how to construct the isomorphic or heterogeneous distillation model will be described in detail in the following embodiments.

需要说明的是，本操作构建的蒸馏模型可以是机器学习模型，也可以是基于神经网络的小模型，该蒸馏模型的特点是参数少，推理速度块，移植性好。It should be noted that the distillation model constructed by this operation can be a machine learning model or a small model based on a neural network. The distillation model is characterized by few parameters, block inference speed, and good portability.

S103，通过强化模型的目标网络，抽取训练任务数据集的目标知识。S103, extract the target knowledge of the training task data set by strengthening the target network of the model.

其中，目标知识可以是强化模型中的目标网络对训练任务数据集处理后得到的结果，该目标知识用于后续注入到蒸馏模型中，作为蒸馏模型训练时的监督信号。Among them, the target knowledge can be the result obtained after the target network in the reinforcement model processes the training task data set, and the target knowledge is used for subsequent injection into the distillation model as a supervision signal during the training of the distillation model.

可选的，本步骤在抽取训练任务数据集的目标知识时，可以是将训练任务数据集作为强化模型的输入，获取强化模型的特征识别网络输出的第一数据特征表示，和强化模型的预测层网络输出的第一预测概率表示；并将获取的第一数据特征表示和第一预测概率表示作为训练任务数据集的目标知识。具体的，可以是将训练任务数据集按照预设尺寸，如batch_size大小，划分为多份。然后将划分后的每份训练任务数据输入到强化模型中，运行强化模型，获取强化模型的特征识别网络输出的特征表示作为第一数据特征表示。其中，若特征识别网络只有底层网络，则第一数据特征表示只是底层网络输出的特征表示；若特征识别网络包括底层网络和一部分中高层网络，则第一数据特征表示不但包括底层网络输出的特征表示，还包括该部分中高层网络输出的特征表示。获取强化模型的预测层网络输出的特征表示，如预测概率值，作为第一预测概率表示，进而将获取的第一数据特征表示和第一预测概率表示作为本次输入的训练任务数据对应的目标知识。Optionally, when extracting the target knowledge of the training task data set in this step, the training task data set may be used as the input of the reinforcement model, the first data feature representation output by the feature recognition network of the reinforcement model is obtained, and the prediction of the reinforcement model is obtained. The first prediction probability representation output by the layer network; and the acquired first data feature representation and the first prediction probability representation are used as the target knowledge of the training task data set. Specifically, the training task dataset may be divided into multiple copies according to a preset size, such as batch_size. Then, input the divided training task data into the reinforcement model, run the reinforcement model, and obtain the feature representation output by the feature recognition network of the reinforcement model as the first data feature representation. Among them, if the feature recognition network has only the underlying network, the first data feature representation is only the feature representation output by the underlying network; if the feature recognition network includes the underlying network and a part of middle and high-level networks, the first data feature representation includes not only the features output by the underlying network representation, and also includes the feature representation of the output of the high-level network in this part. Obtain the feature representation of the network output of the prediction layer of the reinforcement model, such as the prediction probability value, as the first prediction probability representation, and then use the acquired first data feature representation and the first prediction probability representation as the target corresponding to the training task data input this time. Knowledge.

S104，根据目标知识和训练任务数据集，对蒸馏模型进行训练，得到目标学习模型。S104, train the distillation model according to the target knowledge and the training task data set to obtain a target learning model.

可选的，本操作将S103获取的目标知识作为训练蒸馏模型的监督信号，基于训练任务数据集，诱导蒸馏模型进行训练，从而实现在训练的过程中，将目标知识迁移到蒸馏模型中，以使蒸馏模型学会强化模型的预测任务。具体的，本步骤可以是根据目标知识中的数据特征表示和预测概率表示，和蒸馏模型处理训练任务数据得到的数据特征表示和预测概率表示计算软监督标签，根据蒸馏模型处理训练任务数据的处理结果，计算硬监督标签，进而将软件监督标签与硬监督标签结合，通过更少的训练任务数据，更高效的学习效率对蒸馏模型进行蒸馏训练。具体如何计算硬监督标签和软件度标签，以及具体如何根据这两种监督标签进行蒸馏训练的过程将在后续实施例进行详细介绍。Optionally, in this operation, the target knowledge obtained in S103 is used as a supervision signal for training the distillation model, and based on the training task data set, the distillation model is induced to train, so that the target knowledge can be transferred to the distillation model during the training process, so as to improve the performance of the distillation model. The prediction task that makes the distillation model learn to strengthen the model. Specifically, this step may be to calculate the soft supervision label according to the data feature representation and prediction probability representation in the target knowledge, and the data feature representation and prediction probability representation obtained by processing the training task data with the distillation model, and process the training task data according to the distillation model. As a result, the hard-supervised labels are calculated, and then the software-supervised labels are combined with the hard-supervised labels to perform distillation training on the distillation model with less training task data and more efficient learning efficiency. How to calculate the hard supervised labels and software degree labels, and how to perform distillation training according to these two supervised labels will be described in detail in the following embodiments.

可选的，本步骤对蒸馏模型进行训练，训练后的蒸馏模型即为目标学习模型，由于该目标学习模型是对预训练模型进行知识蒸馏得到的，所以该目标学习模型可以精准执行预训练模型的预测任务，且目标学习模型相对于预训练模型结构简单，所以执行预测任务时，耗时短，响应速度快。Optionally, in this step, the distillation model is trained, and the trained distillation model is the target learning model. Since the target learning model is obtained by performing knowledge distillation on the pre-training model, the target learning model can accurately execute the pre-training model. Compared with the pre-training model, the target learning model has a simpler structure, so it takes less time and responds faster when performing the prediction task.

可选的，本实施例在训练得到目标学习模型后，可以将该目标学习模型部署到实际人机交互领域中，以执行线上任务的预测。优选的，若所述预训练模型和目标学习模型是用于进行意图识别的模型，相应的，在根据目标知识和训练任务数据集，对蒸馏模型进行训练，得到目标学习模型之后，本申请实施例还可以：将目标学习模型部署到人机交互设备中，以对人机交互设备实时获取的用户语音数据进行意图识别。具体的，目标学习模型在部署到人机交互设备中之后，人机交互设备在获取到用户语音数据后，会将该用户语音数据传输给该目标学习模型，目标学习模型会对输入的用户语音数据进行意图识别，并将意图识别结果反馈给人机交互设备，人机交互设备会根据目标学习模型的意图识别结果生成用户语音数据对应的响应结果，反馈给用户。本申请实施例的方案是通过知识蒸馏的方式训练得到目标学习模型的，其网络结构相比于预训练模型更为简单，且预测效果可逼近与复杂的预训练模型，可以实现快速且准确的进行意图识别，以满足人机交互设备实时响应的需求。Optionally, in this embodiment, after the target learning model is obtained through training, the target learning model can be deployed in the actual human-computer interaction field to perform online task prediction. Preferably, if the pre-training model and the target learning model are models used for intention recognition, correspondingly, after the distillation model is trained according to the target knowledge and the training task data set, and the target learning model is obtained, this application implements For example, the target learning model may also be deployed in the human-computer interaction device, so as to perform intent recognition on the user's speech data obtained in real time by the human-computer interaction device. Specifically, after the target learning model is deployed in the human-computer interaction device, after the human-computer interaction device obtains the user's voice data, it will transmit the user's voice data to the target learning model, and the target learning model will respond to the input user voice. The data is used for intention recognition, and the intention recognition result is fed back to the human-computer interaction device. The human-computer interaction device will generate a response result corresponding to the user's voice data according to the intention recognition result of the target learning model, and feed it back to the user. The solution of the embodiment of the present application is to train the target learning model by means of knowledge distillation. Compared with the pre-training model, the network structure is simpler, and the prediction effect can be approximated to the complex pre-training model, which can achieve fast and accurate Intent recognition is performed to meet the real-time response requirements of human-computer interaction devices.

本实施例的技术方案，根据训练任务数据集，以底层网络、预测层网络和逐次递减的中高层网络为训练对象，对预训练模型的底层网络进行至少两次沉淀训练，得到强化模型；根据从强化模型中确定的目标网络，络构建蒸馏模型。再通过强化模型的目标网络，抽取训练任务数据集的目标知识；基于抽取的目标知识和训练任务数据集，对蒸馏模型进行训练，得到目标学习模型。本实施例采用逐次递减中高层网络的方式对预训练模型的底层网络进行多次沉淀训练，可以使得预训练模型的底层网络的参数更为精准。后续至少根据沉淀后精准的底层网络和预测层网络构建蒸馏模型，并基于提取出的目标知识对蒸馏模型进行蒸馏训练，使得从预训练模型中蒸馏出的目标学习模型在精简了网络结构的同时，保留了预训练模型的预测精准性，还实现了提高模型的泛化能力。且整个蒸馏过程不受人为因素的影响，将该目标学习模型部署到人机交互设备中，可以实现快速准确的执行任务，以满足人机交互设备实时响应的需求。In the technical solution of this embodiment, according to the training task data set, the bottom network, the prediction layer network and the successively decreasing middle and high-level networks are used as training objects, and the bottom network of the pre-training model is subjected to precipitation training at least twice to obtain a strengthened model; From the target network identified in the reinforcement model, the network builds the distillation model. Then, by strengthening the target network of the model, the target knowledge of the training task data set is extracted; based on the extracted target knowledge and the training task data set, the distillation model is trained to obtain the target learning model. In this embodiment, the underlying network of the pre-training model is subjected to multiple precipitation training in the manner of successively decreasing the middle and high-level networks, which can make the parameters of the underlying network of the pre-training model more accurate. Subsequent at least build a distillation model based on the accurate underlying network and prediction layer network after precipitation, and perform distillation training on the distillation model based on the extracted target knowledge, so that the target learning model distilled from the pre-training model simplifies the network structure at the same time. , retains the prediction accuracy of the pre-trained model, and also improves the generalization ability of the model. And the whole distillation process is not affected by human factors. Deploying the target learning model into human-computer interaction equipment can perform tasks quickly and accurately to meet the real-time response requirements of human-computer interaction equipment.

可选的，本申请实施例中的预训练模型是已经训练好的可执行某种预测任务的模型，当该预测任务覆盖领域比较广时，预训练模型可能在多领域都能进行任务预测，但是对于其中某一领域而言，预测效果可能就不是很好了。例如，若预训练模型是用于意图识别的模型，其可以对购物、业务办理以及智能家具控制等众多领域的用户语音进行意图识别，但是对于其中具体的某些领域而言，可能预测效果并不是很准确。针对该情况，本实施例可以是在根据训练任务数据集，对预训练模型的底层网络进行至少两次沉淀训练之前，执行根据训练领域数据集，对预训练模型进行领域训练，更新所述预训练模型。Optionally, the pre-training model in the embodiment of the present application is a model that has been trained to perform a certain prediction task. When the prediction task covers a relatively wide field, the pre-training model may be able to perform task prediction in multiple fields. But for one of these fields, the prediction effect may not be very good. For example, if the pre-trained model is a model for intent recognition, it can perform intent recognition on user voices in many fields such as shopping, business processing, and smart furniture control, but for some specific fields, the prediction effect may not be effective. not very accurate. In view of this situation, in this embodiment, before performing precipitation training on the underlying network of the pre-training model at least twice according to the training task data set, perform domain training on the pre-training model according to the training domain data set, and update the pre-training model. Train the model.

具体的，训练领域数据集可以是基于预训练模型待部署的工作领域，专门获取与该领域相关的样本数据作为训练领域数据集，例如，若预训练模型需要执行的是购物领域用户语音的意图识别，则此时可以是将各个购物平台的所有语音数据进行相关处理(如打标签、删除无效数据等处理)后得到该领域对应的训练领域数据集。将训练领域数据集输入到预训练模型中，以对预训练模型针对该领域进行更新训练，微调预训练模型的参数，使得更新后的预训练模型能够更为精准的执行该领域的预测任务。本实施例在根据训练领域数据集，对预训练模型进行领域训练，更新预训练模型后，对更新后的预训练模型执行S101的沉淀训练操作，这样设置的好处是极大的提高了预训练模型在其预测任务所属领域的预测精度。为后续基于该预训练模型蒸馏出精准的目标学习模型提供了保障。Specifically, the training domain data set may be the work domain to be deployed based on the pre-training model, and sample data related to this domain is specially obtained as the training domain data set. For example, if the pre-training model needs to execute the intent of the user's voice in the shopping domain At this time, all the voice data of each shopping platform can be processed by related processing (such as tagging, deleting invalid data, etc.) to obtain a training field data set corresponding to the field. Input the training domain data set into the pre-training model to update and train the pre-training model for this domain, and fine-tune the parameters of the pre-training model, so that the updated pre-training model can more accurately perform prediction tasks in this domain. In this embodiment, domain training is performed on the pre-training model according to the training domain data set, and after the pre-training model is updated, the precipitation training operation of S101 is performed on the updated pre-training model. The advantage of this setting is that the pre-training is greatly improved. The prediction accuracy of the model in the domain of its prediction task. It provides a guarantee for the subsequent distillation of an accurate target learning model based on the pre-training model.

图2是根据本申请实施例提供的另一种基于知识蒸馏的模型训练方法的流程图；本实施例在上述实施例的基础上，进行了进一步的优化，给出了根据训练任务数据集，对预训练模型的底层网络进行至少两次沉淀训练的具体情况介绍。如图2所示，该方法包括：2 is a flowchart of another model training method based on knowledge distillation provided according to an embodiment of the present application; the present embodiment is further optimized on the basis of the above-mentioned embodiment, and provides a data set according to a training task, The specific situation of performing precipitation training at least twice for the underlying network of the pre-trained model is introduced. As shown in Figure 2, the method includes:

S201，将训练任务数据集进行划分，以确定多份训练数据子集。S201: Divide the training task data set to determine multiple training data subsets.

可选的，本操作可以是依据预设的沉淀策略，如每次抽取知识的网络层数，将训练任务数据集划分为多份训练数据子集。例如，若预训练模型为图1B所示的模型，且沉淀策略是每次抽取一层网络的知识，则此时可以是将训练任务数据集划分为12份。其中，训练数据子集的划分份数小于等于预训练模型的总层数。例如，当预训练模型的总层数N时，本步骤划分的训练数据子集的份数K可以等于总层数N的一半。可选的，划分后的各份训练数据子集中的训练数据数量可以相同，也可以不同，对此本实施例不进行限定。Optionally, this operation may be to divide the training task data set into multiple training data subsets according to a preset precipitation strategy, such as the number of network layers for each knowledge extraction. For example, if the pre-training model is the model shown in FIG. 1B , and the precipitation strategy is to extract the knowledge of one layer of the network at a time, then the training task data set can be divided into 12 parts at this time. Among them, the number of divisions of the training data subset is less than or equal to the total number of layers of the pre-training model. For example, when the total number of layers of the pre-training model is N, the number K of the training data subsets divided in this step may be equal to half of the total number of layers N. Optionally, the number of training data in the divided training data subsets may be the same or different, which is not limited in this embodiment.

S202，根据设定沉淀训练次数，确定每份训练数据子集各自对应的训练对象。S202, according to the set precipitation training times, determine the training objects corresponding to each training data subset.

其中，训练对象可以是每次进行沉淀训练时，预训练模型中需要进行训练的网络层。本申请实施例每份训练数据子集对应的训练对象不同。具体的，各份训练数据子集对应的训练对象包括预训练模型的底层网络、中高层网络和预测层网络，且包括的中高层网络的层数与沉淀训练的顺序呈反比。且各训练对象包括的中高层网络是与底层网络相邻且向上连续的网络层。也就是说，各份训练数据子集对应的训练对象中，底层网络和预测层网络保持不变，中高层网络的层数随着训练数据子集对应的沉淀训练顺序的后移，中高层网络的层数自上而下层逐次减少。可选的，且基于沉淀训练次数的增加，训练对象中包括的中高层网络的层数递减为零。从而实现随着沉淀训练次数的增加，最后只对底层网络进行更新训练。Among them, the training object may be the network layer that needs to be trained in the pre-training model each time the precipitation training is performed. The training objects corresponding to each training data subset in the embodiments of the present application are different. Specifically, the training objects corresponding to each training data subset include the underlying network, the middle and high-level networks, and the prediction layer network of the pre-training model, and the number of layers of the middle and high-level networks included is inversely proportional to the order of precipitation training. And the middle and high-level networks included in each training object are network layers adjacent to the underlying network and continuous upwards. That is to say, in the training objects corresponding to each training data subset, the underlying network and the prediction layer network remain unchanged, and the number of layers in the middle and high-level networks moves backward with the precipitation training order corresponding to the training data subsets. The number of layers decreases from top to bottom. Optionally, and based on the increase of the number of precipitation training, the number of layers of the middle and high-level networks included in the training object is decreased to zero. In this way, as the number of precipitation training increases, only the underlying network is finally updated and trained.

可选的，本实施例在确定每份训练数据子集各自对应的训练对象时，底层网络和预测层网络是不变的，可以根据预训练模型的总层数和每份训练数据子集对应的沉淀训练顺序，确定每份训练数据子集对应的训练对象中的中高层网络的层数。例如，若预训练模型的总层数为N，某一份训练数据子集对应的沉淀训练顺序为第k次，则该份训练数据子集对应的训练对象中包含的中高层网络的最高层数为S＝N-2*k，即，S层以下的中高层网络都是该份训练数据集对应的训练对象。Optionally, in this embodiment, when determining the training objects corresponding to each training data subset, the underlying network and the prediction layer network are unchanged, and the total number of layers of the pre-training model and the corresponding training data subsets can be used. The precipitation training sequence is determined by determining the number of middle and high-level network layers in the training object corresponding to each training data subset. For example, if the total number of layers of the pre-training model is N, and the precipitation training sequence corresponding to a certain training data subset is the kth time, then the highest layer of the middle and high-level network included in the training object corresponding to the training data subset The number is S=N-2*k, that is, the middle and high-level networks below the S layer are the training objects corresponding to the training data set.

S203，根据每份训练数据子集，对预训练模型中，该份训练数据子集对应的训练对象进行一次沉淀训练，得到预训练模型的强化模型。S203 , according to each training data subset, perform a precipitation training on the training objects corresponding to the training data subset in the pre-training model, to obtain an enhanced model of the pre-training model.

可选的，确定出划分后的每份训练数据子集对应的训练对象后，可以是按照各份训练数据子集对应的沉淀训练顺序，依次将各份训练数据子集输入到预训练模型中，利用输入的训练数据子集，对预训练模型中本次训练对象对应的各网络层进行训练，更新训练对象对应的各网络层的参数。由于本实施例各份训练数据子集对应的训练对象中的中高层网络的层数随着沉淀训练次数的增加，逐次递减，所以多次沉淀训练的过程中，更新的中高层网络的参数越来越少，实现训练过程逐渐向底层网络进行集中，经过多次沉淀训练，会使预训练模型的底层网络越来越精确，此时可以将经过多次沉淀训练后的预训练模型作为强化模型。Optionally, after determining the training objects corresponding to each of the divided training data subsets, each training data subset may be sequentially input into the pre-training model according to the precipitation training sequence corresponding to each training data subset. , using the input training data subset to train each network layer corresponding to the current training object in the pre-training model, and update the parameters of each network layer corresponding to the training object. Since the number of layers of the middle and high-level networks in the training objects corresponding to the training data subsets in this embodiment gradually decreases with the increase of the number of precipitation training, the updated parameters of the mid- and high-level networks in the process of multiple precipitation training will be more and more The less and less, the training process is gradually concentrated to the underlying network. After multiple precipitation training, the underlying network of the pre-trained model will become more and more accurate. At this time, the pre-trained model after multiple precipitation training can be used as a reinforcement model. .

S204，将强化模型中的至少两个网络作为目标网络，并根据目标网络构建蒸馏模型。S204, at least two networks in the reinforcement model are used as target networks, and a distillation model is constructed according to the target networks.

其中，目标网络包含特征识别网络和所述预测层网络；特征识别网络至少包括底层网络。Wherein, the target network includes a feature identification network and the prediction layer network; the feature identification network includes at least a bottom layer network.

S205，通过强化模型的目标网络，抽取训练任务数据集的目标知识。S205, extract the target knowledge of the training task data set by strengthening the target network of the model.

S206，根据目标知识和训练任务数据集，对蒸馏模型进行训练，得到目标学习模型。S206, train the distillation model according to the target knowledge and the training task data set to obtain a target learning model.

本实施例的技术方案，将训练任务数据集划分为多份训练数据子集，并基于训练对象中的中高层网络的层数与沉淀训练的顺序呈反比的原则，确定每份训练数据子集的训练对象，根据划分后下每份训练数据子集，对其训练对象进行一次沉淀训练，得到强化模型。根据该强化模型，络构建蒸馏模型，以及抽取目标知识；进而基于抽取的目标知识和训练任务数据集，对蒸馏模型进行训练，得到目标学习模型。本实施例基于训练对象中的中高层网络的层数与沉淀训练的顺序呈反比的原则，确定每次沉淀训练的训练对象，使得多次沉淀训练后的预训练模型的底层网络的参数更为精准。为知识蒸馏过程的知识沉淀操作提供了一种新思路。为知识蒸馏训练目标学习模型的后续操作提供了保障。In the technical solution of this embodiment, the training task data set is divided into multiple training data subsets, and each training data subset is determined based on the principle that the number of layers of the middle and high-level networks in the training object is inversely proportional to the order of precipitation training For the training objects, according to each training data subset after division, a precipitation training is performed on the training objects to obtain the reinforcement model. According to the reinforcement model, the network constructs a distillation model and extracts target knowledge; then, based on the extracted target knowledge and training task data set, the distillation model is trained to obtain a target learning model. Based on the principle that the number of layers of the middle and high-level networks in the training objects is inversely proportional to the sequence of precipitation training, the present embodiment determines the training objects for each precipitation training, so that the parameters of the underlying network of the pre-trained model after multiple precipitation training are more Precise. It provides a new idea for the knowledge precipitation operation in the knowledge distillation process. It provides a guarantee for the subsequent operation of the knowledge distillation training target learning model.

图3是根据本申请实施例提供的另一种基于知识蒸馏的模型训练方法的流程图，本实施例在上述实施例的基础上，进行了进一步的优化，给出了对预训练模型的底层网络进行多次沉淀训练的过程中，何时得到预训练模型的强化模型的具体情况介绍。如图3所示，该方法包括：Fig. 3 is a flowchart of another model training method based on knowledge distillation provided according to an embodiment of the present application. This embodiment is further optimized on the basis of the above-mentioned embodiment, and provides the bottom layer of the pre-training model. In the process of multiple precipitation training of the network, the specific situation of when to get the reinforcement model of the pre-trained model is introduced. As shown in Figure 3, the method includes:

S301，根据训练任务数据集，对预训练模型的底层网络逐次进行沉淀训练。S301, according to the training task data set, successively perform precipitation training on the underlying network of the pre-training model.

需要说明的是，本步骤对预训练模型的底层网络逐次进行沉淀训练的具体实现方式在上述实施例中已经进行了详细介绍，在此不进行赘述。It should be noted that, in this step, the specific implementation of successive precipitation training for the underlying network of the pre-training model has been described in detail in the above embodiments, and will not be repeated here.

S302，根据测试任务数据集，对沉淀训练后的预训练模型进行测试。S302, test the pre-trained model after precipitation training according to the test task data set.

其中，测试任务数据集可以是用于测试沉淀训练后的预训练模型是否可以精准完成预测任务的测试数据。可选的，可以根据预训练模型需要执行的预测任务，获取与该预测任务相关的样本数据，然后将其分为两份，一份作为本申请实施例的训练任务数据集，另一份作为本申请实施例的测试任务数据集。The test task data set may be test data used to test whether the pre-trained model after precipitation training can accurately complete the prediction task. Optionally, the sample data related to the prediction task can be obtained according to the prediction task to be performed by the pre-training model, and then divided into two parts, one part is used as the training task data set of the embodiment of the present application, and the other part is used as the training task data set of the embodiment of the present application. The test task dataset of the embodiment of the present application.

可选的，本实施例可以是将测试任务数据集输入到S301经过多次沉淀训练后的预训练模型中，得到沉淀训练后的预训练模型基于测试任务数据输出的预测结果，最后根据测试任务数据中的真实标签对预测结果进行分析，计算表征多次沉淀训练后的预训练模型输出结果是否准确的评价指标值，并将该评价指标值作为测试结果。可选的，该评价指标值可以是根据预测任务而定，如可以是多次沉淀训练后的预训练模型输出结果的准确率、精确率和召回率等。Optionally, in this embodiment, the test task data set may be input into the pre-training model after multiple precipitation training in S301, to obtain the prediction result output by the pre-training model after precipitation training based on the test task data, and finally according to the test task. The real labels in the data analyze the prediction results, calculate the evaluation index value that characterizes whether the output result of the pre-training model after multiple precipitation training is accurate, and use the evaluation index value as the test result. Optionally, the evaluation index value may be determined according to the prediction task, for example, may be the accuracy rate, precision rate, and recall rate of the output results of the pre-training model after multiple precipitation training.

可选的，为了保证测试结果的准确性，本实施例可以是采用多组测试任务数据集对沉淀训练后的预训练模型进行多次测试，根据多次测试结果来确定最终的测试结果。Optionally, in order to ensure the accuracy of the test results, in this embodiment, multiple sets of test task data sets may be used to perform multiple tests on the pre-trained model after precipitation training, and the final test result may be determined according to the multiple test results.

S303，若测试结果满足沉淀结束条件，则将沉淀训练后的预训练模型作为强化模型。S303, if the test result satisfies the precipitation end condition, the pre-trained model after the precipitation training is used as the reinforcement model.

其中，沉淀结束条件可以是判断经过多次沉淀训练后的预训练模型是否满足作为强化模型的判断条件。具体的，可以是测试结果中的评价指标值对应的指标阈值。Wherein, the precipitation end condition may be judging whether the pre-training model after multiple precipitation training satisfies the judgment condition as a reinforcement model. Specifically, it may be an index threshold corresponding to the evaluation index value in the test result.

可选的，本实施例可以是将S302对沉淀训练后的预训练模型进行测试，得到的测试结果(即评价指标值)与沉淀结束条件中的指标阈值进行比较，若评价指标值满足指标阈值，则说明测试结果满足沉淀结束条件，此时可以将沉淀训练后的预训练模型作为强化模型；若测试结果不满足沉淀结束条件，则需要返回S301继续根据训练任务数据集，对预训练模型的底层网络逐次进行沉淀训练，直到测试结果满足沉淀结束条件。Optionally, in this embodiment, the pre-training model after the precipitation training can be tested in S302, and the obtained test result (that is, the evaluation index value) is compared with the index threshold in the precipitation end condition, if the evaluation index value satisfies the index threshold. , it means that the test result satisfies the precipitation end condition, and the pre-training model after precipitation training can be used as the reinforcement model; if the test result does not meet the precipitation end condition, it is necessary to return to S301 to continue according to the training task data set, for the pre-training model The underlying network performs precipitation training successively until the test result meets the precipitation end condition.

S304，将强化模型中的至少两个网络作为目标网络，并根据目标网络构建蒸馏模型。S304, at least two networks in the reinforcement model are used as target networks, and a distillation model is constructed according to the target networks.

S305，通过强化模型的目标网络，抽取训练任务数据集的目标知识。S305, extract the target knowledge of the training task data set by strengthening the target network of the model.

S306，根据目标知识和训练任务数据集，对蒸馏模型进行训练，得到目标学习模型。S306, train the distillation model according to the target knowledge and the training task data set to obtain a target learning model.

本实施例的技术方案，根据训练任务数据集，对预训练模型的底层网络进行多次沉淀训练后，根据测试任务数据集对沉淀训练后的预训练模型进行测试，如果测试通过，方可将沉淀训练后的预训练模型作为强化模型。进而根据该强化模型，络构建蒸馏模型，以及抽取目标知识；并基于抽取的目标知识和训练任务数据集，对蒸馏模型进行训练，得到目标学习模型。本实施例通过对知识沉淀后的预训练模型进行测试，来确定知识沉淀是否达到沉淀训练的预期效果，只有达到预期效果，才可以作为强化模型，保证了得到的强化模型的底层网络参数的精准性。为知识蒸馏训练目标学习模型的后续操作提供了保障。In the technical solution of this embodiment, according to the training task data set, the underlying network of the pre-training model is subjected to multiple precipitation training, and then the pre-trained model after precipitation training is tested according to the test task data set. The pre-trained model after precipitation training is used as a reinforcement model. Then, according to the reinforcement model, the network constructs a distillation model and extracts target knowledge; and based on the extracted target knowledge and training task data set, the distillation model is trained to obtain a target learning model. In this embodiment, the pre-training model after knowledge precipitation is tested to determine whether the knowledge precipitation achieves the expected effect of precipitation training. Only when the expected effect is achieved can it be used as a reinforcement model, which ensures the accuracy of the underlying network parameters of the obtained reinforcement model. sex. It provides a guarantee for the subsequent operation of the knowledge distillation training target learning model.

可选的，上述实施例介绍了在对预训练模型的底层网络进行多次沉淀训练的过程中，何时得到预训练模型的强化模型的确定过程，同理，本实施例在根据目标知识和训练任务数据集，对蒸馏模型进行训练的过程中，也可以采用类似的方法，来判断蒸馏模型是否训练完成，可得到目标学习模型。具体的：本申请实施例可以是在执行根据目标知识和训练任务数据集，对蒸馏模型进行训练，得到目标学习模型时，具体执行：根据目标知识和训练任务数据集，对蒸馏模型进行训练；根据测试任务数据集，对训练后的蒸馏模型进行测试；若测试结果满足训练结束条件，则将训练后的蒸馏模型作为目标学习模型。需要说明的是，根据训练任务数据集对训练后的蒸馏模型进行测试的过程，与上述实施例介绍的根据训练任务数据集对沉淀训练后的预训练模型进行测试的过程相似，如可以是将测试任务数据集输入到训练后的蒸馏模型中，根据训练后的蒸馏模型的输出的预测结果与测试任务数据集的真实标签计算评价指标值，若评价指标值满足训练结束条件中的指标阈值，则说明对训练后的蒸馏模型的测试结果满足训练结束条件，可以将本次训练后的蒸馏模型作为目标学习模型。这样设置的好处是通过对训练后的蒸馏模型进行测试，来确定训练后的蒸馏模型的任务预测精度是否达到预期效果，只有达到预期效果，才可以将其作为最终的目标学习模型，提高了基于知识蒸馏技术蒸馏到的目标学习模型的准确性。Optionally, the above-mentioned embodiment introduces the process of determining when the reinforcement model of the pre-training model is obtained in the process of performing multiple precipitation training on the underlying network of the pre-training model. Similarly, this embodiment is based on the target knowledge and In the training task data set, a similar method can also be used in the process of training the distillation model to determine whether the distillation model has been trained, and the target learning model can be obtained. Specifically: in the embodiment of the present application, when performing training on the distillation model according to the target knowledge and the training task data set, and obtaining the target learning model, the specific execution is: according to the target knowledge and the training task data set, the distillation model is trained; According to the test task data set, the trained distillation model is tested; if the test result meets the training end condition, the trained distillation model is used as the target learning model. It should be noted that the process of testing the trained distillation model according to the training task data set is similar to the process of testing the pre-trained model after precipitation training according to the training task data set described in the above embodiment. The test task data set is input into the trained distillation model, and the evaluation index value is calculated according to the predicted result of the output of the trained distillation model and the true label of the test task data set. If the evaluation index value meets the index threshold in the training end condition, It means that the test result of the trained distillation model satisfies the training end condition, and the trained distillation model can be used as the target learning model. The advantage of this setting is to test the trained distillation model to determine whether the task prediction accuracy of the trained distillation model achieves the expected effect. Only when the expected effect is achieved can it be used as the final target learning model. The accuracy of the target learned model distilled by the knowledge distillation technique.

图4-图5是根据本申请实施例提供的两种基于知识蒸馏的模型训练方法的流程图，本实施例在上述实施例的基础上，进行了进一步的优化，给出了根据目标网络构建蒸馏模型的两种具体实施方式的介绍。4 to 5 are flowcharts of two knowledge distillation-based model training methods provided according to the embodiments of the present application. On the basis of the above-mentioned embodiments, the present embodiment further optimizes the An introduction to two specific implementations of the distillation model.

可选的，图4示出的是根据目标网络构建与强化模型同结构的蒸馏模型的可实施方式，具体的：Optionally, Figure 4 shows an implementable implementation of constructing a distillation model with the same structure as the reinforcement model according to the target network, specifically:

S401，根据训练任务数据集，对预训练模型的底层网络进行至少两次沉淀训练，得到预训练模型的强化模型。S401, according to the training task data set, perform precipitation training on the underlying network of the pre-training model at least twice to obtain an enhanced model of the pre-training model.

其中，各次沉淀训练的训练对象至少包括底层网络和预测层网络，且包括逐次递减的中高层网络，预训练模型自底向上包括底层网络、至少一个所述中高层网络和预测层网络。Among them, the training objects of each precipitation training include at least the bottom network and the prediction layer network, and include the middle and high-level networks that gradually decrease, and the pre-training model includes the bottom layer network, at least one of the middle and high-level networks and the prediction layer network from bottom to top.

S402，将强化模型中的至少两个网络作为目标网络，并获取目标网络的网络结构块。S402, at least two networks in the reinforcement model are used as target networks, and network structural blocks of the target network are acquired.

其中，目标网络包含强化模型的特征识别网络和预测层网络；所述特征识别网络至少包括强化模型的底层网络。可选的，还可以包括强化模型的部分或全部的中高层网络。由于本实施例构建的蒸馏模型的网络结构要比强化模型简单，所以通常情况下，本申请实施例的目标网络的特征识别网络中通常不包含或者仅包含少量的中高层网络。网络结构块可以是对强化模型中的一个或多个网络层的网络结构进行封装后得到的，例如，假设本实施例的强化模型是对图1B示出的预训练模型进行沉淀训练后得到的，那强化模型的网络结构应该也如图1B所示，此时可以将图1B中的第1网络层至第3网络层的网络结构封装为底层网络10的网络结构块；将第4网络层至第7网络层的网络结构封装为中层网络110的网络结构块；将第8网络层至第11网络层的网络结构封装为高层网络111的网络结构块；将第12网络层的网络结构封装为预测层网络12的网络结构块。Wherein, the target network includes a feature identification network and a prediction layer network of the enhanced model; the feature identification network includes at least a bottom network of the enhanced model. Optionally, a part or all of the mid- and high-level networks of the enhanced model may also be included. Since the network structure of the distillation model constructed in this embodiment is simpler than that of the reinforcement model, in general, the feature recognition network of the target network in this embodiment of the present application usually does not include or only includes a small number of middle and high-level networks. The network structure block may be obtained by encapsulating the network structure of one or more network layers in the reinforcement model. For example, it is assumed that the reinforcement model in this embodiment is obtained by precipitation training the pre-training model shown in FIG. 1B . , the network structure of the enhanced model should also be shown in Figure 1B. At this time, the network structure of the first network layer to the third network layer in Figure 1B can be encapsulated as the network structure block of the underlying network 10; The network structure to the 7th network layer is encapsulated as the network structure block of the middle network layer 110; the network structure of the 8th network layer to the 11th network layer is encapsulated as the network structure block of the high-level network 111; the network structure of the 12th network layer is encapsulated It is the network structure block of the prediction layer network 12 .

可选的，如果要构建与强化模型具有相同结构的蒸馏模型时，可以是在从强化模型中选择出目标网络后，获取目标网络在强化模型中对应的网络结构块。例如，将强化模型中的底层网络和预测层网络作为目标网络，则可以是底层网络的网络结构块，和预测层网络的网络结构块，作为目标网络的网络结构块。Optionally, if a distillation model with the same structure as the reinforcement model is to be constructed, the network structure block corresponding to the target network in the reinforcement model can be obtained after the target network is selected from the reinforcement model. For example, taking the underlying network and the prediction layer network in the reinforcement model as the target network, the network structure block of the underlying network and the network structure block of the prediction layer network can be used as the network structure block of the target network.

S403，根据获取的网络结构块，构建与强化模型同结构的蒸馏模型。S403, construct a distillation model with the same structure as the reinforcement model according to the acquired network structure blocks.

可选的，由于目标网络对应的是强化模型中的至少两个网络，所以获取的网络结构块也是至少两个网络的网络结构块，本步骤可以是对至少两个网络结构块按照其在强化模型中自下而上的顺序，进行排列，并将位于下方的网络结构块的输出作为与其相邻的上方网络结构块的输入，从而形成一个由目标网络构成的新模型，该新模型即为构建的蒸馏模型。Optionally, since the target network corresponds to at least two networks in the reinforcement model, the acquired network structure blocks are also network structure blocks of at least two networks. The order from the bottom to the top in the model is arranged, and the output of the lower network structure block is used as the input of the upper network structure block adjacent to it, so as to form a new model composed of the target network, the new model is The constructed distillation model.

例如，假设S402获取的是图1B中的底层网络10和预测层网络12的网络结构块，由于底层网络10位于预测层网络12的下方，则此时可以是将底层网络10的网络结构块置于预测层网络12的网络结构块下方，并将底层网络10的网络结构块的输出与预测层网络12的网络结构块的输入连接，从而生成一个由底层网络10的网络结构块和预测层网络12的网络结构块构成的蒸馏模型。同理，如果S402获取的是底层网络10、中层网络110和预测层网络12的网络结构块，则此时可以是将底层网络10的网络结构块位于最下方，将中层网络110的网络结构块位于中间，将预测层网络12的网络结构块位于最上方，将底层网络10的网络结构块的输出连接中层网络110的网络结构块的输入，将中层网络110的网络结构块的输出连接预测层网络12的网络结构块的输入，从而生成一个由底层网络10的网络结构块、中层网络110的网络结构块和预测层网络12的网络结构块构成的蒸馏模型。For example, assuming that S402 acquires the network structure blocks of the underlying network 10 and the prediction layer network 12 in FIG. 1B , since the underlying network 10 is located below the prediction layer network 12 , the network structure block of the underlying network 10 can be set at this time. Below the network structure block of the prediction layer network 12, and connect the output of the network structure block of the underlying network 10 with the input of the network structure block of the prediction layer network 12, thereby generating a network structure block of the underlying network 10 and the prediction layer network. A distillation model composed of 12 network building blocks. Similarly, if S402 obtains the network structure blocks of the bottom layer network 10, the middle layer network 110 and the prediction layer network 12, then at this time, the network structure block of the bottom layer network 10 may be located at the bottom, and the network structure block of the middle layer network 110 may be located at the bottom. Located in the middle, the network structure block of the prediction layer network 12 is located at the top, the output of the network structure block of the bottom network 10 is connected to the input of the network structure block of the middle layer network 110, and the output of the network structure block of the middle layer network 110 is connected to the prediction layer. The input of the network structure blocks of the network 12 , thereby generating a distillation model composed of the network structure blocks of the bottom layer network 10 , the network structure blocks of the middle layer network 110 , and the network structure blocks of the prediction layer network 12 .

S404，通过强化模型的目标网络，抽取训练任务数据集的目标知识。S404, extract the target knowledge of the training task data set by strengthening the target network of the model.

S405，根据目标知识和训练任务数据集，对蒸馏模型进行训练，得到目标学习模型。S405, train the distillation model according to the target knowledge and the training task data set to obtain a target learning model.

可选的，图5示出的是根据目标网络构建与强化模型结构不同的蒸馏模型的可实施方式，具体的：Optionally, Fig. 5 shows an implementable manner of constructing a distillation model with a structure different from that of the reinforcement model according to the target network, specifically:

S501，根据训练任务数据集，对预训练模型的底层网络进行至少两次沉淀训练，得到预训练模型的强化模型。S501 , according to the training task data set, perform precipitation training on the underlying network of the pre-training model at least twice to obtain an enhanced model of the pre-training model.

其中，各次沉淀训练的训练对象至少包括底层网络和预测层网络，且包括逐次递减的中高层网络，预训练模型自底向上包括底层网络、至少一个中高层网络和预测层网络。Among them, the training objects of each precipitation training include at least the bottom network and the prediction layer network, and include the middle and high-level networks that gradually decrease, and the pre-training model includes the bottom layer network, at least one middle and high-level network and the prediction layer network from bottom to top.

S502，将强化模型中的至少两个网络作为目标网络。S502, at least two networks in the reinforcement model are used as target networks.

可选的，从强化模型中选择目标网络的过程在上述实施例已经进行了介绍，在此本实施例不进行赘述。Optionally, the process of selecting the target network from the reinforcement model has been introduced in the foregoing embodiment, and is not repeated in this embodiment.

S503，根据目标网络，选择与强化模型结构不同的神经网络模型作为蒸馏模型。S503, according to the target network, a neural network model with a structure different from that of the reinforcement model is selected as the distillation model.

其中，神经网络模型的输出层网络与目标网络中预测层网络的类型一致，神经网络模型的非输出层网络与目标网络中特征识别网络的类型一致。所谓预测层网络的类型是指网络的类型属于预测型，即进行任务预测类型。所谓特征识别网络的类型包括：底层网络、中层网络，以及高层网络等。Among them, the output layer network of the neural network model is of the same type as the prediction layer network in the target network, and the non-output layer network of the neural network model is of the same type as the feature recognition network in the target network. The so-called prediction layer network type means that the type of the network belongs to the prediction type, that is, the type of task prediction. The types of so-called feature recognition networks include: bottom-level networks, middle-level networks, and high-level networks.

可选的，由于本可实施方式构建的蒸馏模型与强化模型的结构不同，所以，此时可以根据需求选择一个结构简单，且可用于实现预测任务的神经网络模型作为蒸馏模型。其中，可以选作蒸馏模型的神经网络模型的结构通常比较简单，层数较少，但是需要神经网络模型的输出层与目标网络中预测层网络的类型一致，非输出层网络与目标网络中特征识别网络的类型一致。即需要神经网络模型的输出层为可进行任务预测的网络，而其非输出层需要为与目标网络的特征识别网络的类型一致，例如，若目标网络的特征识别网络的类型为底层网络，则该神经网络模型的非输出层的类型也应该为底层网络；若目标网络的特征识别网络的类型为底层网络和中层网络，则神经网络模型的非输出层的类型也应该为底层网络和中层网络。Optionally, since the structure of the distillation model constructed in this embodiment is different from that of the reinforcement model, at this time, a neural network model with a simple structure that can be used to implement the prediction task can be selected as the distillation model according to requirements. Among them, the structure of the neural network model that can be selected as the distillation model is usually relatively simple and the number of layers is small, but the output layer of the neural network model needs to be consistent with the type of the prediction layer network in the target network, and the non-output layer network and the feature in the target network. Identify the same type of network. That is, the output layer of the neural network model needs to be a network that can perform task prediction, and its non-output layer needs to be consistent with the type of the feature recognition network of the target network. For example, if the type of the feature recognition network of the target network is the underlying network, then The type of the non-output layer of the neural network model should also be the bottom network; if the type of the feature recognition network of the target network is the bottom network and the middle network, the type of the non-output layer of the neural network model should also be the bottom network and the middle network. .

本步骤构建的该蒸馏模型由于结构单元，层数交少，所以通常情况下与结构复杂的强化模型是异构模型的关系。例如，假设强化模型为bert模型，此时可以是选择CNN模型作为蒸馏模型。The distillation model constructed in this step has few structural units and few layers, so it is usually related to a heterogeneous model with a complex structural reinforcement model. For example, assuming that the reinforcement model is the bert model, the CNN model can be selected as the distillation model at this time.

S504，通过强化模型的目标网络，抽取训练任务数据集的目标知识。S504, extract the target knowledge of the training task data set by strengthening the target network of the model.

S505，根据目标知识和训练任务数据集，对蒸馏模型进行训练，得到目标学习模型。S505, the distillation model is trained according to the target knowledge and the training task data set to obtain the target learning model.

本申请实施例的技术方案，给出了基于知识蒸馏技术，训练预训练模型的目标学习模型过程中，根据沉淀训练后的强化模型的目标网络，构建与强化模型结构相同或不同的两种蒸馏模型的具体执行方式。若构建与强化模型同结构的蒸馏模型，由于蒸馏模型保留了强化模型的网络结构块，所以同构的蒸馏模型更容易蒸馏训练到强化模型的预测效果；若构建与强化模型同结构的蒸馏模型时，异构的蒸馏模型可以学习到与强化模型不同的特征，提高模型的泛化能力。本申请实施例可以根据实际需求进行选择，灵活性强。The technical solutions of the embodiments of the present application provide that in the process of training the target learning model of the pre-trained model based on the knowledge distillation technology, two distillations with the same or different structure as the reinforcement model are constructed according to the target network of the reinforcement model after precipitation training. The specific implementation of the model. If a distillation model with the same structure as the reinforcement model is constructed, since the distillation model retains the network structure blocks of the reinforcement model, it is easier for the isomorphic distillation model to obtain the prediction effect of the reinforcement model through distillation training; if a distillation model with the same structure as the reinforcement model is constructed When the heterogeneous distillation model can learn different features from the reinforcement model, the generalization ability of the model can be improved. The embodiments of the present application can be selected according to actual needs, and have strong flexibility.

图6A是根据本申请实施例提供的另一种基于知识蒸馏的模型训练方法的流程图；图6B是根据本申请实施例提供的对蒸馏模型进行训练的原理结构示意图。本实施例在上述实施例的基础上，进行了进一步的优化，给出了根据目标知识和训练任务数据集，对蒸馏模型进行训练的具体情况介绍。如图6A-6B所示，该方法包括：FIG. 6A is a flowchart of another model training method based on knowledge distillation provided according to an embodiment of the present application; FIG. 6B is a schematic structural diagram of the principle of training a distillation model according to an embodiment of the present application. In this embodiment, further optimization is performed on the basis of the above-mentioned embodiment, and the specific situation of training the distillation model according to the target knowledge and the training task data set is introduced. As shown in Figures 6A-6B, the method includes:

S601，根据训练任务数据集，对预训练模型的底层网络进行至少两次沉淀训练，得到预训练模型的强化模型。S601, according to the training task data set, perform precipitation training on the underlying network of the pre-training model at least twice to obtain an enhanced model of the pre-training model.

S602，将强化模型中的至少两个网络作为目标网络，并根据目标网络构建蒸馏模型。S602, at least two networks in the reinforcement model are used as target networks, and a distillation model is constructed according to the target networks.

其中，目标网络包含特征识别网络和预测层网络；特征识别网络至少包括底层网络。The target network includes a feature identification network and a prediction layer network; the feature identification network includes at least a bottom layer network.

示例性的，假设从图6B示出的强化模型中，选择出的目标网络为底层网络，中层网络和预测层网络，并根据这三种网络构建了图6B中所示的蒸馏模型。Exemplarily, it is assumed that from the reinforcement model shown in Fig. 6B, the selected target network is the bottom layer network, the middle layer network and the prediction layer network, and the distillation model shown in Fig. 6B is constructed according to these three networks.

S603，通过强化模型的目标网络，抽取训练任务数据集的目标知识。S603, extract the target knowledge of the training task data set by strengthening the target network of the model.

示例性的，本操作可以是将训练任务数据集中预设大小，如batch_size大小的训练数据输入到图6B所示的强化模型中，获取强化模型的底层网络输出的特征表示(knowledge_seq_l)和中层网络输出的特征表示(knowledge_seq_m)，作为第一数据特征表示(knowledge_seq)；获取强化模型的预测层网络输出的特征表示(knowledge_predict)作为第一预测概率表示。本步骤获取的第一数据特征表示和第一预测概率表示即为抽取到的目标知识。Exemplarily, this operation may be to input the training data of a preset size in the training task data set, such as batch_size, into the reinforcement model shown in FIG. 6B, and obtain the feature representation (knowledge_seq _l ) and the middle layer of the output of the underlying network of the reinforcement model. The feature representation (knowledge_seq _m ) output by the network is used as the first data feature representation (knowledge_seq); the feature representation (knowledge_predict) output by the prediction layer network of the enhanced model is obtained as the first prediction probability representation. The first data feature representation and the first prediction probability representation obtained in this step are the extracted target knowledge.

S604，将训练任务数据集输入蒸馏模型中，并根据蒸馏模型对训练任务数据集的处理结果和目标知识，确定软监督标签和硬监督标签。S604, input the training task data set into the distillation model, and determine the soft supervision label and the hard supervision label according to the processing result and target knowledge of the distillation model on the training task data set.

其中，软监督标签和硬监督标签是对蒸馏模型进行训练的过程中的两种监督信号。其中软监督标签是基于抽取的目标知识计算出的，硬监督标签是基于训练任务数据集中的实际标签计算出的。Among them, soft-supervised labels and hard-supervised labels are two kinds of supervision signals in the process of training the distillation model. The soft-supervised labels are calculated based on the extracted target knowledge, and the hard-supervised labels are calculated based on the actual labels in the training task dataset.

可选的，本实施例可以是将训练任务数据集输入到蒸馏模型中，蒸馏模型对输入的训练数据集进行处理，得到蒸馏模型的各网络层的输出结果，该输出结果一方面用于结合目标知识确定软监督标签。另一方面用于结合训练任务数据集的相关信息计算硬监督标签。具体的确定过程包括以下三个子步骤：Optionally, in this embodiment, the training task data set may be input into the distillation model, and the distillation model may process the input training data set to obtain the output results of each network layer of the distillation model. On the one hand, the output results are used to combine Target knowledge determines soft-supervised labels. On the other hand, it is used to compute hard-supervised labels in combination with relevant information from the training task dataset. The specific determination process includes the following three sub-steps:

S6041，将训练任务数据集输入蒸馏模型，得到蒸馏模型的特征识别网络输出的第二数据特征表示，和蒸馏模型的预测层网络输出的第二预测概率表示。S6041: Input the training task data set into the distillation model to obtain the second data feature representation output by the feature recognition network of the distillation model and the second prediction probability representation output by the prediction layer network of the distillation model.

具体的，将训练任务数据中预设大小，如batch_size大小的训练数据输入到蒸馏模型之后，获取蒸馏模型中的预测层网络输出预测结果(即特征表示)作为第二预测概率表示；如果蒸馏模型中的特征识别网络中只有底层网络，则获取底层网络输出的特征表示作为第二数据特征表示；如果蒸馏模型中的特征识别网络中除了底层网络外，还包括部分中高层网络，则此时获取底层网络和该部分中高层网络输出的特征表示一并作为第二数据特征表示。示例性的，如图6B所示，将训练任务数据集输入蒸馏模型，由于图6B中的目标网络的特征识别网络中包括底层网络和中层网络，所以，需要将蒸馏模型处理训练任务数据集后，底层网络输出的特征表示(samll_seq_l)和中层网络输出的特征表示(samll_seq_m)作为第二数据特征表示(samll_seq)；将蒸馏模型的预测层网络输出的特征表示(small_predict)作为第二预测概率表示。Specifically, after inputting the training data with a preset size in the training task data, such as batch_size, into the distillation model, the output prediction result (that is, the feature representation) of the prediction layer network in the distillation model is obtained as the second prediction probability representation; if the distillation model In the feature recognition network in , there is only the underlying network, then the feature representation output by the underlying network is obtained as the second data feature representation; if the feature recognition network in the distillation model includes some middle and high-level networks in addition to the underlying network, then obtain The feature representation of the bottom layer network and the output of the high-level network in this part is combined as the second data feature representation. Exemplarily, as shown in FIG. 6B , the training task dataset is input into the distillation model. Since the feature recognition network of the target network in FIG. 6B includes the bottom layer network and the middle layer network, the distillation model needs to be processed after the training task dataset. , the feature representation (samll_seq _l ) output by the underlying network and the feature representation (samll_seq _m ) output by the middle layer network are used as the second data feature representation (samll_seq); the feature representation (small_predict) output by the prediction layer network of the distillation model is used as the second prediction probability representation.

S6042，根据目标知识、第二数据特征表示和第二预测概率表示，确定软监督标签。S6042, according to the target knowledge, the second data feature representation and the second predicted probability representation, determine the soft supervision label.

可选的，由于目标知识是由第一数据特征表示和第一预测概率表示构成的，本实施例可以是按照预设的算法，对第一数据特征表示、第一预测概率表示、第二数据特征表示和第二预测概率表示进行计算，得到软监督标签。具体的计算算法本实施例不进行限定。如可以是将目标知识中的第一数据特征表示和第二数据特征表示的均值方差作为数据特征标签；将目标知识中的第一预测概率表示和第二预测概率表示的均值方差作为概率预测标签；然后根据强化模型的特征识别网络的权重值，对所述数据特征标签和所述概率预测标签进行标签融合，得到软监督标签。本实施例根据强化模型和蒸馏模型基于相同的训练任务数据集，输出的特征表示来确定软监督标签，使得确定出的软监督标签更为准确，进而提高后续训练出的目标学习模型的准确性。Optionally, since the target knowledge is constituted by the first data feature representation and the first prediction probability representation, this embodiment may be based on a preset algorithm, the first data feature representation, the first prediction probability representation, the second data feature representation, the second data feature representation The feature representation and the second predicted probability representation are computed to obtain soft-supervised labels. The specific calculation algorithm is not limited in this embodiment. For example, the mean variance of the first data feature representation and the second data feature representation in the target knowledge can be used as the data feature label; the mean variance represented by the first prediction probability representation and the second prediction probability representation in the target knowledge can be used as the probability prediction label. Then, according to the weight value of the feature recognition network of the enhanced model, label fusion is performed on the data feature label and the probability prediction label to obtain a soft supervision label. In this embodiment, the soft-supervised label is determined based on the output feature representation of the reinforcement model and the distillation model based on the same training task data set, so that the determined soft-supervised label is more accurate, thereby improving the accuracy of the subsequently trained target learning model. .

具体的，可以是按照下述公式(1)计算数据特征标签，按照下述公式(2)计算概率预测标签；最后按照下述公式(3)计算软监督标签。Specifically, the data feature label may be calculated according to the following formula (1), the probability prediction label may be calculated according to the following formula (2), and finally the soft supervision label may be calculated according to the following formula (3).

loss_i＝MSE(knowledge_seq,small_seq) (1)loss_i=MSE(knowledge_seq,small_seq) (1)

loss_p＝MSE(knowledge_predict,small_predict) (2)loss_p=MSE(knowledge_predict,small_predict) (2)

loss_soft＝W_i*loss_i+loss_p (3)loss_soft=W _i *loss_i+loss_p (3)

其中，loss_i为数据特征标签；MSE()为均值方差函数；knowledge_seq为第一数据特征表示；small_seq为第二数据特征表示；loss_p为概率预测标签；knowledge_predict为第一预测概率表示；small_predict为第二预测概率表示；loss_soft为软监督标签；W_i为特征识别网络的权重值。Among them, loss_i is the data feature label; MSE() is the mean variance function; knowledge_seq is the first data feature representation; small_seq is the second data feature representation; loss_p is the probability prediction label; knowledge_predict is the first prediction probability representation; small_predict is the second Prediction probability representation; _{loss_soft} is the soft supervision label; Wi is the weight value of the feature recognition network.

可选的，当特征识别网络包括多个网络(如底层网络和中层网络)时，第一数据特征表示和第而数据特征表示都是由多个网络层输出的特征表示构成，此时可以是针对每个网络层输出的特征表示，都按照公式(1)计算出一个数据特征标签。例如，如图6B所示，第一数据特征表示包括：knowledge_seq_l和knowledge_seq_m，第二数据特征表示包括：samll_seq_l和samll_seq_m。此时可以是根据knowledge_seq_l和samll_seq_l，计算底层网络的数据特征标签loss_i_l，根据knowledge_seq_m和samll_seq_m计算中层网络的数据特征标签loss_i_m。相应的，此时在计算软监督标签时，可以是将各网络的权重值与其数据特征标签的乘积，以及概率预测标签进行求和，得到最终的软监督标签。例如针对图6B所示的场景，软监督标签的计算公式可以是loss_soft＝W_l*loss_i_l+W_m*loss_i_m+loss_p。Optionally, when the feature recognition network includes multiple networks (such as the bottom network and the middle network), the first data feature representation and the second data feature representation are both composed of feature representations output by multiple network layers, which can be For the feature representation output by each network layer, a data feature label is calculated according to formula (1). For example, as shown in FIG. 6B , the first data feature representation includes: knowledge_seq _l and knowledge_seq _m , and the second data feature representation includes: samll_seq _l and samll_seq _m . At this time, the data feature label loss_i _l of the underlying network can be calculated according to knowledge_seq _l and samll_seq _l , and the data feature label loss_im of the middle-layer network can be calculated according to knowledge_seq _m and _{samll_seq m} _. Correspondingly, when calculating the soft supervision label at this time, the product of the weight value of each network and its data feature label and the probability prediction label may be summed to obtain the final soft supervision label. For example, for the scene shown in FIG. 6B , the calculation formula of the soft supervision label may be loss_soft=W _l *loss_i _l +W _m * _{loss_im} +loss_p.

S6043，根据第二预测概率表示和训练任务数据集信息，确定硬监督标签。S6043: Determine a hard-supervised label according to the second prediction probability representation and the training task dataset information.

其中，训练任务数据集信息包括：训练任务数据集中训练样本数量、训练标签数量和实际标签值。The training task dataset information includes: the number of training samples, the number of training labels, and the actual label value in the training task dataset.

可选的，本子步骤可以是根据下述公式(4)计算硬监督标签。Optionally, this sub-step may calculate hard supervised labels according to the following formula (4).

其中，loss_hart为硬监督标签；N为训练任务数据集中的训练样本数量；M为训练标签数量，i为第i个训练样本；c为第c为训练标签；y_ic为第i个样本属于第c个训练标签的实际标签值；small_predict_ic为蒸馏模型的预测网络层输出的第i个训练样本属于第c个训练标签的概率值。可选的，y_ic的取值可以为0或1。Among them, loss_hart is the hard supervision label; N is the number of training samples in the training task dataset; M is the number of training labels, i is the ith training sample; c is the cth training label; y _ic is the ith sample belonging to the ith The actual label value of the c training labels; small_predict _ic is the probability value that the i-th training sample output by the prediction network layer of the distillation model belongs to the c-th training label. Optionally, the value of y _ic can be 0 or 1.

本实施例S6041-S6043根据强化模型和蒸馏模型基于相同的训练任务数据集，输出的特征表示来确定软监督标签，根据训练任务数据的实际标签值和蒸馏模型的预测概率来确定硬监督标签，为软监督标签和硬监督标签的确定提供了一种新思路，提高了软硬监督标签的准确性。The present embodiment S6041-S6043 determines the soft-supervised label according to the output feature representation based on the reinforcement model and the distillation model based on the same training task data set, and determines the hard-supervised label according to the actual label value of the training task data and the prediction probability of the distillation model, It provides a new idea for the determination of soft supervised labels and hard supervised labels, and improves the accuracy of soft and hard supervised labels.

S605，根据软监督标签和硬监督标签，确定目标标签。S605: Determine the target label according to the soft-supervised label and the hard-supervised label.

其中，目标标签是结合了软监督标签和硬监督标签的特性后，确定出的最终用于监督蒸馏模型训练的标签值。可选的，本步骤根据下述公式(5)确定目标标签：Among them, the target label is the final label value determined for the training of the supervised distillation model after combining the characteristics of the soft-supervised label and the hard-supervised label. Optionally, this step determines the target label according to the following formula (5):

loss＝alpha*loss_soft+(1-alpha)*loss_hart (5)loss=alpha*loss_soft+(1-alpha)*loss_hart (5)

其中，loss为目标标签，alpha为参数变量；loss_soft为软监督标签；loss_hart为硬监督标签。Among them, loss is the target label, alpha is the parameter variable; loss_soft is the soft-supervised label; loss_hart is the hard-supervised label.

可选的上述公式(5)中的参数变量可以是基于预设规则设置的常量，也可以是随蒸馏模型一起进行训练的变量。对此本实施例不进行限定。The optional parameter variables in the above formula (5) may be constants set based on preset rules, or may be variables that are trained together with the distillation model. This embodiment is not limited.

S606，根据目标标签，对蒸馏模型的参数进行迭代更新，得到目标学习模型。S606, according to the target label, iteratively update the parameters of the distillation model to obtain the target learning model.

可选的，本实施例可以是根据S605确定出的目标标签，按照预设规则，如反向传播算法(BP算法)，对蒸馏模型的参数进行更新调整，从而完成对蒸馏模型参数的一次迭代更新。然后再从训练任务数据集中，获取下一组预设大小，如batch_size大小的训练数据输入到蒸馏模型中，返回执行S603-S606的操作，对蒸馏模型的参数进行下一次的迭代更新，从而完成对蒸馏模型训练。对蒸馏模型训练多次后，可以通过测试任务数据集对训练的蒸馏模型进行测试，如果满足训练结束条件，则说明蒸馏模型已经训练好，可将训练后的蒸馏模型作为目标学习模型。Optionally, this embodiment may update and adjust the parameters of the distillation model according to the target label determined in S605 and according to preset rules, such as the back-propagation algorithm (BP algorithm), so as to complete one iteration of the parameters of the distillation model. renew. Then, from the training task data set, obtain the next set of preset sizes, such as batch_size training data, and input it into the distillation model, return to perform the operations of S603-S606, and perform the next iterative update on the parameters of the distillation model, thus completing Train the distillation model. After training the distillation model for many times, you can test the trained distillation model through the test task dataset. If the training end condition is met, the distillation model has been trained, and the trained distillation model can be used as the target learning model.

本实施例的技术方案，根据对预训练模型的底层网络进行沉淀训练得到强化模型，构建蒸馏模型以及抽取目标知识；根据蒸馏模型对任务训练数据的处理结果和抽取的目标知识，确定软监督标签和硬监督标签，进而基于软硬监督标签确定出目标标签来对蒸馏模型的参数进行迭代更新，得到目标学习模型。本实施例将软监督标签和硬监督标签结合来训练蒸馏模型，使得训练的蒸馏模型在逼近预训练模型预测效果的同时，还提高了蒸馏模型的泛化能力。从而更好的满足人机交互设备实时响应的需求。In the technical solution of this embodiment, a strengthened model is obtained by precipitation training the underlying network of the pre-training model, a distillation model is constructed and target knowledge is extracted; the soft supervision label is determined according to the processing result of the distillation model on the task training data and the extracted target knowledge and hard supervision labels, and then determine the target label based on the soft and hard supervision labels to iteratively update the parameters of the distillation model to obtain the target learning model. In this embodiment, the soft-supervised label and the hard-supervised label are combined to train the distillation model, so that the trained distillation model can approach the prediction effect of the pre-trained model and also improve the generalization ability of the distillation model. In order to better meet the real-time response requirements of human-computer interaction equipment.

图7是根据本申请实施例提供的一种基于知识蒸馏的模型训练方法的流程图。本实施例在上述各实施例的基础上，提供了一种优选实例，具体的，如图7所示，该方法包括：FIG. 7 is a flowchart of a model training method based on knowledge distillation provided according to an embodiment of the present application. This embodiment provides a preferred example on the basis of the above-mentioned embodiments. Specifically, as shown in FIG. 7 , the method includes:

S701，获取预训练模型。S701, obtaining a pre-training model.

可选的，本步骤获取的预训练模型是已经基于海量训练样本训练好的模型，该预训练模型能够较好的完成线上预测任务。Optionally, the pre-training model obtained in this step is a model that has been trained based on a large number of training samples, and the pre-training model can better complete the online prediction task.

S702，根据训练领域数据集，对预训练模型进行领域训练，更新预训练模型。S702, perform domain training on the pre-training model according to the training domain data set, and update the pre-training model.

S703，根据训练任务数据集，对预训练模型的底层网络进行逐次沉淀训练。S703, perform successive precipitation training on the underlying network of the pre-training model according to the training task data set.

S704，根据测试任务数据集，对沉淀训练后的预训练模型进行测试。S704, test the pre-trained model after precipitation training according to the test task data set.

S705，判断测试结果是否满足沉淀结束条件，若是，则执行S706，若否，则返回执行S702。S705, it is judged whether the test result satisfies the precipitation end condition, if yes, go to S706, if not, go back to go to S702.

可选的，如果测试结果满足沉淀结束条件，则说明沉淀训练已经达到预期效果，可以执行S706将其作为强化模型，否则，说明沉淀训练不充分，需要返回S702基于训练领域数据集，对预训练模型的参数进行更新调整。Optionally, if the test result satisfies the precipitation end condition, it means that the precipitation training has achieved the expected effect, and S706 can be executed to use it as the reinforcement model; otherwise, it means that the precipitation training is not sufficient, and it is necessary to return to S702 based on the training domain data set, and pre-training is performed. The parameters of the model are updated and adjusted.

S706，若测试结果满足沉淀结束条件，则将沉淀训练后的预训练模型作为强化模型。S706, if the test result satisfies the precipitation end condition, the pre-trained model after the precipitation training is used as the reinforcement model.

S707，将强化模型中的至少两个网络作为目标网络，并根据目标网络构建蒸馏模型。S707, taking at least two networks in the reinforcement model as target networks, and constructing a distillation model according to the target networks.

S708，通过强化模型的目标网络，抽取训练任务数据集的目标知识。S708, extract the target knowledge of the training task data set by strengthening the target network of the model.

S709，根据目标知识和训练任务数据集，对蒸馏模型进行训练。S709, train the distillation model according to the target knowledge and the training task data set.

S710，根据测试任务数据集，对训练后的蒸馏模型进行测试。S710, test the trained distillation model according to the test task data set.

S711，判断测试结果是否满足训练结束条件，若是，则执行S712，若否，则返回执行S709。S711, determine whether the test result satisfies the training end condition, if yes, execute S712, if not, return to execute S709.

S712，若测试结果满足训练结束条件，则将训练后的蒸馏模型作为目标学习模型。S712, if the test result satisfies the training end condition, use the trained distillation model as the target learning model.

本申请实施例的技术方案，给出了基于知识蒸馏技术，从预训练模型中国蒸馏出目标学习模型的具体实现方案，该方案蒸馏出的目标学习模型在保留预训练模型的精准预测能力的同时，精简了网络结构分支，提高了模型的泛化能力。将该目标学习模型部署到人机交互设备中，可以实现快速准确的执行任务，以满足人机交互设备实时响应的需求。The technical solution of the embodiment of the present application provides a specific implementation solution for distilling the target learning model from the pre-training model based on the knowledge distillation technology. The target learning model distilled from the solution retains the accurate prediction ability of the pre-training model while retaining the accurate prediction ability of the pre-training model. , simplifies the network structure branches and improves the generalization ability of the model. Deploying the target learning model into human-computer interaction equipment can realize fast and accurate task execution to meet the real-time response requirements of human-computer interaction equipment.

图8是根据本申请实施例提供的一种意图识别方法的流程图。本实施例适用于基于上述各实施例训练的目标学习模型，进行意图识别的情况。该实施例可以由电子设备中配置的意图识别装置来执行，该装置可以采用软件和/或硬件来实现。可选的，该电子设备可以是人机交互设备或与人机交互设备通信交互的服务端。该人机交互设备可以是智能机器人、智能音箱、智能手机等。如图8所示，该方法包括：FIG. 8 is a flowchart of an intent recognition method provided according to an embodiment of the present application. This embodiment is applicable to the case of performing intention recognition based on the target learning model trained in the above-mentioned embodiments. This embodiment may be implemented by an intention recognition apparatus configured in an electronic device, and the apparatus may be implemented by software and/or hardware. Optionally, the electronic device may be a human-computer interaction device or a server that communicates and interacts with the human-computer interaction device. The human-computer interaction device may be a smart robot, a smart speaker, a smart phone, and the like. As shown in Figure 8, the method includes:

S801，获取人机交互设备采集的用户语音数据。S801, acquiring user voice data collected by a human-computer interaction device.

可选的，本申请实施例的人机交互设备可以通过其内部配置的语音采集装置(如麦克风)，实时采集环境中的用户语音数据。若本实施例的执行主体为人机交互设备，则该人机交互设备采集了用户语音数据后可直接进行下述S802的操作。若本实施例的执行主体为与人机交互设备通信交互的服务端，则人机交互设备在采集到用户语音数据后，会将该用户语音数据传输至其通信交互的服务端，由服务端获取用户语音数据后执行下述S802的操作。Optionally, the human-computer interaction device in this embodiment of the present application may collect user voice data in the environment in real time through a voice collecting device (eg, a microphone) configured inside the device. If the execution subject of this embodiment is a human-computer interaction device, the human-computer interaction device can directly perform the following operations of S802 after collecting the user's voice data. If the execution body of this embodiment is the server that communicates and interacts with the human-computer interaction device, after the human-computer interaction device collects the user's voice data, it will transmit the user's voice data to its communication and interaction server, and the server will After the user voice data is acquired, the following operations of S802 are performed.

S802，将用户语音数据输入目标学习模型，以获取目标学习模型输出的用户意图识别结果。S802 , input the user speech data into the target learning model to obtain the user intent recognition result output by the target learning model.

其中，本实施例中的目标学习模型是基于上述任一实施例所述的基于知识蒸馏的模型训练方法训练而确定。且本实施例的目标学习模型是用于执行意图识别的模型。The target learning model in this embodiment is determined based on the training of the model training method based on knowledge distillation described in any of the above embodiments. And the target learning model of this embodiment is a model for performing intention recognition.

可选的，人机交互设备或与其通信交互的服务端在获取用户语音数据后，会将获取的用户语音数据输入到目标学习模型中，此时目标学习模型会基于输入的用户语音数据，采用训练时的算法对该用户语音数据进行线上分析预测，输出用户意图识别结果，此时人机交互设备或与其通信交互的服务端获取目标学习模型输出的用户意图识别结果。Optionally, after the human-computer interaction device or the server that communicates and interacts with it obtains the user's voice data, it will input the obtained user's voice data into the target learning model. The algorithm during training analyzes and predicts the user's voice data online, and outputs the user's intent recognition result. At this time, the human-computer interaction device or the server that communicates and interacts with it obtains the user's intent recognition result output by the target learning model.

S803，根据用户意图识别结果确定人机交互设备的响应结果。S803: Determine a response result of the human-computer interaction device according to the user intention identification result.

可选的，人机交互设备或与其通信交互的服务端会基于获取的用户意图识别结果，确定该用户意图识别结果所对应的目标人机交互响应规则，并基于该目标人机交互响应规则，确定本次响应结果，并将响应结果反馈给用户，以实现基于用户语音数据进行人机交互。Optionally, the human-computer interaction device or the server that communicates and interacts with it will determine the target human-computer interaction response rule corresponding to the user intent recognition result based on the obtained user intent recognition result, and based on the target human-computer interaction response rule, Determine the response result this time, and feed back the response result to the user, so as to realize human-computer interaction based on the user's voice data.

本申请实施例的技术方案，将基于上述任意实施例所述的基于知识蒸馏的模型训练方法训练的用于意图识别的目标学习模型，部署到人机交互设备或与人机交互设备通信交互的服务端中，人机交互设备或与其通信交互的服务端可以获取用户语音数据输入到目标学习模型中，并基于目标学习模型输出的用户意图识别结果，确定本次响应结果。本申请实施例部署到人机交互设备或与其通信交互的服务端中的目标学习模型是通过知识蒸馏的方式训练得到的，其网络结构相比于预训练模型更为简单，且预测效果可逼近与复杂的预训练模型，可以实现快速且准确的进行意图识别，以满足人机交互设备实时响应的需求。The technical solution of the embodiment of the present application is to deploy the target learning model for intention recognition trained based on the model training method based on knowledge distillation described in any of the above embodiments to a human-computer interaction device or a device that communicates and interacts with the human-computer interaction device. In the server, the human-computer interaction device or the server that communicates and interacts with it can obtain user voice data and input it into the target learning model, and determine the response result based on the user intent recognition result output by the target learning model. The target learning model deployed in the human-computer interaction device or the server that communicates and interacts with the embodiment of the present application is trained by means of knowledge distillation, and its network structure is simpler than that of the pre-training model, and the prediction effect can be approximated With complex pre-trained models, it can realize fast and accurate intent recognition to meet the real-time response requirements of human-computer interaction devices.

图9是根据本申请实施例提供的一种视频处理装置的结构示意图，本实施例适用于基于知识蒸馏技术将网络结构复杂的预训练模型压缩训练成一个网络结构简单的目标学习模型的情况。该装置可实现本申请任意实施例所述的基于知识蒸馏的模型训练方法，该装置900具体包括如下：9 is a schematic structural diagram of a video processing apparatus provided according to an embodiment of the present application. This embodiment is applicable to a situation where a pre-training model with a complex network structure is compressed and trained into a target learning model with a simple network structure based on knowledge distillation technology. The apparatus can implement the model training method based on knowledge distillation described in any embodiment of the present application, and the apparatus 900 specifically includes the following:

沉淀训练模块901，用于根据训练任务数据集，对预训练模型的底层网络进行至少两次沉淀训练，得到所述预训练模型的强化模型；其中，各次所述沉淀训练的训练对象至少包括所述底层网络和预测层网络，且包括逐次递减的中高层网络，所述预训练模型自底向上包括所述底层网络、至少一个所述中高层网络和所述预测层网络；The precipitation training module 901 is configured to perform precipitation training on the underlying network of the pre-training model at least twice according to the training task data set to obtain a strengthened model of the pre-training model; wherein, the training objects of each pre-training training at least include The bottom layer network and the prediction layer network include successively decreasing middle and high level networks, and the pre-training model includes the bottom layer network, at least one of the middle and high level networks and the prediction layer network from bottom to top;

蒸馏模型构建模块902，用于将所述强化模型中的至少两个网络作为目标网络，并根据所述目标网络构建蒸馏模型，其中，所述目标网络包含特征识别网络和所述预测层网络；所述特征识别网络至少包括所述底层网络；A distillation model building module 902, configured to use at least two networks in the enhanced model as target networks, and build a distillation model according to the target networks, wherein the target network includes a feature identification network and the prediction layer network; The feature identification network includes at least the underlying network;

目标知识抽取模块903，用于通过所述强化模型的目标网络，抽取所述训练任务数据集的目标知识；A target knowledge extraction module 903, configured to extract the target knowledge of the training task data set through the target network of the reinforcement model;

蒸馏模型训练模块904，用于根据所述目标知识和所述训练任务数据集，对所述蒸馏模型进行训练，得到目标学习模型。The distillation model training module 904 is configured to train the distillation model according to the target knowledge and the training task data set to obtain a target learning model.

进一步的，所述底层网络和所述中高层网络用于进行特征识别；所述预测层网络用于根据识别的特征进行任务预测。Further, the bottom layer network and the middle and high-level networks are used for feature identification; the prediction layer network is used for task prediction according to the identified features.

进一步的，所述沉淀训练模块901包括：Further, the precipitation training module 901 includes:

数据子集划分单元，用于将所述训练任务数据集进行划分，以确定多份训练数据子集；a data subset dividing unit for dividing the training task data set to determine multiple training data subsets;

训练对象确定单元，用于根据设定沉淀训练次数，确定每份训练数据子集各自对应的训练对象；其中，各份训练数据子集对应的训练对象包括所述预训练模型的底层网络、中高层网络和预测层网络，且包括的所述中高层网络的层数与沉淀训练的顺序呈反比；The training object determination unit is used to determine the training objects corresponding to each training data subset according to the set precipitation training times; wherein, the training objects corresponding to each training data subset include the underlying network of the pre-training model, the middle High-level network and prediction layer network, and the number of layers of the middle and high-level network included is inversely proportional to the order of precipitation training;

沉淀训练单元，用于根据所述每份训练数据子集，对所述预训练模型中，该份训练数据子集对应的训练对象进行一次沉淀训练；a precipitation training unit, configured to perform a precipitation training on the training objects corresponding to the training data subset in the pre-training model according to each training data subset;

其中，训练数据子集的划分份数小于等于所述预训练模型的总层数。Wherein, the number of divisions of the training data subset is less than or equal to the total number of layers of the pre-training model.

进一步的，各所述训练对象包括的中高层网络是与底层网络相邻且向上连续的网络层；且基于所述沉淀训练次数的增加，所述训练对象中包括的中高层网络的层数递减为零。Further, the middle and high-level networks included in each of the training objects are network layers that are adjacent to the underlying network and are continuous upwards; and based on the increase in the number of times of precipitation training, the number of layers of the middle and high-level networks included in the training objects decreases. zero.

进一步的，所述沉淀训练模块901具体用于：Further, the precipitation training module 901 is specifically used for:

根据所述训练任务数据集，对所述预训练模型的底层网络逐次进行沉淀训练；Perform precipitation training on the underlying network of the pre-training model one by one according to the training task data set;

根据测试任务数据集，对沉淀训练后的预训练模型进行测试；According to the test task data set, test the pre-trained model after precipitation training;

若测试结果满足沉淀结束条件，则将所述沉淀训练后的预训练模型作为强化模型。If the test result satisfies the precipitation end condition, the pre-trained model after the precipitation training is used as the reinforcement model.

进一步的，所述装置还包括：Further, the device also includes:

领域训练模型，用于在根据训练任务数据集，对预训练模型的底层网络进行至少两次沉淀训练之前，根据训练领域数据集，对预训练模型进行领域训练，更新所述预训练模型。The domain training model is used to perform domain training on the pre-training model according to the training domain data set before performing precipitation training on the underlying network of the pre-training model at least twice according to the training task data set, and update the pre-training model.

进一步的，所述蒸馏模型构建模块902具体用于：Further, the distillation model building module 902 is specifically used for:

将所述强化模型中的至少两个网络作为目标网络，并获取所述目标网络的网络结构块；Using at least two networks in the enhanced model as target networks, and acquiring network building blocks of the target network;

根据获取的所述网络结构块，构建与所述强化模型同结构的蒸馏模型。According to the obtained network structure block, a distillation model with the same structure as the reinforcement model is constructed.

进一步的，所述蒸馏模型构建模块902还具体用于：Further, the distillation model building module 902 is also specifically used for:

将所述强化模型中的至少两个网络作为目标网络；using at least two networks in the enhanced model as target networks;

根据所述目标网络，选择与所述强化模型结构不同的神经网络模型作为蒸馏模型，其中，所述神经网络模型的输出层网络与所述目标网络中预测层网络的类型一致，所述神经网络模型的非输出层网络与所述目标网络中特征识别网络的类型一致。According to the target network, a neural network model with a different structure from the reinforcement model is selected as the distillation model, wherein the output layer network of the neural network model is consistent with the type of the prediction layer network in the target network, and the neural network The non-output layer network of the model is consistent with the type of the feature recognition network in the target network.

进一步的，所述目标知识抽取模块903具体用于：Further, the target knowledge extraction module 903 is specifically used for:

将所述训练任务数据集作为所述强化模型的输入，获取所述强化模型的特征识别网络输出的第一数据特征表示，和所述强化模型的预测层网络输出的第一预测概率表示；Using the training task data set as the input of the reinforcement model, obtain the first data feature representation of the feature recognition network output of the reinforcement model, and the first prediction probability representation of the prediction layer network output of the reinforcement model;

将获取的所述第一数据特征表示和所述第一预测概率表示作为所述训练任务数据集的目标知识。The acquired first data feature representation and the first predicted probability representation are used as target knowledge of the training task data set.

进一步的，所述蒸馏模型训练模块904包括：Further, the distillation model training module 904 includes:

监督标签确定单元，用于将所述训练任务数据集输入所述蒸馏模型中，并根据所述蒸馏模型对所述训练任务数据集的处理结果和所述目标知识，确定软监督标签和硬监督标签；Supervised label determination unit, used to input the training task dataset into the distillation model, and determine soft-supervised labels and hard-supervised labels according to the processing results of the distillation model on the training task dataset and the target knowledge Label;

目标标签确定单元，用于根据所述软监督标签和所述硬监督标签，确定目标标签；a target label determination unit, configured to determine a target label according to the soft supervision label and the hard supervision label;

模型参数更新单元，用于根据所述目标标签，对所述蒸馏模型的参数进行迭代更新。A model parameter updating unit, configured to iteratively update the parameters of the distillation model according to the target label.

进一步的，所述监督标签确定单元具体包括：Further, the supervision label determination unit specifically includes:

输出获取子单元，用于将所述训练任务数据集输入所述蒸馏模型，得到所述蒸馏模型的特征识别网络输出的第二数据特征表示，和所述蒸馏模型的预测层网络输出的第二预测概率表示；The output acquisition subunit is used to input the training task data set into the distillation model to obtain the second data feature representation of the feature recognition network output of the distillation model, and the second data feature representation of the prediction layer network output of the distillation model. Predicted probability representation;

软标签确定子单元，用于根据所述目标知识、所述第二数据特征表示和所述第二预测概率表示，确定软监督标签；a soft label determination subunit, configured to determine a soft supervision label according to the target knowledge, the second data feature representation and the second predicted probability representation;

硬标签确定子单元，用于根据所述第二预测概率表示和所述训练任务数据集信息，确定硬监督标签。A hard label determination subunit, configured to determine a hard supervised label according to the second prediction probability representation and the training task dataset information.

进一步的，所述训练任务数据集信息包括：训练任务数据集中训练样本数量、训练标签数量和实际标签值。Further, the training task data set information includes: the number of training samples, the number of training labels and the actual label value in the training task data set.

进一步的，所述软标签确定子单元具体用于：Further, the soft label determination subunit is specifically used for:

将所述目标知识中的第一数据特征表示和所述第二数据特征表示的均值方差作为数据特征标签；Taking the mean variance of the first data feature representation in the target knowledge and the second data feature representation as a data feature label;

将所述目标知识中的第一预测概率表示和所述第二预测概率表示的均值方差作为概率预测标签；Taking the mean variance of the first prediction probability representation in the target knowledge and the second prediction probability representation as a probability prediction label;

根据所述强化模型的特征识别网络的权重值，对所述数据特征标签和所述概率预测标签进行标签融合，得到软监督标签。According to the weight value of the feature recognition network of the enhanced model, label fusion is performed on the data feature label and the probability prediction label to obtain a soft-supervised label.

进一步的，所述蒸馏模型训练模块904具体用于：Further, the distillation model training module 904 is specifically used for:

根据所述目标知识和所述训练任务数据集，对所述蒸馏模型进行训练；According to the target knowledge and the training task data set, the distillation model is trained;

根据测试任务数据集，对训练后的蒸馏模型进行测试；Test the trained distillation model according to the test task dataset;

若测试结果满足训练结束条件，则将所述训练后的蒸馏模型作为目标学习模型。If the test result satisfies the training end condition, the trained distillation model is used as the target learning model.

进一步的，所述预训练模型为bert模型。Further, the pre-training model is a bert model.

进一步的，所述预训练模型和目标学习模型是用于进行意图识别的模型；Further, the pre-training model and the target learning model are models used for intention recognition;

相应的，所述装置还包括：Correspondingly, the device further includes:

模型部署模块，用于将所述目标学习模型部署到人机交互设备中，以对所述人机交互设备实时获取的用户语音数据进行意图识别。The model deployment module is used for deploying the target learning model to the human-computer interaction device, so as to perform intention recognition on the user voice data obtained by the human-computer interaction device in real time.

图10是根据本申请实施例提供的一种意图识别装置的结构示意图，本实施例可适用于基于上述各实施例训练的目标学习模型，进行意图识别的情况。该装置可实现本申请任意实施例所述的意图识别方法，该装置1000具体包括如下：FIG. 10 is a schematic structural diagram of an intent recognition apparatus provided according to an embodiment of the present application. This embodiment is applicable to the situation of performing intent recognition based on the target learning model trained in the above-mentioned embodiments. The device can implement the intent recognition method described in any embodiment of the present application, and the device 1000 specifically includes the following:

语音数据获取模块1001，用于获取人机交互设备采集的用户语音数据；A voice data acquisition module 1001, configured to acquire user voice data collected by a human-computer interaction device;

意图识别模块1002，用于将所述用户语音数据输入目标学习模型，以获取所述目标学习模型输出的用户意图识别结果；其中，所述目标学习模型基于上述任一实施例所述的基于知识蒸馏的模型训练方法训练而确定；Intent recognition module 1002, configured to input the user voice data into a target learning model to obtain a user intention recognition result output by the target learning model; wherein, the target learning model is based on the knowledge-based knowledge described in any of the above embodiments The model training method of distillation is determined by training;

响应结果确定模块1003，用于根据所述用户意图识别结果确定人机交互设备的响应结果。The response result determination module 1003 is configured to determine the response result of the human-computer interaction device according to the user intention identification result.

进一步的，所述装置配置于所述人机交互设备中，或与所述人机交互设备通信交互的服务端。Further, the apparatus is configured in the human-computer interaction device, or a server that communicates and interacts with the human-computer interaction device.

根据本申请的实施例，本申请还提供了一种电子设备和一种可读存储介质。According to the embodiments of the present application, the present application further provides an electronic device and a readable storage medium.

如图11所示，是根据本申请实施例的基于知识蒸馏的模型训练方法或意图识别方法的电子设备的框图。电子设备旨在表示各种形式的数字计算机，诸如，膝上型计算机、台式计算机、工作台、个人数字助理、服务器、刀片式服务器、大型计算机、和其它适合的计算机。电子设备还可以表示各种形式的移动装置，诸如，个人数字处理、蜂窝电话、智能电话、可穿戴设备和其它类似的计算装置。本文所示的部件、它们的连接和关系、以及它们的功能仅仅作为示例，并且不意在限制本文中描述的和/或者要求的本申请的实现。As shown in FIG. 11 , it is a block diagram of an electronic device for a model training method or an intent recognition method based on knowledge distillation according to an embodiment of the present application. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframe computers, and other suitable computers. Electronic devices may also represent various forms of mobile devices, such as personal digital processors, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions are by way of example only, and are not intended to limit implementations of the application described and/or claimed herein.

如图11所示，该电子设备包括：一个或多个处理器1101、存储器1102，以及用于连接各部件的接口，包括高速接口和低速接口。各个部件利用不同的总线互相连接，并且可以被安装在公共主板上或者根据需要以其它方式安装。处理器可以对在电子设备内执行的指令进行处理，包括存储在存储器中或者存储器上以在外部输入/输出装置(诸如，耦合至接口的显示设备)上显示GUI的图形信息的指令。在其它实施方式中，若需要，可以将多个处理器和/或多条总线与多个存储器和多个存储器一起使用。同样，可以连接多个电子设备，各个设备提供部分必要的操作(例如，作为服务器阵列、一组刀片式服务器、或者多处理器系统)。图11中以一个处理器1101为例。As shown in FIG. 11, the electronic device includes: one or more processors 1101, a memory 1102, and interfaces for connecting various components, including a high-speed interface and a low-speed interface. The various components are interconnected using different buses and may be mounted on a common motherboard or otherwise as desired. The processor may process instructions executed within the electronic device, including instructions stored in or on memory to display graphical information of the GUI on an external input/output device, such as a display device coupled to the interface. In other embodiments, multiple processors and/or multiple buses may be used with multiple memories and multiple memories, if desired. Likewise, multiple electronic devices may be connected, each providing some of the necessary operations (eg, as a server array, a group of blade servers, or a multiprocessor system). In FIG. 11, a processor 1101 is used as an example.

存储器1102即为本申请所提供的非瞬时计算机可读存储介质。其中，所述存储器存储有可由至少一个处理器执行的指令，以使所述至少一个处理器执行本申请所提供的基于知识蒸馏的模型训练方法或意图识别方法。本申请的非瞬时计算机可读存储介质存储计算机指令，该计算机指令用于使计算机执行本申请所提供的基于知识蒸馏的模型训练方法或意图识别方法。The memory 1102 is the non-transitory computer-readable storage medium provided by the present application. Wherein, the memory stores instructions executable by at least one processor, so that the at least one processor executes the knowledge distillation-based model training method or the intent recognition method provided by the present application. The non-transitory computer-readable storage medium of the present application stores computer instructions for causing the computer to execute the knowledge distillation-based model training method or the intent recognition method provided by the present application.

存储器1102作为一种非瞬时计算机可读存储介质，可用于存储非瞬时软件程序、非瞬时计算机可执行程序以及模块，如本申请实施例中的基于知识蒸馏的模型训练方法或意图识别方法对应的程序指令/模块(例如，附图9所示的沉淀训练模块901、蒸馏模型构建模块902、目标知识抽取模块903和蒸馏模型训练模块904；或附图10所示的语音数据获取模块1001、意图识别模块1002和响应结果确定模块1003)。处理器1101通过运行存储在存储器1102中的非瞬时软件程序、指令以及模块，从而执行服务器的各种功能应用以及数据处理，即实现上述方法实施例中的基于知识蒸馏的模型训练方法或意图识别方法。As a non-transitory computer-readable storage medium, the memory 1102 can be used to store non-transitory software programs, non-transitory computer-executable programs, and modules, such as those corresponding to the knowledge distillation-based model training method or the intent recognition method in the embodiments of the present application. Program instructions/modules (for example, the precipitation training module 901, the distillation model building module 902, the target knowledge extraction module 903, and the distillation model training module 904 shown in FIG. 9; or the speech data acquisition module 1001 shown in FIG. 10, intent Identification module 1002 and response result determination module 1003). The processor 1101 executes various functional applications and data processing of the server by running the non-transitory software programs, instructions and modules stored in the memory 1102, that is, to implement the knowledge distillation-based model training method or intent recognition in the above method embodiments. method.

存储器1102可以包括存储程序区和存储数据区，其中，存储程序区可存储操作系统、至少一个功能所需要的应用程序；存储数据区可存储根据基于知识蒸馏的模型训练方法或意图识别方法的电子设备的使用所创建的数据等。此外，存储器1102可以包括高速随机存取存储器，还可以包括非瞬时存储器，例如至少一个磁盘存储器件、闪存器件、或其他非瞬时固态存储器件。在一些实施例中，存储器1102可选包括相对于处理器1101远程设置的存储器，这些远程存储器可以通过网络连接至基于知识蒸馏的模型训练方法或意图识别方法的电子设备。上述网络的实例包括但不限于互联网、企业内部网、局域网、移动通信网及其组合。The memory 1102 may include a stored program area and a stored data area, wherein the stored program area can store an operating system and an application program required by at least one function; data created by the use of the device, etc. Additionally, memory 1102 may include high speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory 1102 may optionally include memory located remotely relative to the processor 1101, and these remote memories may be connected to the electronic device of the knowledge distillation-based model training method or the intent recognition method through a network. Examples of such networks include, but are not limited to, the Internet, an intranet, a local area network, a mobile communication network, and combinations thereof.

基于知识蒸馏的模型训练方法或意图识别方法的电子设备还可以包括：输入装置1103和输出装置1104。处理器1101、存储器1102、输入装置1103和输出装置1104可以通过总线或者其他方式连接，图11中以通过总线连接为例。The electronic device for the model training method or the intent recognition method based on knowledge distillation may further include: an input device 1103 and an output device 1104 . The processor 1101 , the memory 1102 , the input device 1103 and the output device 1104 may be connected by a bus or in other ways, and the connection by a bus is taken as an example in FIG. 11 .

输入装置1103可接收输入的数字或字符信息，以及产生与基于知识蒸馏的模型训练方法或意图识别方法的电子设备的用户设置以及功能控制有关的键信号输入，例如触摸屏、小键盘、鼠标、轨迹板、触摸板、指示杆、一个或者多个鼠标按钮、轨迹球、操纵杆等输入装置。输出装置1104可以包括显示设备、辅助照明装置(例如，LED)和触觉反馈装置(例如，振动电机)等。该显示设备可以包括但不限于，液晶显示器(LCD)、发光二极管(LED)显示器和等离子体显示器。在一些实施方式中，显示设备可以是触摸屏。The input device 1103 can receive input numerical or character information, and generate key signal input related to user settings and function control of the electronic device based on the knowledge distillation-based model training method or intent recognition method, such as touch screen, keypad, mouse, track Input devices such as pads, touchpads, pointing sticks, one or more mouse buttons, trackballs, joysticks, etc. Output devices 1104 may include display devices, auxiliary lighting devices (eg, LEDs), haptic feedback devices (eg, vibration motors), and the like. The display device may include, but is not limited to, a liquid crystal display (LCD), a light emitting diode (LED) display, and a plasma display. In some implementations, the display device may be a touch screen.

此处描述的系统和技术的各种实施方式可以在数字电子电路系统、集成电路系统、专用ASIC(专用集成电路)、计算机硬件、固件、软件、和/或它们的组合中实现。这些各种实施方式可以包括：实施在一个或者多个计算机程序中，该一个或者多个计算机程序可在包括至少一个可编程处理器的可编程系统上执行和/或解释，该可编程处理器可以是专用或者通用可编程处理器，可以从存储系统、至少一个输入装置、和至少一个输出装置接收数据和指令，并且将数据和指令传输至该存储系统、该至少一个输入装置、和该至少一个输出装置。Various implementations of the systems and techniques described herein can be implemented in digital electronic circuitry, integrated circuit systems, application specific ASICs (application specific integrated circuits), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include being implemented in one or more computer programs executable and/or interpretable on a programmable system including at least one programmable processor that The processor, which may be a special purpose or general-purpose programmable processor, may receive data and instructions from a storage system, at least one input device, and at least one output device, and transmit data and instructions to the storage system, the at least one input device, and the at least one output device an output device.

这些计算程序(也称作程序、软件、软件应用、或者代码)包括可编程处理器的机器指令，并且可以利用高级过程和/或面向对象的编程语言、和/或汇编/机器语言来实施这些计算程序。如本文使用的，术语“机器可读介质”和“计算机可读介质”指的是用于将机器指令和/或数据提供给可编程处理器的任何计算机程序产品、设备、和/或装置(例如，磁盘、光盘、存储器、可编程逻辑装置(PLD))，包括，接收作为机器可读信号的机器指令的机器可读介质。术语“机器可读信号”指的是用于将机器指令和/或数据提供给可编程处理器的任何信号。These computational programs (also referred to as programs, software, software applications, or codes) include machine instructions for programmable processors, and may be implemented using high-level procedural and/or object-oriented programming languages, and/or assembly/machine languages calculation program. As used herein, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or apparatus for providing machine instructions and/or data to a programmable processor ( For example, magnetic disks, optical disks, memories, programmable logic devices (PLDs), including machine-readable media that receive machine instructions as machine-readable signals. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor.

为了提供与用户的交互，可以在计算机上实施此处描述的系统和技术，该计算机具有：用于向用户显示信息的显示装置(例如，CRT(阴极射线管)或者LCD(液晶显示器)监视器)；以及键盘和指向装置(例如，鼠标或者轨迹球)，用户可以通过该键盘和该指向装置来将输入提供给计算机。其它种类的装置还可以用于提供与用户的交互；例如，提供给用户的反馈可以是任何形式的传感反馈(例如，视觉反馈、听觉反馈、或者触觉反馈)；并且可以用任何形式(包括声输入、语音输入或者、触觉输入)来接收来自用户的输入。To provide interaction with a user, the systems and techniques described herein may be implemented on a computer having a display device (eg, a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to the user ); and a keyboard and pointing device (eg, a mouse or trackball) through which a user can provide input to the computer. Other kinds of devices can also be used to provide interaction with the user; for example, the feedback provided to the user can be any form of sensory feedback (eg, visual feedback, auditory feedback, or tactile feedback); and can be in any form (including acoustic input, voice input, or tactile input) to receive input from the user.

可以将此处描述的系统和技术实施在包括后台部件的计算系统(例如，作为数据服务器)、或者包括中间件部件的计算系统(例如，应用服务器)、或者包括前端部件的计算系统(例如，具有图形用户界面或者网络浏览器的用户计算机，用户可以通过该图形用户界面或者该网络浏览器来与此处描述的系统和技术的实施方式交互)、或者包括这种后台部件、中间件部件、或者前端部件的任何组合的计算系统中。可以通过任何形式或者介质的数字数据通信(例如，通信网络)来将系统的部件相互连接。通信网络的示例包括：局域网(LAN)、广域网(WAN)和互联网。The systems and techniques described herein may be implemented on a computing system that includes back-end components (eg, as a data server), or a computing system that includes middleware components (eg, an application server), or a computing system that includes front-end components (eg, a user's computer having a graphical user interface or web browser through which a user may interact with implementations of the systems and techniques described herein), or including such backend components, middleware components, Or any combination of front-end components in a computing system. The components of the system may be interconnected by any form or medium of digital data communication (eg, a communication network). Examples of communication networks include: Local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.

计算机系统可以包括客户端和服务器。客户端和服务器一般远离彼此并且通常通过通信网络进行交互。通过在相应的计算机上运行并且彼此具有客户端-服务器关系的计算机程序来产生客户端和服务器的关系。A computer system can include clients and servers. Clients and servers are generally remote from each other and usually interact through a communication network. The relationship of client and server arises by computer programs running on the respective computers and having a client-server relationship to each other.

应该理解，可以使用上面所示的各种形式的流程，重新排序、增加或删除步骤。例如，本发申请中记载的各步骤可以并行地执行也可以顺序地执行也可以不同的次序执行，只要能够实现本申请公开的技术方案所期望的结果，本文在此不进行限制。It should be understood that steps may be reordered, added or deleted using the various forms of flow shown above. For example, the steps described in the present application can be performed in parallel, sequentially or in different orders, and as long as the desired results of the technical solutions disclosed in the present application can be achieved, no limitation is imposed herein.

上述具体实施方式，并不构成对本申请保护范围的限制。本领域技术人员应该明白的是，根据设计要求和其他因素，可以进行各种修改、组合、子组合和替代。任何在本申请的精神和原则之内所作的修改、等同替换和改进等，均应包含在本申请保护范围之内。The above-mentioned specific embodiments do not constitute a limitation on the protection scope of the present application. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may occur depending on design requirements and other factors. Any modifications, equivalent replacements and improvements made within the spirit and principles of this application shall be included within the protection scope of this application.

Claims

1. A model training method based on knowledge distillation, the method comprising:

According to the training task data set, the underlying network of the pre-training model is subjected to precipitation training at least twice to obtain an enhanced model of the pre-training model; wherein, the training objects of each pre-training training include at least the underlying network and the prediction layer a network, and includes successively decreasing middle and high-level networks, and the pre-training model includes the bottom-layer network, at least one of the middle and high-level networks, and the prediction layer network from bottom to top;

At least two networks in the reinforcement model are used as target networks, and a distillation model is constructed according to the target networks, wherein the target network includes a feature identification network and the prediction layer network; the feature identification network includes at least the the underlying network;

Extract the target knowledge of the training task data set through the target network of the reinforcement model;

According to the target knowledge and the training task data set, the distillation model is trained to obtain a target learning model.

2 . The method according to claim 1 , wherein the bottom layer network and the middle and high-level networks are used for feature identification; and the prediction layer network is used for task prediction according to the identified features. 3 .

3. The method according to claim 1, wherein, according to the training task data set, the bottom network of the pre-training model is subjected to precipitation training at least twice, comprising:

dividing the training task data set to determine multiple training data subsets;

According to the set precipitation training times, the training objects corresponding to each training data subset are determined; wherein, the training objects corresponding to each training data subset include the underlying network, the middle and high-level network and the prediction layer network of the pre-training model, And the number of layers of the middle and high-level networks included is inversely proportional to the order of precipitation training;

According to each training data subset, perform a precipitation training on the training objects corresponding to the training data subset in the pre-training model;

Wherein, the number of divisions of the training data subset is less than or equal to the total number of layers of the pre-training model.

4. The method according to claim 3, wherein the middle and high-level networks included in each of the training objects are network layers that are adjacent to the underlying network and are continuous upwards; and based on the increase in the number of times of precipitation training, the training objects The number of layers of the mid- and high-level networks included in decrements to zero.

5. The method according to claim 1, wherein, according to the training task data set, the bottom network of the pre-training model is subjected to precipitation training at least twice to obtain the reinforcement model of the pre-training model, comprising:

Perform precipitation training on the underlying network of the pre-training model one by one according to the training task data set;

According to the test task data set, test the pre-trained model after precipitation training;

If the test result satisfies the precipitation end condition, the pre-trained model after the precipitation training is used as the reinforcement model.

6. The method according to claim 1, wherein, according to the training task data set, before the bottom network of the pre-training model is subjected to precipitation training at least twice, the method further comprises:

Domain training is performed on the pre-trained model according to the training domain data set, and the pre-trained model is updated.

7. The method according to claim 1, wherein at least two networks in the reinforcement model are used as target networks, and a distillation model is constructed according to the target networks, comprising:

Using at least two networks in the enhanced model as target networks, and acquiring network building blocks of the target network;

According to the obtained network structure block, a distillation model with the same structure as the reinforcement model is constructed.

8. The method according to claim 1, wherein at least two networks in the reinforcement model are used as target networks, and a distillation model is constructed according to the target networks, comprising:

using at least two networks in the enhanced model as target networks;

According to the target network, a neural network model with a different structure from the reinforcement model is selected as the distillation model, wherein the output layer network of the neural network model is consistent with the type of the prediction layer network in the target network, and the neural network The non-output layer network of the model is consistent with the type of the feature recognition network in the target network.

9. The method according to claim 1, wherein, extracting the target knowledge of the training task data set through the target network of the reinforcement model, comprising:

Using the training task data set as the input of the reinforcement model, obtain the first data feature representation of the feature recognition network output of the reinforcement model, and the first prediction probability representation of the prediction layer network output of the reinforcement model;

The acquired first data feature representation and the first predicted probability representation are used as target knowledge of the training task data set.

10. The method of claim 1, wherein the distillation model is trained according to the target knowledge and the training task dataset, comprising:

Inputting the training task data set into the distillation model, and determining a soft-supervised label and a hard-supervised label according to the processing result of the distillation model on the training task data set and the target knowledge;

determining a target label according to the soft supervision label and the hard supervision label;

According to the target label, the parameters of the distillation model are iteratively updated.

11. The method according to claim 10, wherein the training task data set is input into the distillation model, and according to the processing result of the distillation model on the training task data set and the target knowledge, determine Soft-supervised labels and hard-supervised labels, including:

Inputting the training task data set into the distillation model, to obtain the second data feature representation of the feature recognition network output of the distillation model, and the second prediction probability representation of the prediction layer network output of the distillation model;

determining a soft-supervised label according to the target knowledge, the second data feature representation and the second predicted probability representation;

Hard-supervised labels are determined based on the second predicted probability representation and the training task dataset information.

12. The method according to claim 11, wherein the training task data set information comprises: the number of training samples, the number of training labels and the actual label value in the training task data set.

13. The method of claim 11, wherein determining a soft-supervised label based on the target knowledge, the second data feature representation, and the second predicted probability representation comprises:

Taking the mean variance of the first data feature representation in the target knowledge and the second data feature representation as a data feature label;

Taking the mean variance of the first prediction probability representation in the target knowledge and the second prediction probability representation as a probability prediction label;

According to the weight value of the feature recognition network of the enhanced model, label fusion is performed on the data feature label and the probability prediction label to obtain a soft-supervised label.

14. The method according to claim 1, wherein, according to the target knowledge and the training task data set, the distillation model is trained to obtain a target learning model, comprising:

According to the target knowledge and the training task data set, the distillation model is trained;

Test the trained distillation model according to the test task dataset;

If the test result satisfies the training end condition, the trained distillation model is used as the target learning model.

15. The method of any one of claims 1-14, wherein the pretrained model is a bert model.

16. The method of any one of claims 1-14, wherein the pretrained model and the target learning model are models for intent recognition;

Correspondingly, after the distillation model is trained according to the target knowledge and the training task data set to obtain a target learning model, the method further includes:

The target learning model is deployed in a human-computer interaction device to perform intention recognition on the user's voice data acquired by the human-computer interaction device in real time.

17. An intent recognition method, the method comprising:

Obtain user voice data collected by human-computer interaction equipment;

Input the user speech data into the target learning model to obtain the user intent recognition result output by the target learning model; wherein, the target learning model is based on the knowledge distillation-based model training method described in any one of claims 1-16 determined by training;

A response result of the human-computer interaction device is determined according to the user intention recognition result.

18. The method according to claim 17, wherein the execution body of the method is the human-computer interaction device or a server that communicates and interacts with the human-computer interaction device.

19. A model training device based on knowledge distillation, the device comprising:

The precipitation training module is used to perform precipitation training on the underlying network of the pre-training model at least twice according to the training task data set to obtain the enhanced model of the pre-training model; wherein, the training objects of each pre-training training at least include all the bottom layer network and the prediction layer network, and include successively decreasing middle and high level networks, and the pre-training model includes the bottom layer network, at least one of the middle and high level networks and the prediction layer network from bottom to top;

a distillation model building module, configured to use at least two networks in the enhanced model as target networks, and build a distillation model according to the target networks, wherein the target network includes a feature recognition network and the prediction layer network; The feature identification network includes at least the underlying network;

a target knowledge extraction module, used for extracting the target knowledge of the training task data set through the target network of the reinforcement model;

The distillation model training module is used for training the distillation model according to the target knowledge and the training task data set to obtain a target learning model.

20. The apparatus according to claim 19, wherein the bottom layer network and the middle and high-level networks are used for feature identification; and the prediction layer network is used for task prediction according to the identified features.

21. The apparatus of claim 19, wherein the precipitation training module comprises:

a data subset dividing unit for dividing the training task data set to determine multiple training data subsets;

The training object determination unit is used to determine the training objects corresponding to each training data subset according to the set precipitation training times; wherein, the training objects corresponding to each training data subset include the underlying network of the pre-training model, the middle High-level network and prediction layer network, and the number of layers of the middle and high-level network included is inversely proportional to the order of precipitation training;

a precipitation training unit, configured to perform a precipitation training on the training objects corresponding to the training data subset in the pre-training model according to each training data subset;

22. The device according to claim 21, wherein the middle and high-level networks included in each of the training objects are network layers adjacent to the underlying network and continuous upward; and based on the increase in the number of times of the precipitation training, the training objects The number of layers of the mid- and high-level networks included in decrements to zero.

23. The device according to claim 19, wherein the precipitation training module is specifically used for:

24. The apparatus of claim 19, further comprising:

The domain training model is used to perform domain training on the pre-training model according to the training domain data set before performing precipitation training on the underlying network of the pre-training model at least twice according to the training task data set, and update the pre-training model.

25. The apparatus of claim 19, wherein the distillation model building block is specifically used to:

26. The apparatus of claim 19, wherein the distillation model building block is further specifically used for:

using at least two networks in the enhanced model as target networks;

27. The apparatus according to claim 19, wherein the target knowledge extraction module is specifically used for:

28. The apparatus of claim 19, wherein the distillation model training module comprises:

Supervised label determination unit, used to input the training task dataset into the distillation model, and determine soft-supervised labels and hard-supervised labels according to the processing results of the distillation model on the training task dataset and the target knowledge Label;

a target label determination unit, configured to determine a target label according to the soft supervision label and the hard supervision label;

A model parameter updating unit, configured to iteratively update the parameters of the distillation model according to the target label.

29. The apparatus according to claim 28, wherein the supervisory label determination unit specifically comprises:

The output acquisition subunit is used to input the training task data set into the distillation model to obtain the second data feature representation of the feature recognition network output of the distillation model, and the second data feature representation of the prediction layer network output of the distillation model. Predicted probability representation;

a soft label determination subunit, configured to determine a soft supervision label according to the target knowledge, the second data feature representation and the second predicted probability representation;

A hard label determination subunit, configured to determine a hard supervised label according to the second prediction probability representation and the training task dataset information.

30. The apparatus according to claim 29, wherein the training task data set information comprises: the number of training samples, the number of training labels and the actual label value in the training task data set.

31. The apparatus according to claim 29, wherein the soft label determination subunit is specifically used for:

32. The apparatus of claim 19, wherein the distillation model training module is further used to:

Test the trained distillation model according to the test task dataset;

33. The apparatus of any one of claims 19-32, wherein the pretrained model is a bert model.

34. The apparatus of any one of claims 19-32, wherein the pretrained model and the target learning model are models for intent recognition;

Accordingly, it also includes:

The model deployment module is used for deploying the target learning model to the human-computer interaction device, so as to perform intention recognition on the user voice data obtained by the human-computer interaction device in real time.

35. An intent recognition device, the device comprising:

A voice data acquisition module, used to acquire user voice data collected by human-computer interaction equipment;

Intention recognition module, used for inputting the user voice data into the target learning model, to obtain the user intention recognition result output by the target learning model; wherein, the target learning model is based on any one of claims 1-16 based on the The model training method of knowledge distillation is determined by training;

A response result determination module, configured to determine a response result of the human-computer interaction device according to the user intention identification result.

36. The apparatus according to claim 35, wherein the apparatus is configured in the human-computer interaction device, or a server that communicates and interacts with the human-computer interaction device.

37. An electronic device comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,

The memory stores instructions executable by the at least one processor, the instructions being executed by the at least one processor to enable the at least one processor to perform the execution of any of claims 1-16 The model training method based on knowledge distillation, or perform the intent recognition method described in any one of claims 17-18.

38. A non-transitory computer-readable storage medium storing computer instructions for causing the computer to perform the knowledge distillation-based model training method of any one of claims 1-16, or to perform The intent recognition method of any one of claims 17-18.