CN111666763A

CN111666763A - Network structure construction method and device for multitask scene

Info

Publication number: CN111666763A
Application number: CN202010468557.3A
Authority: CN
Inventors: 朱威; 李恬静; 何义龙
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2020-05-28
Filing date: 2020-05-28
Publication date: 2020-09-15
Anticipated expiration: 2040-05-28
Also published as: CN111666763B; WO2021114625A1

Abstract

The present application relates to artificial intelligence, and provides a network structure construction method, device, device and storage medium for multi-task scenarios, including: acquiring a training set, and inputting training sub-text data corresponding to each target semantic task step by step to be determined The multi-task network model of the network structure, obtain the sub-prediction results corresponding to each target semantic task, adjust the network parameters of the multi-task network model until the current target network parameters corresponding to the current network structure are obtained; obtain the search space corresponding to the multi-task network model , obtain the validation set, adjust the structural parameters of the multi-task network model corresponding to the current target network parameters by searching the differentiable network search space, and divide the hidden state vector of the multi-task network model into multiple ordered sub-hidden states during the search. vector, until the output result of the multi-task network model on the validation set satisfies the convergence condition, the target structure parameters are obtained, and the trained multi-task network model is obtained.

Description

Network structure construction method and device for multi-task scene

技术领域technical field

本申请涉及人工智能技术领域，特别是涉及一种用于多任务场景的网络结构构建方法、装置、计算机设备和存储介质。The present application relates to the technical field of artificial intelligence, and in particular, to a network structure construction method, apparatus, computer equipment and storage medium for multi-task scenarios.

背景技术Background technique

机器学习(ML，Machine Learning)是人工智能的一个分支，机器学习的目的是让机器根据先验的知识进行学习，从而具有分类和判断的逻辑能力。以神经网络为代表的机器学习模型不断发展，被越来越多地应用到各个行业中。Machine learning (ML, Machine Learning) is a branch of artificial intelligence. The purpose of machine learning is to allow machines to learn based on prior knowledge, so as to have the logical ability to classify and judge. Machine learning models represented by neural networks continue to develop and are increasingly applied to various industries.

多任务学习机制在现代的人工智能产品应用方面有很广泛的应用。多任务指需要对输入针对不同的任务得到对应的识别结果，原始的解决方案是每个子任务训练一个模型，经过部署后，每个模型都要训练一次，训练耗时，预测速度慢，且由工程师们自己手动尝试不同的神经网络架构，然后根据验证集的表现确定目标架构。由于多任务场景的网络架构学习的复杂性，很难人工设计出非常好的神经网络结构。传统的模型结构自动搜索方法主要针对分类问题，无法直接应用于多任务场景的模型结构自动搜索。通过人工不断尝试的方法构建多任务场景的模型结构，复杂度高，效率低，系统资源占用率大。The multi-task learning mechanism is widely used in the application of modern artificial intelligence products. Multitasking refers to the need to obtain the corresponding recognition results for different tasks. The original solution is to train a model for each subtask. After deployment, each model needs to be trained once. The training is time-consuming and the prediction speed is slow. Engineers manually tried different neural network architectures themselves and then settled on the target architecture based on the performance on the validation set. Due to the complexity of network architecture learning in multi-task scenarios, it is difficult to manually design a very good neural network structure. Traditional model structure automatic search methods are mainly aimed at classification problems, and cannot be directly applied to model structure automatic search in multi-task scenarios. The model structure of the multi-task scene is constructed by the method of manual continuous trial, which has high complexity, low efficiency and large system resource occupancy rate.

发明内容SUMMARY OF THE INVENTION

基于此，有必要针对上述技术问题，提供一种用于多任务场景的网络结构构建方法、装置、计算机设备和存储介质，自动发现最适合已有的多任务场景数据集的网络架构，通过部分链接有效降低可微分搜索时的资源消耗，使得搜索收敛更快更稳定，提高效率和降低系统资源占用率。Based on this, it is necessary to provide a network structure construction method, device, computer equipment and storage medium for multi-task scenarios to automatically discover the most suitable network architecture for existing multi-task scenario datasets. Links effectively reduce resource consumption during differentiable search, make search convergence faster and more stable, improve efficiency and reduce system resource occupancy.

一种用于多任务场景的网络结构构建方法，所述方法包括：A method for constructing a network structure for a multi-task scenario, the method comprising:

获取训练集，所述训练集包括多个不同目标语义任务对应的训练子样本,训练子样本包括训练子文本数据和训练子标签数据；Obtaining a training set, the training set includes training subsamples corresponding to multiple different target semantic tasks, and the training subsamples include training subtext data and training sublabel data;

将各个目标语义任务对应的训练子文本数据分步输入待确定网络结构的多任务网络模型，得到各个目标语义任务对应的子预测结果，根据子预测结果与对应的训练子标签数据的差异调整所述多任务网络模型的网络参数，直到得到与当前网络结构对应的当前目标网络参数；The training sub-text data corresponding to each target semantic task is input into the multi-task network model of the network structure to be determined step by step, and the sub-prediction results corresponding to each target semantic task are obtained. Describe the network parameters of the multi-task network model until the current target network parameters corresponding to the current network structure are obtained;

获取所述多任务网络模型对应的搜索空间，形成可微网络搜索空间，获取验证集，根据所述验证集通过搜索可微网络搜索空间调整所述当前目标网络参数对应的多任务网络模型的结构参数,搜索时将所述多任务网络模型的隐含状态向量分为多个有序的子隐含状态向量，按预设顺序获取当次搜索对应的子隐含状态向量，将子隐含状态向量输入对应的网络层进行训练，得到更新的多任务网络模型，返回将各个目标语义任务对应的训练子文本数据分步输入待确定网络结构的多任务网络模型的步骤，直到多任务网络模型在所述验证集上的输出结果满足收敛条件，得到目标结构参数，获取与目标结构参数匹配的网络参数，根据所述目标结构参数和匹配的网络参数得到已训练的多任务网络模型。Obtain the search space corresponding to the multi-task network model, form a differentiable network search space, obtain a verification set, and adjust the structure of the multi-task network model corresponding to the current target network parameters by searching the differentiable network search space according to the verification set parameter, the hidden state vector of the multi-task network model is divided into a plurality of ordered sub-hidden state vectors during the search, the sub-hidden state vectors corresponding to the current search are obtained in a preset order, and the sub-hidden state vectors are divided into sub-hidden state vectors. The vector input corresponds to the network layer for training, and the updated multi-task network model is obtained, and the step of inputting the training sub-text data corresponding to each target semantic task into the multi-task network model of the network structure to be determined step by step is returned, until the multi-task network model is in The output result on the verification set satisfies the convergence condition, the target structure parameters are obtained, the network parameters matching the target structure parameters are obtained, and the trained multi-task network model is obtained according to the target structure parameters and the matched network parameters.

在其中一个实施例中，所述搜索可微网络搜索空间通过以下共享方式中的至少一种：In one of the embodiments, the searchable network search space is shared in at least one of the following ways:

所述可微网络搜索空间中多头注意力的矩阵参数共享；Matrix parameter sharing of multi-head attention in the differentiable network search space;

所述多任务网络模型的池化层的搜索时，基于胶囊网络的多个操作符，共享映射网络的参数；During the search of the pooling layer of the multi-task network model, the parameters of the mapping network are shared based on multiple operators of the capsule network;

获取所述多任务网络模型的节点间的连接关系，将具有同一个起始节点的节点组成节点集合，不同节点集合中的节点对应的操作符进行参数共享。The connection relationship between the nodes of the multi-task network model is acquired, the nodes with the same starting node are formed into a node set, and the operators corresponding to the nodes in different node sets share parameters.

在其中一个实施例中，所述将各个目标语义任务对应的训练子文本数据分步输入待确定网络结构的多任务网络模型，得到各个目标语义任务对应的子预测结果包括：In one embodiment, the training sub-text data corresponding to each target semantic task is input into the multi-task network model of the network structure to be determined step by step, and the sub-prediction results corresponding to each target semantic task are obtained including:

将当前目标语义任务对应的当前训练子文本数据进行分词，将各个分词映射到对应的向量，组成向量集；Perform word segmentation on the current training sub-text data corresponding to the current target semantic task, map each word segmentation to the corresponding vector, and form a vector set;

经过编码器对所述向量集提取语义特征，根据语义特征得到所述当前目标语义任务对应的子预测结果，其中所述当前目标语义任务是所述各个目标语义任务中的一个。The encoder extracts semantic features from the vector set, and obtains a sub-prediction result corresponding to the current target semantic task according to the semantic features, wherein the current target semantic task is one of the target semantic tasks.

计算当前目标语义任务对应的当前训练子文本数据与数据库中的候选文本对应的相似度，得到与所述当前训练子文本数据匹配的相似子文本数据；Calculate the similarity between the current training sub-text data corresponding to the current target semantic task and the candidate text in the database, and obtain similar sub-text data matching the current training sub-text data;

将所述当前训练子文本数据对应的第一向量集输入第一编码器提取语义特征得到第一语义特征，将所述相似子文本数据对应的第二向量集输入第二编码器提取语义特征得到第二语义特征；Inputting the first vector set corresponding to the current training sub-text data into the first encoder to extract semantic features to obtain first semantic features, and inputting the second vector set corresponding to the similar sub-text data into the second encoder to extract semantic features to obtain the second semantic feature;

根据所述第一语义特征和第二语义特征得到所述当前目标语义任务对应的子预测结果。A sub-prediction result corresponding to the current target semantic task is obtained according to the first semantic feature and the second semantic feature.

在其中一个实施例中，所述第一编码器和第二编码器的权重共享。In one of the embodiments, the weights of the first encoder and the second encoder are shared.

在其中一个实施例中，所述根据子预测结果与对应的训练子标签数据的差异调整所述多任务网络模型的网络参数包括：In one embodiment, the adjusting the network parameters of the multi-task network model according to the difference between the sub-prediction result and the corresponding training sub-label data includes:

获取各个目标语义任务对应的子预测结果与训练子标签数据，得到与各个目标语义任务对应的子差异；Obtain the sub-prediction results and training sub-label data corresponding to each target semantic task, and obtain the sub-difference corresponding to each target semantic task;

获取各个目标语义任务对应的任务权重，根据任务权重对各个子差异进行加权得到统计子差异；Obtain the task weight corresponding to each target semantic task, and weight each sub-difference according to the task weight to obtain the statistical sub-difference;

根据所述统计子差异调整所述多任务网络模型的网络参数。The network parameters of the multi-task network model are adjusted according to the statistical sub-differences.

一种用于多任务场景的网络结构构建装置，所述装置包括：A network structure construction device for a multi-task scenario, the device comprising:

获取模块，用于获取训练集，所述训练集包括多个不同目标语义任务对应的训练子样本,训练子样本包括训练子文本数据和训练子标签数据；an acquisition module, used for acquiring a training set, the training set includes training subsamples corresponding to a plurality of different target semantic tasks, and the training subsamples include training subtext data and training sublabel data;

网络参数调整模块，用于将各个目标语义任务对应的训练子文本数据分步输入待确定网络结构的多任务网络模型，得到各个目标语义任务对应的子预测结果，根据子预测结果与对应的训练子标签数据的差异调整所述多任务网络模型的网络参数，直到得到与当前网络结构对应的当前目标网络参数；The network parameter adjustment module is used to input the training sub-text data corresponding to each target semantic task step by step into the multi-task network model of the network structure to be determined, and obtain the sub-prediction results corresponding to each target semantic task. According to the sub-prediction results and the corresponding training The difference of the sub-label data adjusts the network parameters of the multi-task network model until the current target network parameters corresponding to the current network structure are obtained;

网络结构构建模块，用于获取所述多任务网络模型对应的搜索空间，形成可微网络搜索空间，获取验证集，根据所述验证集通过搜索可微网络搜索空间调整所述当前目标网络参数对应的多任务网络模型的结构参数,搜索时将所述多任务网络模型的隐含状态向量分为多个有序的子隐含状态向量，按预设顺序获取当次搜索对应的子隐含状态向量，将子隐含状态向量输入对应的网络层进行训练，得到更新的多任务网络模型，返回网络参数调整模块，直到多任务网络模型在所述验证集上的输出结果满足收敛条件，得到目标结构参数，获取与目标结构参数匹配的网络参数，根据所述目标结构参数和匹配的网络参数得到已训练的多任务网络模型。The network structure building module is used to obtain the search space corresponding to the multi-task network model, form a differentiable network search space, obtain a verification set, and adjust the corresponding current target network parameters by searching the differentiable network search space according to the verification set. The structural parameters of the multi-task network model, the hidden state vector of the multi-task network model is divided into a plurality of ordered sub-hidden state vectors during the search, and the sub-hidden states corresponding to the current search are obtained in a preset order. vector, input the sub-hidden state vector into the corresponding network layer for training, obtain the updated multi-task network model, and return to the network parameter adjustment module until the output result of the multi-task network model on the verification set satisfies the convergence condition, and the target is obtained. Structural parameters, obtaining network parameters matching the target structural parameters, and obtaining a trained multi-task network model according to the target structural parameters and the matching network parameters.

在其中一个实施例中，所述网络结构构建模块还用于搜索可微网络搜索空间通过以下共享方式中的至少一种：In one of the embodiments, the network structure building block is further used to search the differentiable network search space through at least one of the following sharing methods:

一种计算机设备，包括存储器和处理器，所述存储器存储有计算机程序，所述处理器执行所述计算机程序时实现以下步骤：A computer device includes a memory and a processor, the memory stores a computer program, and the processor implements the following steps when executing the computer program:

一种计算机可读存储介质，其上存储有计算机程序，所述计算机程序被处理器执行时实现以下步骤：A computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, the following steps are implemented:

上述用于多任务场景的网络结构构建方法、装置、计算机设备和存储介质，具有自动发现最适合已有的多任务场景数据集的网络架构，不需要人工尝试很多不同模型就能提高多任务系统的精度，通过部分链接有效降低可微分搜索时候的资源消耗，而且使得搜索收敛更快更稳定，在提升系统精度的同时降低了系统开发所需的人力和计算资源成本，提高效率和降低系统资源占用率。The above-mentioned network structure construction method, device, computer equipment and storage medium for multi-task scenarios have the ability to automatically discover the network architecture most suitable for existing multi-task scenario data sets, and can improve multi-task systems without manually trying many different models. It can effectively reduce the resource consumption during differentiable search through partial links, and make the search convergence faster and more stable. While improving the system accuracy, it reduces the cost of manpower and computing resources required for system development, improves efficiency and reduces system resources. Occupancy rate.

附图说明Description of drawings

图1为一个实施例中用于多任务场景的网络结构构建方法的应用环境图；1 is an application environment diagram of a method for constructing a network structure for a multi-task scenario in one embodiment;

图2为一个实施例中用于多任务场景的网络结构构建方法的流程示意图；2 is a schematic flowchart of a method for constructing a network structure for a multi-task scenario in one embodiment;

图3为一个实施例中用于多任务场景的网络结构构建装置的结构框图；3 is a structural block diagram of an apparatus for constructing a network structure for a multi-task scenario in one embodiment;

图4为一个实施例中计算机设备的内部结构图。FIG. 4 is a diagram of the internal structure of a computer device in one embodiment.

具体实施方式Detailed ways

为了使本申请的目的、技术方案及优点更加清楚明白，以下结合附图及实施例，对本申请进行进一步详细说明。应当理解，此处描述的具体实施例仅仅用以解释本申请，并不用于限定本申请。In order to make the purpose, technical solutions and advantages of the present application more clearly understood, the present application will be described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are only used to explain the present application, but not to limit the present application.

本申请提供的用于多任务场景的网络结构构建方法，可以应用于如图1所示的应用环境中。图1为一个实施例中用于多任务场景的网络结构构建方法运行的应用环境图。如图1所示，该应用环境包括终端110、服务器120。终端、服务器之间通过网络进行通信，通信网络可以是无线或者有线通信网络，例如IP网络、蜂窝移动通信网络等，其中终端和服务器的个数不限。The method for constructing a network structure for a multi-task scenario provided by this application can be applied to the application environment shown in FIG. 1 . FIG. 1 is an application environment diagram for the operation of a method for constructing a network structure in a multi-task scenario in one embodiment. As shown in FIG. 1 , the application environment includes a terminal 110 and a server 120 . The terminal and the server communicate through a network, and the communication network may be a wireless or wired communication network, such as an IP network, a cellular mobile communication network, etc., wherein the number of terminals and servers is not limited.

其中，终端110可以但不限于是各种个人计算机、笔记本电脑、智能手机、平板电脑和便携式可穿戴设备。服务器可以用独立的服务器或者是多个服务器组成的服务器集群来实现。可以在终端110或服务器120获取训练集，训练集包括多个不同目标语义任务对应的训练子样本,训练子样本包括训练子文本数据和训练子标签数据；将各个目标语义任务对应的训练子文本数据分步输入待确定网络结构的多任务网络模型，得到各个目标语义任务对应的子预测结果，根据子预测结果与对应的训练子标签数据的差异调整所述多任务网络模型的网络参数，直到得到与当前网络结构对应的当前目标网络参数；获取多任务网络模型对应的搜索空间，形成可微网络搜索空间，获取验证集，根据验证集通过搜索可微网络搜索空间调整当前目标网络参数对应的多任务网络模型的结构参数,搜索时将多任务网络模型的隐含状态向量分为多个有序的子隐含状态向量，按预设顺序获取当次搜索对应的子隐含状态向量，将子隐含状态向量输入对应的网络层进行训练，得到更新的多任务网络模型，返回将各个目标语义任务对应的训练子文本数据分步输入待确定网络结构的多任务网络模型的步骤，直到多任务网络模型在验证集上的输出结果满足收敛条件，得到目标结构参数，获取与目标结构参数匹配的网络参数，根据目标结构参数和匹配的网络参数得到已训练的多任务网络模型。Wherein, the terminal 110 may be, but is not limited to, various personal computers, notebook computers, smart phones, tablet computers and portable wearable devices. The server can be implemented as an independent server or a server cluster composed of multiple servers. A training set can be obtained at the terminal 110 or the server 120, and the training set includes training subsamples corresponding to a plurality of different target semantic tasks, and the training subsamples include training subtext data and training sublabel data; The data is input into the multi-task network model of the network structure to be determined step by step, the sub-prediction results corresponding to each target semantic task are obtained, and the network parameters of the multi-task network model are adjusted according to the difference between the sub-prediction results and the corresponding training sub-label data, until Obtain the current target network parameters corresponding to the current network structure; obtain the search space corresponding to the multi-task network model, form a differentiable network search space, obtain a verification set, and adjust the current target network parameters according to the verification set by searching the differentiable network search space. The structural parameters of the multi-task network model. When searching, the hidden state vector of the multi-task network model is divided into multiple ordered sub-hidden state vectors, and the sub-hidden state vectors corresponding to the current search are obtained in a preset order. The sub-hidden state vector is input to the corresponding network layer for training, and an updated multi-task network model is obtained, and the step of inputting the training sub-text data corresponding to each target semantic task into the multi-task network model of the network structure to be determined step by step is returned. The output result of the task network model on the verification set satisfies the convergence condition, the target structure parameters are obtained, the network parameters matching the target structure parameters are obtained, and the trained multi-task network model is obtained according to the target structure parameters and the matched network parameters.

在一个实施例中，如图2所示，提供了一种用于多任务场景的网络结构构建方法，以该方法应用于图1中的终端110或服务器120为例进行说明，包括以下步骤：In one embodiment, as shown in FIG. 2 , a method for constructing a network structure for a multi-task scenario is provided, and the method is applied to the terminal 110 or the server 120 in FIG. 1 as an example for description, including the following steps:

步骤210，获取训练集，训练集包括多个不同目标语义任务对应的训练子样本,训练子样本包括训练子文本数据和训练子标签数据。Step 210: Obtain a training set, where the training set includes training subsamples corresponding to multiple different target semantic tasks, and the training subsamples include training subtext data and training sublabel data.

其中，多个不同目标语义任务对应的训练子样本组成训练集，目标语义任务是多任务场景对应的多个不同类型的任务，如对于语义分析类型的任务包括实体识别、句子分类，意图识别，句子对相似度等任务。其中目标语义任务的数量与待确定网络结构的多任务网络模型的目标识别结果对应，其中多任务网络模型可以是语义分析网络。如语义分析网络的目标识别结果包括输入文本的实体识别和用途识别，则目标语义任务包括实体识别任务和用途识别任务。如在接收到用户的问句时”二甲双胍怎么吃”，既要识别里面的实体”二甲双胍”，又要识别这句话的用意意图,即用户想问用法用量。Among them, training sub-samples corresponding to multiple different target semantic tasks constitute a training set, and target semantic tasks are multiple different types of tasks corresponding to multi-task scenarios. For example, tasks of semantic analysis type include entity recognition, sentence classification, and intent recognition. Sentence pair similarity and other tasks. The number of target semantic tasks corresponds to the target recognition result of the multi-task network model of the network structure to be determined, wherein the multi-task network model may be a semantic analysis network. If the target recognition result of the semantic analysis network includes entity recognition and purpose recognition of the input text, the target semantic task includes entity recognition task and purpose recognition task. For example, when receiving a question from the user "how to eat metformin", it is necessary to identify the entity "metformin" and the intent of the sentence, that is, the user wants to ask the usage and dosage.

具体地，不同的目标语义任务有对应的训练子样本，以适应多任务场景，如第一目标语义任务对应第一训练子样本，第二目标语义任务对应第二训练子样本，各个训练子样本都包括训练子文本数据和训练标签数据，其中训练子标签数据是已确定对应任务结果的训练文本数据对应的任务识别结果，这个任务识别结果即作为与目标语义任务对应的训练子标签数据。Specifically, different target semantic tasks have corresponding training subsamples to adapt to multi-task scenarios. For example, the first target semantic task corresponds to the first training subsample, the second target semantic task corresponds to the second training subsample, and each training subsample corresponds to the second training subsample. Both include training sub-text data and training label data, where the training sub-label data is the task recognition result corresponding to the training text data for which the corresponding task result has been determined, and this task recognition result is used as the training sub-label data corresponding to the target semantic task.

步骤220，将各个目标语义任务对应的训练子文本数据分步输入待确定网络结构的多任务网络模型，得到各个目标语义任务对应的子预测结果，根据子预测结果与对应的训练子标签数据的差异调整多任务网络模型的网络参数，直到得到与当前网络结构对应的当前目标网络参数。Step 220, input the training sub-text data corresponding to each target semantic task step by step into the multi-task network model of the network structure to be determined, obtain the sub-prediction results corresponding to each target semantic task, and obtain the sub-prediction results corresponding to each target semantic task. The network parameters of the multi-task network model are adjusted differentially until the current target network parameters corresponding to the current network structure are obtained.

具体地，将训练样本中的训练子文本数据以目标语义任务为单位分步输入待确定网络结构的多任务网络模型，其中分步是指第一目标语义任务对应的训练子文本数据先输入，得到第一目标语义任务对应的第一子预测结果，然后接着输入第二目标语义任务对应的训练子文本数据，得到第二目标语义任务对应的第二子预测结果，直到各个目标语义任务对应的训练子文本数据依次分步输入得到对应的子预测结果。每个子预测结果都存在对应的训练子标签数据，从而计算得到各个目标语义任务对应的子差异，根据各个子差异构建损失函数，再按照最小化该损失函数的方向反向传播，调整多任务网络模型的网络参数并继续训练，直至满足训练结束条件。通过最小化训练损失，获得与多任务网络模型的结构相关的当前目标网络参数，即当前最优权重w。Specifically, the training sub-text data in the training sample is input into the multi-task network model of the network structure to be determined step by step with the target semantic task as the unit, wherein the step means that the training sub-text data corresponding to the first target semantic task is input first, Obtain the first sub-prediction result corresponding to the first target semantic task, and then input the training sub-text data corresponding to the second target semantic task to obtain the second sub-prediction result corresponding to the second target semantic task, until the corresponding The training sub-text data is sequentially input step by step to obtain the corresponding sub-prediction results. Each sub-prediction result has corresponding training sub-label data, so as to calculate the sub-difference corresponding to each target semantic task, construct a loss function according to each sub-difference, and then backpropagate in the direction that minimizes the loss function to adjust the multi-task network. network parameters of the model and continue training until the end of training condition is met. By minimizing the training loss, the current target network parameters related to the structure of the multi-task network model, i.e. the current optimal weight w are obtained.

步骤230，获取多任务网络模型对应的搜索空间，形成可微网络搜索空间，获取验证集，根据验证集通过搜索可微网络搜索空间调整当前目标网络参数对应的多任务网络模型的结构参数,搜索时将多任务网络模型的隐含状态向量分为多个有序的子隐含状态向量，按预设顺序获取当次搜索对应的子隐含状态向量，将子隐含状态向量输入对应的网络层进行训练，得到更新的多任务网络模型，返回将各个目标语义任务对应的训练子文本数据分步输入待确定网络结构的多任务网络模型的步骤，直到多任务网络模型在验证集上的输出结果满足收敛条件，得到目标结构参数，获取与目标结构参数匹配的网络参数，根据目标结构参数和匹配的网络参数得到已训练的多任务网络模型。Step 230: Obtain the search space corresponding to the multi-task network model, form a differentiable network search space, obtain a verification set, adjust the structural parameters of the multi-task network model corresponding to the current target network parameters by searching the differentiable network search space according to the verification set, and search for When the hidden state vector of the multi-task network model is divided into multiple ordered sub-hidden state vectors, the sub-hidden state vectors corresponding to the current search are obtained in a preset order, and the sub-hidden state vectors are input into the corresponding network. layer training to obtain an updated multi-task network model, and return to the step of inputting the training sub-text data corresponding to each target semantic task into the multi-task network model of the network structure to be determined, until the output of the multi-task network model on the validation set The result satisfies the convergence condition, the target structure parameters are obtained, the network parameters matching the target structure parameters are obtained, and the trained multi-task network model is obtained according to the target structure parameters and the matched network parameters.

其中，定义搜索空间，搜索空间中包含各种操作符，有LSTM(Long Short-TermMemory，长短期记忆网络)，门控循环单元GRU，一维卷积、多头注意力(multi-headattention)等，其中操作符的核的大小可为1，3，5等，其中注意力头的数量可为1，2，4，8等。Among them, the search space is defined, and the search space contains various operators, including LSTM (Long Short-TermMemory, long short-term memory network), gated recurrent unit GRU, one-dimensional convolution, multi-head attention (multi-head attention), etc., where the size of the kernel of the operator can be 1, 3, 5, etc., where the number of attention heads can be 1, 2, 4, 8, etc.

具体地，多任务网络模型看成由多个单元cell堆叠而成，而一个cell是一个有向图,由N个有序节点组成，经过有向边连接构成，把搜索空间连续松弛化，而每个有向边(i,j)代表一种操作符，看成是所有子操作的混合，可以通过softmax权值叠加实现。以下为softmax公式：Specifically, the multi-task network model is regarded as a stack of multiple cells, and a cell is a directed graph composed of N ordered nodes connected by directed edges to continuously relax the search space, while Each directed edge (i, j) represents an operator, which is regarded as a mixture of all sub-operations, which can be realized by superposition of softmax weights. The following is the softmax formula:

其中，o(x)是随机初始化的对每个子操作的权重，因为训练的需要，所以不能限定其是否是0到1之间的数。这个公式就是将其转化为0-1之间的数，这样所有子操作的权重就加起来等于1。有向边(i,j)的子操作混合权重为α^(i,j)。维度为|O|即有向边(i,j)间子操作的总个数；o()表示当前子操作。更新结构参数和网络参数，学习最优的权重参数，优化目标是一个双层的Bi-level优化问题，即Among them, o(x) is the weight of each sub-operation that is randomly initialized. Because of the needs of training, it cannot be limited whether it is a number between 0 and 1. The formula is to convert it to a number between 0-1, so that the weights of all sub-operations add up to 1. The sub-operations of the directed edge (i,j) are mixed with weight α ^(i,j) . The dimension is |O|, that is, the total number of sub-operations between directed edges (i, j); o() represents the current sub-operation. Update the structural parameters and network parameters, and learn the optimal weight parameters. The optimization goal is a two-layer Bi-level optimization problem, that is,

优化方法是交叉梯度下降，沿着L_train(w_k-1，α_k-1)对w_k-1的梯度更新一次w网络参数，沿着L_train(w_k，α_k-1)对α_k-1的梯度更新一次多任务网络模型的结构参数α。The optimization method is cross gradient descent, updating the w network parameters once along the gradient of L _train (w _k-1 , α _k-1 ) to w _k-1 , and following L _train (w _k , α _k-1 ) to α The gradient of _k-1 updates the structural parameter α of the multi-task network model once.

搜索时对多任务网络模型的隐含状态进行部分链接，如隐含状态向量包括300维，将300维分为6个有序的子隐含状态向量，每个子隐含状态向量包括50维，每次调整参数时，选取一个子隐含状态向量，即选取50维，进行一步可微分搜索，下一步，按顺序选取另外一个子隐含状态向量，依次进行选取将子隐含状态向量输入对应的网络层进行训练，得到更新的多任务网络模型，返回将各个目标语义任务对应的训练子文本数据分步输入待确定网络结构的多任务网络模型的步骤，直到多任务网络模型在验证集上的输出结果满足收敛条件，得到目标结构参数，获取与目标结构参数匹配的网络参数，根据目标结构参数和匹配的网络参数得到已训练的多任务网络模型。优化完毕后，激活最大的权重即目标结构参数所对应的操作符，去除其他操作符，得到的便是已训练的多任务网络模型。When searching, the hidden states of the multi-task network model are partially linked. For example, the hidden state vector includes 300 dimensions, and the 300 dimensions are divided into 6 ordered sub-hidden state vectors. Each sub-hidden state vector includes 50 dimensions. Each time the parameters are adjusted, a sub-implicit state vector is selected, that is, 50-dimension is selected, and a differentiable search is performed in one step. The network layer is trained to obtain an updated multi-task network model, and the steps of inputting the training sub-text data corresponding to each target semantic task into the multi-task network model of the network structure to be determined step by step, until the multi-task network model is on the verification set. The output result satisfies the convergence condition, the target structure parameters are obtained, the network parameters matching the target structure parameters are obtained, and the trained multi-task network model is obtained according to the target structure parameters and the matched network parameters. After the optimization, the operator corresponding to the target structure parameter with the largest weight is activated, and the other operators are removed to obtain the trained multi-task network model.

上述用于多任务场景的网络结构构建方法，具有自动发现最适合已有的多任务场景数据集的网络架构，不需要人工尝试很多不同模型就能提高多任务系统的精度，通过部分链接有效降低可微分搜索时候的资源消耗，而且使得搜索收敛更快更稳定，在提升系统精度的同时降低了系统开发所需的人力和计算资源成本，提高效率和降低系统资源占用率。The above-mentioned network structure construction method for multi-task scenarios has the ability to automatically discover the network architecture that is most suitable for the existing multi-task scenario datasets. It does not need to manually try many different models to improve the accuracy of the multi-task system. Differentiable resource consumption during search, and make search convergence faster and more stable, while improving system accuracy, it reduces the cost of manpower and computing resources required for system development, improves efficiency and reduces system resource occupancy.

在一个实施例中，搜索可微网络搜索空间通过以下共享方式中的至少一种：可微网络搜索空间中多头注意力的矩阵参数共享；多任务网络模型的池化层的搜索时，基于胶囊网络的多个操作符，共享映射网络的参数；获取多任务网络模型的节点间的连接关系，将具有同一个起始节点的节点组成节点集合，不同节点集合中的节点对应的操作符进行参数共享。In one embodiment, searching the differentiable network search space is performed by at least one of the following sharing methods: matrix parameter sharing of multi-head attention in the differentiable network search space; when searching the pooling layer of the multi-task network model, capsule-based Multiple operators of the network share the parameters of the mapping network; obtain the connection relationship between the nodes of the multi-task network model, group the nodes with the same starting node into a node set, and parameterize the operators corresponding to the nodes in different node sets shared.

具体地，多头注意力的3个矩阵(W_Q,W_K,W_V)可以参数共享，基于胶囊网络的操作符4个，可以共享一个映射网络的参数。比如1->2的操作符可以共享给3->4；能够共享的规则是节点之间不共有同一个起点，即可以共享，将具有同一个起始节点的节点组成节点集合，不同节点集合中的节点对应的操作符进行参数共享。参数共享有效降低可微分搜索时候的资源消耗，而且使得搜索收敛更快更稳定。Specifically, the three matrices (W_Q, W_K, W_V) of the multi-head attention can be shared by parameters, and the four operators based on the capsule network can share the parameters of a mapping network. For example, the operator of 1->2 can be shared with 3->4; the rule that can be shared is that the nodes do not share the same starting point, that is, they can be shared, and the nodes with the same starting node are formed into a node set, and different node sets The operators corresponding to the nodes in the parameter share parameters. Parameter sharing effectively reduces resource consumption during differentiable search, and makes search convergence faster and more stable.

在一个实施例中，步骤220中将各个目标语义任务对应的训练子文本数据分步输入待确定网络结构的多任务网络模型，得到各个目标语义任务对应的子预测结果包括：将当前目标语义任务对应的当前训练子文本数据进行分词，将各个分词映射到对应的向量，组成向量集；经过编码器对向量集提取语义特征，根据语义特征得到当前目标语义任务对应的子预测结果，其中当前目标语义任务是各个目标语义任务中的一个。In one embodiment, in step 220, the training sub-text data corresponding to each target semantic task is input into the multi-task network model of the network structure to be determined step by step, and obtaining the sub-prediction results corresponding to each target semantic task includes: The corresponding current training sub-text data is segmented, and each segment is mapped to the corresponding vector to form a vector set; the semantic feature is extracted from the vector set through the encoder, and the sub-prediction result corresponding to the current target semantic task is obtained according to the semantic feature. The semantic task is one of the respective target semantic tasks.

具体地，将当前目标语义任务对应的当前训练子文本数据进行分词，可采用自定义的分词算法，不同的目标语义任务的分词算法可以不同。将各个分词映射到对应的向量，可以采用自定义的映射算法。当当前目标语义任务为不同的目标语义任务时，对应的编码器可不同或相同，从而可针对不同的目标语义任务提取不同的语义特征，根据语义特征得到当前目标语义任务对应的子预测结果。Specifically, to perform word segmentation on the current training sub-text data corresponding to the current target semantic task, a custom word segmentation algorithm may be used, and the word segmentation algorithms for different target semantic tasks may be different. To map each word segment to the corresponding vector, a custom mapping algorithm can be used. When the current target semantic task is a different target semantic task, the corresponding encoders can be different or the same, so that different semantic features can be extracted for different target semantic tasks, and the sub-prediction results corresponding to the current target semantic task can be obtained according to the semantic features.

本实施例中，先将当前目标语义任务对应的当前训练子文本数据进行分词映射到对应的向量，组成向量集，再经过编码器对向量集提取语义特征，得到当前目标语义任务对应的子预测结果，分词的多样化和编码器的多样化提高了各个目标语义任务得到对应的子预测结果的便利性，可灵活针对不同的目标语义任务配置不同的分词算法和编码器。In this embodiment, the current training sub-text data corresponding to the current target semantic task is first segmented and mapped to the corresponding vector to form a vector set, and then the encoder extracts semantic features from the vector set to obtain the sub-prediction corresponding to the current target semantic task As a result, the diversification of word segmentation and encoder improves the convenience of obtaining corresponding sub-prediction results for each target semantic task, and different word segmentation algorithms and encoders can be flexibly configured for different target semantic tasks.

在一个实施例中，步骤220中将各个目标语义任务对应的训练子文本数据分步输入待确定网络结构的多任务网络模型，得到各个目标语义任务对应的子预测结果包括：计算当前目标语义任务对应的当前训练子文本数据与数据库中的候选文本对应的相似度，得到与当前训练子文本数据匹配的相似子文本数据；将当前训练子文本数据对应的第一向量集输入第一编码器提取语义特征得到第一语义特征，将相似子文本数据对应的第二向量集输入第二编码器提取语义特征得到第二语义特征；根据第一语义特征和第二语义特征得到当前目标语义任务对应的子预测结果。In one embodiment, in step 220, the training sub-text data corresponding to each target semantic task is input into the multi-task network model of the network structure to be determined step by step, and obtaining the sub-prediction results corresponding to each target semantic task includes: calculating the current target semantic task The similarity corresponding to the corresponding current training sub-text data and the candidate text in the database is obtained, and the similar sub-text data matching the current training sub-text data is obtained; the first vector set corresponding to the current training sub-text data is input into the first encoder to extract The first semantic feature is obtained from the semantic feature, and the second vector set corresponding to the similar sub-text data is input into the second encoder to extract the semantic feature to obtain the second semantic feature; according to the first semantic feature and the second semantic feature, the current target semantic task corresponding sub-prediction results.

具体地，数据库中的候选文本可以是表达相对标准的文本，通过相似度查找得到训练子文本数据对应的相似子文本数据，因为表达相对标准，便于后续提取得到语义特征更有效，通过将两个编码器得到的不同语义特征相结合得到当前目标语义任务对应的子预测结果，提高了子预测结果的准确度。其中第一编码器可以称为premise编码器，第二编码器可以称为hypothesis编码器。Specifically, the candidate texts in the database can be texts with relatively standard expressions. Similar sub-text data corresponding to the training sub-text data can be obtained through similarity search. Because the expressions are relatively standard, it is more effective to facilitate subsequent extraction to obtain semantic features. By combining the two The different semantic features obtained by the encoder are combined to obtain the sub-prediction result corresponding to the current target semantic task, which improves the accuracy of the sub-prediction result. The first encoder may be referred to as a premise encoder, and the second encoder may be referred to as a hypothesis encoder.

当当前目标语义任务为不同的目标语义任务时，第一编码器和第二编码器可共享，通过编码器共享，提高了资源利用率，提高了训练的效率。由于一个输入文本形成了两个输入文本，对应的目标语义任务也可包括基于文本对的语义任务，如问答语句任务、句子相似度计算任务，以一个句子为条件下的另一个句子的概率任务等。When the current target semantic task is a different target semantic task, the first encoder and the second encoder can be shared, and the sharing of the encoders improves resource utilization and improves training efficiency. Since one input text forms two input texts, the corresponding target semantic tasks can also include semantic tasks based on text pairs, such as question-and-answer sentence task, sentence similarity calculation task, and probability task of another sentence under the condition of one sentence Wait.

本实施例中，获取训练子文本数据匹配的相似子文本数据，通过将两个编码器得到的不同语义特征相结合得到当前目标语义任务对应的子预测结果，提高了子预测结果的准确度，也提高了目标语义任务形式的多样化。In this embodiment, similar sub-text data matched by the training sub-text data is obtained, and the sub-prediction result corresponding to the current target semantic task is obtained by combining different semantic features obtained by the two encoders, which improves the accuracy of the sub-prediction result. It also improves the diversity of target semantic task forms.

在一个实施例中，第一编码器和第二编码器的权重共享。In one embodiment, the weights of the first encoder and the second encoder are shared.

具体地，权重共享是指卷积核参数共享，也就是第一编码器的卷积核参数与第二编码器的卷积核参数相同。通过权重共享减少参数数量，通过多任务系统机制，以及premise编码器和hypothesis编码器的权重共享，降低多任务系统部署时的显存占用，降低成本。Specifically, the weight sharing refers to the sharing of convolution kernel parameters, that is, the convolution kernel parameters of the first encoder are the same as the convolution kernel parameters of the second encoder. Through weight sharing, the number of parameters is reduced, and the multi-task system mechanism, as well as the weight sharing of the promise encoder and the hypothesis encoder, reduces the memory usage and cost when the multi-task system is deployed.

在一个实施例中，根据子预测结果与对应的训练子标签数据的差异调整多任务网络模型的网络参数包括：获取各个目标语义任务对应的子预测结果与训练子标签数据，得到与各个目标语义任务对应的子差异；获取各个目标语义任务对应的任务权重，根据任务权重对各个子差异进行加权得到统计子差异；根据统计子差异调整多任务网络模型的网络参数。In one embodiment, adjusting the network parameters of the multi-task network model according to the difference between the sub-prediction results and the corresponding training sub-label data includes: obtaining the sub-prediction results and training sub-label data corresponding to each target semantic task, and obtaining the sub-prediction results and training sub-label data corresponding to each target semantic task; The sub-differences corresponding to the tasks are obtained; the task weights corresponding to each target semantic task are obtained, and the statistical sub-differences are obtained by weighting each sub-difference according to the task weights; the network parameters of the multi-task network model are adjusted according to the statistical sub-differences.

具体地，其中目标语义任务对应的任务权重表示目标语义任务的重要程度，任务权重越大，说明此任务对应的重要度越高。如对于一个文本，其主要的任务在于识别文本的实体，次要的任务在于识别文本的中的实体的用法，则实体识别任务对应的任务权重大于实体的用法识别对应任务权重。通过任务权重对子差异进行加权，使得重要的任务对应的加权系数大，从而根据统计子差异调整多任务网络模型的网络参数时，重要的任务在调整参数时影响度更高。Specifically, the task weight corresponding to the target semantic task represents the importance of the target semantic task, and the larger the task weight, the higher the importance corresponding to the task. For example, for a text, the main task is to recognize the entity of the text, and the secondary task is to recognize the usage of the entity in the text, then the task weight corresponding to the entity recognition task is greater than the corresponding task weight of the entity usage recognition. The sub-differences are weighted by the task weights, so that the weighting coefficient corresponding to the important tasks is large, so that when the network parameters of the multi-task network model are adjusted according to the statistical sub-differences, the important tasks have a higher degree of influence when adjusting the parameters.

本实施例中，通过对各个目标语义任务配置对应的任务权重，可以灵活控制不同的目标语义任务对多任务网络模型的网络参数调整的影响度。In this embodiment, by configuring the corresponding task weights for each target semantic task, the degree of influence of different target semantic tasks on the adjustment of the network parameters of the multi-task network model can be flexibly controlled.

应该理解的是，虽然图2的流程图中的各个步骤按照箭头的指示依次显示，但是这些步骤并不是必然按照箭头指示的顺序依次执行。除非本文中有明确的说明，这些步骤的执行并没有严格的顺序限制，这些步骤可以以其它的顺序执行。而且，图2中的至少一部分步骤可以包括多个子步骤或者多个阶段，这些子步骤或者阶段并不必然是在同一时刻执行完成，而是可以在不同的时刻执行，这些子步骤或者阶段的执行顺序也不必然是依次进行，而是可以与其它步骤或者其它步骤的子步骤或者阶段的至少一部分轮流或者交替地执行。It should be understood that although the various steps in the flowchart of FIG. 2 are shown in sequence according to the arrows, these steps are not necessarily executed in the sequence shown by the arrows. Unless explicitly stated herein, the execution of these steps is not strictly limited to the order, and these steps may be performed in other orders. Moreover, at least a part of the steps in FIG. 2 may include multiple sub-steps or multiple stages. These sub-steps or stages are not necessarily executed and completed at the same time, but may be executed at different times. The execution of these sub-steps or stages The sequence is also not necessarily sequential, but may be performed alternately or alternately with other steps or sub-steps of other steps or at least a portion of a phase.

在一个实施例中，如图3所示，提供了一种用于多任务场景的网络结构构建装置，包括：获取模块310、网络参数调整模块320、网络结构构建模块330，其中：In one embodiment, as shown in FIG. 3, a network structure construction device for a multi-task scenario is provided, including: an acquisition module 310, a network parameter adjustment module 320, and a network structure construction module 330, wherein:

获取模块310，用于获取训练集，训练集包括多个不同目标语义任务对应的训练子样本,训练子样本包括训练子文本数据和训练子标签数据。The obtaining module 310 is configured to obtain a training set, where the training set includes training sub-samples corresponding to a plurality of different target semantic tasks, and the training sub-samples include training sub-text data and training sub-label data.

网络参数调整模块320，用于将各个目标语义任务对应的训练子文本数据分步输入待确定网络结构的多任务网络模型，得到各个目标语义任务对应的子预测结果，根据子预测结果与对应的训练子标签数据的差异调整多任务网络模型的网络参数，直到得到与当前网络结构对应的当前目标网络参数。The network parameter adjustment module 320 is used to input the training sub-text data corresponding to each target semantic task step by step into the multi-task network model of the network structure to be determined, and obtain the sub-prediction results corresponding to each target semantic task. The differences in the training sub-label data adjust the network parameters of the multi-task network model until the current target network parameters corresponding to the current network structure are obtained.

网络结构构建模块330，用于获取多任务网络模型对应的搜索空间，形成可微网络搜索空间，获取验证集，根据验证集通过搜索可微网络搜索空间调整当前目标网络参数对应的多任务网络模型的结构参数,搜索时将多任务网络模型的隐含状态向量分为多个有序的子隐含状态向量，按预设顺序获取当次搜索对应的子隐含状态向量，将子隐含状态向量输入对应的网络层进行训练，得到更新的多任务网络模型，返回网络参数调整模块，直到多任务网络模型在验证集上的输出结果满足收敛条件，得到目标结构参数，获取与目标结构参数匹配的网络参数，根据目标结构参数和匹配的网络参数得到已训练的多任务网络模型。The network structure building module 330 is used to obtain the search space corresponding to the multi-task network model, form a differentiable network search space, obtain a verification set, and adjust the multi-task network model corresponding to the current target network parameters by searching the differentiable network search space according to the verification set When searching, divide the hidden state vector of the multi-task network model into multiple ordered sub-hidden state vectors, obtain the sub-hidden state vectors corresponding to the current search in a preset order, and divide the sub-hidden state vectors into sub-hidden state vectors. The network layer corresponding to the vector input is trained, the updated multi-task network model is obtained, and the network parameter adjustment module is returned until the output result of the multi-task network model on the verification set meets the convergence condition, the target structure parameters are obtained, and the obtained parameters match the target structure parameters. According to the network parameters of the target structure parameters and the matching network parameters, the trained multi-task network model is obtained.

在一个实施例中，网络结构构建模块330还用于搜索可微网络搜索空间通过以下共享方式中的至少一种：可微网络搜索空间中多头注意力的矩阵参数共享；多任务网络模型的池化层的搜索时，基于胶囊网络的多个操作符，共享映射网络的参数；获取多任务网络模型的节点间的连接关系，将具有同一个起始节点的节点组成节点集合，不同节点集合中的节点对应的操作符进行参数共享。In one embodiment, the network structure building module 330 is also used to search the differentiable network search space through at least one of the following sharing methods: matrix parameter sharing of multi-head attention in the differentiable network search space; pooling of multi-task network models When searching the layer, based on the multiple operators of the capsule network, the parameters of the mapping network are shared; the connection relationship between the nodes of the multi-task network model is obtained, and the nodes with the same starting node are formed into a node set. The operator corresponding to the node of , performs parameter sharing.

在一个实施例中，网络参数调整模块320还用于将当前目标语义任务对应的当前训练子文本数据进行分词，将各个分词映射到对应的向量，组成向量集；经过编码器对向量集提取语义特征，根据语义特征得到当前目标语义任务对应的子预测结果，其中当前目标语义任务是各个目标语义任务中的一个。In one embodiment, the network parameter adjustment module 320 is further configured to perform word segmentation on the current training sub-text data corresponding to the current target semantic task, map each word segmentation to a corresponding vector, and form a vector set; extract semantics from the vector set through the encoder feature, and obtain the sub-prediction result corresponding to the current target semantic task according to the semantic feature, wherein the current target semantic task is one of each target semantic task.

在一个实施例中，网络参数调整模块320还用于计算当前目标语义任务对应的当前训练子文本数据与数据库中的候选文本对应的相似度，得到与当前训练子文本数据匹配的相似子文本数据；将当前训练子文本数据对应的第一向量集输入第一编码器提取语义特征得到第一语义特征，将相似子文本数据对应的第二向量集输入第二编码器提取语义特征得到第二语义特征；根据第一语义特征和第二语义特征得到当前目标语义任务对应的子预测结果。In one embodiment, the network parameter adjustment module 320 is further configured to calculate the similarity between the current training sub-text data corresponding to the current target semantic task and the candidate text in the database to obtain similar sub-text data matching the current training sub-text data ; Input the first vector set corresponding to the current training sub-text data into the first encoder to extract semantic features to obtain the first semantic feature, and input the second vector set corresponding to the similar sub-text data into the second encoder to extract the semantic features to obtain the second semantic feature feature; obtain the sub-prediction result corresponding to the current target semantic task according to the first semantic feature and the second semantic feature.

在一个实施例中，网络参数调整模块320还用于获取各个目标语义任务对应的子预测结果与训练子标签数据，得到与各个目标语义任务对应的子差异；获取各个目标语义任务对应的任务权重，根据任务权重对各个子差异进行加权得到统计子差异；根据统计子差异调整多任务网络模型的网络参数。In one embodiment, the network parameter adjustment module 320 is further configured to obtain sub-prediction results and training sub-label data corresponding to each target semantic task, obtain sub-differences corresponding to each target semantic task; obtain the task weight corresponding to each target semantic task , each sub-difference is weighted according to the task weight to obtain the statistical sub-difference; the network parameters of the multi-task network model are adjusted according to the statistical sub-difference.

关于用于多任务场景的网络结构构建装置的具体限定可以参见上文中对于用于多任务场景的网络结构构建方法的限定，在此不再赘述。上述用于多任务场景的网络结构构建装置中的各个模块可全部或部分通过软件、硬件及其组合来实现。上述各模块可以硬件形式内嵌于或独立于计算机设备中的处理器中，也可以以软件形式存储于计算机设备中的存储器中，以便于处理器调用执行以上各个模块对应的操作。For the specific definition of the apparatus for constructing a network structure used in a multi-task scenario, reference may be made to the above definition of the method for constructing a network structure for a multi-task scenario, which will not be repeated here. Each module in the above-mentioned apparatus for constructing a network structure for a multi-task scenario may be implemented in whole or in part by software, hardware, and combinations thereof. The above modules can be embedded in or independent of the processor in the computer device in the form of hardware, or stored in the memory in the computer device in the form of software, so that the processor can call and execute the operations corresponding to the above modules.

在一个实施例中，提供了一种计算机设备，该计算机设备可以是服务器，其内部结构图可以如图4所示。该计算机设备包括通过系统总线连接的处理器、存储器、网络接口和数据库。其中，该计算机设备的处理器用于提供计算和控制能力。该计算机设备的存储器包括非易失性存储介质、内存储器。该非易失性存储介质存储有操作系统、计算机程序和数据库。该内存储器为非易失性存储介质中的操作系统和计算机程序的运行提供环境。该计算机设备的数据库用于存储训练集。该计算机设备的网络接口用于与外部的终端通过网络连接通信。该计算机程序被处理器执行时以实现一种用于多任务场景的网络结构构建方法。In one embodiment, a computer device is provided, and the computer device may be a server, and its internal structure diagram may be as shown in FIG. 4 . The computer device includes a processor, memory, a network interface, and a database connected by a system bus. Among them, the processor of the computer device is used to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium, an internal memory. The nonvolatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the execution of the operating system and computer programs in the non-volatile storage medium. The computer device's database is used to store the training set. The network interface of the computer device is used to communicate with an external terminal through a network connection. When the computer program is executed by the processor, a method for constructing a network structure for a multi-task scenario is implemented.

本领域技术人员可以理解，图4中示出的结构，仅仅是与本申请方案相关的部分结构的框图，并不构成对本申请方案所应用于其上的计算机设备的限定，具体的计算机设备可以包括比图中所示更多或更少的部件，或者组合某些部件，或者具有不同的部件布置。在一些实施例中，计算机设备可以是终端。Those skilled in the art can understand that the structure shown in FIG. 4 is only a block diagram of a partial structure related to the solution of the present application, and does not constitute a limitation on the computer equipment to which the solution of the present application is applied. Include more or fewer components than shown in the figures, or combine certain components, or have a different arrangement of components. In some embodiments, the computer device may be a terminal.

在一个实施例中，提供了一种计算机设备，包括存储器和处理器，该存储器存储有计算机程序，该处理器执行计算机程序时实现以下步骤：获取训练集，训练集包括多个不同目标语义任务对应的训练子样本,训练子样本包括训练子文本数据和训练子标签数据；将各个目标语义任务对应的训练子文本数据分步输入待确定网络结构的多任务网络模型，得到各个目标语义任务对应的子预测结果，根据子预测结果与对应的训练子标签数据的差异调整所述多任务网络模型的网络参数，直到得到与当前网络结构对应的当前目标网络参数；获取多任务网络模型对应的搜索空间，形成可微网络搜索空间，获取验证集，根据所述验证集通过搜索可微网络搜索空间调整所述当前目标网络参数对应的多任务网络模型的结构参数,搜索时将多任务网络模型的隐含状态向量分为多个有序的子隐含状态向量，按预设顺序获取当次搜索对应的子隐含状态向量，将子隐含状态向量输入对应的网络层进行训练，得到更新的多任务网络模型，返回将各个目标语义任务对应的训练子文本数据分步输入待确定网络结构的多任务网络模型的步骤，直到多任务网络模型在验证集上的输出结果满足收敛条件，得到目标结构参数，获取与目标结构参数匹配的网络参数，根据目标结构参数和匹配的网络参数得到已训练的多任务网络模型。In one embodiment, a computer device is provided, including a memory and a processor, where the memory stores a computer program, and when the processor executes the computer program, the processor implements the following steps: acquiring a training set, where the training set includes a plurality of different target semantic tasks The corresponding training sub-samples include training sub-text data and training sub-label data; input the training sub-text data corresponding to each target semantic task step by step into the multi-task network model of the network structure to be determined, and obtain the corresponding target semantic tasks. The sub-prediction results of the sub-prediction results, adjust the network parameters of the multi-task network model according to the difference between the sub-prediction results and the corresponding training sub-label data, until the current target network parameters corresponding to the current network structure are obtained; Obtain the search corresponding to the multi-task network model space, form a differentiable network search space, obtain a verification set, and adjust the structural parameters of the multi-task network model corresponding to the current target network parameters by searching the differentiable network search space according to the verification set. The hidden state vector is divided into multiple ordered sub-hidden state vectors, the sub-hidden state vectors corresponding to the current search are obtained in a preset order, and the sub-hidden state vectors are input into the corresponding network layers for training, and the updated sub-hidden state vectors are obtained. The multi-task network model returns the step of inputting the training sub-text data corresponding to each target semantic task into the multi-task network model of the network structure to be determined step by step, until the output results of the multi-task network model on the verification set meet the convergence conditions, and the target is obtained. Structural parameters, obtain the network parameters matching the target structural parameters, and obtain the trained multi-task network model according to the target structural parameters and the matched network parameters.

在一个实施例中，将各个目标语义任务对应的训练子文本数据分步输入待确定网络结构的多任务网络模型，得到各个目标语义任务对应的子预测结果包括：将当前目标语义任务对应的当前训练子文本数据进行分词，将各个分词映射到对应的向量，组成向量集；经过编码器对向量集提取语义特征，根据语义特征得到当前目标语义任务对应的子预测结果，其中当前目标语义任务是各个目标语义任务中的一个。In one embodiment, the training sub-text data corresponding to each target semantic task is input into the multi-task network model of the network structure to be determined step by step, and obtaining the sub-prediction results corresponding to each target semantic task includes: The sub-text data is trained for word segmentation, and each word segmentation is mapped to the corresponding vector to form a vector set; the semantic feature is extracted from the vector set through the encoder, and the sub-prediction result corresponding to the current target semantic task is obtained according to the semantic feature. The current target semantic task is One of each target semantic task.

在一个实施例中，将各个目标语义任务对应的训练子文本数据分步输入待确定网络结构的多任务网络模型，得到各个目标语义任务对应的子预测结果包括：计算当前目标语义任务对应的当前训练子文本数据与数据库中的候选文本对应的相似度，得到与所述当前训练子文本数据匹配的相似子文本数据；将当前训练子文本数据对应的第一向量集输入第一编码器提取语义特征得到第一语义特征，将相似子文本数据对应的第二向量集输入第二编码器提取语义特征得到第二语义特征；根据第一语义特征和第二语义特征得到当前目标语义任务对应的子预测结果。In one embodiment, the training sub-text data corresponding to each target semantic task is input into the multi-task network model of the network structure to be determined step by step, and obtaining the sub-prediction results corresponding to each target semantic task includes: calculating the current target semantic task corresponding to the current target semantic task. The similarity corresponding to the candidate text in the training sub-text data and the database is obtained, and the similar sub-text data matching the current training sub-text data is obtained; the first vector set corresponding to the current training sub-text data is input into the first encoder to extract semantics The first semantic feature is obtained from the feature, and the second vector set corresponding to the similar sub-text data is input into the second encoder to extract the semantic feature to obtain the second semantic feature; according to the first semantic feature and the second semantic feature, the sub-set corresponding to the current target semantic task is obtained. forecast result.

在一个实施例中，根据子预测结果与对应的训练子标签数据的差异调整所述多任务网络模型的网络参数包括：获取各个目标语义任务对应的子预测结果与训练子标签数据，得到与各个目标语义任务对应的子差异；获取各个目标语义任务对应的任务权重，根据任务权重对各个子差异进行加权得到统计子差异；根据统计子差异调整多任务网络模型的网络参数。In one embodiment, adjusting the network parameters of the multi-task network model according to the difference between the sub-prediction results and the corresponding training sub-label data includes: obtaining the sub-prediction results and training sub-label data corresponding to each target semantic task, and obtaining the corresponding sub-prediction results and training sub-label data for each target semantic task. The sub-differences corresponding to the target semantic tasks are obtained; the task weights corresponding to each target semantic task are obtained, and the statistical sub-differences are obtained by weighting each sub-difference according to the task weights; the network parameters of the multi-task network model are adjusted according to the statistical sub-differences.

在一个实施例中，提供了一种计算机可读存储介质，其上存储有计算机程序，计算机程序被处理器执行时实现以下步骤：获取训练集，训练集包括多个不同目标语义任务对应的训练子样本,训练子样本包括训练子文本数据和训练子标签数据；将各个目标语义任务对应的训练子文本数据分步输入待确定网络结构的多任务网络模型，得到各个目标语义任务对应的子预测结果，根据子预测结果与对应的训练子标签数据的差异调整所述多任务网络模型的网络参数，直到得到与当前网络结构对应的当前目标网络参数；获取多任务网络模型对应的搜索空间，形成可微网络搜索空间，获取验证集，根据所述验证集通过搜索可微网络搜索空间调整所述当前目标网络参数对应的多任务网络模型的结构参数,搜索时将多任务网络模型的隐含状态向量分为多个有序的子隐含状态向量，按预设顺序获取当次搜索对应的子隐含状态向量，将子隐含状态向量输入对应的网络层进行训练，得到更新的多任务网络模型，返回将各个目标语义任务对应的训练子文本数据分步输入待确定网络结构的多任务网络模型的步骤，直到多任务网络模型在验证集上的输出结果满足收敛条件，得到目标结构参数，获取与目标结构参数匹配的网络参数，根据目标结构参数和匹配的网络参数得到已训练的多任务网络模型。In one embodiment, a computer-readable storage medium is provided, on which a computer program is stored, and when the computer program is executed by a processor, the following steps are implemented: acquiring a training set, where the training set includes training corresponding to a plurality of different target semantic tasks Sub-sample, training sub-sample includes training sub-text data and training sub-label data; input the training sub-text data corresponding to each target semantic task step by step into the multi-task network model of the network structure to be determined, and obtain the sub-prediction corresponding to each target semantic task As a result, the network parameters of the multi-task network model are adjusted according to the difference between the sub-prediction results and the corresponding training sub-label data, until the current target network parameters corresponding to the current network structure are obtained; the search space corresponding to the multi-task network model is obtained to form A differentiable network search space is obtained, a verification set is obtained, the structural parameters of the multi-task network model corresponding to the current target network parameters are adjusted by searching the differentiable network search space according to the verification set, and the hidden state of the multi-task network model is searched. The vector is divided into multiple ordered sub-hidden state vectors, the sub-hidden state vectors corresponding to the current search are obtained in a preset order, and the sub-hidden state vectors are input into the corresponding network layers for training to obtain an updated multi-task network. model, returning the step of inputting the training subtext data corresponding to each target semantic task into the multi-task network model of the network structure to be determined step by step, until the output result of the multi-task network model on the verification set satisfies the convergence condition, and the target structure parameters are obtained, Obtain the network parameters matching the target structure parameters, and obtain the trained multi-task network model according to the target structure parameters and the matched network parameters.

本申请可应用于智慧政务、智慧安防中，从而推动智慧城市的建设。The present application can be applied to smart government affairs and smart security, thereby promoting the construction of smart cities.

本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程，是可以通过计算机程序来指令相关的硬件来完成，所述的计算机程序可存储于一非易失性计算机可读取存储介质中，该计算机程序在执行时，可包括如上述各方法的实施例的流程。其中，本申请所提供的各实施例中所使用的对存储器、存储、数据库或其它介质的任何引用，均可包括非易失性和/或易失性存储器。非易失性存储器可包括只读存储器(ROM)、可编程ROM(PROM)、电可编程ROM(EPROM)、电可擦除可编程ROM(EEPROM)或闪存。易失性存储器可包括随机存取存储器(RAM)或者外部高速缓冲存储器。作为说明而非局限，RAM以多种形式可得，诸如静态RAM(SRAM)、动态RAM(DRAM)、同步DRAM(SDRAM)、双数据率SDRAM(DDRSDRAM)、增强型SDRAM(ESDRAM)、同步链路(Synchlink)DRAM(SLDRAM)、存储器总线(Rambus)直接RAM(RDRAM)、直接存储器总线动态RAM(DRDRAM)、以及存储器总线动态RAM(RDRAM)等。Those of ordinary skill in the art can understand that all or part of the processes in the methods of the above embodiments can be implemented by instructing relevant hardware through a computer program, and the computer program can be stored in a non-volatile computer-readable storage In the medium, when the computer program is executed, it may include the processes of the above-mentioned method embodiments. Wherein, any reference to memory, storage, database or other medium used in the various embodiments provided in this application may include non-volatile and/or volatile memory. Nonvolatile memory may include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory. Volatile memory may include random access memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in various forms such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous chain Road (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.

以上实施例的各技术特征可以进行任意的组合，为使描述简洁，未对上述实施例中的各个技术特征所有可能的组合都进行描述，然而，只要这些技术特征的组合不存在矛盾，都应当认为是本说明书记载的范围。The technical features of the above embodiments can be combined arbitrarily. For the sake of brevity, all possible combinations of the technical features in the above embodiments are not described. However, as long as there is no contradiction in the combination of these technical features, all It is considered to be the range described in this specification.

以上所述实施例仅表达了本申请的几种实施方式，其描述较为具体和详细，但并不能因此而理解为对发明专利范围的限制。应当指出的是，对于本领域的普通技术人员来说，在不脱离本申请构思的前提下，还可以做出若干变形和改进，这些都属于本申请的保护范围。因此，本申请专利的保护范围应以所附权利要求为准。The above-mentioned embodiments only represent several embodiments of the present application, and the descriptions thereof are specific and detailed, but should not be construed as a limitation on the scope of the invention patent. It should be noted that, for those skilled in the art, without departing from the concept of the present application, several modifications and improvements can be made, which all belong to the protection scope of the present application. Therefore, the scope of protection of the patent of the present application shall be subject to the appended claims.

Claims

1. A method of network fabric construction for a multitasking scenario, the method comprising:

acquiring a training set, wherein the training set comprises training subsamples corresponding to a plurality of different target semantic tasks, and the training subsamples comprise training subsample data and training subsample label data;

inputting training sub-text data corresponding to each target semantic task into a multi-task network model of a network structure to be determined step by step to obtain a sub-prediction result corresponding to each target semantic task, and adjusting network parameters of the multi-task network model according to the difference between the sub-prediction result and corresponding training sub-label data until current target network parameters corresponding to the current network structure are obtained;

obtaining a search space corresponding to the multitask network model, forming a micro-network search space, obtaining a verification set, adjusting the structural parameters of the multitask network model corresponding to the current target network parameters by searching the micro-network search space according to the verification set, dividing the hidden state vector of the multitask network model into a plurality of ordered sub-hidden state vectors during searching, obtaining the sub-hidden state vector corresponding to the current searching according to a preset sequence, inputting the sub-hidden state vectors into the corresponding network layer for training to obtain an updated multitask network model, returning to the step of inputting the training sub-text data corresponding to each target semantic task step by step into the multitask network model with the network structure to be determined until the output result of the multitask network model on the verification set meets the convergence condition, obtaining the target structural parameters, and obtaining the network parameters matched with the target structural parameters, and obtaining a trained multi-task network model according to the target structure parameters and the matched network parameters.

2. The method of claim 1, wherein the searching the micro-searchable space is shared by at least one of:

sharing matrix parameters of multi-head attention in the micro-network search space;

sharing parameters of a mapping network based on a plurality of operational characters of the capsule network during searching of the pooling layer of the multitask network model;

and acquiring the connection relation among the nodes of the multitask network model, forming a node set by the nodes with the same initial node, and sharing parameters of operators corresponding to the nodes in different node sets.

3. The method according to claim 1, wherein the step-by-step inputting of the training sub-text data corresponding to each target semantic task into the multi-task network model of the network structure to be determined to obtain the sub-prediction result corresponding to each target semantic task comprises:

performing word segmentation on current training sub-text data corresponding to a current target semantic task, and mapping each word segmentation to a corresponding vector to form a vector set;

semantic features are extracted from the vector set through an encoder, and a sub-prediction result corresponding to the current target semantic task is obtained according to the semantic features, wherein the current target semantic task is one of the target semantic tasks.

4. The method according to claim 1, wherein the step-by-step inputting of the training sub-text data corresponding to each target semantic task into the multi-task network model of the network structure to be determined to obtain the sub-prediction result corresponding to each target semantic task comprises:

calculating the similarity between the current training subfile data corresponding to the current target semantic task and the candidate text in the database to obtain similar subfile data matched with the current training subfile data;

inputting a first vector set corresponding to the current training sub-text data into a first encoder to extract semantic features to obtain first semantic features, and inputting a second vector set corresponding to the similar sub-text data into a second encoder to extract semantic features to obtain second semantic features;

and obtaining a sub-prediction result corresponding to the current target semantic task according to the first semantic feature and the second semantic feature.

5. The method of claim 4, wherein the weights of the first encoder and the second encoder are shared.

6. The method of claim 1, wherein adjusting the network parameters of the multitask network model according to the difference between the sub-prediction results and the corresponding training sub-label data comprises:

acquiring sub-prediction results and training sub-label data corresponding to each target semantic task to obtain sub-differences corresponding to each target semantic task;

acquiring task weights corresponding to the target semantic tasks, and weighting the sub-differences according to the task weights to obtain statistical sub-differences;

and adjusting the network parameters of the multitask network model according to the statistical sub-difference.

7. A network structure construction apparatus for a multitasking scenario, characterized in that the apparatus comprises:

the acquisition module is used for acquiring a training set, wherein the training set comprises training subsamples corresponding to a plurality of different target semantic tasks, and the training subsamples comprise training subsample data and training subsample label data;

the network parameter adjusting module is used for inputting the training sub-text data corresponding to each target semantic task into the multi-task network model of the network structure to be determined step by step to obtain a sub-prediction result corresponding to each target semantic task, and adjusting the network parameters of the multi-task network model according to the difference between the sub-prediction result and the corresponding training sub-label data until the current target network parameters corresponding to the current network structure are obtained;

a network structure constructing module for acquiring the search space corresponding to the multitask network model, forming a micro-network search space, acquiring a verification set, adjusting the structural parameters of the multitask network model corresponding to the current target network parameters by searching a micro-network searching space according to the verification set, dividing the hidden state vector of the multitask network model into a plurality of ordered sub-hidden state vectors during searching, acquiring the sub-hidden state vector corresponding to the current searching according to a preset sequence, inputting the sub-hidden state vector into a corresponding network layer for training to obtain an updated multitask network model, returning to a network parameter adjusting module until the output result of the multitask network model on the verification set meets a convergence condition to obtain the target structural parameters, and acquiring the network parameters matched with the target structural parameters, and obtaining a trained multi-task network model according to the target structure parameters and the matched network parameters.

8. The apparatus of claim 7, wherein the network structure building module is further configured to search the micro-network search space by at least one of:

9. A computer device comprising a memory and a processor, the memory storing a computer program, wherein the processor implements the steps of the method of any one of claims 1 to 6 when executing the computer program.

10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 6.