CN114722827B

CN114722827B - Model training method, device and equipment for task processing model and storage medium

Info

Publication number: CN114722827B
Application number: CN202210373086.7A
Authority: CN
Inventors: 谢亚雄; 温珂伟
Original assignee: Shanghai Mingsheng Pinzhi Artificial Intelligence Technology Co ltd
Current assignee: Shanghai Mingsheng Pinzhi Artificial Intelligence Technology Co ltd
Priority date: 2022-04-11
Filing date: 2022-04-11
Publication date: 2024-08-02
Anticipated expiration: 2042-04-11
Also published as: CN114722827A; CN119106135A

Abstract

The present application provides a model training method, device, equipment and storage medium for a task processing model, the method comprising: extracting multiple categories of shared feature information in a training corpus through a shared feature extraction model; inputting the shared feature information and training text information based on the annotations of the training corpus into multiple subtask models according to a preset input method, training the multiple subtask models in parallel, and adjusting the weight coefficient of the subtask model according to the gradient change of the task training loss of each subtask model, so that the training rates of the multiple subtask models are within the same numerical range, until the overall loss function of the multiple subtask models meets the training cutoff condition. In this way, the present application can provide different subtask models with a variety of shared feature information related to the subtasks they perform while ensuring that each subtask model can be trained independently, thereby improving the overall model training effect of the task processing model.

Description

Model training method, device, equipment and storage medium for task processing model

技术领域Technical Field

本申请涉及自然语言处理技术领域，具体而言，涉及一种任务处理模型的模型训练方法、装置、设备及存储介质。The present application relates to the technical field of natural language processing, and in particular to a model training method, apparatus, device and storage medium for a task processing model.

背景技术Background technique

在自然语言处理领域内，在使用文本信息作为模型训练数据时，可以训练模型针对文本信息进行多种不同类型的模型训练任务，如识别特定业务场景下命名实体的文本识别任务、识别不同语句表达的句子情感的情感分类任务等。具体的，以上述的文本识别任务以及情感分类任务为例，虽然这两类任务的模型训练难度以及模型训练目的都不相同，但是，考虑到命名实体的文本识别(相当于分词的语义识别)属于句子情感识别的识别基础，因此，在相关技术中，对于这类模型训练任务之间存在相关性的模型通常采用整体训练的方式来执行模型的训练步骤。In the field of natural language processing, when using text information as model training data, the model can be trained to perform various types of model training tasks for text information, such as text recognition tasks for identifying named entities in specific business scenarios, sentiment classification tasks for identifying sentence sentiments expressed by different sentences, etc. Specifically, taking the above-mentioned text recognition tasks and sentiment classification tasks as examples, although the model training difficulty and model training purpose of these two types of tasks are different, considering that the text recognition of named entities (equivalent to the semantic recognition of word segmentation) belongs to the recognition basis of sentence sentiment recognition, therefore, in the relevant technology, for models with correlations between such model training tasks, the model training steps are usually performed in an overall training manner.

目前，在相关技术中，常使用“流水线式的学习模式”对上述类型的任务模型进行整体训练。这里，仍以文本识别任务以及情感分类任务为例，在采用流水线式的学习模式对任务模型进行整体训练时，首先训练一级子任务模型进行命名实体的识别任务，然后，再借助于一级子任务模型对于命名实体的识别结果，进一步训练二级子任务模型学习对文本信息中不同命名实体的语义情感、情感表达主题等进行识别与分类。这样，基于流水线式的学习模式，往往会使得任务模型在整体训练过程中容易产生识别错误的级联累积问题，导致每一级子任务模型产生的识别错误都会顺延到下一层级，进而造成流水线式的学习模式下的整体模型训练效果不理想。At present, in the relevant technology, the "assembly-line learning mode" is often used to conduct overall training for the above-mentioned types of task models. Here, still taking the text recognition task and the sentiment classification task as an example, when the task model is trained as a whole using the assembly-line learning mode, the first-level sub-task model is first trained to perform the named entity recognition task, and then, with the help of the first-level sub-task model's recognition results for the named entities, the second-level sub-task model is further trained to learn to recognize and classify the semantic emotions, sentiment expression themes, etc. of different named entities in the text information. In this way, based on the assembly-line learning mode, the task model is often prone to cascading accumulation of recognition errors during the overall training process, resulting in the recognition errors generated by each level of the sub-task model being postponed to the next level, which in turn causes the overall model training effect under the assembly-line learning mode to be unsatisfactory.

发明内容Summary of the invention

有鉴于此，本申请的目的在于提供一种任务处理模型的模型训练方法、装置、设备及存储介质，以通过构建的多任务学习模型框架，在保障每个子任务模型能够进行独立训练的同时，为不同的子任务模型提供与其执行的子任务相关的多种共享特征信息，有利于提高任务处理模型的整体模型训练效果。In view of this, the purpose of the present application is to provide a model training method, device, equipment and storage medium for a task processing model, so as to construct a multi-task learning model framework, while ensuring that each subtask model can be trained independently, and provide different subtask models with a variety of shared feature information related to the subtasks they perform, which is conducive to improving the overall model training effect of the task processing model.

第一方面，本申请实施例提供了一种任务处理模型的模型训练方法，应用于多任务学习模型框架，所述多任务学习模型框架包括任务处理模型和预先训练好的共享特征提取模型，所述任务处理模型包括多个子任务模型；所述模型训练方法包括：In a first aspect, an embodiment of the present application provides a model training method for a task processing model, which is applied to a multi-task learning model framework, wherein the multi-task learning model framework includes a task processing model and a pre-trained shared feature extraction model, wherein the task processing model includes multiple subtask models; the model training method includes:

获取训练语料，并将所述训练语料输入至所述共享特征提取模型中，通过所述共享特征提取模型提取所述训练语料中多种类别的共享特征信息；Acquire training corpus, and input the training corpus into the shared feature extraction model, and extract multiple categories of shared feature information in the training corpus through the shared feature extraction model;

按照预设的输入方式，将所述多种类别的共享特征信息和基于所述训练语料标注的训练文本信息输入至所述多个子任务模型中，并行对所述多个子任务模型进行训练，以使所述多个子任务模型的整体损失函数满足训练截止条件；According to a preset input method, the multiple categories of shared feature information and the training text information annotated based on the training corpus are input into the multiple subtask models, and the multiple subtask models are trained in parallel so that the overall loss function of the multiple subtask models meets the training cutoff condition;

在所述多个子任务模型独立训练的过程中，获取每一所述子任务模型的任务训练损失，根据每一所述子任务模型的任务训练损失的梯度变化情况，对该子任务模型的权重系数进行调整，以使所述多个子任务模型的训练率位于同一数值范围区间内，直至所述多个子任务模型的整体损失函数满足所述训练截止条件，将训练好的多个子任务模型作为训练好的任务处理模型。During the independent training of the multiple subtask models, the task training loss of each subtask model is obtained, and the weight coefficient of the subtask model is adjusted according to the gradient change of the task training loss of each subtask model, so that the training rates of the multiple subtask models are within the same numerical range, until the overall loss function of the multiple subtask models meets the training cutoff condition, and the trained multiple subtask models are used as trained task processing models.

在一种可选的实施方式中，所述多种类别的共享特征信息包括：训练语料被切分成字序列后的字特征向量；训练语料中表征词语与词语之间的句法依赖关系的词语特征向量；训练语料中的句特征向量。In an optional embodiment, the multiple categories of shared feature information include: character feature vectors after the training corpus is segmented into character sequences; word feature vectors representing syntactic dependencies between words in the training corpus; and sentence feature vectors in the training corpus.

在一种可选的实施方式中，通过如下方法确定所述共享特征提取模型待提取的共享特征信息的多种类别：In an optional implementation, multiple categories of shared feature information to be extracted by the shared feature extraction model are determined by the following method:

根据所述多个子任务模型待执行的多个子任务之间的目标任务依赖关系，从预设的任务依赖关系表中确定所述目标任务依赖关系对应的多种信息类别作为所述待提取的共享特征信息的多种类别；其中，所述任务依赖关系表预先存储有多种任务依赖关系对应的多种信息类别。According to the target task dependency between the multiple subtasks to be executed by the multiple subtask models, multiple information categories corresponding to the target task dependency are determined from a preset task dependency table as the multiple categories of shared feature information to be extracted; wherein the task dependency table pre-stores multiple information categories corresponding to the multiple task dependencies.

根据所述任务处理模型待执行的目标任务，以所述任务处理模型包括的多个子任务模型作为第一搜索空间，以能够执行所述目标任务作为第一搜索策略，对所述第一搜索空间内不同子任务模型之间的子任务模型组合方式进行神经网络结构搜索，得到符合所述第一搜索策略的最优子任务模型组合方式；According to the target task to be executed by the task processing model, taking the multiple subtask models included in the task processing model as the first search space, taking the ability to execute the target task as the first search strategy, performing a neural network structure search on the subtask model combination mode between different subtask models in the first search space, and obtaining the optimal subtask model combination mode that meets the first search strategy;

将所述最优子任务模型组合方式下包括的每个子任务模型作为第一子任务模型；Taking each subtask model included in the optimal subtask model combination as the first subtask model;

根据每一所述第一子任务模型待执行的子任务之间的第一任务依赖关系，从预设的任务依赖关系表中确定所述第一任务依赖关系对应的多种信息类别作为所述待提取的共享特征信息的多种类别。According to the first task dependency relationship between the subtasks to be executed in each of the first subtask models, multiple information categories corresponding to the first task dependency relationship are determined from a preset task dependency relationship table as the multiple categories of the shared feature information to be extracted.

根据所述任务处理模型待执行的目标任务，获取与完成所述目标任务相关的多种文本特征信息；According to the target task to be performed by the task processing model, obtaining a variety of text feature information related to completing the target task;

以所述多种文本特征信息作为第二搜索空间，以所述多个子任务模型能够基于不同文本特征信息的信息组合完成所述目标任务作为第二搜索策略，对所述第二搜索空间内不同文本特征信息之间的信息组合方式进行神经网络结构搜索，得到符合所述第二搜索策略的最优信息组合方式；Taking the multiple types of text feature information as the second search space, taking the ability of the multiple subtask models to complete the target task based on the information combination of different text feature information as the second search strategy, performing a neural network structure search on the information combination mode between different text feature information in the second search space, and obtaining the optimal information combination mode that meets the second search strategy;

将所述最优信息组合方式下包括的每种文本特征信息所属的信息类别作为所述待提取的共享特征信息的多种类别。The information category to which each type of text feature information included in the optimal information combination mode belongs is used as the multiple categories of the shared feature information to be extracted.

在一种可选的实施方式中，所述按照预设的输入方式，将所述多种类别的共享特征信息和基于所述训练语料标注的训练文本信息输入至所述多个子任务模型中，包括：In an optional implementation, the inputting of the multiple categories of shared feature information and the training text information annotated based on the training corpus into the multiple subtask models according to a preset input method includes:

在每一所述子任务模型的首层模型输入节点处，将所述训练文本信息输入至每一所述子任务模型中；At the first-layer model input node of each of the subtask models, input the training text information into each of the subtask models;

将所述多种类别的共享特征信息按照信息类别与训练节点之间的对应关系，以分层输入的第一输入方式，分层级输入至每一所述子任务模型中的不同训练节点处；其中，每一所述子任务模型中的不同训练节点是按照子任务模型中神经网络由浅到深的层级进行排序的。The multiple categories of shared feature information are hierarchically input to different training nodes in each of the subtask models according to the correspondence between the information categories and the training nodes in a first input method of hierarchical input; wherein the different training nodes in each of the subtask models are sorted from shallow to deep levels of the neural network in the subtask model.

在一种可选的实施方式中，所述按照预设的输入方式，将所述多种类别的共享特征信息和基于所述训练语料标注的训练文本信息输入至所述多个子任务模型中，还包括：In an optional implementation, the step of inputting the multiple categories of shared feature information and the training text information annotated based on the training corpus into the multiple subtask models in a preset input manner further includes:

在每一所述子任务模型的首层模型输入节点处，以首层输入的第二输入方式，将所述多种类别的共享特征信息和所述训练文本信息同步输入至每一所述子任务模型中。At the first-layer model input node of each of the subtask models, the multiple categories of shared feature information and the training text information are synchronously input into each of the subtask models in the second input mode of the first-layer input.

在一种可选的实施方式中，所述以首层输入的第二输入方式，将所述多种类别的共享特征信息和所述训练文本信息同步输入至每一所述子任务模型中，包括：In an optional implementation, the second input method of the first-level input synchronously inputs the multiple categories of shared feature information and the training text information into each of the subtask models, including:

在对不同子任务模型待执行的子任务所属的任务类型不进行区分时，以所述第二输入方式，将所述多种类别的共享特征信息和所述训练文本信息同步输入至每一所述子任务模型中；When the task types of the subtasks to be executed by different subtask models are not differentiated, the multiple categories of shared feature information and the training text information are synchronously input into each of the subtask models in the second input mode;

或者，or,

在对不同子任务模型待执行的子任务所属的任务类型进行区分时，针对每一所述子任务模型，根据该子任务模型待执行的子任务，确定所述多种类别的共享特征信息中与该子任务模型待执行的子任务相匹配的目标共享特征信息；When distinguishing the task types to which the subtasks to be executed by different subtask models belong, for each of the subtask models, according to the subtask to be executed by the subtask model, determining the target shared feature information that matches the subtask to be executed by the subtask model among the multiple categories of shared feature information;

以所述第二输入方式，将所述多种类别的共享特征信息、所述训练文本信息以及所述目标共享特征信息同步输入至该子任务模型中。In the second input mode, the multiple categories of shared feature information, the training text information and the target shared feature information are synchronously input into the subtask model.

在一种可选的实施方式中，所述多个子任务模型的整体损失函数是根据每一所述子任务模型的任务训练损失的梯度以及所述多任务学习模型框架中该子任务模型的权重系数的乘积确定的；所述根据每一所述子任务模型的任务训练损失的梯度变化情况，对该子任务模型的权重系数进行调整，包括：In an optional implementation, the overall loss function of the multiple subtask models is determined based on the product of the gradient of the task training loss of each subtask model and the weight coefficient of the subtask model in the multi-task learning model framework; the weight coefficient of the subtask model is adjusted according to the gradient change of the task training loss of each subtask model, including:

针对每一所述子任务模型，以该子任务模型的任务训练损失的梯度作为目标梯度，在梯度检测周期内，获取所述目标梯度在所述检测周期内的周期变化幅度；For each of the subtask models, taking the gradient of the task training loss of the subtask model as the target gradient, within a gradient detection period, obtaining the periodic variation amplitude of the target gradient within the detection period;

当检测到所述目标梯度的周期变化幅度大于或者等于参考梯度变化量时，则按照梯度降低调节系数，对该子任务模型的权重系数进行下降式的动态调整；When it is detected that the periodic variation amplitude of the target gradient is greater than or equal to the reference gradient variation, the weight coefficient of the subtask model is dynamically adjusted in a descending manner according to the gradient reduction adjustment coefficient;

当检测到所述目标梯度的周期变化幅度小于所述参考梯度变化量时，则按照梯度升高调节系数，对该子任务模型的权重系数进行升高式的动态调整。When it is detected that the periodic variation amplitude of the target gradient is smaller than the reference gradient variation, the weight coefficient of the subtask model is dynamically adjusted in an increasing manner according to the gradient increasing adjustment coefficient.

在一种可选的实施方式中，所述任务处理模型中的每个子任务模型用于执行相应的子任务，且不同子任务模型互相配合能够处理所述任务处理模型待执行的目标任务；当所述目标任务与识别文本信息表达的语义情感有关时，所述任务处理模型中至少包括一个命名实体识别模型以及一个情感分类模型；其中，所述命名实体识别模型用于执行针对所述训练文本信息中包括的命名实体的文本识别任务；所述情感分类模型用于执行针对所述训练文本信息中每一语句表征的句子情感的情感分类任务。In an optional embodiment, each subtask model in the task processing model is used to execute a corresponding subtask, and different subtask models cooperate with each other to process the target task to be executed by the task processing model; when the target task is related to identifying the semantic emotions expressed by text information, the task processing model includes at least one named entity recognition model and one sentiment classification model; wherein the named entity recognition model is used to execute the text recognition task for the named entities included in the training text information; and the sentiment classification model is used to execute the sentiment classification task for the sentence emotions represented by each sentence in the training text information.

第二方面，本申请实施例提供了一种任务处理模型的模型训练装置，应用于多任务学习模型框架，所述多任务学习模型框架包括任务处理模型和预先训练好的共享特征提取模型，所述任务处理模型包括多个子任务模型；所述模型训练装置包括：In a second aspect, an embodiment of the present application provides a model training device for a task processing model, which is applied to a multi-task learning model framework, wherein the multi-task learning model framework includes a task processing model and a pre-trained shared feature extraction model, wherein the task processing model includes multiple subtask models; the model training device includes:

提取模块，用于获取训练语料，并将所述训练语料输入至所述共享特征提取模型中，通过所述共享特征提取模型提取所述训练语料中多种类别的共享特征信息；An extraction module, used for acquiring training corpus, and inputting the training corpus into the shared feature extraction model, and extracting multiple categories of shared feature information in the training corpus through the shared feature extraction model;

输入模块，用于按照预设的输入方式，将所述多种类别的共享特征信息和基于所述训练语料标注的训练文本信息输入至所述多个子任务模型中，并行对所述多个子任务模型进行训练，以使所述多个子任务模型的整体损失函数满足训练截止条件；An input module, used to input the multiple categories of shared feature information and the training text information annotated based on the training corpus into the multiple subtask models according to a preset input method, and train the multiple subtask models in parallel so that the overall loss function of the multiple subtask models meets the training cutoff condition;

训练模块，用于在所述多个子任务模型独立训练的过程中，获取每一所述子任务模型的任务训练损失，根据每一所述子任务模型的任务训练损失的梯度变化情况，对该子任务模型的权重系数进行调整，以使所述多个子任务模型的训练率位于同一数值范围区间内，直至所述多个子任务模型的整体损失函数满足所述训练截止条件，将训练好的多个子任务模型作为训练好的任务处理模型。A training module is used to obtain the task training loss of each subtask model during the independent training of the multiple subtask models, and adjust the weight coefficient of the subtask model according to the gradient change of the task training loss of each subtask model, so that the training rates of the multiple subtask models are within the same numerical range, until the overall loss function of the multiple subtask models meets the training cutoff condition, and the trained multiple subtask models are used as trained task processing models.

第三方面，本申请实施例提供了一种计算机设备，包括存储器、处理器及存储在所述存储器上并可在所述处理器上运行的计算机程序，所述处理器执行所述计算机程序时实现上述的模型训练方法的步骤。In a third aspect, an embodiment of the present application provides a computer device, comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor implements the steps of the above-mentioned model training method when executing the computer program.

第四方面，本申请实施例提供了一种计算机可读存储介质，所述计算机可读存储介质上存储有计算机程序，所述计算机程序被处理器运行时执行上述的模型训练方法的步骤。In a fourth aspect, an embodiment of the present application provides a computer-readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, the steps of the above-mentioned model training method are executed.

本申请的实施例提供的技术方案可以包括以下有益效果：The technical solution provided by the embodiments of the present application may have the following beneficial effects:

本申请实施例提供的一种任务处理模型的模型训练方法、装置、设备及存储介质，先获取训练语料，并将训练语料输入至共享特征提取模型中，通过共享特征提取模型提取训练语料中多种类别的共享特征信息；再按照预设的输入方式，将多种类别的共享特征信息和基于训练语料标注的训练文本信息输入至多个子任务模型中，并行对多个子任务模型进行训练，以使多个子任务模型的整体损失函数满足训练截止条件；在多个子任务模型独立训练的过程中，获取每一子任务模型的任务训练损失，根据每一子任务模型的任务训练损失的梯度变化情况，对该子任务模型的权重系数进行调整，以使多个子任务模型的训练率位于同一数值范围区间内，直至多个子任务模型的整体损失函数满足所述训练截止条件，将训练好的多个子任务模型作为训练好的任务处理模型。The embodiment of the present application provides a model training method, device, equipment and storage medium for a task processing model, which first obtains training corpus and inputs the training corpus into a shared feature extraction model, and extracts multiple categories of shared feature information in the training corpus through the shared feature extraction model; then, according to a preset input method, the multiple categories of shared feature information and training text information based on the annotation of the training corpus are input into multiple subtask models, and the multiple subtask models are trained in parallel so that the overall loss function of the multiple subtask models meets the training cutoff condition; in the process of independent training of multiple subtask models, the task training loss of each subtask model is obtained, and the weight coefficient of the subtask model is adjusted according to the gradient change of the task training loss of each subtask model, so that the training rates of the multiple subtask models are within the same numerical range, until the overall loss function of the multiple subtask models meets the training cutoff condition, and the trained multiple subtask models are used as the trained task processing models.

通过这种方式，本申请可以通过构建的多任务学习模型框架，在保障每个子任务模型能够进行独立训练的同时，为不同的子任务模型提供与其执行的子任务相关的多种共享特征信息，有利于提高任务处理模型的整体模型训练效果。In this way, the present application can construct a multi-task learning model framework to ensure that each subtask model can be trained independently while providing different subtask models with a variety of shared feature information related to the subtasks they perform, which is conducive to improving the overall model training effect of the task processing model.

为使本申请的上述目的、特征和优点能更明显易懂，下文特举较佳实施例，并配合所附附图，作详细说明如下。In order to make the above-mentioned objects, features and advantages of the present application more obvious and easy to understand, preferred embodiments are specifically cited below and described in detail with reference to the attached drawings.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

为了更清楚地说明本申请实施例的技术方案，下面将对实施例中所需要使用的附图作简单地介绍，应当理解，以下附图仅示出了本申请的某些实施例，因此不应被看作是对范围的限定，对于本领域普通技术人员来讲，在不付出创造性劳动的前提下，还可以根据这些附图获得其他相关的附图。In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings required for use in the embodiments will be briefly introduced below. It should be understood that the following drawings only show certain embodiments of the present application and therefore should not be regarded as limiting the scope. For ordinary technicians in this field, other related drawings can be obtained based on these drawings without paying creative work.

图1示出了本申请实施例所提供的一种任务处理模型的模型训练方法的流程示意图；FIG1 is a schematic diagram showing a flow chart of a model training method for a task processing model provided in an embodiment of the present application;

图2示出了本申请实施例所提供的一种对多任务学习模型框架中每个子任务模型的权重系数进行动态调整的方法的流程示意图；FIG2 is a schematic flow chart showing a method for dynamically adjusting the weight coefficient of each subtask model in a multi-task learning model framework provided by an embodiment of the present application;

图3示出了本申请实施例所提供的第一种神经网络结构搜索的方法的流程示意图；FIG3 is a schematic diagram showing a flow chart of a first neural network structure search method provided in an embodiment of the present application;

图4示出了本申请实施例所提供的第二种神经网络结构搜索的方法的流程示意图；FIG4 is a schematic diagram showing a flow chart of a second neural network structure search method provided in an embodiment of the present application;

图5示出了本申请实施例所提供的一种按照第一输入方式，输入共享特征信息的方法的流程示意图；FIG5 is a schematic diagram showing a flow chart of a method for inputting shared feature information in a first input mode according to an embodiment of the present application;

图6示出了本申请实施例提供的一种任务处理模型的模型训练装置的结构示意图；FIG6 shows a schematic diagram of the structure of a model training device for a task processing model provided in an embodiment of the present application;

图7示出了本申请实施例提供的一种计算机设备700的结构示意图。FIG. 7 shows a schematic diagram of the structure of a computer device 700 provided in an embodiment of the present application.

具体实施方式Detailed ways

为使本申请实施例的目的、技术方案和优点更加清楚，下面将结合本申请实施例中的附图，对本申请实施例中的技术方案进行清楚、完整地描述，应当理解，本申请中附图仅起到说明和描述的目的，并不用于限定本申请的保护范围。另外，应当理解，示意性的附图并未按实物比例绘制。本申请中使用的流程图示出了根据本申请的一些实施例实现的操作。应该理解，流程图的操作可以不按顺序实现，没有逻辑的上下文关系的步骤可以反转顺序或者同时实施。此外，本领域技术人员在本申请内容的指引下，可以向流程图添加一个或多个其他操作，也可以从流程图中移除一个或多个操作。To make the purpose, technical scheme and advantages of the embodiments of the present application clearer, the technical scheme in the embodiments of the present application will be clearly and completely described below in conjunction with the drawings in the embodiments of the present application. It should be understood that the drawings in the present application only serve the purpose of explanation and description and are not used to limit the scope of protection of the present application. In addition, it should be understood that the schematic drawings are not drawn in real proportion. The flowchart used in this application shows the operations implemented according to some embodiments of the present application. It should be understood that the operations of the flowchart can be implemented out of sequence, and the steps without logical context can be reversed in order or implemented simultaneously. In addition, those skilled in the art can add one or more other operations to the flowchart under the guidance of the content of the present application, or remove one or more operations from the flowchart.

另外，所描述的实施例仅仅是本申请一部分实施例，而不是全部的实施例。通常在此处附图中描述和示出的本申请实施例的组件可以以各种不同的配置来布置和设计。因此，以下对在附图中提供的本申请的实施例的详细描述并非旨在限制要求保护的本申请的范围，而是仅仅表示本申请的选定实施例。基于本申请的实施例，本领域技术人员在没有做出创造性劳动的前提下所获得的所有其他实施例，都属于本申请保护的范围。In addition, the described embodiments are only a part of the embodiments of the present application, rather than all of the embodiments. The components of the embodiments of the present application described and shown in the drawings here can be arranged and designed in various configurations. Therefore, the following detailed description of the embodiments of the present application provided in the drawings is not intended to limit the scope of the application claimed for protection, but merely represents the selected embodiments of the present application. Based on the embodiments of the present application, all other embodiments obtained by those skilled in the art without making creative work belong to the scope of protection of the present application.

需要说明的是，本申请实施例中将会用到术语“包括”，用于指出其后所声明的特征的存在，但并不排除增加其它的特征。It should be noted that the term "comprising" will be used in the embodiments of the present application to indicate the existence of the features declared thereafter, but does not exclude the addition of other features.

基于此，本申请实施例提供了一种任务处理模型的模型训练方法、装置、设备及存储介质，先获取训练语料，并将训练语料输入至共享特征提取模型中，通过共享特征提取模型提取训练语料中多种类别的共享特征信息；再按照预设的输入方式，将多种类别的共享特征信息和基于训练语料标注的训练文本信息输入至多个子任务模型中，并行对多个子任务模型进行训练，以使多个子任务模型的整体损失函数满足训练截止条件；在多个子任务模型独立训练的过程中，获取每一子任务模型的任务训练损失，根据每一子任务模型的任务训练损失的梯度变化情况，对该子任务模型的权重系数进行调整，以使多个子任务模型的训练率位于同一数值范围区间内，直至多个子任务模型的整体损失函数满足所述训练截止条件，将训练好的多个子任务模型作为训练好的任务处理模型。Based on this, the embodiments of the present application provide a model training method, apparatus, device and storage medium for a task processing model, first obtaining a training corpus, and inputting the training corpus into a shared feature extraction model, and extracting multiple categories of shared feature information in the training corpus through the shared feature extraction model; then, according to a preset input method, multiple categories of shared feature information and training text information based on the annotations of the training corpus are input into multiple subtask models, and the multiple subtask models are trained in parallel so that the overall loss function of the multiple subtask models meets the training cutoff condition; in the process of independent training of multiple subtask models, the task training loss of each subtask model is obtained, and the weight coefficient of the subtask model is adjusted according to the gradient change of the task training loss of each subtask model, so that the training rates of the multiple subtask models are within the same numerical range, until the overall loss function of the multiple subtask models meets the training cutoff condition, and the trained multiple subtask models are used as the trained task processing models.

下面对本申请实施例提供的一种任务处理模型的模型训练方法、装置、设备及存储介质进行详细介绍。The following is a detailed introduction to the model training method, device, equipment and storage medium of a task processing model provided in an embodiment of the present application.

参照图1所示，图1示出了本申请实施例所提供的一种任务处理模型的模型训练方法的流程示意图，所述模型训练方法应用于多任务学习模型框架，所述多任务学习模型框架包括任务处理模型和预先训练好的共享特征提取模型，所述任务处理模型包括多个子任务模型；所述模型训练方法包括步骤S101-S103；具体的：Referring to FIG. 1 , FIG. 1 shows a flow chart of a model training method for a task processing model provided in an embodiment of the present application, wherein the model training method is applied to a multi-task learning model framework, wherein the multi-task learning model framework includes a task processing model and a pre-trained shared feature extraction model, wherein the task processing model includes a plurality of sub-task models; the model training method includes steps S101-S103; specifically:

S101，获取训练语料，并将所述训练语料输入至所述共享特征提取模型中，通过所述共享特征提取模型提取所述训练语料中多种类别的共享特征信息。S101, obtaining training corpus, and inputting the training corpus into the shared feature extraction model, and extracting multiple categories of shared feature information in the training corpus through the shared feature extraction model.

这里，不同子任务模型在多任务学习模型框架中的训练过程是互相独立的，也即，不同子任务模型待执行的子任务可以相同，也可以不同，对此，本申请实施例并不进行任何限定。Here, the training processes of different subtask models in the multi-task learning model framework are independent of each other, that is, the subtasks to be executed by different subtask models can be the same or different, and the embodiments of the present application do not impose any limitations on this.

在本申请实施例中，步骤S101中获取的训练语料表征的是：上述多个子任务模型在训练过程中使用的同一文本类型的语料信息；上述共享特征信息的信息类别是根据不同子任务模型待执行的子任务之间存在的任务依赖关系确定的。In an embodiment of the present application, the training corpus obtained in step S101 represents: corpus information of the same text type used by the above-mentioned multiple subtask models in the training process; the information category of the above-mentioned shared feature information is determined according to the task dependency relationship between the subtasks to be executed by different subtask models.

具体的，在本申请实施例中，上述多种类别的共享特征信息可以包括：训练语料被切分成字序列后的字特征向量；训练语料中表征词语与词语之间的句法依赖关系的词语特征向量；训练语料中的句特征向量。Specifically, in an embodiment of the present application, the above-mentioned multiple categories of shared feature information may include: character feature vectors after the training corpus is segmented into character sequences; word feature vectors in the training corpus that represent the syntactic dependency relationship between words; and sentence feature vectors in the training corpus.

这里针对上述多个子任务模型，需要说明的是，在本申请实施例中，多任务学习模型框架中的子任务模型并不是没有任何限定条件的任意模型的集合；也即，本申请实施例所适用的具体技术应用场景(即上述多个子任务模型的模型范围)为：一个多任务学习模型框架下的多个“执行与处理文本信息相关的任务”且“使用同一文本类型的语料信息进行训练”的子任务模型(相当于上述“多个子任务模型在训练过程中用于基于同一文本类型的语料信息执行不同的子任务”)。Regarding the above-mentioned multiple subtask models, it should be noted that, in the embodiments of the present application, the subtask models in the multi-task learning model framework are not a collection of arbitrary models without any restrictions; that is, the specific technical application scenarios applicable to the embodiments of the present application (i.e., the model range of the above-mentioned multiple subtask models) are: multiple subtask models under a multi-task learning model framework that "perform tasks related to processing text information" and "use corpus information of the same text type for training" (equivalent to the above-mentioned "multiple subtask models are used to perform different subtasks based on corpus information of the same text type during training").

这里，针对上述共享特征信息中的字特征向量、词语特征向量以及句特征向量，需要说明的是：Here, for the character feature vector, word feature vector and sentence feature vector in the above shared feature information, it should be noted that:

字特征向量可以是按照任意固定步长对训练语料进行切分后得到的，例如，可以按照2个字的固定步长对训练语料进行切分得到字特征向量，也可以按照3个字的固定步长对训练语料进行切分得到字特征向量，对于上述字特征向量的具体切分方式，本申请实施例不作任何限定；The character feature vector may be obtained by segmenting the training corpus according to any fixed step length. For example, the training corpus may be segmented according to a fixed step length of 2 characters to obtain the character feature vector, or the training corpus may be segmented according to a fixed step length of 3 characters to obtain the character feature vector. The specific segmentation method of the above character feature vector is not limited in any way in the embodiment of the present application.

词语特征向量既可以是通过对训练语料进行依存句法分析后得到的高阶词特征向量，也可以是通过常用分词方式得到的简易词特征向量，对于词语特征向量的具体向量形式，本申请实施例不作任何限定；The word feature vector can be a high-order word feature vector obtained by performing dependency syntactic analysis on the training corpus, or a simple word feature vector obtained by a common word segmentation method. The specific vector form of the word feature vector is not limited in any way in the embodiment of the present application;

句特征向量可以表征训练语料中不同语句在多个维度特征下映射的高维特征向量，对于句特征向量中向量的具体维度数值，本申请实施例不作任何限定。The sentence feature vector can represent a high-dimensional feature vector that maps different sentences in the training corpus under multiple dimensional features. The embodiment of the present application does not impose any limitation on the specific dimensional values of the vector in the sentence feature vector.

具体的，针对上述共享特征信息的具体信息类别，还需要说明的是，在特殊情况下，当两个不同子任务模型待执行的子任务相同时：Specifically, with respect to the specific information categories of the above-mentioned shared feature information, it should be noted that, in special cases, when the subtasks to be executed by two different subtask models are the same:

在第一种可选的实施方式中，可以确定执行该子任务所需的文本特征信息，即为当前两个不同子任务模型所对应的共享特征信息；In a first optional implementation, the text feature information required to execute the subtask may be determined, that is, the shared feature information corresponding to the two current different subtask models;

在第二种可选的实施方式中，还可以确定当前两个不同子任务模型待执行的子任务之间不具备任务依赖关系，也即，确定当前两个不同子任务模型之间不存在需要提取的共享特征信息。In a second optional implementation, it may also be determined that there is no task dependency between the subtasks to be executed by the current two different subtask models, that is, it is determined that there is no shared feature information to be extracted between the current two different subtask models.

S102，按照预设的输入方式，将所述多种类别的共享特征信息和基于所述训练语料标注的训练文本信息输入至所述多个子任务模型中，并行对所述多个子任务模型进行训练，以使所述多个子任务模型的整体损失函数满足训练截止条件。S102, according to a preset input method, input the multiple categories of shared feature information and the training text information annotated based on the training corpus into the multiple subtask models, and train the multiple subtask models in parallel so that the overall loss function of the multiple subtask models meets the training cutoff condition.

这里，上述训练语料的具体标注方式根据当前待输入的子任务模型对应执行的子任务确定，例如，若子任务模型A用于执行针对训练语料中包括的命名实体的文本识别任务，则对训练语料中包括的命名实体进行实体标注，将实体标注后的训练语料作为训练文本信息输入至子任务模型A中。Here, the specific annotation method of the above-mentioned training corpus is determined according to the subtask executed by the subtask model currently to be input. For example, if subtask model A is used to perform a text recognition task for named entities included in the training corpus, the named entities included in the training corpus are annotated, and the training corpus after entity annotation is input into subtask model A as training text information.

这里，上述预设的输入方式至少包括：分层输入的第一输入方式以及首层输入的第二输入方式。Here, the above-mentioned preset input method at least includes: a first input method of layered input and a second input method of first-level input.

具体的，在本申请实施例中，当预设的输入方式为上述第一输入方式时，则可以按照以下方式1，来执行上述步骤S102：Specifically, in the embodiment of the present application, when the preset input mode is the first input mode, the step S102 may be performed in the following manner 1:

方式1、在每一所述子任务模型的首层模型输入节点处，将所述训练文本信息输入至每一所述子任务模型中；Method 1: at the first-layer model input node of each subtask model, input the training text information into each subtask model;

将所述多种类别的共享特征信息按照信息类别与训练节点之间的对应关系，以分层输入的第一输入方式，分层级输入至每一所述子任务模型中的不同训练节点处。The multiple categories of shared feature information are input hierarchically to different training nodes in each of the subtask models in a first input mode of hierarchical input according to the correspondence between the information categories and the training nodes.

这里，每一所述子任务模型中的不同训练节点是按照模型中神经网络由浅到深的层级进行排序的；作为一可选实施例，针对不同信息类别的共享特征信息，可以设置共享特征信息的信息量越低，则该共享特征信息的信息类别对应的训练节点所在的神经网络层级越浅。Here, the different training nodes in each of the subtask models are sorted according to the layers of the neural network in the model from shallow to deep; as an optional embodiment, for shared feature information of different information categories, it can be set that the lower the information amount of the shared feature information, the shallower the neural network layer of the training node corresponding to the information category of the shared feature information is.

示例性的说明，以子任务模型A与子任务模型B对应的共享特征信息为字特征向量以及词语特征向量为例，若子任务模型A是由神经网络a、神经网络b以及神经网络c组成的3层神经网络模型，且子任务模型A中神经网络的深度排序为：神经网络a<神经网络b<神经网络c为例，则当神经网络a是子任务模型A的首层神经网络时，将共享特征信息中信息量较低的字特征向量以及训练文本信息，从神经网络a所在的输入节点处，输入至子任务模型A中；将共享特征信息中信息量较高的词语特征向量，从神经网络b所在的输入节点处，输入至子任务模型A中。As an exemplary explanation, taking the shared feature information corresponding to subtask model A and subtask model B as character feature vectors and word feature vectors as an example, if subtask model A is a three-layer neural network model composed of neural network a, neural network b and neural network c, and the depth order of the neural networks in subtask model A is: neural network a < neural network b < neural network c, then when neural network a is the first-layer neural network of subtask model A, the character feature vectors with lower information content and training text information in the shared feature information are input into subtask model A from the input node where neural network a is located; the word feature vectors with higher information content in the shared feature information are input into subtask model A from the input node where neural network b is located.

具体的，在本申请实施例中，当预设的输入方式为上述第二输入方式时，则可以按照以下方式2，来执行上述步骤S102：Specifically, in the embodiment of the present application, when the preset input method is the second input method, the step S102 may be performed according to the following method 2:

方式2、在每一所述子任务模型的首层模型输入节点处，以首层输入的第二输入方式，将所述多种类别的共享特征信息和所述训练文本信息同步输入至每一所述子任务模型中。Method 2: At the first-layer model input node of each of the subtask models, the multiple categories of shared feature information and the training text information are synchronously input into each of the subtask models using the second input method of the first-layer input.

示例性的说明，仍以子任务模型A与子任务模型B对应的共享特征信息为字特征向量以及词语特征向量为例，则按照上述第二输入方式，可以在不确定子任务模型A的具体模型结构(即模型中的神经网络层数)的情况下，直接将字特征向量、词语特征向量以及训练文本信息，从子任务模型A的首层模型输入节点(如最低层级神经网络的输入节点)处，输入至子任务模型A中。For example, taking the shared feature information corresponding to subtask model A and subtask model B as character feature vectors and word feature vectors as an example, according to the second input method mentioned above, the character feature vectors, word feature vectors and training text information can be directly input into subtask model A from the first-level model input node of subtask model A (such as the input node of the lowest-level neural network) without determining the specific model structure of subtask model A (i.e., the number of neural network layers in the model).

S103，在所述多个子任务模型独立训练的过程中，获取每一所述子任务模型的任务训练损失，根据每一所述子任务模型的任务训练损失的梯度变化情况，对该子任务模型的权重系数进行调整，以使所述多个子任务模型的训练率位于同一数值范围区间内，直至所述多个子任务模型的整体损失函数满足所述训练截止条件，将训练好的多个子任务模型作为训练好的任务处理模型。S103, during the independent training of the multiple subtask models, obtain the task training loss of each of the subtask models, and adjust the weight coefficient of the subtask model according to the gradient change of the task training loss of each of the subtask models, so that the training rates of the multiple subtask models are within the same numerical range, until the overall loss function of the multiple subtask models meets the training cutoff condition, and use the trained multiple subtask models as trained task processing models.

需要说明的是，每个子任务模型在多任务学习模型框架下的模型训练过程是互相独立的，因此，不同子任务模型使用的任务训练损失函数可以相同，也可以不同，对此，本申请实施例不作任何限定。It should be noted that the model training process of each subtask model in the multi-task learning model framework is independent of each other. Therefore, the task training loss functions used by different subtask models can be the same or different. This embodiment of the present application does not impose any limitations on this.

在本申请实施例中，所述多个子任务模型的整体损失函数是根据每一所述子任务模型的任务训练损失的梯度以及所述多任务学习模型框架中该子任务模型的权重系数的乘积确定的。In an embodiment of the present application, the overall loss function of the multiple subtask models is determined based on the gradient of the task training loss of each subtask model and the product of the weight coefficient of the subtask model in the multi-task learning model framework.

这里，考虑到不同子任务模型待执行的子任务可能不同，而任务完成难度越高的子任务模型，其对应的具体模型结构也就越复杂，因此，不同模型结构的子任务模型之间，子任务模型的任务训练损失的梯度变化速度(相当于模型训练任务反向传播的梯度量级)难以进行平衡(很难保证以相似或者相同的训练速度进行模型收敛)，也即，难以在同一多任务学习模型框架下使得多个子任务模型的训练率保持同步(即训练率相同或者训练率位于同一数值范围区间内)。Here, considering that the subtasks to be executed by different subtask models may be different, and the subtask model with higher difficulty in completing the task has a more complex corresponding specific model structure, therefore, between subtask models with different model structures, the gradient change rate of the task training loss of the subtask model (equivalent to the gradient magnitude of the back propagation of the model training task) is difficult to balance (it is difficult to ensure model convergence at a similar or identical training speed), that is, it is difficult to keep the training rates of multiple subtask models synchronized (that is, the training rates are the same or the training rates are within the same numerical range) under the same multi-task learning model framework.

基于此，为了保证模型结构复杂程度不同的子任务模型能够在多任务学习模型框架下按照相似的训练速度(即训练率位于同一数值范围区间内)进行模型训练，提高各个子任务模型的模型学习充分度，在一种可选的实施方式中，还可以按照如图2所示的动态调整方法，来对多任务学习模型框架中每个子任务模型的权重系数进行动态调整，具体的：Based on this, in order to ensure that subtask models with different model structure complexities can be trained at a similar training speed (that is, the training rate is within the same numerical range) under the multi-task learning model framework, and improve the model learning adequacy of each subtask model, in an optional implementation, the weight coefficient of each subtask model in the multi-task learning model framework can also be dynamically adjusted according to the dynamic adjustment method shown in FIG. 2, specifically:

参照图2所示，图2示出了本申请实施例所提供的一种对多任务学习模型框架中每个子任务模型的权重系数进行动态调整的方法的流程示意图，该方法包括步骤S201-S203；具体的：Referring to FIG. 2 , FIG. 2 shows a flow chart of a method for dynamically adjusting the weight coefficient of each subtask model in a multi-task learning model framework provided by an embodiment of the present application, the method comprising steps S201-S203; specifically:

S201，针对每一所述子任务模型，以该子任务模型的任务训练损失的梯度作为目标梯度，在梯度检测周期内，获取所述目标梯度在所述检测周期内的周期变化幅度。S201, for each of the subtask models, taking the gradient of the task training loss of the subtask model as the target gradient, and obtaining the periodic variation amplitude of the target gradient within the detection period within a gradient detection period.

这里，在多任务学习模型框架下，不同子任务模型对应的梯度检测周期相同，也即，在同一梯度检测周期内，对每个子任务模型的任务训练损失的梯度进行梯度检测；对于梯度检测周期的具体时间长度，本申请实施例不作任何限定。Here, under the multi-task learning model framework, the gradient detection cycles corresponding to different subtask models are the same, that is, within the same gradient detection cycle, the gradient of the task training loss of each subtask model is gradient detected; the specific time length of the gradient detection cycle is not limited in any way in the embodiments of the present application.

S202，当检测到所述目标梯度的周期变化幅度大于或者等于参考梯度变化量时，则按照梯度降低调节系数，对该子任务模型的权重系数进行下降式的动态调整。S202, when it is detected that the periodic variation amplitude of the target gradient is greater than or equal to the reference gradient variation, the weight coefficient of the subtask model is dynamically adjusted in a descending manner according to the gradient reduction adjustment coefficient.

这里，当检测到目标梯度的周期变化幅度大于或者等于参考梯度变化量时，可以确定在当前的梯度检测周期内，该子任务模型的任务训练损失的梯度发生较大幅度的变化(即大幅增高)，此时，通过对该子任务模型的权重系数进行下降式的动态调整，即可保持该子任务模型的任务训练损失在多任务学习模型框架(或者任务处理模型)的整体损失中所占的比重(相当于该子任务模型的任务训练损失的梯度与该子任务模型的权重系数的乘积)达到动态的平衡，从而，保证模型结构复杂程度不同的子任务模型能够在多任务学习模型框架下按照相似的训练速度进行模型训练，提高了各个子任务模型的模型学习充分度。Here, when it is detected that the periodic change amplitude of the target gradient is greater than or equal to the reference gradient change, it can be determined that within the current gradient detection cycle, the gradient of the task training loss of the subtask model has changed significantly (i.e., increased significantly). At this time, by dynamically adjusting the weight coefficient of the subtask model in a descending manner, the proportion of the task training loss of the subtask model in the overall loss of the multi-task learning model framework (or task processing model) (equivalent to the product of the gradient of the task training loss of the subtask model and the weight coefficient of the subtask model) can be maintained in dynamic balance, thereby ensuring that subtask models with different levels of model structure complexity can be trained at a similar training speed under the multi-task learning model framework, thereby improving the model learning adequacy of each subtask model.

S203，当检测到所述目标梯度的周期变化幅度小于所述参考梯度变化量时，则按照梯度升高调节系数，对该子任务模型的权重系数进行升高式的动态调整。S203, when it is detected that the periodic variation amplitude of the target gradient is smaller than the reference gradient variation, the weight coefficient of the subtask model is dynamically adjusted in an increasing manner according to the gradient increasing adjustment coefficient.

这里，与上述步骤S202相对应的，当检测到目标梯度的周期变化幅度小于参考梯度变化量时，可以确定在当前的梯度检测周期内，该子任务模型的任务训练损失的梯度同样发生较大幅度的变化(即大幅降低)，此时，通过对该子任务模型的权重系数进行升高式的动态调整，即可保持该子任务模型的任务训练损失在多任务学习模型框架的整体损失中所占的比重(相当于该子任务模型的任务训练损失的梯度与该子任务模型的权重系数的乘积)同样达到动态的平衡，从而，保证模型结构复杂程度不同的子任务模型能够在多任务学习模型框架下按照相似的训练速度进行模型训练，提高了各个子任务模型的模型学习充分度。Here, corresponding to the above-mentioned step S202, when it is detected that the periodic change amplitude of the target gradient is less than the reference gradient change, it can be determined that within the current gradient detection cycle, the gradient of the task training loss of the subtask model also changes significantly (i.e., decreases significantly). At this time, by dynamically adjusting the weight coefficient of the subtask model in an increasing manner, the proportion of the task training loss of the subtask model in the overall loss of the multi-task learning model framework (equivalent to the product of the gradient of the task training loss of the subtask model and the weight coefficient of the subtask model) can be maintained to achieve a dynamic balance, thereby ensuring that subtask models with different levels of model structure complexity can be trained at a similar training speed under the multi-task learning model framework, thereby improving the model learning adequacy of each subtask model.

为了更加清晰的体现本申请实施例中上述步骤S101-S103的实施细节，下面以文本信息处理过程中出现较为频繁的“语义情感分析任务”作为上述多任务学习模型框架的整体学习任务为例，对上述步骤S101-S103的实施细节，进行详细的介绍：In order to more clearly reflect the implementation details of the above steps S101-S103 in the embodiment of the present application, the following takes the "semantic sentiment analysis task" that appears more frequently in the text information processing process as an example of the overall learning task of the above multi-task learning model framework, and introduces the implementation details of the above steps S101-S103 in detail:

首先需要说明的是，本申请实施例中，任务处理模型中的每个子任务模型用于执行相应的子任务，且不同子任务模型互相配合能够处理任务处理模型待执行的目标任务；任务处理模型待执行的目标任务可以是“与文本信息处理”相关的任意学习任务，并不仅局限于上述“语义情感分析任务”，这里仅是选用相对较为复杂的文本信息处理任务(即“语义情感分析”任务)作为示例，以更加清晰的突显本申请实施例在“训练多个子任务模型”的过程中的具体实施细节，本申请实施例对于任务处理模型待执行的目标任务具体属于“文本信息处理任务”中的哪一种，并不进行任何限定。First of all, it should be explained that in the embodiment of the present application, each subtask model in the task processing model is used to execute the corresponding subtask, and different subtask models cooperate with each other to process the target task to be executed by the task processing model; the target task to be executed by the task processing model can be any learning task related to "text information processing", and is not limited to the above-mentioned "semantic sentiment analysis task". Here, only a relatively complex text information processing task (i.e., "semantic sentiment analysis" task) is selected as an example to more clearly highlight the specific implementation details of the embodiment of the present application in the process of "training multiple subtask models". The embodiment of the present application does not impose any restrictions on which type of "text information processing tasks" the target task to be executed by the task processing model belongs to.

针对上述步骤S101的实施过程，当任务处理模型待执行的目标任务与识别文本信息表达的语义情感有关时，所述任务处理模型(即所述多个子任务模型)中至少包括一个命名实体识别模型以及一个情感分类模型；其中，所述命名实体识别模型用于执行针对所述训练文本信息中包括的命名实体的文本识别任务；所述情感分类模型用于执行针对所述训练文本信息中每一语句表征的句子情感的情感分类任务。With respect to the implementation process of the above-mentioned step S101, when the target task to be executed by the task processing model is related to identifying the semantic emotions expressed by the text information, the task processing model (i.e., the multiple subtask models) includes at least one named entity recognition model and one sentiment classification model; wherein the named entity recognition model is used to perform the text recognition task for the named entities included in the training text information; and the sentiment classification model is used to perform the sentiment classification task for the sentence emotions represented by each sentence in the training text information.

基于此，在本申请实施例中，当针对共享特征提取模型预先进行的训练方式不同时，根据不同的模型训练方式，在步骤S101中，至少可以按照以下3种不同的可选方式，来确定共享特征提取模型待提取的共享特征信息的具体信息类别，具体的：Based on this, in the embodiment of the present application, when the pre-training methods for the shared feature extraction model are different, according to different model training methods, in step S101, at least the following three different optional methods can be used to determine the specific information category of the shared feature information to be extracted by the shared feature extraction model, specifically:

可选方式(1)、根据所述多个子任务模型待执行的多个子任务之间的目标任务依赖关系，从预设的任务依赖关系表中确定所述目标任务依赖关系对应的多种信息类别作为所述待提取的共享特征信息的多种类别。Optional method (1) is to determine, based on the target task dependency relationship between the multiple subtasks to be executed by the multiple subtask models, multiple information categories corresponding to the target task dependency relationship from a preset task dependency table as the multiple categories of shared feature information to be extracted.

这里，任务依赖关系表预先存储有多种任务依赖关系对应的多种信息类别，如任务依赖关系表中预先存储有任务依赖关系a对应的多种信息类别为：信息类别x1、x2和x3。Here, the task dependency table pre-stores a plurality of information categories corresponding to a plurality of task dependencies. For example, the task dependency table pre-stores a plurality of information categories corresponding to task dependency a: information categories x1, x2 and x3.

针对上述可选方式(1)的实施，这里，以任务处理模型中多个子任务模型为：命名实体识别模型M以及情感分类模型Q为例，在上述可选方式(1)的实施过程中，命名实体识别模型M需要执行针对训练文本信息中包括的命名实体的文本识别任务；情感分类模型Q需要执行针对训练文本信息中每一语句表征的句子情感的情感分类任务；此时，情感分类模型Q执行的情感分类任务是以“语句”作为最小分析单元进行文本处理的，命名实体识别模型M执行的文本识别任务则是以“字-词”的文本识别顺序进行文本处理的，基于此，可以确定命名实体识别模型M与情感分类模型Q之间存在的目标任务依赖关系为：情感分类模型Q待执行的子任务(即语句级别的文本分析任务)依赖于命名实体识别模型M的字/词的文本识别结果(即字/词级别的文本识别任务)；从任务依赖关系表中，确定该目标任务依赖关系对应的信息类别：字特征向量、词语特征向量作为待提取的共享特征信息的信息类别。Regarding the implementation of the above optional method (1), here, taking the multiple subtask models in the task processing model as: named entity recognition model M and sentiment classification model Q as an example, during the implementation of the above optional method (1), the named entity recognition model M needs to perform a text recognition task for the named entities included in the training text information; the sentiment classification model Q needs to perform a sentiment classification task for the sentence sentiment represented by each sentence in the training text information; at this time, the sentiment classification task performed by the sentiment classification model Q is to perform text processing with "sentence" as the minimum analysis unit, and the text recognition task performed by the named entity recognition model M is to perform text processing with the text recognition order of "character-word". Based on this, it can be determined that the target task dependency relationship between the named entity recognition model M and the sentiment classification model Q is: the subtask to be performed by the sentiment classification model Q (i.e., the text analysis task at the sentence level) depends on the text recognition result of the character/word of the named entity recognition model M (i.e., the text recognition task at the character/word level); from the task dependency table, the information category corresponding to the target task dependency relationship is determined: character feature vector and word feature vector are used as the information category of the shared feature information to be extracted.

这里，结合上述分析内容，在本申请实施例中，作为一可选实施例，当所述任务处理模型的目标任务与识别文本信息表达的语义情感有关时，所述多种类别的共享特征信息，至少可以包括：Here, in combination with the above analysis content, in the embodiment of the present application, as an optional embodiment, when the target task of the task processing model is related to identifying the semantic emotions expressed by text information, the multiple categories of shared feature information may at least include:

1、字符级共享特征信息，也即，上述步骤S101中的字特征向量。1. Character-level shared feature information, that is, the word feature vector in the above step S101.

这里，所述字符级共享特征信息用于表征能够组成目标分词的字符排列特征信息。Here, the character-level shared feature information is used to represent the character arrangement feature information that can constitute the target word segmentation.

示例性的说明，以使用邻接字bigram(二次元语法)向量为例，可以将训练语料切成bigram字序列，例如，训练文本信息中的语句“北京今天北风劲吹蓝天霸屏”会被切成序列：“北京/京今/今天/天北/北风风劲/劲吹/吹蓝/蓝天/天霸/霸屏”，然后,使用word2vec(一群用来产生词向量的相关模型)的方法进行训练，可以得到50维的邻接字bigram向量。As an exemplary explanation, taking the use of adjacent word bigram (secondary grammar) vectors as an example, the training corpus can be cut into bigram word sequences. For example, the sentence "Beijing today the north wind blows hard and the blue sky dominates the screen" in the training text information will be cut into the sequence: "Beijing/Beijing today/Today/North Sky/North Wind/Blowing Hard/Blowing Blue/Blue Sky/Sky Dominates/Dominates the Screen" and then, using the word2vec (a group of related models used to generate word vectors) method for training, a 50-dimensional adjacent word bigram vector can be obtained.

需要说明的是，针对字符级共享特征信息，除上述示例中的邻接字bigram向量之外，也可以使用tri-gram(即每3个字符为一组，进行语句的切分)特征向量，4-gram(即每4个字符为一组，进行语句的切分)特征向量……n-Gram(即以每n个字符为一组，进行语句的切分)特征向量。对于上述字符级共享特征信息的具体获取方式，本申请实施例不作任何限定。It should be noted that, for character-level shared feature information, in addition to the adjacent word bigram vectors in the above example, tri-gram (i.e., every 3 characters are grouped together to segment the sentence) feature vector, 4-gram (i.e., every 4 characters are grouped together to segment the sentence) feature vector... n-Gram (i.e., every n characters are grouped together to segment the sentence) feature vector can also be used. The specific method of obtaining the above-mentioned character-level shared feature information is not limited in any way in the embodiments of the present application.

2、分词级共享特征信息，也即，上述步骤S101中的词语特征向量。2. Shared feature information at the word segmentation level, that is, the word feature vector in the above step S101.

这里，所述分词级共享特征信息用于表征同一上下文语境中不同分词之间的从属关系的词性特征信息。Here, the word segmentation level shared feature information is used to represent the part-of-speech feature information of the subordinate relationship between different word segments in the same context.

这里，作为一可选实施例，可以采用StandFordNLP(一种自然语言处理工具)对输入的训练文本信息进行依存句法分析，获得训练文本信息中词语与词语之间的句法依赖关系，从而，得到上述分词级共享特征信息。Here, as an optional embodiment, StandFordNLP (a natural language processing tool) can be used to perform dependency syntactic analysis on the input training text information to obtain the syntactic dependency relationship between words in the training text information, thereby obtaining the above-mentioned word segmentation level shared feature information.

这里，作为另一可选实施例，还可以采用有向图的邻接矩阵，来存储和获取词语之间的句法依赖关系。Here, as another optional embodiment, an adjacency matrix of a directed graph may also be used to store and obtain the syntactic dependency relationship between words.

示例性的说明，首先，忽略词语与词语之间依赖关系的具体类型(例如，无论是主谓关系还是动宾关系，都认为具有“从属关系”)，然后，根据预设的依存关系指示方向(例如，矩阵中i指向j可以表示i是j的从属词)，将每个语句的分词结果分别作为矩阵的行与矩阵的列，创建邻接矩阵；此时，若词语i与词语j之间存在上述的“从属关系”，则对应的邻接矩阵元素a_ij的取值为1；若词语i与词语j之间不存在上述的“从属关系”，则对应的邻接矩阵元素a_ij的取值为0；最后，将邻接矩阵中的每一行作为每一列词语对应的词特征向量，即可得到上述分词级共享特征信息。For example, first, ignore the specific type of dependency between words (for example, whether it is a subject-predicate relationship or a verb-object relationship, it is considered to have a "subordinate relationship"). Then, according to the preset dependency indication direction (for example, i pointing to j in the matrix can indicate that i is a subordinate word of j), use the word segmentation results of each sentence as the rows and columns of the matrix to create an adjacency matrix; at this time, if the above-mentioned "subordinate relationship" exists between word i and word j, the corresponding adjacency matrix element a _ij has a value of 1; if the above-mentioned "subordinate relationship" does not exist between word i and word j, the corresponding adjacency matrix element a _ij has a value of 0; finally, use each row in the adjacency matrix as the word feature vector corresponding to each column of words to obtain the above-mentioned word segmentation level shared feature information.

3、语句级高维度共享特征信息，也即，上述步骤S101中的句特征向量。3. Sentence-level high-dimensional shared feature information, that is, the sentence feature vector in the above step S101.

这里，所述语句级高维度共享特征信息用于表征文本信息中不同语句在多个维度特征下映射的高维特征向量；所述目标分词用于表征具备语义含义的分词。Here, the sentence-level high-dimensional shared feature information is used to represent high-dimensional feature vectors mapped under multiple dimensional features of different sentences in text information; the target word segmentation is used to represent word segmentations with semantic meanings.

示例性的说明，作为一可选实施例，可以采用BERT预训练语言模型，对训练文本信息进行文本嵌入处理，获取每个语句的句向量作为上述语句级高维度共享特征信息。As an exemplary description, as an optional embodiment, the BERT pre-trained language model can be used to perform text embedding processing on the training text information, and the sentence vector of each sentence is obtained as the above-mentioned sentence-level high-dimensional shared feature information.

可选方式(2)、训练共享特征提取模型通过NAS(神经网络结构搜索)的方式，在任务处理模型待执行的目标任务可以进行拆分的情况下，以搜索能够完成目标任务所需的最优子任务模型组合的方式，来进行共享特征信息的信息类别的确定与提取。Optional method (2) is to train the shared feature extraction model through NAS (neural network structure search). When the target task to be executed by the task processing model can be split, the information category of the shared feature information is determined and extracted by searching for the optimal sub-task model combination required to complete the target task.

参照图3所示，图3示出了本申请实施例所提供的第一种神经网络结构搜索的方法的流程示意图，该方法包括步骤S301-S303；具体的：Referring to FIG. 3 , FIG. 3 shows a schematic flow chart of a first neural network structure search method provided in an embodiment of the present application, the method comprising steps S301-S303; specifically:

S301，根据所述任务处理模型待执行的目标任务，以所述任务处理模型包括的多个子任务模型作为第一搜索空间，以能够执行所述目标任务作为第一搜索策略，对所述第一搜索空间内不同子任务模型之间的子任务模型组合方式进行神经网络结构搜索，得到符合所述第一搜索策略的最优子任务模型组合方式。S301, according to the target task to be executed by the task processing model, taking the multiple subtask models included in the task processing model as the first search space, taking the ability to execute the target task as the first search strategy, performing a neural network structure search on the subtask model combination method between different subtask models in the first search space, and obtaining the optimal subtask model combination method that meets the first search strategy.

这里，第一搜索策略中的目标任务既可以用于表征任务处理模型能够执行的最高级学习任务，也可以用于表征位于上述最高级学习任务下的次级学习任务。Here, the target task in the first search strategy can be used to characterize the highest-level learning task that the task processing model can perform, and can also be used to characterize the secondary learning task under the above-mentioned highest-level learning task.

示例性的说明，以任务处理模型能够执行的最高级学习任务是上述“语义情感分析”任务(命名实体识别任务+语句情感分类任务)为例，则在步骤S301中，第一搜索策略中的目标任务可以是“语义情感分析”任务，也可以是上述最高级学习任务下的次级学习任务：命名实体识别任务或者语句情感分类任务。For example, the highest-level learning task that the task processing model can perform is the above-mentioned "semantic sentiment analysis" task (named entity recognition task + sentence sentiment classification task). In step S301, the target task in the first search strategy can be the "semantic sentiment analysis" task, or it can be a secondary learning task under the above-mentioned highest-level learning task: named entity recognition task or sentence sentiment classification task.

这里，最优子任务模型组合方式中“最优”的定义可以根据用户的实际模型训练需求确定，例如，当检测到用户的实际模型训练需求趋向于提高模型训练速度时，则可以确定最优子任务模型组合方式是符合上述第一搜索策略的同时，且多个子任务模型的整体训练速度最快的模型组合方式；当检测到用户的实际模型训练需求趋向于提高模型训练准确度时，则可以确定最优子任务模型组合方式是符合上述第一搜索策略的同时，且多个子任务模型的整体训练结果最为准确的模型组合方式。Here, the definition of "optimal" in the optimal subtask model combination method can be determined according to the user's actual model training needs. For example, when it is detected that the user's actual model training needs tend to increase the model training speed, it can be determined that the optimal subtask model combination method is a model combination method that complies with the above-mentioned first search strategy and has the fastest overall training speed for multiple subtask models; when it is detected that the user's actual model training needs tend to increase the model training accuracy, it can be determined that the optimal subtask model combination method is a model combination method that complies with the above-mentioned first search strategy and has the most accurate overall training results for multiple subtask models.

S302，将所述最优子任务模型组合方式下包括的每个子任务模型作为第一子任务模型。S302: taking each subtask model included in the optimal subtask model combination as a first subtask model.

S303，根据每一所述第一子任务模型待执行的子任务之间的第一任务依赖关系，从预设的任务依赖关系表中确定所述第一任务依赖关系对应的多种信息类别作为所述待提取的共享特征信息的多种类别。S303: According to the first task dependency relationship between the subtasks to be executed in each of the first subtask models, determine from a preset task dependency relationship table a plurality of information categories corresponding to the first task dependency relationship as the plurality of categories of the shared feature information to be extracted.

这里，在确定上述最优子任务模型组合方式之后，步骤S302-S303的具体实施过程与上述可选方式(1)相同，重复之处在此不再赘述。Here, after determining the above-mentioned optimal subtask model combination method, the specific implementation process of steps S302-S303 is the same as the above-mentioned optional method (1), and the repeated parts are not repeated here.

这里，针对上述步骤S301-S303的实施，还需要说明的是，具体到每一个子任务模型的具体模型结构中，每个子任务模型又可以看作是多个不同类型/层级的神经网络的组合，则针对一个子任务模型而言，以不同神经网络之间的结构组合能够完成该子任务模型待执行的子任务(即子任务模型的模型训练任务)作为上述第一搜索策略，同样可以确定出每个子任务模型的最优模型结构，重复之处不再赘述。Here, with regard to the implementation of the above steps S301-S303, it is also necessary to explain that, in the specific model structure of each subtask model, each subtask model can be regarded as a combination of multiple neural networks of different types/levels. For a subtask model, the structural combination of different neural networks can complete the subtask to be executed by the subtask model (i.e., the model training task of the subtask model) as the above-mentioned first search strategy, and the optimal model structure of each subtask model can also be determined. The repetitions will not be repeated here.

可选方式(3)、训练共享特征提取模型通过NAS(神经网络结构搜索)的方式，在多任务学习模型框架的框架结构固定(即任务处理模型待执行的目标任务不能拆分执行)的情况下，来进行共享特征信息的信息类别的确定与提取。Optional method (3) is to train the shared feature extraction model through NAS (neural network structure search) to determine and extract the information category of the shared feature information when the framework structure of the multi-task learning model framework is fixed (that is, the target task to be executed by the task processing model cannot be split and executed).

参照图4所示，图4示出了本申请实施例所提供的第二种神经网络结构搜索的方法的流程示意图，该方法包括步骤S401-S403；具体的：Referring to FIG. 4 , FIG. 4 shows a schematic flow chart of a second neural network structure search method provided in an embodiment of the present application, the method comprising steps S401-S403; specifically:

S401，根据所述任务处理模型待执行的目标任务，获取与完成所述目标任务相关的多种文本特征信息。S401, according to the target task to be executed by the task processing model, obtaining a variety of text feature information related to completing the target task.

示例性的说明，仍以任务处理模型待执行的目标任务是上述“语义情感分析”任务(命名实体识别任务+语句情感分类任务)为例，则在不拆分执行目标任务的条件下，可以获取“字符级共享特征信息a、b、c”(相当于信息类别为字特征向量的文本特征信息)，“分词级共享特征信息d、e、f”(相当于信息类别为词语特征向量的文本特征信息)，“语句级共享特征信息g、h”(相当于信息类别为句特征向量的文本特征信息)作为与完成该目标任务相关的多种文本特征信息。As an exemplary explanation, still taking the target task to be executed by the task processing model as the above-mentioned "semantic sentiment analysis" task (named entity recognition task + sentence sentiment classification task) as an example, without splitting the target task, "character-level shared feature information a, b, c" (equivalent to the text feature information whose information category is word feature vector), "word-level shared feature information d, e, f" (equivalent to the text feature information whose information category is word feature vector), and "sentence-level shared feature information g, h" (equivalent to the text feature information whose information category is sentence feature vector) can be obtained as a variety of text feature information related to completing the target task.

S402，以所述多种文本特征信息作为第二搜索空间，以所述多个子任务模型能够基于不同文本特征信息的信息组合完成所述目标任务作为第二搜索策略，对所述第二搜索空间内不同文本特征信息之间的信息组合方式进行神经网络结构搜索，得到符合所述第二搜索策略的最优信息组合方式。S402, taking the multiple text feature information as the second search space, taking the ability of the multiple subtask models to complete the target task based on the information combination of different text feature information as the second search strategy, performing a neural network structure search on the information combination method between different text feature information in the second search space, and obtaining the optimal information combination method that meets the second search strategy.

这里，与上述步骤S301相似，最优信息组合方式中“最优”的定义同样可以根据用户的实际模型训练需求确定，例如，当检测到用户的实际模型训练需求趋向于提高模型训练速度时，则可以确定最优信息组合方式是符合上述第二搜索策略的同时，且需要提取的共享特征信息的信息类别最少的信息组合方式；当检测到用户的实际模型训练需求趋向于提高模型训练准确度时，则可以确定最优信息组合方式是符合上述第二搜索策略的同时，且多个子任务模型的整体训练结果最为准确的信息组合方式。Here, similar to the above-mentioned step S301, the definition of "optimal" in the optimal information combination method can also be determined based on the user's actual model training needs. For example, when it is detected that the user's actual model training needs tend to increase the model training speed, it can be determined that the optimal information combination method is an information combination method that complies with the above-mentioned second search strategy and has the least information categories of shared feature information that need to be extracted; when it is detected that the user's actual model training needs tend to improve the model training accuracy, it can be determined that the optimal information combination method is an information combination method that complies with the above-mentioned second search strategy and has the most accurate overall training results for multiple subtask models.

S403，将所述最优信息组合方式下包括的每种文本特征信息所属的信息类别作为所述待提取的共享特征信息的多种类别。S403: taking the information category to which each type of text feature information included in the optimal information combination mode belongs as the multiple categories of the shared feature information to be extracted.

示例性的说明，仍以上述S401中的示例为例，若确定最优信息组合方式下包括的文本特征信息为：字符级共享特征信息a和分词级共享特征信息d，则可以确定待提取的共享特征信息的多种类别为：字特征向量和词语级特征向量。For example, still taking the example in S401 above, if it is determined that the text feature information included in the optimal information combination method is: character-level shared feature information a and word-level shared feature information d, then it can be determined that the multiple categories of shared feature information to be extracted are: character feature vectors and word-level feature vectors.

针对上述步骤S102的实施过程，在按照分层输入的第一输入方式进行共享特征信息部分的输入时，除上述步骤S102中的方式1之外，参照图5所示，图5示出了本申请实施例所提供的一种按照第一输入方式，输入共享特征信息的方法的流程示意图，该方法包括步骤S501-S503；具体的：With respect to the implementation process of the above step S102, when the shared feature information part is input according to the first input mode of hierarchical input, in addition to the mode 1 in the above step S102, referring to FIG. 5, FIG. 5 shows a flow chart of a method for inputting shared feature information according to the first input mode provided by an embodiment of the present application, the method comprising steps S501-S503; specifically:

S501，针对每一所述子任务模型，在该子任务模型的第一训练节点处，将信息类别为所述字特征向量的第一共享特征信息输入至该子任务模型中。S501 , for each of the subtask models, at a first training node of the subtask model, inputting first shared feature information whose information category is the word feature vector into the subtask model.

这里，所述第一训练节点用于表征该子任务模型中浅层神经网络的输入节点。Here, the first training node is used to represent the input node of the shallow neural network in the subtask model.

S502，在该子任务模型的第二训练节点处，将信息类别为所述词语特征向量的第二共享特征信息输入至该子任务模型中。S502: At a second training node of the subtask model, second shared feature information whose information category is the word feature vector is input into the subtask model.

这里，所述第二训练节点用于表征该子任务模型中的中间层神经网络的输入节点。Here, the second training node is used to represent the input node of the intermediate layer neural network in the subtask model.

S503，在该子任务模型的第三训练节点处，将信息类别为所述句特征向量的第三共享特征信息输入至该子任务模型中。S503: At the third training node of the subtask model, third shared feature information whose information category is the sentence feature vector is input into the subtask model.

这里，所述第三训练节点用于表征该子任务模型中深层神经网络的输入节点。Here, the third training node is used to represent the input node of the deep neural network in the subtask model.

针对上述步骤S501-S503的实施，需要说明的是，当子任务模型中神经网络的层数小于3层时，以2层神经网络的模型结构为例，则可以将“字符级共享特征信息”(即信息类别为所述字特征向量的第一共享特征信息)与“分词级共享特征信息”(即信息类别为所述词语特征向量的第二共享特征信息)输入第一层(即最浅层)神经网络中，将“语句级共享特征信息”(即信息类别为所述句特征向量的第三共享特征信息)输入第二层(即最深层)神经网络中；Regarding the implementation of the above steps S501-S503, it should be noted that when the number of layers of the neural network in the subtask model is less than 3 layers, taking the model structure of a 2-layer neural network as an example, the "character-level shared feature information" (i.e., the first shared feature information whose information category is the character feature vector) and the "word-level shared feature information" (i.e., the second shared feature information whose information category is the word feature vector) can be input into the first layer (i.e., the shallowest layer) of the neural network, and the "sentence-level shared feature information" (i.e., the third shared feature information whose information category is the sentence feature vector) can be input into the second layer (i.e., the deepest layer) of the neural network;

此外，还可以将“字符级共享特征信息”输入第一层神经网络中，将“分词级共享特征信息”与“语句级共享特征信息”输入第二层神经网络中，对于子任务模型中神经网络的具体层数，本申请实施例不作任何限定。In addition, the "character-level shared feature information" can be input into the first layer of the neural network, and the "word segmentation level shared feature information" and the "sentence level shared feature information" can be input into the second layer of the neural network. The embodiment of the present application does not impose any limitation on the specific number of layers of the neural network in the subtask model.

针对上述步骤S102的实施过程，在按照首层输入的第二输入方式进行共享特征信息部分的输入时，针对上述步骤S102中的方式2的具体实施，还可以按照是否区分不同子任务模型待执行的子任务的任务类型差异，分为以下2种可选实施方案，具体的：For the implementation process of the above step S102, when the shared feature information part is input according to the second input method of the first-level input, for the specific implementation of the method 2 in the above step S102, it can also be divided into the following two optional implementation plans according to whether the task type differences of the subtasks to be executed by different subtask models are distinguished, specifically:

可选实施方案1、在对不同子任务模型待执行的子任务所属的任务类型不进行区分时，以所述第二输入方式，将所述多种类别的共享特征信息和所述训练文本信息同步输入至每一所述子任务模型中。Optional implementation scheme 1: when the task types of the subtasks to be executed by different subtask models are not differentiated, the multiple categories of shared feature information and the training text information are synchronously input into each of the subtask models using the second input method.

可选实施方案2、共分为步骤a与步骤b，具体的：Optional implementation scheme 2 is divided into step a and step b, specifically:

步骤a、在对不同子任务模型待执行的子任务所属的任务类型进行区分时，针对每一所述子任务模型，根据该子任务模型待执行的子任务，确定所述多种类别的共享特征信息中与该子任务模型待执行的子任务相匹配的目标共享特征信息。Step a. When distinguishing the task types of subtasks to be executed by different subtask models, for each subtask model, according to the subtask to be executed by the subtask model, determine the target shared feature information that matches the subtask to be executed by the subtask model in the multiple categories of shared feature information.

步骤b、以所述第二输入方式，将所述多种类别的共享特征信息、所述训练文本信息以及所述目标共享特征信息同步输入至该子任务模型中。Step b: using the second input method, synchronously inputting the multiple categories of shared feature information, the training text information and the target shared feature information into the subtask model.

本申请实施例提供的上述任务处理模型的模型训练方法，The model training method of the above-mentioned task processing model provided in the embodiment of the present application,

先获取训练语料，并将训练语料输入至共享特征提取模型中，通过共享特征提取模型提取训练语料中多种类别的共享特征信息；再按照预设的输入方式，将多种类别的共享特征信息和基于训练语料标注的训练文本信息输入至多个子任务模型中，并行对多个子任务模型进行训练，以使多个子任务模型的整体损失函数满足训练截止条件；在多个子任务模型独立训练的过程中，获取每一子任务模型的任务训练损失，根据每一子任务模型的任务训练损失的梯度变化情况，对该子任务模型的权重系数进行调整，以使多个子任务模型的训练率位于同一数值范围区间内，直至多个子任务模型的整体损失函数满足所述训练截止条件，将训练好的多个子任务模型作为训练好的任务处理模型。First, a training corpus is obtained, and the training corpus is input into a shared feature extraction model, and multiple categories of shared feature information in the training corpus are extracted through the shared feature extraction model; then, the multiple categories of shared feature information and training text information based on the annotations of the training corpus are input into multiple subtask models according to a preset input method, and the multiple subtask models are trained in parallel so that the overall loss function of the multiple subtask models meets the training cutoff condition; in the process of independent training of multiple subtask models, the task training loss of each subtask model is obtained, and the weight coefficient of the subtask model is adjusted according to the gradient change of the task training loss of each subtask model so that the training rates of the multiple subtask models are within the same numerical range, until the overall loss function of the multiple subtask models meets the training cutoff condition, and the trained multiple subtask models are used as trained task processing models.

基于同一发明构思，本申请实施例中还提供了与上述实施例中任务处理模型的模型训练方法对应的模型训练装置，由于本申请实施例中的模型训练装置解决问题的原理与本申请上述实施例中的模型训练方法相似，因此，模型训练装置的实施可以参见前述模型训练方法的实施，重复之处不再赘述。Based on the same inventive concept, the embodiment of the present application also provides a model training device corresponding to the model training method of the task processing model in the above embodiment. Since the principle of solving the problem by the model training device in the embodiment of the present application is similar to the model training method in the above embodiment of the present application, the implementation of the model training device can refer to the implementation of the aforementioned model training method, and the repeated parts will not be repeated.

参照图6所示，图6示出了本申请实施例提供的一种任务处理模型的模型训练装置的结构示意图；其中，所述模型训练装置应用于多任务学习模型框架，所述多任务学习模型框架包括任务处理模型和预先训练好的共享特征提取模型，所述任务处理模型包括多个子任务模型；所述模型训练装置包括：Referring to FIG. 6 , FIG. 6 shows a schematic diagram of the structure of a model training device for a task processing model provided in an embodiment of the present application; wherein the model training device is applied to a multi-task learning model framework, the multi-task learning model framework includes a task processing model and a pre-trained shared feature extraction model, the task processing model includes a plurality of subtask models; the model training device includes:

提取模块601，用于获取训练语料，并将所述训练语料输入至所述共享特征提取模型中，通过所述共享特征提取模型提取所述训练语料中多种类别的共享特征信息；An extraction module 601 is used to obtain training corpus, and input the training corpus into the shared feature extraction model, and extract multiple categories of shared feature information in the training corpus through the shared feature extraction model;

输入模块602，用于按照预设的输入方式，将所述多种类别的共享特征信息和基于所述训练语料标注的训练文本信息输入至所述多个子任务模型中，并行对所述多个子任务模型进行训练，以使所述多个子任务模型的整体损失函数满足训练截止条件；An input module 602 is used to input the multiple categories of shared feature information and the training text information annotated based on the training corpus into the multiple subtask models in accordance with a preset input method, and train the multiple subtask models in parallel so that the overall loss function of the multiple subtask models meets the training cutoff condition;

训练模块603，用于在所述多个子任务模型独立训练的过程中，获取每一所述子任务模型的任务训练损失，根据每一所述子任务模型的任务训练损失的梯度变化情况，对该子任务模型的权重系数进行调整，以使所述多个子任务模型的训练率位于同一数值范围区间内，直至所述多个子任务模型的整体损失函数满足所述训练截止条件，将训练好的多个子任务模型作为训练好的任务处理模型。The training module 603 is used to obtain the task training loss of each subtask model during the independent training of the multiple subtask models, and adjust the weight coefficient of the subtask model according to the gradient change of the task training loss of each subtask model, so that the training rates of the multiple subtask models are within the same numerical range, until the overall loss function of the multiple subtask models meets the training cutoff condition, and the trained multiple subtask models are used as trained task processing models.

在一种可选的实施方式中，提取模块601用于通过如下方法确定所述共享特征提取模型待提取的共享特征信息的多种类别：In an optional implementation, the extraction module 601 is used to determine multiple categories of shared feature information to be extracted by the shared feature extraction model by the following method:

在一种可选的实施方式中，在所述按照预设的输入方式，将所述多种类别的共享特征信息和基于所述训练语料标注的训练文本信息输入至所述多个子任务模型中时，输入模块602具体用于：In an optional implementation, when the multiple categories of shared feature information and the training text information annotated based on the training corpus are input into the multiple subtask models according to the preset input method, the input module 602 is specifically used to:

在一种可选的实施方式中，在所述按照预设的输入方式，将所述多种类别的共享特征信息和基于所述训练语料标注的训练文本信息输入至所述多个子任务模型中时，输入模块602还用于：In an optional implementation, when the multiple categories of shared feature information and the training text information annotated based on the training corpus are input into the multiple subtask models according to the preset input method, the input module 602 is further used to:

在一种可选的实施方式中，在所述以首层输入的第二输入方式，将所述多种类别的共享特征信息和所述训练文本信息同步输入至每一所述子任务模型中时，输入模块602具体用于：In an optional implementation, when the multiple categories of shared feature information and the training text information are synchronously input into each of the subtask models in the second input mode of the first-level input, the input module 602 is specifically used to:

或者，or,

在一种可选的实施方式中，所述多个子任务模型的整体损失函数是根据每一所述子任务模型的任务训练损失的梯度以及所述多任务学习模型框架中该子任务模型的权重系数的乘积确定的；在所述根据每一所述子任务模型的任务训练损失的梯度变化情况，对该子任务模型的权重系数进行调整时，训练模块603具体用于：In an optional implementation, the overall loss function of the multiple subtask models is determined according to the product of the gradient of the task training loss of each subtask model and the weight coefficient of the subtask model in the multi-task learning model framework; when adjusting the weight coefficient of the subtask model according to the gradient change of the task training loss of each subtask model, the training module 603 is specifically used to:

如图7所示，本申请实施例提供了一种计算机设备700，用于执行本申请中任务处理模型的模型训练方法，该设备包括存储器701、处理器702及存储在该存储器701上并可在该处理器702上运行的计算机程序，其中，上述处理器702执行上述计算机程序时实现上述任务处理模型的模型训练方法的步骤。As shown in Figure 7, an embodiment of the present application provides a computer device 700 for executing the model training method of the task processing model in the present application. The device includes a memory 701, a processor 702, and a computer program stored in the memory 701 and executable on the processor 702, wherein the processor 702 implements the steps of the model training method of the task processing model when executing the computer program.

具体地，上述存储器701和处理器702可以为通用的存储器和处理器，这里不做具体限定，当处理器702运行存储器701存储的计算机程序时，能够执行上述任务处理模型的模型训练方法。Specifically, the above-mentioned memory 701 and processor 702 can be general-purpose memory and processor, which are not specifically limited here. When the processor 702 runs the computer program stored in the memory 701, it can execute the model training method of the above-mentioned task processing model.

对应于本申请中任务处理模型的模型训练方法，本申请实施例还提供了一种计算机可读存储介质，该计算机可读存储介质上存储有计算机程序，该计算机程序被处理器运行时执行上述任务处理模型的模型训练方法的步骤。Corresponding to the model training method of the task processing model in the present application, an embodiment of the present application also provides a computer-readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, the steps of the model training method of the above-mentioned task processing model are executed.

具体地，该存储介质能够为通用的存储介质，如移动磁盘、硬盘等，该存储介质上的计算机程序被运行时，能够执行上述任务处理模型的模型训练方法。Specifically, the storage medium can be a general storage medium, such as a mobile disk, a hard disk, etc. When the computer program on the storage medium is run, it can execute the model training method of the above-mentioned task processing model.

在本申请所提供的实施例中，应该理解到，所揭露系统和方法，可以通过其它的方式实现。以上所描述的系统实施例仅仅是示意性的，例如，所述单元的划分，仅仅为一种逻辑功能划分，实际实现时可以有另外的划分方式，又例如，多个单元或组件可以结合或者可以集成到另一个系统，或一些特征可以忽略，或不执行。另一点，所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些通信接口，系统或单元的间接耦合或通信连接，可以是电性，机械或其它的形式。In the embodiments provided in the present application, it should be understood that the disclosed systems and methods can be implemented in other ways. The system embodiments described above are merely schematic. For example, the division of the units is only a logical function division. There may be other division methods in actual implementation. For example, multiple units or components can be combined or integrated into another system, or some features can be ignored or not executed. Another point is that the mutual coupling or direct coupling or communication connection shown or discussed can be through some communication interfaces, indirect coupling or communication connection of systems or units, which can be electrical, mechanical or other forms.

所述作为分离部件说明的单元可以是或者也可以不是物理上分开的，作为单元显示的部件可以是或者也可以不是物理单元，即可以位于一个地方，或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in one place or distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

另外，在本申请提供的实施例中的各功能单元可以集成在一个处理单元中，也可以是各个单元单独物理存在，也可以两个或两个以上单元集成在一个单元中。In addition, each functional unit in the embodiments provided in the present application may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit.

所述功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时，可以存储在一个计算机可读取存储介质中。基于这样的理解，本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来，该计算机软件产品存储在一个存储介质中，包括若干指令用以使得一台计算机设备(可以是个人计算机，服务器，或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括：U盘、移动硬盘、只读存储器(Read-Only Memory，ROM)、随机存取存储器(Random Access Memory，RAM)、磁碟或者光盘等各种可以存储程序代码的介质。If the functions are implemented in the form of software functional units and sold or used as independent products, they can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present application can be essentially or partly embodied in the form of a software product that contributes to the prior art. The computer software product is stored in a storage medium and includes several instructions for a computer device (which can be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the methods described in each embodiment of the present application. The aforementioned storage medium includes: various media that can store program codes, such as a USB flash drive, a mobile hard disk, a read-only memory (ROM), a random access memory (RAM), a magnetic disk or an optical disk.

应注意到：相似的标号和字母在下面的附图中表示类似项，因此，一旦某一项在一个附图中被定义，则在随后的附图中不需要对其进行进一步定义和解释，此外，术语“第一”、“第二”、“第三”等仅用于区分描述，而不能理解为指示或暗示相对重要性。It should be noted that similar numbers and letters represent similar items in the following figures. Therefore, once an item is defined in one figure, it does not need to be further defined and explained in subsequent figures. In addition, the terms "first", "second", "third", etc. are only used to distinguish the description and are not to be understood as indicating or implying relative importance.

最后应说明的是：以上所述实施例，仅为本申请的具体实施方式，用以说明本申请的技术方案，而非对其限制，本申请的保护范围并不局限于此，尽管参照前述实施例对本申请进行了详细的说明，本领域的普通技术人员应当理解：任何熟悉本技术领域的技术人员在本申请揭露的技术范围内，其依然可以对前述实施例所记载的技术方案进行修改或可轻易想到变化，或者对其中部分技术特征进行等同替换；而这些修改、变化或者替换，并不使相应技术方案的本质脱离本申请实施例技术方案的精神和范围。都应涵盖在本申请的保护范围之内。因此，本申请的保护范围应以所述权利要求的保护范围为准。Finally, it should be noted that the above-described embodiments are only specific implementation methods of the present application, which are used to illustrate the technical solutions of the present application, rather than to limit them. The protection scope of the present application is not limited thereto. Although the present application is described in detail with reference to the above-mentioned embodiments, ordinary technicians in the field should understand that any technician familiar with the technical field can still modify the technical solutions recorded in the above-mentioned embodiments within the technical scope disclosed in the present application, or can easily think of changes, or make equivalent replacements for some of the technical features therein; and these modifications, changes or replacements do not make the essence of the corresponding technical solutions deviate from the spirit and scope of the technical solutions of the embodiments of the present application. They should all be included in the protection scope of the present application. Therefore, the protection scope of the present application shall be based on the protection scope of the claims.

Claims

1. The model training method of the task processing model is characterized by being applied to a multi-task learning model framework, wherein the multi-task learning model framework comprises a task processing model and a pre-trained shared feature extraction model, and the task processing model comprises a plurality of subtask models; the model training method comprises the following steps:

Acquiring a training corpus, inputting the training corpus into the shared feature extraction model, and extracting shared feature information of various categories in the training corpus through the shared feature extraction model;

according to a preset input mode, the shared characteristic information of the multiple categories and training text information marked based on the training corpus are input into the multiple subtask models, and the multiple subtask models are trained in parallel, so that the overall loss function of the multiple subtask models meets the training cut-off condition;

Acquiring task training loss of each subtask model in the independent training process of the subtask models, adjusting the weight coefficient of the subtask model of the multitask learning model frame according to the gradient change condition of the task training loss of each subtask model so that the training rate of the subtask models is in the same numerical range interval until the integral loss function of the subtask models meets the training cut-off condition, and taking the trained subtask models as trained task processing models;

the step of inputting the shared feature information of the plurality of categories and the training text information based on the training corpus label into the plurality of subtask models according to a preset input mode comprises the following steps:

inputting the training text information into each subtask model at a first-layer model input node of each subtask model;

Inputting the shared characteristic information of the multiple categories to different training nodes in each subtask model in a layering manner according to the corresponding relation between the information categories and the training nodes in a layering input first input mode; wherein different training nodes in each subtask model are ordered according to the shallow-to-deep hierarchy of the neural network in the subtask model.

2. The model training method of claim 1, wherein the plurality of categories of shared feature information comprises: the training corpus is segmented into word feature vectors of word sequences; word feature vectors representing the syntactic dependency between words and words in a training corpus; sentence feature vectors in the corpus are trained.

3. The model training method according to claim 1, wherein the plurality of categories of the shared feature information to be extracted by the shared feature extraction model are determined by:

According to target task dependency relationships among a plurality of subtasks to be executed by the subtask models, determining a plurality of information categories corresponding to the target task dependency relationships from a preset task dependency relationship table as a plurality of categories of the shared characteristic information to be extracted; the task dependency relationship table stores a plurality of information categories corresponding to a plurality of task dependency relationships in advance.

4. The model training method according to claim 1, wherein the plurality of categories of the shared feature information to be extracted by the shared feature extraction model are determined by:

According to a target task to be executed by the task processing model, taking a plurality of subtask models included in the task processing model as a first search space, taking the target task as a first search strategy, and searching a neural network structure for a subtask model combination mode among different subtask models in the first search space to obtain an optimal subtask model combination mode conforming to the first search strategy;

taking each subtask model included in the optimal subtask model combination mode as a first subtask model;

and determining a plurality of information categories corresponding to the first task dependency relationship from a preset task dependency relationship table as a plurality of categories of the shared characteristic information to be extracted according to the first task dependency relationship among the subtasks to be executed of each first subtask model.

5. The model training method according to claim 1, wherein the plurality of categories of the shared feature information to be extracted by the shared feature extraction model are determined by:

Acquiring various text characteristic information related to completing the target task according to the target task to be executed by the task processing model;

Taking the plurality of text characteristic information as a second search space, taking the target task which can be completed based on the information combination of different text characteristic information by the plurality of subtask models as a second search strategy, and searching a neural network structure for the information combination mode among the different text characteristic information in the second search space to obtain an optimal information combination mode conforming to the second search strategy;

And taking the information category of each text characteristic information included in the optimal information combination mode as a plurality of categories of the shared characteristic information to be extracted.

6. The model training method according to claim 1, wherein the inputting the shared feature information of the plurality of categories and the training text information based on the training corpus label into the plurality of subtask models according to a preset input manner further comprises:

And synchronously inputting the shared characteristic information of the multiple categories and the training text information into each subtask model at a first-layer model input node of each subtask model in a first-layer input second input mode.

7. The method according to claim 6, wherein the step of synchronously inputting the shared feature information and the training text information of the plurality of categories into each of the subtask models in the second input manner of the first input includes:

When the task types of the subtasks to be executed by different subtask models are not distinguished, synchronously inputting the shared characteristic information and the training text information of the multiple categories into each subtask model in the second input mode;

Or alternatively

When the task types of the subtasks to be executed by different subtask models are distinguished, determining target sharing characteristic information matched with the subtasks to be executed by the subtask models in the sharing characteristic information of various types according to the subtasks to be executed by the subtask models for each subtask model;

And synchronously inputting the sharing characteristic information of the multiple categories, the training text information and the target sharing characteristic information into the subtask model in the second input mode.

8. The model training method of claim 1, wherein the overall loss function of the plurality of subtask models is determined from a product of a gradient of task training loss of each of the subtask models and a weight coefficient of the subtask model in the multitask learning model framework; the adjusting the weight coefficient of each subtask model according to the gradient change condition of the task training loss of the subtask model comprises the following steps:

Aiming at each subtask model, taking the gradient of the task training loss of the subtask model as a target gradient, and acquiring the periodic variation amplitude of the target gradient in a gradient detection period;

When the periodic variation amplitude of the target gradient is detected to be larger than or equal to the reference gradient variation, the adjustment coefficient is reduced according to the gradient, and the weight coefficient of the subtask model is dynamically adjusted in a descending mode;

When the periodic variation amplitude of the target gradient is detected to be smaller than the reference gradient variation amount, the weight coefficient of the subtask model is dynamically adjusted in a rising mode according to the gradient rising adjustment coefficient.

9. The model training method according to claim 1, wherein each subtask model in the task processing model is used for executing a corresponding subtask, and different subtask models are mutually matched to be capable of processing a target task to be executed by the task processing model; when the target task is related to identifying the semantic emotion expressed by the text information, the task processing model at least comprises a named entity identification model and an emotion classification model; the named entity recognition model is used for executing a text recognition task aiming at the named entity included in the training text information; and the emotion classification model is used for executing emotion classification tasks of sentence emotion represented by each sentence in the training text information.

10. The model training device of the task processing model is characterized by being applied to a multi-task learning model framework, wherein the multi-task learning model framework comprises a task processing model and a pre-trained shared feature extraction model, and the task processing model comprises a plurality of subtask models; the model training apparatus includes:

the extraction module is used for acquiring a training corpus, inputting the training corpus into the shared feature extraction model, and extracting shared feature information of various categories in the training corpus through the shared feature extraction model;

The input module is used for inputting the shared characteristic information of the multiple categories and the training text information marked based on the training corpus into the multiple subtask models according to a preset input mode, and training the multiple subtask models in parallel so that the overall loss function of the multiple subtask models meets the training cut-off condition;

The training module is used for acquiring task training loss of each subtask model in the independent training process of the subtask models, adjusting the weight coefficient of each subtask model according to the gradient change condition of the task training loss of each subtask model so that the training rate of the subtask models is in the same numerical range interval until the integral loss function of the subtask models meets the training cut-off condition, and taking the trained subtask models as trained task processing models;

When the shared feature information of the multiple categories and the training text information based on the training corpus labels are input into the multiple subtask models according to a preset input mode, the input module is used for:

11. An electronic device, comprising: a processor, a memory and a bus, said memory storing machine readable instructions executable by said processor, said processor and said memory communicating over the bus when the electronic device is running, said machine readable instructions when executed by said processor performing the steps of the model training method according to any of claims 1 to 9.

12. A computer readable storage medium, characterized in that the computer readable storage medium has stored thereon a computer program which, when executed by a processor, performs the steps of the model training method according to any of claims 1 to 9.