CN116052649A

CN116052649A - Loss weight self-adaptive element learning method in low-resource speech recognition

Info

Publication number: CN116052649A
Application number: CN202310031464.8A
Authority: CN
Inventors: 洪青阳; 王秋林; 李琳
Original assignee: Xiamen University
Current assignee: Xiamen University
Priority date: 2023-01-10
Filing date: 2023-01-10
Publication date: 2023-05-02
Anticipated expiration: 2043-01-10
Also published as: CN116052649B

Abstract

A loss weight adaptive meta-learning method in low-resource speech recognition, related to the field of speech recognition. Aiming at the problems existing in the prior art that the MAML algorithm is unstable and the training loss weight is difficult to adjust accurately, a loss weight adaptive meta-learning method in low-resource speech recognition is provided. The method of fine-tuning weights through homoscedastic uncertainty solves the instability of MAML algorithm to a certain extent, and applies it to speech recognition. On the basis of it, VGG‑CNN network and Adapter module are introduced to improve the overall recognition performance. Compared with other meta-learning methods, it has smaller fluctuations and is more stable. It does not need to manually adjust the loss weight or spend a lot of money to accurately adjust the weight. It can automatically and efficiently adjust the training loss weight to an appropriate value. It can be used on any model, with high flexibility, and the added VGG-CNN module and Adapter module increase the calculation cost at low cost.

Description

A loss weight adaptive meta-learning approach for low-resource speech recognition

技术领域Technical Field

本发明涉及语音识别领域，尤其是涉及低资源语音识别中的一种损失权重自适应元学习方法。The present invention relates to the field of speech recognition, and in particular to a loss weight adaptive meta-learning method in low-resource speech recognition.

背景技术Background Art

随着机器学习的蓬勃发展，端到端模型在语音识别领域取得十分优良的效果，但这些良好的性能往往依赖于大量的有标注语音数据。然而，在大多数的场景中，很多语种并不具备丰富且已正确标注的数据量，高昂的标注成本以及稀缺的数据量进一步地限制了模型的效果以及泛化性能。尽管数据扩增(如速度扰动、加噪以及语音合成等)和一些特殊的训练手段(如迁移学习、多语种联合学习和对抗性学习等)的应用可以在一定程度缓解上述问题，但这些方法仍然存在着过拟合、效果较弱等问题。With the vigorous development of machine learning, end-to-end models have achieved excellent results in the field of speech recognition, but these good performances often rely on a large amount of labeled speech data. However, in most scenarios, many languages do not have abundant and correctly labeled data. The high cost of annotation and the scarce amount of data further limit the effect and generalization performance of the model. Although the application of data augmentation (such as speed perturbation, noise addition, and speech synthesis) and some special training methods (such as transfer learning, multilingual joint learning, and adversarial learning) can alleviate the above problems to a certain extent, these methods still have problems such as overfitting and weak results.

为了解决上述问题，元学习被引入到低资源情况下的语音识别当中。元学习又被叫做学会学习，其可以在有效的吸收其他任务的先验知识，并高效地运用这些知识学习新的任务。与模型无关的元学习算法MAML(Model Agnostic Meta-Learning)是其中最灵活有效的且基于梯度下降的元学习算法，不引入多余参数(Chelsea Finn,Pieter Abbeel,andSergey Levine,Model agnostic meta-learning for fast adaptation of deepnetworks,in International conference on machine learning.PMLR,2017,pp.1126–1135)。它通过对于不同任务的训练以获取模型的最佳初始化参数，在微调阶段，只需要很少的样本数据，模型就可以学习到新的任务。In order to solve the above problems, meta-learning was introduced into speech recognition under low-resource conditions. Meta-learning is also called learning to learn, which can effectively absorb the prior knowledge of other tasks and efficiently use this knowledge to learn new tasks. The model-independent meta-learning algorithm MAML (Model Agnostic Meta-Learning) is the most flexible and effective meta-learning algorithm based on gradient descent, which does not introduce redundant parameters (Chelsea Finn, Pieter Abbeel, and Sergey Levine, Model agnostic meta-learning for fast adaptation of deepnetworks, in International conference on machine learning. PMLR, 2017, pp. 1126–1135). It obtains the best initialization parameters of the model by training different tasks. In the fine-tuning stage, only a small amount of sample data is needed for the model to learn new tasks.

尽管MAML算法的性能比其它低资源方法更加出色，但由于MAML算法在训练过程中双层的损失反向传播更新方式，其不同任务在训练阶段对应的损失是不稳定的，这也进一步地导致模型的整体识别性能出现较大的波动。此外，不同训练任务对所需识别的目标任务的影响是不同的，且由相应的损失权重控制。适当的损失权重可以进一步提高目标模型的性能，相反，则会损失模型的性能。然而，通过设置超参数的形式寻找最佳权重的成本十分高昂，并且依赖于复杂的人工调整。Although the performance of the MAML algorithm is better than other low-resource methods, due to the double-layer loss back-propagation update method of the MAML algorithm during the training process, the corresponding losses of different tasks in the training stage are unstable, which further leads to large fluctuations in the overall recognition performance of the model. In addition, different training tasks have different effects on the target task to be recognized, and are controlled by the corresponding loss weights. Appropriate loss weights can further improve the performance of the target model, otherwise, the performance of the model will be lost. However, the cost of finding the best weights by setting hyperparameters is very high and relies on complex manual adjustments.

基于这一点，利用同方差不确定性来调整损失的权重的方法被使用到MAML算法中(A.Boiarov,K.Khabarlak,and I.Yastrebov,Multi-task meta learning modificationwith stochastic approximation,2021)。可以捕获不同任务之间相对置信度的同方差不确定性可以作为衡量多任务学习问题中损失权重的基础。借用这一点，在MAML算法中，将不同的语言当作不同的任务，以同方差不确定为基础调整不同任务之间的权重来稳定损失，从而提升模型整体的性能。以此为基础改进的MAML算法已经被运用到图像识别中(WenfengShen Lin Ding,Peng Liu and Shengbo Chen,Gradient-based meta-learning usinguncertainty to weigh loss for few-shot learning,arXiv:2208.0813,2022)，其以单个的超参数的形式评估相应的权重。虽然超参数会随着模型训练而更新，但单独的参数并不能很好的学习到任务中的有效信息，并且无法进一步地利用。Based on this, the method of using homoscedastic uncertainty to adjust the weight of loss is used in the MAML algorithm (A.Boiarov, K.Khabarlak, and I.Yastrebov, Multi-task meta learning modification with stochastic approximation, 2021). Homoscedastic uncertainty, which can capture the relative confidence between different tasks, can be used as the basis for measuring the loss weight in multi-task learning problems. Borrowing this point, in the MAML algorithm, different languages are treated as different tasks, and the weights between different tasks are adjusted based on homoscedastic uncertainty to stabilize the loss, thereby improving the overall performance of the model. The improved MAML algorithm based on this has been applied to image recognition (WenfengShen Lin Ding, Peng Liu and Shengbo Chen, Gradient-based meta-learning using uncertainty to weigh loss for few-shot learning, arXiv:2208.0813, 2022), which evaluates the corresponding weights in the form of a single hyperparameter. Although the hyperparameters are updated as the model is trained, the individual parameters cannot learn the effective information in the task well and cannot be further utilized.

如何更加有效地在语音识别中使用同方差不确定性来调整MAML算法中不同任务之间的损失权重，如何使其能够学习到更加有效的特征泛化知识来辅助模型的训练以及提升整体模型对于目标语言的识别性能是非常值得研究的问题。How to more effectively use homoscedastic uncertainty in speech recognition to adjust the loss weights between different tasks in the MAML algorithm, and how to enable it to learn more effective feature generalization knowledge to assist model training and improve the overall model's recognition performance for the target language are issues that are worth studying.

中国专利113178190A中通过匹配网络、孪生网络等元学习进行语音识别，但这些方法与模型相关性较大，并没有本发明运用的MAML算法灵活度高。在此基础上，虽然专利112560904A有将MAML算法运用到机器学习中，但其没有对损失权重进行调整，MAML算法不稳定的问题依旧存在。Chinese Patent 113178190A uses meta-learning such as matching networks and twin networks for speech recognition, but these methods are highly correlated with the model and are not as flexible as the MAML algorithm used in the present invention. On this basis, although Patent 112560904A applies the MAML algorithm to machine learning, it does not adjust the loss weight, and the instability of the MAML algorithm still exists.

发明内容Summary of the invention

本发明的目的在于针对现有技术存在的MAML算法不稳定、训练损失权重难以精确调整等技术问题，提供低资源语音识别中的一种损失权重自适应元学习方法。通过同方差不确定性微调权重的方法在一定程度上解决MAML算法不稳定等问题，并将其运用到语音识别中，同时在其基础上进一步地引入VGG-CNN网络以及Adapter模块，提升整体的识别性能。The purpose of the present invention is to provide a loss weight adaptive meta-learning method in low-resource speech recognition in view of the technical problems of the instability of the MAML algorithm and the difficulty in accurately adjusting the training loss weight in the prior art. The method of fine-tuning the weights by homoscedastic uncertainty solves the problems of the instability of the MAML algorithm to a certain extent, and applies it to speech recognition. At the same time, the VGG-CNN network and the Adapter module are further introduced on this basis to improve the overall recognition performance.

本发明所述低资源语音识别中的一种损失权重自适应元学习方法，包括以下步骤：A loss weight adaptive meta-learning method for low-resource speech recognition according to the present invention comprises the following steps:

1)将训练用到的多个语种按照其语种进行分类，划分训练集和验证集；1) Classify the multiple languages used in training according to their languages and divide them into training sets and validation sets;

2)将每个语种类别的训练集和验证集的语音数据分成支撑集和查询集两个部分；2) Divide the speech data of the training set and validation set of each language category into two parts: the support set and the query set;

3)将所有语音数据的文本进行统计，制作训练用字典；3) Collect statistics on the text of all speech data and create a training dictionary;

4)编写代码构建由Transformer模型以及VGG-CNN网络和Adapter模块组成的语音识别模型；4) Write code to build a speech recognition model consisting of a Transformer model, a VGG-CNN network, and an Adapter module;

5)进行模型预训练阶段；5) Carry out model pre-training stage;

5.1)在每个训练迭代中，对于每个语种的训练集语音数据，首先从训练集的支撑集中抽取k个数据，经由VGG-CNN网络提取特征后，将特征送入Transformer模型进行训练，更新当前迭代轮次的模型参数；5.1) In each training iteration, for each language’s training set speech data, first extract k data from the training set’s support set, extract features through the VGG-CNN network, and then feed the features into the Transformer model for training, and update the model parameters of the current iteration round;

5.2)再从对应语种的查询集中抽取k个数据，经由VGG-CNN网络提取特征后，将特征送入Transformer模型测试模型的识别性能获得对应的查询集损失，同时将特征送入Adapter模块，经过取均值和对数的操作后，获得调整损失权重所需要的方差；5.2) Then extract k data from the query set of the corresponding language, extract features through the VGG-CNN network, and send the features to the Transformer model to test the recognition performance of the model to obtain the corresponding query set loss. At the same time, send the features to the Adapter module, and after taking the mean and logarithm operations, obtain the variance required to adjust the loss weight;

5.3)利用获取的方差通过公式调整对应语种在当前轮次的查询集损失；5.3) Use the obtained variance to adjust the query set loss of the corresponding language in the current round through the formula;

5.4)完成上述对于单个语种的训练操作后，将模型的参数初始化为当前训练轮次开始时的数值；5.4) After completing the above training operation for a single language, initialize the model parameters to the values at the beginning of the current training round;

6)再对其他语种的训练集和验证集重复步骤5.1)至5.4)，进而获取训练用所有语种在当前迭代轮次中的调整后查询集损失，并将其进行求和取均值；6) Repeat steps 5.1) to 5.4) for the training and validation sets of other languages to obtain the adjusted query set losses of all languages used for training in the current iteration, and sum and average them;

7)利用步骤6)中操作后的查询集损失均值对模型当前迭代轮次开始时的模型所有参数进行梯度下降更新；7) Using the mean loss of the query set after the operation in step 6), perform gradient descent update on all model parameters at the beginning of the current iteration of the model;

8)更新完模型参数后，模型当前的参数将作为下一个迭代轮次的模型初始化参数，并重复步骤5)至7)；8) After updating the model parameters, the current parameters of the model will be used as the model initialization parameters for the next iteration, and steps 5) to 7) will be repeated;

9)每训练一定Epoch后，利用验证集的语音数据，对当前训练后的模型进行测试，保存测试后最优性能时的模型参数，当某次保存的模型参数为多次测试后的相同最优值，则结束上述的训练过程；9) After each training epoch, the currently trained model is tested using the speech data of the validation set, and the model parameters with the best performance after the test are saved. When the model parameters saved at a certain time are the same optimal values after multiple tests, the above training process is terminated;

10)准备需要进行识别的语种的训练集、验证集和测试集；10) Prepare training sets, validation sets, and test sets for the languages to be recognized;

11)利用需识别语种的训练集对步骤9)中保存的模型进行训练；11) Using the training set of the language to be recognized, train the model saved in step 9);

11.1)训练中，每个迭代轮次，从训练集中抽取一个Batch数的语音数据，经由VGG-CNN网络提取特征后，将特征送入Transformer模型测试模型的识别性能获得对应的损失，同时将特征送入Adapter模块，经过取均值和对数的操作后，获得调整损失权重所需要的方差；11.1) During training, in each iteration, a batch of speech data is extracted from the training set. After extracting features through the VGG-CNN network, the features are sent to the Transformer model to test the recognition performance of the model and obtain the corresponding loss. At the same time, the features are sent to the Adapter module. After taking the mean and logarithm operations, the variance required to adjust the loss weight is obtained.

11.2)利用获取的方差通过公式调整所需识别语种在当前轮次的训练损失；11.2) Use the obtained variance to adjust the training loss of the required recognition language in the current round through the formula;

11.3)利用步骤11.2)中获得的调正后损失更新模型参数；11.3) Update the model parameters using the adjusted loss obtained in step 11.2);

12)每训练一个Epoch后，利用目标语种验证集的语音数据，对当前训练后的模型进行测试，保存性能最优模型；12) After each epoch of training, use the speech data of the target language verification set to test the currently trained model and save the model with the best performance;

13)反复训练多个Epoch后结束上述针对目标语种的训练过程，即可获得具有优良识别性能的语音识别模型。13) After repeatedly training multiple epochs, the above training process for the target language is terminated, and a speech recognition model with excellent recognition performance can be obtained.

与现有技术相比，本发明具有以下技术效果：Compared with the prior art, the present invention has the following technical effects:

本发明能有效的提升低资源情况下对于目标语种的语音识别性能。同时经过验证，本发明的算法性能相较于其他元学习方法具有更小的波动，更加稳定，无需手动调整损失权重或者花费大代价地精确调整权重，本发明能够自动且高效地调整训练损失权重到适当的数值。同时本发明并没有对模型做出改动，可以使用在任何模型上，灵活程度高，且添加的VGG-CNN模块以及Adapter模块增加计算成本几乎可以忽略不计。The present invention can effectively improve the speech recognition performance of the target language under low-resource conditions. At the same time, it has been verified that the algorithm performance of the present invention has smaller fluctuations and is more stable than other meta-learning methods. There is no need to manually adjust the loss weight or accurately adjust the weight at a high cost. The present invention can automatically and efficiently adjust the training loss weight to an appropriate value. At the same time, the present invention does not make any changes to the model and can be used on any model. It is highly flexible, and the added VGG-CNN module and Adapter module increase the computational cost to be almost negligible.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

图1为Adapter模块的结构示意图。FIG1 is a schematic diagram of the structure of the Adapter module.

图2为VGG-CNN模块的结构示意图。Figure 2 is a schematic diagram of the structure of the VGG-CNN module.

图3为本发明的训练流程示意图。FIG. 3 is a schematic diagram of the training process of the present invention.

具体实施方式DETAILED DESCRIPTION

为了使本发明的目的、技术方案及优点更加清楚明白，以下实施例将结合附图对本发明进行作进一步的说明。应当理解，此处所描述的具体实施例仅仅用于解释本发明，并不用于限定本发明。In order to make the purpose, technical solution and advantages of the present invention more clearly understood, the following embodiments will further illustrate the present invention in conjunction with the accompanying drawings. It should be understood that the specific embodiments described herein are only used to explain the present invention and are not used to limit the present invention.

与模型无关的元学习算法MAML凭借其不引入多余参数的优点，可以适用于所有依赖于梯度下降的模型学习上。MAML算法整体流程类似于预训练-微调过程，基于对预训练数据的学习，从而在微调阶段能够更加有效地学习目标数据。但与普通的模型训练任务不同的是，除了用于训练的源语音数据D_source以及用于后续微调的目标语音数据D_{tar get}之外，MAML算法将不同语音数据根据其对应的语种划分成不同的任务，同时每个任务i进一步地划分为用于训练过程中模拟模型训练的支持集

和用于评估此次训练性能的验证集

The model-independent meta-learning algorithm MAML can be applied to all model learning that relies on gradient descent because it does not introduce unnecessary parameters. The overall process of the MAML algorithm is similar to the pre-training-fine-tuning process. It is based on the learning of pre-training data, so that the target data can be learned more effectively in the fine-tuning stage. However, unlike ordinary model training tasks, in addition to the source speech data D _source used for training and the target speech data D _{tar get} used for subsequent fine-tuning, the MAML algorithm divides different speech data into different tasks according to their corresponding languages, and each task i is further divided into a support set used to simulate model training during the training process.

and a validation set for evaluating the performance of this training

在训练阶段，MAML算法分为元训练和元测试两个阶段。在元训练阶段中，对于任务i，算法会从其支持集

中随机选取k个数据对当前模型进行梯度更新，用θ表示模型的参数，则元训练过程如公式1所示：In the training phase, the MAML algorithm is divided into two phases: meta-training and meta-testing. In the meta-training phase, for task i, the algorithm will select

Randomly select k data to update the gradient of the current model, and use θ to represent the parameters of the model. The meta-training process is shown in Formula 1:

其中，θ′_i是训练对应任务i更新后模型的参数；

是相应的交叉熵损失；α则是对应的学习率。在相同的模型参数下，对所有的任务同步进行上述的训练。随后在元测试阶段，分别抽取对应

上k个数据评估训练后参数为θ′_i的模型性能，之后将模型的参数充值为当前轮次的初始值θ，再利用所有任务验证集的损失之和更新模型当前轮次的初始化参数，如公式2所示：Among them, θ′ _i is the parameter of the updated model for training corresponding task i;

is the corresponding cross entropy loss; α is the corresponding learning rate. Under the same model parameters, all tasks are trained synchronously. Then, in the meta-test phase, the corresponding

The performance of the model with parameters θ′ _i after training is evaluated on the last k data, and then the parameters of the model are recharged to the initial value θ of the current round, and then the sum of the losses of all task validation sets is used to update the initialization parameters of the model in the current round, as shown in Formula 2:

其中，β是对应的学习率，通过公式2更新后得到的模型参数将作为下一轮次的模型初始化参数重复上述的更新流程。则整体流程可总结为如下公式3：Among them, β is the corresponding learning rate. The model parameters obtained after updating through formula 2 will be used as the model initialization parameters for the next round to repeat the above update process. The overall process can be summarized as the following formula 3:

通过上述公式可以发现，在更新过程中，涉及到模型参数θ的二次求导，大大增加算法的计算复杂度，后续一阶导数近似方法的提出，如公式4所示，将外层对于θ′_i的求导替换为对θ的求导，在大大降低计算成本的基础上，其效果几乎没有影响，本发明中所使用的MAML算法就是在这个基础上进一步延伸的。It can be found from the above formula that in the updating process, the quadratic derivative of the model parameter θ is involved, which greatly increases the computational complexity of the algorithm. The subsequent first-order derivative approximation method is proposed, as shown in Formula 4, which replaces the outer layer's derivative of θ′ _i with the derivative of θ. On the basis of greatly reducing the computational cost, its effect is almost not affected. The MAML algorithm used in the present invention is further extended on this basis.

经过反复的上述过程的更新迭代后，可以获得非常优良的初始化参数。在此基础上，即使目标数据D_target是低资源的，模型经过少量的梯度更新就可以获得非常优良的性能。After repeated iterations of the above process, very good initialization parameters can be obtained. On this basis, even if the target data D _target is low-resource, the model can achieve very good performance after a small amount of gradient updates.

本发明在元测试阶段中，MAML算法将所有任务的评估损失相加，用来更新模型的初始化参数。然而，不同任务对于相同损失函数的敏感程度是不同的，简单的损失相加则会进一步扩大这种敏感性的负面影响。同时MAML算法独特的双层梯度更新方式以及单次更新使用数据量少的特点，经大量实验后被证实其并不具备较好的稳定性。由于上述两个原因，算法对模型整体参数的更新偶尔会产生较大的波动，甚至会在微调阶段损害模型的性能。In the meta-test phase of the present invention, the MAML algorithm adds the evaluation losses of all tasks to update the initialization parameters of the model. However, different tasks have different sensitivities to the same loss function, and simply adding the losses will further amplify the negative impact of this sensitivity. At the same time, the unique double-layer gradient update method of the MAML algorithm and the small amount of data used in a single update have been proven to be less stable after a large number of experiments. Due to the above two reasons, the algorithm's update of the overall model parameters occasionally produces large fluctuations, and may even damage the performance of the model during the fine-tuning phase.

本发明旨在通过利用方差不确定性来实现元测试阶段的损失权重自适应，使其自动调节相应的权重，进而达到最优的效果。同时通过这一自适应调节过程使得对应的自适应模块能够学习到更加有用的泛化深层表征，并将其运用到微调阶段辅助对于目标数据的训练，以使模型获得更加优秀的性能。The present invention aims to achieve loss weight adaptation in the meta-test phase by utilizing variance uncertainty, so that the corresponding weights can be automatically adjusted to achieve the best effect. At the same time, through this adaptive adjustment process, the corresponding adaptive module can learn more useful generalized deep representations, and apply them to the fine-tuning phase to assist in the training of target data, so that the model can obtain better performance.

同方差不确定性，也叫做任务相关不确定性，是对同一任务保持不变但在不同任务之间变化的量。同方差不确定性可以获取多任务学习中不同任务之间的相对置信度，反映不同任务的不确定性，其首先被运用到多任务学习当中作为衡量不同任务损失的权重，以强化损失的稳定性。由此，与多任务学习类似，元学习可以使用其作为评估不同任务损失权重的基础。Homoscedastic uncertainty, also known as task-related uncertainty, is a quantity that remains constant for the same task but varies between different tasks. Homoscedastic uncertainty can obtain the relative confidence between different tasks in multi-task learning, reflecting the uncertainty of different tasks. It was first used in multi-task learning as a weight to measure the loss of different tasks to enhance the stability of the loss. Therefore, similar to multi-task learning, meta-learning can use it as a basis for evaluating the loss weights of different tasks.

语音识别，一般可以被认为是基于高斯分布的回归问题：Speech recognition can generally be considered as a regression problem based on Gaussian distribution:

其中，

是基于模型参数θ以及模型输入

的模型输出，L表示输出向量的长度，T表示输入音频数据的时域维度，F则是音频的频域维度；

是对应输入的真实标签，σ²∈

是与同方差不确定性相关的方差标量，本发明基于该标量可以间接反映不同任务之间相对分布的特点，使用其来作为衡量不同任务之间损失权重的基础。基于极大似然估计以及交叉熵损失可以得到：in,

is based on the model parameters θ and the model input

The model output is L, which represents the length of the output vector, T represents the time domain dimension of the input audio data, and F represents the frequency domain dimension of the audio.

is the true label of the corresponding input, σ ² ∈

It is a variance scalar related to homoscedastic uncertainty. Based on the fact that this scalar can indirectly reflect the relative distribution between different tasks, this paper uses it as the basis for measuring the loss weights between different tasks. Based on maximum likelihood estimation and cross entropy loss, we can get:

MAML算法性能的波动主要集中在元测试阶段的验证集损失上，本发明利用同方差不确定性调整元测试阶段的损失权重。由此，基于上述的公式以及MAML算法的元测试阶段的公式，进一步地可以得到在元测试阶段N个任务对应验证集损失：The fluctuation of MAML algorithm performance is mainly concentrated on the validation set loss in the meta-test phase. The present invention uses homoscedastic uncertainty to adjust the loss weight in the meta-test phase. Therefore, based on the above formula and the formula of the meta-test phase of the MAML algorithm, the validation set loss corresponding to N tasks in the meta-test phase can be further obtained:

其中，θ′_i是对应任务i经过元训练阶段更新后得到的模型参数，

以及

是对应任务i在元测试阶段的验证集损失以及用于微调损失权重方差标量。模型输入

是由B个输入音频数据x组成的，

则是其对应的真实标签。Among them, θ′ _i is the model parameter obtained after the meta-training phase for the corresponding task i,

as well as

is the validation set loss of task i in the meta-test phase and the variance scalar used to fine-tune the loss weight. Model input

is composed of B input audio data x,

is the corresponding true label.

相比较普通MAML算法，本发明的

在原先简单加和的基础上通过基于同方差不确定性的σ_i进行微调，强化不同任务间的差异性，强化损失的稳定性，从而进一步地使模型在训练过程中学习到更加泛化的语音特征，提高模型的整体识别性能。Compared with the common MAML algorithm, the present invention

On the basis of the original simple addition, fine-tuning is performed based on _σi based on homoscedastic uncertainty to enhance the differences between different tasks and the stability of the loss, so that the model can further learn more generalized speech features during the training process and improve the overall recognition performance of the model.

在此基础上，本发明进一步地将随着训练更新，且作为微调参数以及正则化项的超参数σ_i进一步地替换为由共享的Adapter模块(Pfeiffer，J.，

I.，Gurevych，I.，&Ruder，S.(2020).MAD-X：An Adapter-Based Framework for Multi-Task Cross-LingualTransfer(arXiv：2005.00052).arXiv)基于验证集计算得到，Adapter模块结构如图1所示：模块首先对输入数据进行均值化操作，随后对其进行下采样，在经过ReLU激活函数后通过升采样操作使其恢复原有维度，最后将操作后的数据与最开始的输入数据进行残差链接，得到最后的输出结果。On this basis, the present invention further replaces the hyperparameter σ _i, which is updated with training and serves as a fine-tuning parameter and a regularization term, with a shared Adapter module (Pfeiffer, J.,

I., Gurevych, I., & Ruder, S. (2020). MAD-X: An Adapter-Based Framework for Multi-Task Cross-Lingual Transfer (arXiv: 2005.00052). arXiv) is calculated based on the validation set. The Adapter module structure is shown in Figure 1: the module first averages the input data, then downsamples it, and restores its original dimension through upsampling after the ReLU activation function. Finally, the operated data is residually linked with the initial input data to obtain the final output result.

VGG-CNN模块的结构示意图如图2所示；对所得到的值取对数以防止其为负，用g(X)表示模型在VGG-CNN、Adpater以及取均值、对数的过程，进一步地得σ_i的计算公式以及权重调整公式如下：The schematic diagram of the structure of the VGG-CNN module is shown in Figure 2; the obtained value is logarithmized to prevent it from being negative, and g(X) is used to represent the process of the model in VGG-CNN, Adpater, and taking the mean and logarithm. The calculation formula of _σi and the weight adjustment formula are further obtained as follows:

与使用超参数的调整方式相比，本发明借助CNN网络以及Adapter模块可以在训练阶段获取更多有用的泛化知识，学习到更细致的深层表征。同时将目标语种当为单个任务，进一步使用当前方法评估调整微调阶段的损失，提高对目标语种的识别性能。整体公式如下，其中，g(X)中的Conv以及Adapter均训练自预训练阶段。Compared with the adjustment method using hyperparameters, the present invention can obtain more useful generalized knowledge and learn more detailed deep representations in the training stage with the help of CNN network and Adapter module. At the same time, the target language is regarded as a single task, and the current method is further used to evaluate and adjust the loss in the fine-tuning stage to improve the recognition performance of the target language. The overall formula is as follows, where Conv and Adapter in g(X) are both trained in the pre-training stage.

基于公式10，本发明能自动且精准地调整MAML算法训练过程中训练损失的权重，并基于调整后的权重有效地更新模型的参数，使得本发明中的MAML算法相比其他元学习方法在训练过程中更加稳定，性能更好。虽在模型中引入VGG-CNN网络以及Adapter模块，但在计算成本上几乎没有增加，速度和原有MAML算法基本相同。Based on formula 10, the present invention can automatically and accurately adjust the weight of the training loss during the training of the MAML algorithm, and effectively update the parameters of the model based on the adjusted weight, so that the MAML algorithm in the present invention is more stable and has better performance during the training process than other meta-learning methods. Although the VGG-CNN network and the Adapter module are introduced into the model, there is almost no increase in the computational cost, and the speed is basically the same as the original MAML algorithm.

本发明技术方案及其可替代方案：The technical solution of the present invention and its alternatives:

1、基础MAML算法1. Basic MAML algorithm

和用于评估此次训练性能的验证集

and a validation set for evaluating the performance of this training

其中，θ′_i是训练对应任务i更新后模型的参数；

是相应的交叉熵损失；a则是对应的学习率。在相同的模型参数下，对所有的任务同步进行上述的训练。随后在元测试阶段，分别抽取对应

is the corresponding cross entropy loss; a is the corresponding learning rate. Under the same model parameters, the above training is performed on all tasks simultaneously. Then, in the meta-test phase, the corresponding

通过上述公式可以发现，在更新过程中，会涉及到模型参数θ的二次求导，大大增加了算法的计算复杂度，后续一阶导数近似方法的提出，如公式4所示，将外层对于θ′_i的求导替换为对θ的求导，在大大降低计算成本的基础上，其效果几乎没有影响，本发明的MAML算法就是在这个基础上进一步创新的。It can be found from the above formula that the updating process involves the quadratic derivative of the model parameter θ, which greatly increases the computational complexity of the algorithm. The subsequent first-order derivative approximation method, as shown in Formula 4, replaces the outer layer's derivative of θ′ _i with the derivative of θ, which greatly reduces the computational cost and has almost no effect. The MAML algorithm of the present invention is further innovated on this basis.

经过反复的上述过程的更新迭代后，就可以获得非常优良的初始化参数。在此基础上，即使目标数据D_target是低资源的，模型经过少量的梯度更新就可以获得非常优良的性能。After repeated iterations of the above process, very good initialization parameters can be obtained. On this basis, even if the target data D _target is low-resource, the model can achieve very good performance after a small amount of gradient updates.

2、本发明权重自适应的MAML算法2. The weight-adaptive MAML algorithm of the present invention

在元测试阶段中，MAML算法将所有任务的评估损失相加，用来更新模型的初始化参数。然而，不同任务对于相同损失函数的敏感程度是不同的，简单的损失相加则会进一步扩大这种敏感性的负面影响。同时MAML算法独特的双层梯度更新方式以及单次更新使用数据量少的特点，经大量实验后被证实其并不具备较好的稳定性。由于上述两个原因，算法对模型整体参数的更新偶尔会产生较大的波动，甚至会在微调阶段损害模型的性能。In the meta-test phase, the MAML algorithm adds up the evaluation losses of all tasks to update the model's initialization parameters. However, different tasks have different sensitivities to the same loss function, and simply adding up the losses will further amplify the negative impact of this sensitivity. At the same time, the MAML algorithm's unique dual-layer gradient update method and the small amount of data used in a single update have been proven to be less stable after a large number of experiments. Due to the above two reasons, the algorithm's update of the overall model parameters occasionally produces large fluctuations, and may even damage the model's performance during the fine-tuning phase.

本发明旨在通过利用方差不确定性来实现元测试阶段的损失权重自适应，使其自动调节相应的权重，进而达到最优的效果。同时通过这一自适应调节过程使得对应的自适应模块能够学习到更加有用的泛化深层表征，并将其运用到微调阶段辅助对于目标数据的训练，以使模型获得更加优秀的性能。The present invention aims to achieve loss weight adaptation in the meta-test phase by utilizing variance uncertainty, so that it can automatically adjust the corresponding weights to achieve the best effect. At the same time, through this adaptive adjustment process, the corresponding adaptive module can learn more useful generalized deep representations, and apply them to the fine-tuning phase to assist in the training of target data, so that the model can obtain better performance.

其中，

是基于模型参数θ以及模型输入

是对应输入的真实标签，

is based on the model parameters θ and the model input

is the true label of the corresponding input,

MAML算法性能的波动主要集中在元测试阶段的验证集损失上，本发明主要利用同方差不确定性调整元测试阶段的损失权重。由此，基于上述的公式以及MAML算法的元测试阶段的公式，进一步地可以得到在元测试阶段N个任务对应验证集损失：The fluctuation of MAML algorithm performance is mainly concentrated on the validation set loss in the meta-test phase. The present invention mainly uses homoscedastic uncertainty to adjust the loss weight in the meta-test phase. Therefore, based on the above formula and the formula of the meta-test phase of the MAML algorithm, the validation set loss corresponding to N tasks in the meta-test phase can be further obtained:

以及

是由B个输入音频数据x组成的，

则是其对应的真实标签。相比较普通MAML算法，本发明的

在原先简单加和的基础上通过基于同方差不确定性的σ_i进行微调，强化不同任务间的差异性，强化损失的稳定性，从而进一步地使模型在训练过程中学习到更加泛化的语音特征，提高模型的整体识别性能。Among them, θ′ _i is the model parameter obtained after the meta-training phase for the corresponding task i,

as well as

is composed of B input audio data x,

is the corresponding true label. Compared with the common MAML algorithm, the

I.，Gurevych，I.，&Ruder，S.(2020).MAD-X：An Adapter-Based Framework for Multi-Task Cross-LingualTransfer(arXiv：2005.00052).arXiv)基于验证集计算得到，模块首先对输入数据进行均值化操作，随后对其进行下采样，在经过ReLU激活函数后通过升采样操作使其恢复原有维度，最后将操作后的数据与最开始的输入数据进行残差链接，得到最后的输出结果。On this basis, the present invention further replaces the hyperparameter σ _i, which is updated with training and serves as a fine-tuning parameter and a regularization term, with a shared Adapter module (Pfeiffer, J.,

I., Gurevych, I., & Ruder, S. (2020). MAD-X: An Adapter-Based Framework for Multi-Task Cross-Lingual Transfer (arXiv: 2005.00052). arXiv) is calculated based on the validation set. The module first averages the input data, then downsamples it, and restores its original dimension through upsampling after the ReLU activation function. Finally, the operated data is residually linked with the initial input data to obtain the final output result.

对所得到的值取对数以防止其为负，用g(X)表示模型在VGG-CNN、Adpater以及取均值、对数的过程，进一步地得σ_i的计算公式以及权重调整公式如下：Take the logarithm of the obtained value to prevent it from being negative, and use g(X) to represent the model in VGG-CNN, Adpater, and the process of taking the mean and logarithm. Further, the calculation formula of _σi and the weight adjustment formula are as follows:

基于公式10，本发明能自动且精准地调整MAML算法训练过程中训练损失的权重，并基于调整后的权重有效地更新模型的参数，使得本发明中的MAML算法相比其他元学习方法在训练过程中更加稳定，性能更好。本发明的训练流程示意图参见图3。Based on formula 10, the present invention can automatically and accurately adjust the weight of the training loss during the training process of the MAML algorithm, and effectively update the parameters of the model based on the adjusted weight, so that the MAML algorithm in the present invention is more stable and has better performance during the training process than other meta-learning methods. See Figure 3 for a schematic diagram of the training process of the present invention.

Claims

1. A loss weight self-adaptive element learning method in low-resource speech recognition is characterized by comprising the following steps:

1) Classifying a plurality of languages used for training according to the languages, and dividing a training set and a verification set;

2) Dividing the voice data of the training set and the verification set of each language type into a supporting set and a query set;

3) Counting the texts of all the voice data, and manufacturing a training dictionary;

4) Writing codes to construct a voice recognition model consisting of a transducer model, a VGG-CNN network and an Adapter module;

5) Model pre-training stage is carried out;

5.1 In each training iteration, for the training set voice data of each language, firstly extracting k data from the supporting set of the training set, extracting features through a VGG-CNN network, then sending the features into a transducer model for training, and updating model parameters of the current iteration turn;

5.2 Extracting k data from the query set of the corresponding language, extracting the characteristics through the VGG-CNN network, sending the characteristics into the recognition performance of the transducer model test model to obtain the corresponding query set loss, sending the characteristics into the Adapter module, and obtaining the variance required by adjusting the loss weight after the average value and the logarithm are taken;

5.3 Utilizing the obtained variance to adjust the query set loss of the corresponding language in the current turn through a formula;

5.4 After the training operation for the single language is completed, initializing the parameters of the model to the numerical value at the beginning of the current training round;

6) Repeating the steps 5.1) to 5.4) for training sets and verification sets of other languages, further obtaining the adjusted query set loss of all the languages for training in the current iteration round, and summing the query set loss to obtain an average value;

7) Gradient descent updating is carried out on all parameters of the model at the beginning of the current iteration round of the model by utilizing the query set loss average value operated in the step 6);

8) After updating the model parameters, taking the current parameters of the model as the model initialization parameters of the next iteration round, and repeating the steps 5) to 7);

9) After each time of training a certain Epoch, testing a current trained model by using voice data of a verification set, storing model parameters in the process of optimal performance after the test, and ending the training process when the model parameters stored for a certain time are the same optimal values after a plurality of tests;

10 Preparing training set, verification set and test set of language to be identified;

11 Training the model stored in the step 9) by using a training set of languages to be identified;

11.1 In each iteration round, extracting voice data of one Batch number from a training set, extracting features through a VGG-CNN network, sending the features into a recognition performance of a transducer model test model to obtain corresponding loss, sending the features into an Adapter module, and obtaining variances required by adjusting loss weights after average and logarithmic operations;

11.2 Using the obtained variance to adjust the training loss of the required identification language in the current turn through a formula;

11.3 Updating the model parameters using the loss after alignment obtained in step 11.2);

12 After each Epoch is trained, testing the current trained model by utilizing the voice data of the target language verification set, and storing the model with optimal performance;

13 After repeatedly training a plurality of epochs, ending the training process aiming at the target language, thus obtaining the voice recognition model with excellent recognition performance.