CN110379433A

CN110379433A - Method, apparatus, computer equipment and the storage medium of authentication

Info

Publication number: CN110379433A
Application number: CN201910711306.0A
Authority: CN
Inventors: 刘加; 刘艺; 何亮; 张卫强
Original assignee: Beijing Huacong Zhijia Technology Co Ltd; Tsinghua University
Current assignee: Beijing Huacong Zhijia Technology Co Ltd; Tsinghua University
Priority date: 2019-08-02
Filing date: 2019-08-02
Publication date: 2019-10-25
Anticipated expiration: 2039-08-02
Also published as: CN110379433B

Abstract

The application relates to an identity verification method, device, computer equipment and storage medium. The method includes: acquiring the voice data input by the target user according to the target dynamic verification code; dividing the voice data into at least one voice frame according to a preset segmentation algorithm; and extracting each voice frame according to a preset acoustic feature extraction algorithm. The acoustic feature vector corresponding to the speech frame; the acoustic feature vector corresponding to the speech frame is input to the pre-trained identity verification multi-task model, and the intermediate user feature vector and the first posterior probability set corresponding to the speech frame are output; according to each speech The intermediate user feature vector corresponding to the frame and the preset pooling algorithm determine the first user feature vector corresponding to the target user; according to the first user feature vector corresponding to the target user and the first posterior probability set corresponding to each speech frame, the The target user is authenticated. By adopting the application, the computing complexity of the server can be reduced, and the processing efficiency of the server can be improved.

Description

Identity verification method, device, computer equipment and storage medium

技术领域technical field

本申请涉及安全技术领域，特别是涉及一种身份验证的方法、装置、计算机设备及存储介质。The present application relates to the field of security technology, in particular to an identity verification method, device, computer equipment and storage medium.

背景技术Background technique

目前，基于生物识别(比如指纹、人脸、语音等)的身份验证技术和基于动态验证码的身份验证技术是两种较为常用的身份验证技术。为了进一步提高身份验证的安全性，可以采用基于语音的身份验证技术和动态验证码相结合的方式，对用户进行身份验证。At present, identity verification technologies based on biometrics (such as fingerprints, faces, voices, etc.) and identity verification technologies based on dynamic verification codes are two commonly used identity verification technologies. In order to further improve the security of identity verification, a combination of voice-based identity verification technology and dynamic verification code can be used to verify user identity.

在传统的结合方式中，用户终端通过语音采集设备采集用户根据动态验证码输入的语音数据，并将该语音数据发送至部署有声纹识别模型和语音识别模型的服务器。服务器接收到该语音数据后，可以将该语音数据输入至声纹识别模型，输出该用户对应的目标用户特征向量。同时，服务器还可以将该语音数据输入至语音识别模型，输出该语音数据对应的目标文本。如果该目标用户特征向量与预先存储的该用户的用户特征向量相近，且该目标文本与动态验证码相同，则服务器可以确定该用户为合法用户。否则，服务器可以确定该用户为非法用户。In the traditional combination method, the user terminal collects the voice data input by the user according to the dynamic verification code through the voice collection device, and sends the voice data to the server deployed with the voiceprint recognition model and the voice recognition model. After receiving the voice data, the server may input the voice data into the voiceprint recognition model, and output the target user feature vector corresponding to the user. At the same time, the server can also input the speech data into the speech recognition model, and output the target text corresponding to the speech data. If the target user feature vector is similar to the pre-stored user feature vector of the user, and the target text is the same as the dynamic verification code, the server can determine that the user is a legitimate user. Otherwise, the server may determine that the user is an illegal user.

基于传统的结合方式，服务器需要两套部署结构、参数均不相同的声纹识别模型和语音识别模型。声纹识别模型和语音识别模型需要独立对用户的语音数据进行处理，才能得到用户特征向量和目标文本，从而造成服务器的计算复杂度较高，导致服务器的处理效率较低。Based on the traditional combination method, the server needs two sets of voiceprint recognition models and speech recognition models with different deployment structures and parameters. The voiceprint recognition model and the speech recognition model need to independently process the user's voice data to obtain the user's feature vector and target text, resulting in high computational complexity of the server and low processing efficiency of the server.

发明内容Contents of the invention

基于此，有必要针对上述技术问题，提供一种身份验证的方法、装置、计算机设备及存储介质。Based on this, it is necessary to provide an identity verification method, device, computer equipment and storage medium for the above technical problems.

第一方面，提供了一种身份验证的方法，所述方法包括：In a first aspect, a method for identity verification is provided, the method comprising:

获取目标用户根据目标动态验证码输入的语音数据；Obtain the voice data input by the target user according to the target dynamic verification code;

根据预设的分段算法，将所述语音数据划分为至少一个语音帧；dividing the speech data into at least one speech frame according to a preset segmentation algorithm;

针对每个语音帧，根据预设的声学特征提取算法，提取该语音帧对应的声学特征向量；For each speech frame, extract an acoustic feature vector corresponding to the speech frame according to a preset acoustic feature extraction algorithm;

将该语音帧对应的声学特征向量输入至预先训练的身份验证多任务模型，输出该语音帧对应的中间用户特征向量和第一后验概率集合，所述第一后验概率集合包括各预设发音单元对应的后验概率；The acoustic feature vector corresponding to the voice frame is input to the pre-trained identity verification multi-task model, and the intermediate user feature vector and the first posterior probability set corresponding to the voice frame are output, and the first posterior probability set includes each preset The posterior probability corresponding to the pronunciation unit;

根据各语音帧对应的中间用户特征向量和预设的池化算法，确定所述目标用户对应的第一用户特征向量；According to the intermediate user feature vector corresponding to each speech frame and the preset pooling algorithm, determine the first user feature vector corresponding to the target user;

根据所述目标用户对应的第一用户特征向量和所述各语音帧对应的第一后验概率集合，对所述目标用户进行身份验证。Perform identity verification on the target user according to the first user feature vector corresponding to the target user and the first posterior probability set corresponding to each speech frame.

作为一种可选的实施方式，所述身份验证多任务模型包括多任务共享隐含层、声纹识别网络和语音识别网络；As an optional implementation, the identity verification multi-task model includes a multi-task shared hidden layer, a voiceprint recognition network and a speech recognition network;

所述将该语音帧对应的声学特征向量输入至预先训练的身份验证多任务模型，输出该语音帧对应的中间用户特征向量和第一后验概率集合，包括：The acoustic feature vector corresponding to the voice frame is input to the pre-trained identity verification multi-task model, and the intermediate user feature vector and the first posterior probability set corresponding to the voice frame are output, including:

将该语音帧对应的声学特征向量输入至所述多任务共享隐含层，输出该语音帧对应的中间特征向量；The acoustic feature vector corresponding to the speech frame is input to the multi-task sharing hidden layer, and the intermediate feature vector corresponding to the speech frame is output;

将该语音帧对应的中间特征向量输入至所述语音识别网络，输出该语音帧对应的发音特征向量和第一后验概率集合；The intermediate feature vector corresponding to the speech frame is input to the speech recognition network, and the pronunciation feature vector and the first posterior probability set corresponding to the speech frame are output;

将该语音帧对应的中间特征向量和发音特征向量输入至所述声纹识别网络，输出该语音帧对应的中间用户特征向量。The intermediate feature vector and pronunciation feature vector corresponding to the speech frame are input to the voiceprint recognition network, and the intermediate user feature vector corresponding to the speech frame is output.

作为一种可选的实施方式，所述根据所述目标用户对应的第一用户特征向量和所述各语音帧对应的第一后验概率集合，对所述目标用户进行身份验证，包括：As an optional implementation manner, the authentication of the target user according to the first user feature vector corresponding to the target user and the first posterior probability set corresponding to each speech frame includes:

根据所述各语音帧对应的第一后验概率集合，确定所述目标用户对应的目标动态验证码分数；According to the first posterior probability set corresponding to each speech frame, determine the target dynamic verification code score corresponding to the target user;

如果所述第一用户特征向量与预先存储的所述目标用户对应的第二用户特征向量的相似度大于或等于预设的相似度阈值，且所述目标动态验证码分数大于或等于预设的动态验证码分数阈值，则确定所述目标用户为合法用户；If the similarity between the first user feature vector and the pre-stored second user feature vector corresponding to the target user is greater than or equal to a preset similarity threshold, and the target dynamic verification code score is greater than or equal to a preset dynamic verification code score threshold, then determine that the target user is a legitimate user;

如果所述第一用户特征向量与所述第二用户特征向量的相似度小于所述预设的相似度阈值，或者所述目标动态验证码分数小于所述预设的动态验证码分数阈值，则确定所述目标用户为非法用户。If the similarity between the first user feature vector and the second user feature vector is smaller than the preset similarity threshold, or the target dynamic verification code score is smaller than the preset dynamic verification code score threshold, then It is determined that the target user is an illegal user.

作为一种可选的实施方式，所述根据所述各语音帧对应的第一后验概率集合，确定所述目标用户对应的目标动态验证码分数，包括：As an optional implementation manner, the determining the target dynamic verification code score corresponding to the target user according to the first posterior probability set corresponding to each speech frame includes:

获取所述目标动态验证码对应的发音单元序列；Acquiring the pronunciation unit sequence corresponding to the target dynamic verification code;

根据所述各语音帧对应的第一后验概率集合、所述发音单元序列和预设的强制对齐算法，确定各语音帧对应的目标发音单元；Determine the target pronunciation unit corresponding to each voice frame according to the first posterior probability set corresponding to each voice frame, the pronunciation unit sequence and the preset forced alignment algorithm;

针对每个语音帧，在该语音帧对应的第一后验概率集合中，确定该语音帧对应的目标发音单元的后验概率，并确定该目标发音单元的后验概率与预先存储的该目标发音单元的先验概率的乘积，作为该目标发音单元的似然值；For each speech frame, in the first posterior probability set corresponding to the speech frame, determine the posterior probability of the target pronunciation unit corresponding to the speech frame, and determine the posterior probability of the target pronunciation unit and the pre-stored target The product of the prior probability of the pronunciation unit is used as the likelihood value of the target pronunciation unit;

根据所述各语音帧对应的目标发音单元的似然值，确定所述目标用户对应的目标动态验证码分数。The target dynamic verification code score corresponding to the target user is determined according to the likelihood value of the target pronunciation unit corresponding to each speech frame.

作为一种可选的实施方式，所述获取所述目标动态验证码对应的发音单元序列，包括：As an optional implementation manner, the acquiring the pronunciation unit sequence corresponding to the target dynamic verification code includes:

根据所述目标动态验证码和预设的分词算法，确定所述目标动态验证码对应的单词集合；Determine the word set corresponding to the target dynamic verification code according to the target dynamic verification code and a preset word segmentation algorithm;

针对所述单词集合中的每个单词，根据预先存储的单词和发音单元序列的对应关系，确定该单词对应的发音单元序列；For each word in the word set, determine the pronunciation unit sequence corresponding to the word according to the correspondence between the pre-stored word and the pronunciation unit sequence;

将各单词对应的发音单元序列按照所述各单词在所述目标动态验证码中的顺序进行排序，得到所述目标动态验证码对应的发音单元序列。The pronunciation unit sequence corresponding to each word is sorted according to the order of each word in the target dynamic verification code to obtain the pronunciation unit sequence corresponding to the target dynamic verification code.

作为一种可选的实施方式，所述根据所述各语音帧对应的目标发音单元的似然值，确定所述目标用户对应的目标动态验证码分数，包括：As an optional implementation manner, the determining the target dynamic verification code score corresponding to the target user according to the likelihood value of the target pronunciation unit corresponding to each speech frame includes:

针对每个语音帧，确定该语音帧对应的目标发音单元的似然值与该语音帧对应的所述各预设发音单元的似然值中的最大似然值的差值，作为该语音帧对应的动态验证码分数；For each speech frame, determine the difference between the likelihood value of the target pronunciation unit corresponding to the speech frame and the likelihood value of each preset pronunciation unit corresponding to the speech frame, as the speech frame The corresponding dynamic verification code score;

确定所述各语音帧对应的动态验证码分数的平均值，作为所述目标用户对应的目标动态验证码分数。Determine the average value of the dynamic verification code scores corresponding to the speech frames as the target dynamic verification code score corresponding to the target user.

作为一种可选的实施方式，所述方法还包括：As an optional implementation, the method also includes:

获取预先存储的第一训练样本集合，所述第一训练样本集合包括多个样本用户标识和每个样本用户标识对应的第一样本语音数据；Obtaining a pre-stored first training sample set, the first training sample set including a plurality of sample user IDs and first sample voice data corresponding to each sample user ID;

针对所述第一训练样本集合中的每个第一样本语音数据，根据预设的分段算法，将该第一样本语音数据划分为至少一个第一样本语音帧；For each first sample speech data in the first training sample set, divide the first sample speech data into at least one first sample speech frame according to a preset segmentation algorithm;

针对该第一样本语音数据对应的每个第一样本语音帧，根据预设的声学特征提取算法，提取该第一样本语音帧对应的声学特征向量；For each first sample speech frame corresponding to the first sample speech data, extract an acoustic feature vector corresponding to the first sample speech frame according to a preset acoustic feature extraction algorithm;

将该第一样本语音数据对应的各第一样本语音帧的声学特征向量输入至待训练的身份验证多任务模型，输出该第一样本语音数据对应的第二后验概率集合，所述第二后验概率集合包括各样本用户标识对应的后验概率；The acoustic feature vectors of each first sample speech frame corresponding to the first sample speech data are input to the identity verification multi-task model to be trained, and the second posterior probability set corresponding to the first sample speech data is output, so The second posterior probability set includes the posterior probability corresponding to each sample user identifier;

根据各第一样本语音数据对应的样本用户标识的后验概率，确定所述第一训练样本集合对应的第一代价函数；Determining a first cost function corresponding to the first training sample set according to the posterior probability of the sample user identification corresponding to each first sample voice data;

根据所述第一代价函数和预设的第一参数更新算法，更新所述待训练的身份验证多任务模型中所述多任务共享隐含层对应的参数、所述声纹识别网络对应的参数和所述语音识别网络对应的参数。According to the first cost function and the preset first parameter update algorithm, update the parameters corresponding to the multi-task shared hidden layer and the parameters corresponding to the voiceprint recognition network in the identity verification multi-task model to be trained Parameters corresponding to the speech recognition network.

获取预先存储的第二训练样本集合，所述第二训练样本集合包括多个第二样本语音帧和每个第二样本语音帧对应的样本发音单元；Obtaining a pre-stored second training sample set, the second training sample set including a plurality of second sample speech frames and sample pronunciation units corresponding to each second sample speech frame;

针对所述第二训练样本集合中的每个第二样本语音帧，根据所述预设的声学特征提取算法，提取该第二样本语音帧对应的声学特征向量；For each second sample speech frame in the second training sample set, extract an acoustic feature vector corresponding to the second sample speech frame according to the preset acoustic feature extraction algorithm;

将该第二样本语音帧对应的声学特征向量输入至待训练的身份验证多任务模型，输出该第二样本语音帧对应的第三后验概率集合，所述第三后验概率集合包括各样本发音单元对应的后验概率；The acoustic feature vector corresponding to the second sample speech frame is input to the identity verification multi-task model to be trained, and the third posterior probability set corresponding to the second sample speech frame is output, and the third posterior probability set includes each sample The posterior probability corresponding to the pronunciation unit;

根据各第二样本语音帧对应的样本发音单元的后验概率，确定所述第二训练样本集合对应的第二代价函数；Determining a second cost function corresponding to the second training sample set according to the posterior probability of the sample pronunciation unit corresponding to each second sample speech frame;

根据所述第二代价函数和预设的第二参数更新算法，更新所述待训练的身份验证多任务模型中所述多任务共享隐含层对应的参数和所述语音识别网络对应的参数。According to the second cost function and the preset second parameter update algorithm, update the parameters corresponding to the multi-task shared hidden layer and the parameters corresponding to the speech recognition network in the identity verification multi-task model to be trained.

获取预先存储的多个验证样本集合，每个验证样本集合包括多个所述样本用户标识和每个样本用户标识对应的第二样本语音数据；Obtaining a plurality of pre-stored verification sample sets, each verification sample set including a plurality of sample user IDs and second sample voice data corresponding to each sample user ID;

针对每个验证样本集合中的每个第二样本语音数据，根据预设的分段算法，将该第二样本语音数据划分为至少一个第三样本语音帧；For each second sample speech data in each verification sample set, divide the second sample speech data into at least one third sample speech frame according to a preset segmentation algorithm;

针对该第二样本语音数据对应的每个第三样本语音帧，根据预设的声学特征提取算法，提取该第三样本语音帧对应的声学特征向量；For each third sample speech frame corresponding to the second sample speech data, extract an acoustic feature vector corresponding to the third sample speech frame according to a preset acoustic feature extraction algorithm;

将该第二样本语音数据对应的各第三样本语音帧的声学特征向量输入至待验证的身份验证多任务模型，输出该第二样本语音数据对应的第四后验概率集合，所述第四后验概率集合包括各样本用户标识对应的后验概率；Input the acoustic feature vectors of the third sample speech frames corresponding to the second sample speech data to the identity verification multi-task model to be verified, and output the fourth posterior probability set corresponding to the second sample speech data, the fourth The posterior probability set includes the posterior probability corresponding to each sample user identifier;

如果该第二样本语音数据对应的样本用户标识的后验概率为该第二样本语音数据对应的第四后验概率集合中的最大值，则确定该第二样本语音数据为目标样本语音数据；If the posterior probability of the sample user identification corresponding to the second sample voice data is the maximum value in the fourth posterior probability set corresponding to the second sample voice data, then determine that the second sample voice data is the target sample voice data;

确定该验证样本集合中目标样本语音数据的数目与该验证样本集合中第二样本语音数据的总数目的比值，作为该验证样本集合的准确率；Determine the ratio of the number of target sample speech data in the verification sample set to the total number of the second sample speech data in the verification sample set, as the accuracy rate of the verification sample set;

根据各验证样本集合的准确率，确定各验证样本集合对应的准确率的变化率，如果存在连续预设数目个验证样本集合对应的准确率的变化率小于或等于预设的变化率阈值，则确定所述待验证的身份验证多任务模型训练完成。According to the accuracy rate of each verification sample set, determine the change rate of the accuracy rate corresponding to each verification sample set, if there is a continuous preset number of verification sample sets The change rate corresponding to the accuracy rate is less than or equal to the preset change rate threshold, then It is determined that the identity verification multi-task model to be verified has been trained.

第二方面，提供了一种身份验证的装置，所述装置包括：In a second aspect, an identity verification device is provided, and the device includes:

第一获取模块，用于获取目标用户根据目标动态验证码输入的语音数据；The first obtaining module is used to obtain the voice data input by the target user according to the target dynamic verification code;

第一划分模块，用于根据预设的分段算法，将所述语音数据划分为至少一个语音帧；A first division module, configured to divide the voice data into at least one voice frame according to a preset segmentation algorithm;

第一提取模块，用于针对每个语音帧，根据预设的声学特征提取算法，提取该语音帧对应的声学特征向量；The first extraction module is used for extracting the acoustic feature vector corresponding to the speech frame according to a preset acoustic feature extraction algorithm for each speech frame;

第一输出模块，用于将该语音帧对应的声学特征向量输入至预先训练的身份验证多任务模型，输出该语音帧对应的中间用户特征向量和第一后验概率集合，所述第一后验概率集合包括各预设发音单元对应的后验概率；The first output module is used to input the acoustic feature vector corresponding to the voice frame to the pre-trained identity verification multi-task model, and output the intermediate user feature vector and the first posterior probability set corresponding to the voice frame, the first posterior probability set The posterior probability set includes the posterior probability corresponding to each preset pronunciation unit;

第一确定模块，用于根据各语音帧对应的中间用户特征向量和预设的池化算法，确定所述目标用户对应的第一用户特征向量；The first determination module is used to determine the first user feature vector corresponding to the target user according to the intermediate user feature vector corresponding to each speech frame and a preset pooling algorithm;

验证模块，用于根据所述目标用户对应的第一用户特征向量和所述各语音帧对应的第一后验概率集合，对所述目标用户进行身份验证。A verification module, configured to perform identity verification on the target user according to the first user feature vector corresponding to the target user and the first posterior probability set corresponding to each speech frame.

第三方面，提供了一种计算机设备，包括存储器及处理器，所述存储器上存储有可在处理器上运行的计算机程序，所述处理器执行所述计算机程序时实现以下步骤：In a third aspect, a computer device is provided, including a memory and a processor, the memory stores a computer program that can run on the processor, and the processor implements the following steps when executing the computer program:

第四方面，提供了一种计算机可读存储介质，其上存储有计算机程序，所述计算机程序被处理器执行时实现以下步骤：In a fourth aspect, a computer-readable storage medium is provided, on which a computer program is stored, and when the computer program is executed by a processor, the following steps are implemented:

本申请实施例提供了一种身份验证的方法、装置、计算机设备及存储介质。服务器获取目标用户根据目标动态验证码输入的语音数据，根据预设的分段算法，将语音数据划分为至少一个语音帧。然后，针对每个语音帧，服务器根据预设的声学特征提取算法，确定该语音帧对应的声学特征向量。之后，服务器将该语音帧对应的声学特征向量输入至预先训练的身份验证多任务模型，输出该语音帧对应的中间用户特征向量和第一后验概率集合。其中，第一后验概率集合包括各预设发音单元对应的后验概率。最后，服务器根据各语音帧对应的中间用户特征向量和预设的池化算法，确定目标用户对应的第一用户特征向量，并根据目标用户对应的第一用户特征向量和各语音帧对应的第一后验概率集合，对目标用户进行身份验证。这样，服务器无需部署两套结构、参数均不相同的声纹识别模型和语音识别模型，仅需要部署一套身份验证多任务模型，即可对用户的语音数据处理，从而降低服务器的计算复杂度，提高服务器的处理效率。Embodiments of the present application provide an identity verification method, device, computer equipment, and storage medium. The server acquires the voice data input by the target user according to the target dynamic verification code, and divides the voice data into at least one voice frame according to a preset segmentation algorithm. Then, for each speech frame, the server determines an acoustic feature vector corresponding to the speech frame according to a preset acoustic feature extraction algorithm. Afterwards, the server inputs the acoustic feature vector corresponding to the speech frame into the pre-trained identity verification multi-task model, and outputs the intermediate user feature vector and the first posterior probability set corresponding to the speech frame. Wherein, the first posterior probability set includes the posterior probability corresponding to each preset pronunciation unit. Finally, the server determines the first user feature vector corresponding to the target user according to the intermediate user feature vector corresponding to each speech frame and the preset pooling algorithm, and according to the first user feature vector corresponding to the target user and the first user feature vector corresponding to each speech frame A set of posterior probabilities for authenticating the target user. In this way, the server does not need to deploy two sets of voiceprint recognition models and speech recognition models with different structures and parameters, but only needs to deploy a set of authentication multi-tasking models to process the user's voice data, thereby reducing the computational complexity of the server , improve the processing efficiency of the server.

附图说明Description of drawings

图1为本申请实施例提供的一种身份验证系统的架构图；Fig. 1 is the architectural diagram of a kind of identity verification system provided by the embodiment of the present application;

图2为本申请实施例提供的一种身份验证的方法的流程图；FIG. 2 is a flow chart of a method for identity verification provided by an embodiment of the present application;

图3为本申请实施例提供的一种身份验证多任务模型的结构示意图；FIG. 3 is a schematic structural diagram of an identity verification multi-task model provided by an embodiment of the present application;

图4为本申请实施例提供的一种身份验证的方法的流程图；FIG. 4 is a flow chart of an identity verification method provided in an embodiment of the present application;

图5为本申请实施例提供的一种确定目标动态验证码分数的方法流程图；FIG. 5 is a flow chart of a method for determining a target dynamic verification code score provided by an embodiment of the present application;

图6为本申请实施例提供的一种账号注册的方法的流程图；FIG. 6 is a flow chart of an account registration method provided in an embodiment of the present application;

图7为本申请实施例提供的一种身份验证多任务模型的训练方法的流程图；FIG. 7 is a flow chart of a training method for an identity verification multi-task model provided in an embodiment of the present application;

图8为本申请实施例提供的一种身份验证多任务模型的验证方法的流程图；FIG. 8 is a flow chart of a verification method for an identity verification multi-task model provided in an embodiment of the present application;

图9为本申请实施例提供的一种身份验证的装置的结构示意图；FIG. 9 is a schematic structural diagram of an identity verification device provided by an embodiment of the present application;

图10为本申请实施例提供的一种计算机设备的结构示意图。FIG. 10 is a schematic structural diagram of a computer device provided by an embodiment of the present application.

具体实施方式Detailed ways

为了使本申请的目的、技术方案及优点更加清楚明白，以下结合附图及实施例，对本申请进行进一步详细说明。应当理解，此处描述的具体实施例仅仅用以解释本申请，并不用于限定本申请。In order to make the purpose, technical solution and advantages of the present application clearer, the present application will be further described in detail below in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described here are only used to explain the present application, and are not intended to limit the present application.

本申请实施例提供了一种身份验证的方法，该方法可以应用于身份验证系统。图1为本申请实施例提供的一种身份验证系统的架构图。如图1所示，该身份验证系统包括通过通信网络连接的用户终端和服务器。其中，该通信网络可以为有线网络，也可以为无线网络，还可以为其他类型的通信网络，本申请实施例不作限定；该用户终端可以为具有录音和计算功能的便携式电子设备，比如手机、平板电脑、笔记本电脑等。The embodiment of the present application provides an identity verification method, which can be applied to an identity verification system. FIG. 1 is an architecture diagram of an identity verification system provided by an embodiment of the present application. As shown in Figure 1, the identity verification system includes a user terminal and a server connected through a communication network. Wherein, the communication network can be a wired network, a wireless network, or other types of communication networks, which are not limited in this embodiment of the application; the user terminal can be a portable electronic device with recording and computing functions, such as a mobile phone, Tablets, laptops, etc.

用户终端，用于接收用户的账号注册请求和身份验证请求，生成该用户对应的动态验证码，采集用户根据该动态验证码输入的语音数据，并将该用户对应的账号、动态验证码和语音数据发送至服务器；服务器，用于对待训练的身份验证多任务模型进行训练和验证；在账号注册过程中，服务器，还用于根据预设的分段算法，将语音数据划分为至少一个语音帧，将各语音帧对应的声学特征向量输入至身份验证多任务模型，输出各语音帧对应的中间用户特征向量，根据各语音帧对应的中间用户特征向量和预设的池化算法，得到该用户对应的目标用户特征向量，并存储该用户对应的账号和目标用户特征向量；在身份验证过程中，服务器，还用于根据预设的分段算法，将语音数据划分为至少一个语音帧，将各语音帧对应的声学特征向量输入至身份验证多任务模型，输出各语音帧对应的中间用户特征向量和第一后验概率集合，根据各语音帧对应的中间用户特征向量和预设的池化算法，得到该用户对应的目标用户特征向量，根据各语音帧对应的第一后验概率集合，得到目标动态验证码分数，并根据该用户对应的目标用户特征向量和目标动态验证码分数对该用户进行身份验证。The user terminal is used to receive the user's account registration request and identity verification request, generate a dynamic verification code corresponding to the user, collect voice data input by the user according to the dynamic verification code, and send the user's corresponding account number, dynamic verification code and voice data The data is sent to the server; the server is used to train and verify the identity verification multi-task model to be trained; during the account registration process, the server is also used to divide the voice data into at least one voice frame according to a preset segmentation algorithm , input the acoustic feature vectors corresponding to each speech frame into the identity verification multi-task model, output the intermediate user feature vectors corresponding to each speech frame, and obtain the user The corresponding target user feature vector, and store the account corresponding to the user and the target user feature vector; in the identity verification process, the server is also used to divide the voice data into at least one voice frame according to the preset segmentation algorithm, and divide the voice data into at least one voice frame. The acoustic feature vector corresponding to each speech frame is input to the identity verification multi-task model, and the intermediate user feature vector and the first posterior probability set corresponding to each speech frame are output. According to the intermediate user feature vector corresponding to each speech frame and the preset pooling Algorithm, obtain the target user feature vector corresponding to the user, obtain the target dynamic verification code score according to the first posterior probability set corresponding to each speech frame, and use the target user feature vector and target dynamic verification code score corresponding to the user for the The user is authenticated.

作为一种可选地实施方式，服务器还可以将身份验证多任务模型发送至用户终端，由用户终端基于该身份验证多任务模型对用户输入的语音数据进行处理。本申请实施例以服务器基于该身份验证多任务模型对用户输入的语音数据进行处理为例进行介绍，其他情况与之类似。As an optional implementation manner, the server may also send the identity verification multitasking model to the user terminal, and the user terminal processes the voice data input by the user based on the identity verification multitasking model. In the embodiment of the present application, the server processes the voice data input by the user based on the identity verification multi-task model as an example, and other situations are similar.

下面将结合具体实施方式，对本申请实施例提供的一种用户验证的方法进行详细的说明。如图2所示，具体步骤如下：A user authentication method provided in the embodiment of the present application will be described in detail below in combination with specific implementation manners. As shown in Figure 2, the specific steps are as follows:

步骤201，获取目标用户根据目标动态验证码输入的语音数据。Step 201, acquire voice data input by the target user according to the target dynamic verification code.

在实施中，当某一用户(即目标用户)使用该目标用户对应的目标账号登录用户终端时，用户终端可以生成该目标账号对应的目标动态验证码。其中，该用户终端可以在预先存储的候选动态验证码集合中随机选取一个候选动态验证码，作为该目标账号对应的目标动态验证码；该用户终端也可以在预先存储的候选单词集合中，随机选取预设数目个候选单词，并将选取出的预设数目个候选单词进行随机组合，作为该目标账号对应的目标动态验证码；该用户终端还可以采用其他方式生成该目标账号对应的目标动态验证码，本申请实施例不作限定。用户终端生成该目标账号对应的目标动态验证码后，可以在显示界面中显示该目标动态验证码。作为一种可选地实施方式，该用户终端还可以在该显示界面中显示用于提示目标用户朗读该目标动态验证码的提示信息。然后，该用户终端可以启动语音采集装置(比如麦克风)，采集该目标用户朗读该目标动态验证码时的语音数据。该用户终端得到该目标用户根据目标动态验证码输入的语音数据后，可以向服务器发送身份验证请求。其中，该身份验证请求中携带有目标账号、目标动态验证码和语音数据。服务器接收到该身份验证请求后，可以对该身份验证请求进行解析，得到该身份验证请求中携带的目标账号、目标动态验证码和语音数据。In implementation, when a certain user (namely the target user) logs in the user terminal using the target account corresponding to the target user, the user terminal may generate a target dynamic verification code corresponding to the target account. Wherein, the user terminal can randomly select a candidate dynamic verification code from the pre-stored candidate dynamic verification code set as the target dynamic verification code corresponding to the target account; Select a preset number of candidate words, and randomly combine the selected preset number of candidate words as the target dynamic verification code corresponding to the target account; the user terminal can also use other methods to generate the target dynamic verification code corresponding to the target account. The verification code is not limited in the embodiment of this application. After the user terminal generates the target dynamic verification code corresponding to the target account, the target dynamic verification code may be displayed on the display interface. As an optional implementation manner, the user terminal may also display prompt information on the display interface for prompting the target user to read the target dynamic verification code aloud. Then, the user terminal can activate a voice collection device (such as a microphone) to collect voice data when the target user reads the target dynamic verification code aloud. After the user terminal obtains the voice data input by the target user according to the target dynamic verification code, it can send an identity verification request to the server. Wherein, the identity verification request carries the target account number, the target dynamic verification code and voice data. After receiving the identity verification request, the server may analyze the identity verification request to obtain the target account number, target dynamic verification code and voice data carried in the identity verification request.

步骤202，根据预设的分段算法，将语音数据划分为至少一个语音帧。Step 202: Divide the voice data into at least one voice frame according to a preset segmentation algorithm.

服务器中可以预先存储有预设的分段算法。其中，该分段算法可以为分帧加窗算法，也可以为其他类型的分段算法，本申请实施例不作限定。服务器得到语音数据后，可以根据预先存储的分段算法，将该语音数据划分为至少一个语音帧。其中，语音帧为时长约为25毫秒的短时语音片段。A preset segmentation algorithm may be pre-stored in the server. Wherein, the segmentation algorithm may be a frame segmentation and windowing algorithm, or may be another type of segmentation algorithm, which is not limited in this embodiment of the present application. After the server obtains the voice data, it can divide the voice data into at least one voice frame according to a pre-stored segmentation algorithm. Wherein, the speech frame is a short-term speech segment with a duration of about 25 milliseconds.

步骤203，针对每个语音帧，根据预设的声学特征提取算法，提取该语音帧对应的声学特征向量。Step 203, for each speech frame, according to a preset acoustic feature extraction algorithm, extract the corresponding acoustic feature vector of the speech frame.

在实施中，服务器中可以预先存储有声学特征提取算法。针对每个语音帧，服务器可以根据预设的声学特征提取算法，提取该语音帧对应的声学特征向量。其中，声学特征向量可以为梅尔倒谱系数等。In implementation, an acoustic feature extraction algorithm may be pre-stored in the server. For each speech frame, the server may extract the corresponding acoustic feature vector of the speech frame according to a preset acoustic feature extraction algorithm. Wherein, the acoustic feature vector may be a Mel cepstral coefficient or the like.

步骤203，将该语音帧对应的声学特征向量输入至预先训练的身份验证多任务模型，输出该语音帧对应的中间用户特征向量和第一后验概率集合。Step 203: Input the acoustic feature vector corresponding to the speech frame into the pre-trained identity verification multi-task model, and output the intermediate user feature vector and the first posterior probability set corresponding to the speech frame.

其中，第一后验概率集合包括各预设发音单元对应的后验概率。Wherein, the first posterior probability set includes the posterior probability corresponding to each preset pronunciation unit.

在实施中，服务器中可以预先存储有预先训练的身份验证多任务模型。针对语音帧集合中的每个语音帧，服务器得到该语音帧对应的声学特征向量后，可以将该语音帧对应的声学特征向量输入至该身份验证多任务模型。该身份验证多任务模型则可以输出该语音帧对应的中间用户特征向量和第一后验概率集合。其中，该第一后验概率集合包括该身份验证多任务模型中各预设发音单元对应的后验概率。In implementation, a pre-trained identity verification multi-task model may be pre-stored in the server. For each speech frame in the speech frame set, after the server obtains the acoustic feature vector corresponding to the speech frame, the acoustic feature vector corresponding to the speech frame may be input into the identity verification multi-task model. The identity verification multi-task model can output the intermediate user feature vector and the first posterior probability set corresponding to the voice frame. Wherein, the first posterior probability set includes the posterior probability corresponding to each preset pronunciation unit in the identity verification multi-task model.

图3为本申请实施例提供的一种身份验证多任务模型的结构示意图。如图3所示，该身份验证多任务模型包括多任务共享隐含层、声纹识别网络和语音识别网络。其中，针对语音帧集合中的每个语音帧，服务器将该语音帧对应的声学特征向量输入至预先训练的身份验证多任务模型，输出该语音帧对应的中间用户特征向量和第一后验概率集合的处理过程如下：FIG. 3 is a schematic structural diagram of an identity verification multi-task model provided by an embodiment of the present application. As shown in Figure 3, the identity verification multi-task model includes multi-task sharing hidden layer, voiceprint recognition network and speech recognition network. Wherein, for each speech frame in the speech frame set, the server inputs the acoustic feature vector corresponding to the speech frame to the pre-trained identity verification multi-task model, and outputs the intermediate user feature vector and the first posterior probability corresponding to the speech frame Collections are processed as follows:

步骤一，将该语音帧对应的声学特征向量输入至多任务共享隐含层，输出该语音帧对应的中间特征向量。Step 1: Input the acoustic feature vector corresponding to the speech frame to the multi-task sharing hidden layer, and output the intermediate feature vector corresponding to the speech frame.

在实施中，针对语音帧集合中的每个语音帧，服务器可以将该语音帧对应的声学特征向量输入至该身份验证多任务模型中的多任务共享隐含层。该多任务共享隐含层则可以输出该语音帧对应的中间特征向量。In an implementation, for each speech frame in the speech frame set, the server may input the acoustic feature vector corresponding to the speech frame into the multi-task sharing hidden layer in the identity verification multi-task model. The multi-task shared hidden layer can output the intermediate feature vector corresponding to the speech frame.

步骤二，将该语音帧对应的中间特征向量输入至语音识别网络，输出该语音帧对应的发音特征向量和第一后验概率集合。Step 2: Input the intermediate feature vector corresponding to the speech frame to the speech recognition network, and output the pronunciation feature vector and the first posterior probability set corresponding to the speech frame.

在实施中，服务器得到该语音帧对应的中间特征向量后，可以将该语音帧对应的中间特征向量输入至语音识别网络。该语音识别网络则可以输出该语音帧对应的发音特征向量和第一后验概率集合。In implementation, after the server obtains the intermediate feature vector corresponding to the speech frame, it may input the intermediate feature vector corresponding to the speech frame to the speech recognition network. The speech recognition network can output the pronunciation feature vector and the first posterior probability set corresponding to the speech frame.

步骤三，将该语音帧对应的中间特征向量和发音特征向量输入至声纹识别网络，输出该语音帧对应的中间用户特征向量。Step 3: Input the intermediate feature vector and pronunciation feature vector corresponding to the speech frame to the voiceprint recognition network, and output the intermediate user feature vector corresponding to the speech frame.

在实施中，服务器得到该语音帧对应的中间特征向量和发音特征向量后，可以将该语音帧对应的中间特征向量和发音特征向量输入至声纹识别网络。该声纹识别网络则可以输出该语音帧对应的中间用户特征向量。In implementation, after the server obtains the intermediate feature vector and the pronunciation feature vector corresponding to the speech frame, the server may input the intermediate feature vector and the pronunciation feature vector corresponding to the speech frame into the voiceprint recognition network. The voiceprint recognition network can output the intermediate user feature vector corresponding to the voice frame.

在本申请实施例中，相比于传统声纹识别模型，服务器将多任务共享隐含层输出的语音帧对应的中间特征向量和语音识别网络输出的语音帧对应的发音特征向量共同输入至声纹识别网络，从而提高了声纹识别网络输出语音帧对应的用户特征向量的准确率。In the embodiment of the present application, compared with the traditional voiceprint recognition model, the server jointly inputs the intermediate feature vector corresponding to the speech frame output by the multi-task sharing hidden layer and the pronunciation feature vector corresponding to the speech frame output by the speech recognition network into the voiceprint recognition model. The fingerprint recognition network improves the accuracy of the user feature vector corresponding to the voice frame output by the voiceprint recognition network.

步骤205，根据各语音帧对应的中间用户特征向量和预设的池化算法，确定目标用户对应的第一用户特征向量。Step 205: Determine the first user feature vector corresponding to the target user according to the intermediate user feature vector corresponding to each speech frame and the preset pooling algorithm.

在实施中，服务器中可以预先存储有池化算法。服务器得到各语音帧对应的中间用户特征向量后，可以根据预设的池化算法对各语音帧对应的中间用户特征向量进行池化处理，得到目标用户对应的第一用户特征向量。In implementation, a pooling algorithm may be pre-stored in the server. After obtaining the intermediate user feature vectors corresponding to each speech frame, the server may perform pooling processing on the intermediate user feature vectors corresponding to each speech frame according to a preset pooling algorithm to obtain the first user feature vector corresponding to the target user.

步骤206，根据目标用户对应的第一用户特征向量和各语音帧对应的第一后验概率集合，对目标用户进行身份验证。Step 206: Perform identity verification on the target user according to the first user feature vector corresponding to the target user and the first posterior probability set corresponding to each speech frame.

在实施中，服务器得到目标用户对应的第一用户特征向量和各语音帧对应的第一后验概率集合后，可以根据目标用户对应的第一用户特征向量和各语音帧对应的第一后验概率集合，对目标用户进行身份验证。这样，服务器无需部署两套结构、参数均不相同的声纹识别模型和语音识别模型，仅需要部署一套身份验证多任务模型，即可对用户的语音数据处理，从而降低服务器的计算复杂度，提高服务器的处理效率。In implementation, after the server obtains the first user feature vector corresponding to the target user and the first posterior probability set corresponding to each speech frame, it can use the first user feature vector corresponding to the target user and the first posterior probability set corresponding to each speech frame A collection of probabilities to authenticate the target user. In this way, the server does not need to deploy two sets of voiceprint recognition models and speech recognition models with different structures and parameters, but only needs to deploy a set of authentication multi-tasking models to process the user's voice data, thereby reducing the computational complexity of the server , improve the processing efficiency of the server.

其中，如图4所示，服务器根据目标用户对应的第一用户特征向量和各语音帧对应的第一后验概率集合，对目标用户进行身份验证的处理过程如下：Wherein, as shown in Figure 4, the server performs identity verification on the target user according to the first user feature vector corresponding to the target user and the first posterior probability set corresponding to each speech frame as follows:

步骤401，根据各语音帧对应的第一后验概率集合，确定目标用户对应的目标动态验证码分数。Step 401: Determine the target dynamic verification code score corresponding to the target user according to the first posterior probability set corresponding to each speech frame.

在实施中，服务器得到各语音帧对应的第一后验概率集合后，可以进一步根据各语音帧对应的第一后验概率集合，确定目标用户对应的目标动态验证码分数。服务器得到目标用户对应的第一用户特征向量和目标动态验证码分数后，可以进一步判断第一用户特征向量与预先存储的该目标用户(也即身份验证请求中携带的目标账号)对应的第二用户特征向量是否相同，且目标动态验证码分数是否大于或等于预设的动态验证码分数阈值。如果第一用户特征向量与预先存储的目标用户对应的第二用户特征向量的相似度大于或等于预设的相似度阈值，且目标动态验证码分数大于或等于预设的动态验证码分数阈值，则执行步骤402。如果第一用户特征向量与第二用户特征向量的相似度小于预设的相似度阈值，或者目标动态验证码分数小于预设的动态验证码分数阈值，则执行步骤403。其中，该相似度可以为欧式相似度，也可以为余弦相似度，还可以为其他类型的相似度，本申请实施例不作限定。In implementation, after obtaining the first posterior probability set corresponding to each voice frame, the server may further determine the target dynamic verification code score corresponding to the target user according to the first posterior probability set corresponding to each voice frame. After the server obtains the first user feature vector and the target dynamic verification code score corresponding to the target user, it can further determine the second user feature vector corresponding to the pre-stored target user (that is, the target account number carried in the identity verification request). Whether the user feature vectors are the same, and whether the target dynamic verification code score is greater than or equal to a preset dynamic verification code score threshold. If the similarity between the first user feature vector and the second user feature vector corresponding to the pre-stored target user is greater than or equal to the preset similarity threshold, and the target dynamic verification code score is greater than or equal to the preset dynamic verification code score threshold, Then step 402 is executed. If the similarity between the first user feature vector and the second user feature vector is smaller than the preset similarity threshold, or the target dynamic verification code score is smaller than the preset dynamic verification code score threshold, step 403 is executed. Wherein, the similarity may be Euclidean similarity, cosine similarity, or other types of similarity, which are not limited in this embodiment of the present application.

图5为本申请实施例提供的一种确定目标动态验证码分数的方法流程图，如图5所示，服务器根据各语音帧对应的第一后验概率集合，确定目标用户对应的目标动态验证码分数的处理过程如下：Figure 5 is a flow chart of a method for determining the target dynamic verification code score provided by the embodiment of the present application. As shown in Figure 5, the server determines the target dynamic verification code corresponding to the target user according to the first posterior probability set corresponding to each voice frame Code fractions are processed as follows:

步骤501，获取目标动态验证码对应的发音单元序列。Step 501, acquire the pronunciation unit sequence corresponding to the target dynamic verification code.

在实施中，服务器接收到该身份验证请求后，可以对该身份验证请求进行解析，得到该身份验证请求中携带的目标动态验证码。然后，服务器可以进一步获取目标动态验证码对应的发音单元序列。其中，服务器获取目标动态验证码对应的发音单元序列的处理过程如下：In implementation, after receiving the identity verification request, the server may parse the identity verification request to obtain the target dynamic verification code carried in the identity verification request. Then, the server may further acquire the pronunciation unit sequence corresponding to the target dynamic verification code. Wherein, the process of the server obtaining the pronunciation unit sequence corresponding to the target dynamic verification code is as follows:

步骤一，根据目标动态验证码和预设的分词算法，确定目标动态验证码对应的单词集合。Step 1: Determine the word set corresponding to the target dynamic verification code according to the target dynamic verification code and the preset word segmentation algorithm.

在实施中，服务器中可以预先存储有分词算法。服务器得到目标动态验证码后，可以根据预设的分词算法，对该目标动态验证码进行分词处理，得到该目标动态验证码对应的单词集合。例如，目标动态验证码为“清华大学”，服务器对该目标动态验证码进行分词处理后，得到该目标动态验证码对应的单词集合{“清华”，“大学”}。In implementation, word segmentation algorithms may be pre-stored in the server. After obtaining the target dynamic verification code, the server may perform word segmentation processing on the target dynamic verification code according to a preset word segmentation algorithm to obtain a word set corresponding to the target dynamic verification code. For example, the target dynamic verification code is "Tsinghua University", and the server performs word segmentation processing on the target dynamic verification code to obtain the word set {"Tsinghua", "University"} corresponding to the target dynamic verification code.

步骤二，针对单词集合中的每个单词，根据预先存储的单词和发音单元序列的对应关系，确定该单词对应的发音单元序列。Step 2, for each word in the word set, according to the pre-stored correspondence between the word and the pronunciation unit sequence, determine the pronunciation unit sequence corresponding to the word.

在实施中，服务器中可以预先存储有单词和发音单元序列的对应关系(也可称为发音字典)。表一为服务器中预先存储的单词和发音单元序列的对应关系，如表一所示。In an implementation, the correspondence relationship between words and pronunciation unit sequences (also called a pronunciation dictionary) may be pre-stored in the server. Table 1 shows the correspondence between words and pronunciation unit sequences pre-stored in the server, as shown in Table 1.

表一Table I

序号serial number 单词word 发音单元序列articulation unit sequence 11 北京Beijing b ei3 j ing1b ei3 j ing1 22 清华Tsinghua University q ing1 h ua2q ing1 h ua2 33 大学the University d a4 x ve2d a4 x ve2 44 航天aerospace h ang2 t ian1h ang2 t ian1

服务器得到目标动态验证码对应的单词集合后，针对单词集合中的每个单词，服务器可以根据预先存储的单词和发音单元序列的对应关系，确定该单词对应的发音单元序列。After the server obtains the word set corresponding to the target dynamic verification code, for each word in the word set, the server can determine the pronunciation unit sequence corresponding to the word according to the pre-stored correspondence between the word and the pronunciation unit sequence.

步骤三，将各单词对应的发音单元序列按照各单词在目标动态验证码中的顺序进行排序，得到目标动态验证码对应的发音单元序列。Step 3: sort the pronunciation unit sequence corresponding to each word according to the order of each word in the target dynamic verification code, and obtain the pronunciation unit sequence corresponding to the target dynamic verification code.

在实施中，服务器得到各单词对应的发音单元序列后，可以按照各单词在目标动态验证码中的顺序将各单词对应的发音单元序列进行排序，得到目标动态验证码对应的发音单元序列。例如，目标动态验证码为“清华大学”，目标动态验证码对应的发音单元序列为“q ing1 h ua2 d a4 x ve2”。In implementation, after the server obtains the pronunciation unit sequence corresponding to each word, it can sort the pronunciation unit sequence corresponding to each word according to the order of each word in the target dynamic verification code, and obtain the pronunciation unit sequence corresponding to the target dynamic verification code. For example, the target dynamic verification code is "Tsinghua University", and the pronunciation unit sequence corresponding to the target dynamic verification code is "q ing1 h ua2 d a4 x ve2".

步骤502，根据各语音帧对应的第一后验概率集合、发音单元序列和预设的强制对齐算法，确定各语音帧对应的目标发音单元。Step 502: Determine the target pronunciation unit corresponding to each speech frame according to the first posterior probability set corresponding to each speech frame, the pronunciation unit sequence and the preset forced alignment algorithm.

在实施中，服务器中可以预先存储有强制对齐算法。该强制对齐算法可以为维特比算法，也可以为其他类型的强制对齐算法，本申请实施例不作限定。服务器得到各语音帧对应的第一后验概率集合和目标动态验证码对应的发音单元序列后，可以将各语音帧与发音单元序列进行强制对齐，也即得到各语音帧在该发音单元序列中对应的起始时间和结束时间，从而得到各语音帧对应的目标发音单元。In implementation, the mandatory alignment algorithm may be pre-stored in the server. The forced alignment algorithm may be a Viterbi algorithm, or other types of forced alignment algorithms, which are not limited in this embodiment of the present application. After the server obtains the first posterior probability set corresponding to each speech frame and the pronunciation unit sequence corresponding to the target dynamic verification code, it can forcibly align each speech frame with the pronunciation unit sequence, that is, to obtain the position of each speech frame in the pronunciation unit sequence The corresponding start time and end time, so as to obtain the target pronunciation unit corresponding to each speech frame.

步骤503，针对每个语音帧，在该语音帧对应的第一后验概率集合中，确定该语音帧对应的目标发音单元的后验概率，并确定该目标发音单元的后验概率与预先存储的该目标发音单元的先验概率的乘积，作为该目标发音单元的似然值。Step 503, for each speech frame, in the first posterior probability set corresponding to the speech frame, determine the posterior probability of the target pronunciation unit corresponding to the speech frame, and determine the posterior probability of the target pronunciation unit and the pre-stored The product of the prior probability of the target pronunciation unit is taken as the likelihood value of the target pronunciation unit.

在实施中，服务器中可以预先存储有各预设的发音单元的先验概率。服务器得到各语音帧对应的目标发音单元后，针对每个语音帧，服务器可以在该语音帧对应的第一后验概率集合中，确定该语音帧对应的目标发音单元的后验概率。然后，服务器可以在预先存储的各预设的发音单元的先验概率中确定该目标发音单元的先验概率。之后，服务器可以计算该目标发音单元的后验概率与该目标发音单元的先验概率的乘积，得到该目标发音单元的似然值。In an implementation, prior probabilities of each preset pronunciation unit may be pre-stored in the server. After the server obtains the target pronunciation unit corresponding to each speech frame, for each speech frame, the server may determine the posterior probability of the target pronunciation unit corresponding to the speech frame in the first posterior probability set corresponding to the speech frame. Then, the server may determine the prior probability of the target pronunciation unit from the pre-stored prior probability of each preset pronunciation unit. Afterwards, the server may calculate the product of the posterior probability of the target pronunciation unit and the prior probability of the target pronunciation unit to obtain the likelihood value of the target pronunciation unit.

步骤504，根据各语音帧对应的目标发音单元的似然值，确定目标用户对应的目标动态验证码分数。Step 504: Determine the target dynamic verification code score corresponding to the target user according to the likelihood value of the target pronunciation unit corresponding to each speech frame.

在实施中，服务器得到各语音帧对应的目标发音单元的似然值后，可以根据各语音帧对应的目标发音单元的似然值，确定目标用户对应的目标动态验证码分数。其中，服务器根据各语音帧对应的目标发音单元的似然值，确定目标用户对应的目标动态验证码分数的处理过程如下：In implementation, after obtaining the likelihood value of the target pronunciation unit corresponding to each speech frame, the server may determine the target dynamic verification code score corresponding to the target user according to the likelihood value of the target pronunciation unit corresponding to each speech frame. Wherein, the server determines the target dynamic verification code score corresponding to the target user according to the likelihood value of the target pronunciation unit corresponding to each voice frame as follows:

步骤一，针对每个语音帧，确定该语音帧对应的目标发音单元的似然值与该语音帧对应的各预设发音单元的似然值中的最大似然值的差值，作为该语音帧对应的动态验证码分数。Step 1, for each speech frame, determine the difference between the likelihood value of the target pronunciation unit corresponding to the speech frame and the likelihood value of each preset pronunciation unit corresponding to the speech frame, as the difference of the maximum likelihood value in the speech frame The dynamic verification code score corresponding to the frame.

在实施中，服务器得到该语音帧对应的目标发音单元的似然值后，服务器可以在该语音帧对应的各预设发音单元的似然值中确定最大似然值。然后，针对每个语音帧，服务器可以计算该语音帧对应的目标发音单元的似然值与最大似然值的差值，得到该语音帧对应的动态验证码分数。In implementation, after the server obtains the likelihood value of the target pronunciation unit corresponding to the speech frame, the server may determine the maximum likelihood value among the likelihood values of the preset pronunciation units corresponding to the speech frame. Then, for each speech frame, the server may calculate the difference between the likelihood value of the target pronunciation unit corresponding to the speech frame and the maximum likelihood value, to obtain the dynamic verification code score corresponding to the speech frame.

步骤二，确定各语音帧对应的动态验证码分数的平均值，作为目标用户对应的目标动态验证码分数。Step 2: Determine the average value of the dynamic verification code scores corresponding to each voice frame, and use it as the target dynamic verification code score corresponding to the target user.

在实施中，服务器得到各语音帧对应的动态验证码分数后，可以计算各语音帧对应的动态验证码分数的平均值，作为目标用户对应的目标动态验证码分数。In implementation, after obtaining the dynamic verification code scores corresponding to each voice frame, the server may calculate an average value of the dynamic verification code scores corresponding to each voice frame, as the target dynamic verification code score corresponding to the target user.

传统的语音识别模型采用解码方式输出语音数据对应的文本内容，由于采用解码方式需要考虑语音数据对应的声学特征向量与发音单元之间、发音单元与单词之间、单词与单词之间的相互联系，并需要在较大的候选词表中进行解码操作，才能得到语音数据对应的最有可能的文本内容，导致传统的语音识别模型的计算复杂度较高。本申请实施例中，服务器得到各语音帧对应的各预设发音单元的后验概率后，根据动态验证码对应的发音单元序列和强制对齐算法，确定各语音帧对应的目标发音单元，并确定各语音帧对应的目标发音单元的似然值。然后，服务器可以根据各语音帧对应的目标发音单元的似然值确定目标动态验证码分数，从而降低计算复杂度。The traditional speech recognition model uses the decoding method to output the text content corresponding to the speech data. Because the decoding method needs to consider the relationship between the acoustic feature vector corresponding to the speech data and the pronunciation unit, the pronunciation unit and the word, and the word and the word. , and it is necessary to perform a decoding operation in a large candidate vocabulary to obtain the most likely text content corresponding to the speech data, which leads to a high computational complexity of the traditional speech recognition model. In the embodiment of the present application, after obtaining the posterior probability of each preset pronunciation unit corresponding to each voice frame, the server determines the target pronunciation unit corresponding to each voice frame according to the pronunciation unit sequence corresponding to the dynamic verification code and the forced alignment algorithm, and determines The likelihood value of the target pronunciation unit corresponding to each speech frame. Then, the server can determine the target dynamic verification code score according to the likelihood value of the target pronunciation unit corresponding to each speech frame, thereby reducing the computational complexity.

步骤402，确定目标用户为合法用户。Step 402, determining that the target user is a legitimate user.

在实施中，如果第一用户特征向量与预先存储的目标用户对应的第二用户特征向量的相似度大于或等于预设的相似度阈值，且目标动态验证码分数大于或等于预设的动态验证码分数阈值，则服务器可以确定该目标用户为合法用户。然后，服务器可以向用户终端发送身份验证成功响应。该用户终端接收到该身份验证成功响应后，允许目标用户使用该目标账号登录该用户终端。In implementation, if the similarity between the first user feature vector and the second user feature vector corresponding to the pre-stored target user is greater than or equal to the preset similarity threshold, and the target dynamic verification code score is greater than or equal to the preset dynamic verification code score threshold, the server can determine that the target user is a legitimate user. Then, the server may send an authentication success response to the user terminal. After receiving the identity verification success response, the user terminal allows the target user to use the target account to log in to the user terminal.

步骤403，确定目标用户为非法用户。Step 403, determining that the target user is an illegal user.

在实施中，如果第一用户特征向量与第二用户特征向量的相似度小于预设的相似度阈值，或者目标动态验证码分数小于预设的动态验证码分数阈值，则服务器确定该目标用户为非法用户。然后，服务器可以向用户终端发送身份验证失败响应。该用户终端接收到该身份验证失败响应后，拒绝目标用户使用该目标账号登录该用户终端。In implementation, if the similarity between the first user feature vector and the second user feature vector is less than the preset similarity threshold, or the target dynamic verification code score is less than the preset dynamic verification code score threshold, the server determines that the target user is illegal user. Then, the server may send an authentication failure response to the user terminal. After receiving the identity verification failure response, the user terminal rejects the target user's use of the target account to log in to the user terminal.

本申请实施例提供了一种身份验证的方法。服务器获取目标用户根据目标动态验证码输入的语音数据，根据预设的分段算法，将语音数据划分为至少一个语音帧。然后，针对每个语音帧，服务器根据预设的声学特征提取算法，确定该语音帧对应的声学特征向量。之后，服务器将该语音帧对应的声学特征向量输入至预先训练的身份验证多任务模型，输出该语音帧对应的中间用户特征向量和第一后验概率集合。其中，第一后验概率集合包括各预设发音单元对应的后验概率。最后，服务器根据各语音帧对应的中间用户特征向量和预设的池化算法，确定目标用户对应的第一用户特征向量，并根据目标用户对应的第一用户特征向量和各语音帧对应的第一后验概率集合，对目标用户进行身份验证。这样，服务器无需部署两套结构、参数均不相同的声纹识别模型和语音识别模型，仅需要部署一套身份验证多任务模型，即可对用户的语音数据处理，从而降低服务器的计算复杂度，提高服务器的处理效率。The embodiment of this application provides an identity verification method. The server acquires the voice data input by the target user according to the target dynamic verification code, and divides the voice data into at least one voice frame according to a preset segmentation algorithm. Then, for each speech frame, the server determines an acoustic feature vector corresponding to the speech frame according to a preset acoustic feature extraction algorithm. Afterwards, the server inputs the acoustic feature vector corresponding to the speech frame into the pre-trained identity verification multi-task model, and outputs the intermediate user feature vector and the first posterior probability set corresponding to the speech frame. Wherein, the first posterior probability set includes the posterior probability corresponding to each preset pronunciation unit. Finally, the server determines the first user feature vector corresponding to the target user according to the intermediate user feature vector corresponding to each speech frame and the preset pooling algorithm, and according to the first user feature vector corresponding to the target user and the first user feature vector corresponding to each speech frame A set of posterior probabilities for authenticating the target user. In this way, the server does not need to deploy two sets of voiceprint recognition models and speech recognition models with different structures and parameters, but only needs to deploy a set of authentication multi-tasking models to process the user's voice data, thereby reducing the computational complexity of the server , improve the processing efficiency of the server.

本申请实施例还提供了一种账号注册的方法，如图6所示，具体处理过程如下：The embodiment of the present application also provides a method for account registration, as shown in Figure 6, the specific processing process is as follows:

步骤601，获取目标用户根据目标动态验证码输入的语音数据。Step 601, acquire voice data input by the target user according to the target dynamic verification code.

在实施中，当某一用户(即目标用户)创建该目标用户登录用户终端对应的目标账号时，该目标用户可以在该用户终端的账号注册界面中的账号输入框中输入该目标用户的目标账号。然后，用户终端可以生成该目标账号对应的目标动态验证码。其中，用户终端生成该目标账号对应的目标动态验证码的处理过程与步骤201中用户终端生成该目标账号对应的目标动态验证码的处理过程类似，此处不再赘述。用户终端生成该目标账号对应的目标动态验证码后，可以在显示界面中显示该目标动态验证码。作为一种可选地实施方式，该用户终端还可以在该显示界面中显示用于提示目标用户朗读该目标动态验证码的提示信息。然后，该用户终端可以启动语音采集装置(比如麦克风)，采集该目标用户朗读该目标动态验证码时的语音数据。该用户终端得到该目标用户根据目标动态验证码输入的语音数据后，可以向服务器发送账号注册请求。其中，该账号注册请求中携带有目标账号、目标动态验证码和语音数据。服务器接收到该账号注册请求后，可以对该账号注册请求进行解析，得到该账号注册请求中携带的目标账号、目标动态验证码和语音数据。In implementation, when a certain user (i.e., the target user) creates a target account corresponding to the target user's login user terminal, the target user can input the target account of the target user in the account input box in the account registration interface of the user terminal. account. Then, the user terminal may generate a target dynamic verification code corresponding to the target account. Wherein, the processing process of the user terminal generating the target dynamic verification code corresponding to the target account is similar to the processing process of the user terminal generating the target dynamic verification code corresponding to the target account in step 201, and will not be repeated here. After the user terminal generates the target dynamic verification code corresponding to the target account, it can display the target dynamic verification code on the display interface. As an optional implementation manner, the user terminal may also display prompt information on the display interface for prompting the target user to read the target dynamic verification code aloud. Then, the user terminal can activate a voice collection device (such as a microphone) to collect voice data when the target user reads the target dynamic verification code aloud. After the user terminal obtains the voice data input by the target user according to the target dynamic verification code, it can send an account registration request to the server. Wherein, the account registration request carries the target account, the target dynamic verification code and voice data. After receiving the account registration request, the server may analyze the account registration request to obtain the target account number, target dynamic verification code and voice data carried in the account registration request.

步骤602，根据预设的分段算法，将语音数据划分为至少一个语音帧。Step 602: Divide the voice data into at least one voice frame according to a preset segmentation algorithm.

步骤603，针对每个语音帧，根据预设的声学特征提取算法，提取该语音帧对应的声学特征向量。Step 603, for each speech frame, extract an acoustic feature vector corresponding to the speech frame according to a preset acoustic feature extraction algorithm.

步骤604，将该语音帧对应的声学特征向量输入至预先训练的身份验证多任务模型，输出该语音帧对应的中间用户特征向量。Step 604: Input the acoustic feature vector corresponding to the speech frame into the pre-trained identity verification multi-task model, and output the intermediate user feature vector corresponding to the speech frame.

步骤605，根据各语音帧对应的中间用户特征向量和预设的池化算法，确定目标用户对应的目标用户特征向量。Step 605: Determine the target user feature vector corresponding to the target user according to the intermediate user feature vector corresponding to each speech frame and the preset pooling algorithm.

在实施中，步骤602至步骤605的处理过程与上述身份验证的方法步骤中服务器确定目标用户对应的第一用户特征向量的处理过程类似，此处不再赘述。In practice, the processing from step 602 to step 605 is similar to the processing in which the server determines the first user feature vector corresponding to the target user in the steps of the identity verification method described above, and will not be repeated here.

步骤606，存储目标账号和目标用户特征向量的对应关系。Step 606, storing the corresponding relationship between the target account and the feature vector of the target user.

在实施中，服务器得到目标用户对应的目标用户特征向量后，可以将该目标用户对应的目标账号和目标用户特征向量的对应关系存储至本地，以便对该目标用户进行身份验证。In implementation, after the server obtains the target user feature vector corresponding to the target user, it may store the corresponding relationship between the target account corresponding to the target user and the target user feature vector locally, so as to authenticate the target user.

本申请实施例还提供了一种身份验证多任务模型的训练方法，如图7所示，具体处理过程如下：The embodiment of the present application also provides a training method for an identity verification multi-task model, as shown in FIG. 7 , and the specific processing process is as follows:

步骤701，初始化待训练的身份验证多任务模型。Step 701, initialize the identity verification multi-task model to be trained.

在实施中，服务器中可以预先存储有待训练的身份验证多任务模型。当服务器需要对待训练的身份验证多任务模型进行训练时，服务器可以随机初始化待训练的身份验证多任务模型中的参数。In implementation, the identity verification multi-task model to be trained may be stored in advance in the server. When the server needs to train the identity verification multi-task model to be trained, the server can randomly initialize parameters in the identity verification multi-task model to be trained.

步骤702A，获取预先存储的第一训练样本集合。Step 702A, obtain a pre-stored first training sample set.

其中，第一训练样本集合包括多个样本用户标识和每个样本用户标识对应的第一样本语音数据。Wherein, the first training sample set includes a plurality of sample user identifiers and first sample speech data corresponding to each sample user identifier.

步骤702B，获取预先存储的第二训练样本集合。Step 702B, acquiring a pre-stored second training sample set.

其中，第二训练样本集合包括多个第二样本语音帧和每个第二样本语音帧对应的样本发音单元。Wherein, the second training sample set includes a plurality of second sample speech frames and a sample pronunciation unit corresponding to each second sample speech frame.

在实施中，服务器中可以预先存储有多个第一训练样本集合和多个第二训练样本集合。其中，第一训练样本集合包括多个样本用户标识和每个样本用户标识对应的第一样本语音数据；第二训练样本集合包括多个第二样本语音帧和每个第二样本语音帧对应的样本发音单元。服务器对待训练的身份验证多任务模型中的参数进行初始化后，可以获取预先存储的第一预设数目个第一训练样本集合和第二预设数目个第二训练样本集合。In an implementation, multiple first training sample sets and multiple second training sample sets may be pre-stored in the server. Wherein, the first training sample set includes a plurality of sample user IDs and first sample voice data corresponding to each sample user ID; the second training sample set includes a plurality of second sample voice frames and each second sample voice frame corresponds to A sample pronunciation unit of . After the server initializes the parameters in the identity verification multi-task model to be trained, it may acquire a first preset number of first training sample sets and a second preset number of second training sample sets stored in advance.

步骤703，针对第一训练样本集合中的每个第一样本语音数据，根据预设的分段算法，将该第一样本语音数据划分为至少一个第一样本语音帧。Step 703: For each first sample speech data in the first training sample set, divide the first sample speech data into at least one first sample speech frame according to a preset segmentation algorithm.

在实施中，步骤703的处理过程与步骤202的处理过程类似，此处不再赘述。In implementation, the processing process of step 703 is similar to the processing process of step 202, and will not be repeated here.

步骤704A，针对该第一样本语音数据对应的每个第一样本语音帧，根据预设的声学特征提取算法，提取该第一样本语音帧对应的声学特征向量。Step 704A, for each first sample speech frame corresponding to the first sample speech data, extract an acoustic feature vector corresponding to the first sample speech frame according to a preset acoustic feature extraction algorithm.

步骤704B，针对第二训练样本集合中的每个第二样本语音帧，根据预设的声学特征提取算法，提取该第二样本语音帧对应的声学特征向量。Step 704B, for each second sample speech frame in the second training sample set, extract an acoustic feature vector corresponding to the second sample speech frame according to a preset acoustic feature extraction algorithm.

在实施中，步骤704A和步骤704B的处理过程与步骤203的处理过程类似，此处不再赘述。In implementation, the processing procedures of step 704A and step 704B are similar to the processing procedure of step 203, and will not be repeated here.

步骤705A，将该第一样本语音数据对应的各第一样本语音帧的声学特征向量输入至待训练的身份验证多任务模型，输出该第一样本语音数据对应的第二后验概率集合。Step 705A, input the acoustic feature vectors of each first sample speech frame corresponding to the first sample speech data into the identity verification multi-task model to be trained, and output the second posterior probability corresponding to the first sample speech data gather.

其中，第二后验概率集合包括各样本用户标识对应的后验概率。Wherein, the second posterior probability set includes the posterior probability corresponding to each sample user identifier.

在实施中，服务器得到该第一样本语音数据对应的各第一样本语音帧的声学特征向量后，可以将该第一样本语音数据对应的各第一样本语音帧的声学特征向量输入至该身份验证多任务模型。该身份验证多任务模型则可以输出该第一样本语音数据对应的第二后验概率集合。其中，该第二后验概率集合包括各样本用户标识对应的后验概率。In implementation, after the server obtains the acoustic feature vectors of each first sample speech frame corresponding to the first sample speech data, the acoustic feature vector of each first sample speech frame corresponding to the first sample speech data can be Input to the authentication multitasking model. The identity verification multi-task model can output a second posterior probability set corresponding to the first sample voice data. Wherein, the second posterior probability set includes the posterior probability corresponding to each sample user identifier.

步骤705B，将该第二样本语音帧对应的声学特征向量输入至待训练的身份验证多任务模型，输出该第二样本语音帧对应的第三后验概率集合。Step 705B: Input the acoustic feature vector corresponding to the second sample speech frame into the identity verification multi-task model to be trained, and output the third posterior probability set corresponding to the second sample speech frame.

其中，第三后验概率集合包括各样本发音单元对应的后验概率。Wherein, the third posterior probability set includes the posterior probability corresponding to each sample pronunciation unit.

在实施中，针对每个第二样本语音帧，服务器得到该第二样本语音帧对应的声学特征向量后，可以将该第二样本语音帧对应的声学特征向量输入至该身份验证多任务模型。该身份验证多任务模型则可以输出该第二样本语音帧对应的第三后验概率集合。其中，该第三后验概率集合包括各样本发音单元对应的后验概率。In an implementation, for each second sample speech frame, after the server obtains the acoustic feature vector corresponding to the second sample speech frame, the acoustic feature vector corresponding to the second sample speech frame may be input into the identity verification multi-task model. The identity verification multi-task model may output a third posterior probability set corresponding to the second sample speech frame. Wherein, the third posterior probability set includes the posterior probability corresponding to each sample pronunciation unit.

步骤706A，根据各第一样本语音数据对应的样本用户标识的后验概率，确定第一训练样本集合对应的第一代价函数。Step 706A: Determine the first cost function corresponding to the first training sample set according to the posterior probability of the sample user ID corresponding to each first sample speech data.

在实施中，服务器得到各第一样本语音数据对应的第二后验概率集合后，针对每个第一样本语音数据，服务器可以在该第一样本语音数据对应的第二后验概率集合中确定该第一样本语音数据对应的样本用户标识的后验概率。然后，服务器可以根据各第一样本语音数据对应的样本用户标识的后验概率，确定第一训练样本集合对应的第一代价函数。In an implementation, after the server obtains the second posterior probability set corresponding to each first sample speech data, for each first sample speech data, the server can calculate the second posterior probability set corresponding to the first sample speech data The posterior probability of the sample user identity corresponding to the first sample voice data is determined in the set. Then, the server may determine the first cost function corresponding to the first training sample set according to the posterior probability of the sample user identification corresponding to each first sample voice data.

步骤706B，根据各第二样本语音帧对应的样本发音单元的后验概率，确定第二训练样本集合对应的第二代价函数。Step 706B: Determine a second cost function corresponding to the second training sample set according to the posterior probability of the sample pronunciation unit corresponding to each second sample speech frame.

在实施中，服务器得到各第二样本语音帧对应的第三后验概率集合后，针对每个第二样本语音帧，服务器可以在该第二样本语音帧对应的第三后验概率集合中确定该第二样本语音帧对应的样本发音单元的后验概率。然后，服务器可以根据各第二样本语音帧对应的样本发音单元的后验概率，确定第二训练样本集合对应的第二代价函数。In implementation, after the server obtains the third posterior probability set corresponding to each second sample speech frame, for each second sample speech frame, the server can determine in the third posterior probability set corresponding to the second sample speech frame The posterior probability of the sample pronunciation unit corresponding to the second sample speech frame. Then, the server may determine the second cost function corresponding to the second training sample set according to the posterior probability of the sample pronunciation unit corresponding to each second sample speech frame.

步骤707A，根据第一代价函数和预设的第一参数更新算法，更新待训练的身份验证多任务模型中多任务共享隐含层对应的参数、声纹识别网络对应的参数和语音识别网络对应的参数。Step 707A, according to the first cost function and the preset first parameter update algorithm, update the parameters corresponding to the multi-task shared hidden layer in the identity verification multi-task model to be trained, the parameters corresponding to the voiceprint recognition network and the corresponding parameters of the speech recognition network parameters.

在实施中，服务器中可以预先存储有第一参数更新算法。该第一参数更新算法可以为随机梯度下降法。服务器得到第一代价函数后，可以根据第一代价函数和第一参数更新算法，更新待训练的身份验证多任务模型中多任务共享隐含层对应的参数、声纹识别网络对应的参数和语音识别网络对应的参数。In implementation, the first parameter update algorithm may be pre-stored in the server. The first parameter update algorithm may be a stochastic gradient descent method. After the server obtains the first cost function, it can update the algorithm according to the first cost function and the first parameter to update the parameters corresponding to the multi-task shared hidden layer in the identity verification multi-task model to be trained, the parameters corresponding to the voiceprint recognition network and the voice Identify the parameters corresponding to the network.

步骤707B，根据第二代价函数和预设的第二参数更新算法，更新待训练的身份验证多任务模型中多任务共享隐含层对应的参数和语音识别网络对应的参数。Step 707B, according to the second cost function and the preset second parameter update algorithm, update the parameters corresponding to the multi-task shared hidden layer and the parameters corresponding to the speech recognition network in the identity verification multi-task model to be trained.

在实施中，服务器中可以预先存储有第二参数更新算法。该第二参数更新算法可以为随机梯度下降法。服务器得到第二代价函数后，可以根据第二代价函数和第二参数更新算法，更新待训练的身份验证多任务模型中多任务共享隐含层对应的参数和语音识别网络对应的参数。In implementation, the second parameter update algorithm may be pre-stored in the server. The second parameter update algorithm may be a stochastic gradient descent method. After obtaining the second cost function, the server can update parameters corresponding to the multi-task shared hidden layer and parameters corresponding to the speech recognition network in the identity verification multi-task model to be trained according to the second cost function and the second parameter update algorithm.

本申请实施例还提供了一种身份验证多任务模型的验证方法，如图8所示，具体处理过程如下：The embodiment of the present application also provides a verification method for an identity verification multi-task model, as shown in FIG. 8 , and the specific processing process is as follows:

步骤801，获取预先存储的多个验证样本集合。Step 801, obtaining a plurality of verification sample sets stored in advance.

其中，每个验证样本集合包括多个样本用户标识和每个样本用户标识对应第二样本语音数据。Wherein, each verification sample set includes a plurality of sample user identifiers and each sample user identifier corresponds to the second sample voice data.

在实施中，服务器中可以预先存储有多个验证样本集合。其中，每个验证样本集合包括多个样本用户标识和每个样本用户标识对应第三样本语音帧，该多个样本用户标识为上述第一训练样本集合中的样本用户标识。当服务器需要对待验证的身份验证多任务模型验证时，服务器可以获取预先存储的多个验证样本集合。In implementation, multiple verification sample sets may be pre-stored in the server. Wherein, each verification sample set includes a plurality of sample user IDs and each sample user ID corresponds to a third sample speech frame, and the multiple sample user IDs are sample user IDs in the first training sample set. When the server needs to verify the identity verification multi-task model to be verified, the server can acquire multiple verification sample sets stored in advance.

步骤802，针对每个验证样本集合中的每个第二样本语音数据，根据预设的分段算法，将该第二样本语音数据划分为至少一个第三样本语音帧。Step 802, for each second sample speech data in each verification sample set, divide the second sample speech data into at least one third sample speech frame according to a preset segmentation algorithm.

在实施中，步骤802的处理过程与步骤703的处理过程类似，此处不再赘述。In implementation, the processing process of step 802 is similar to the processing process of step 703, and will not be repeated here.

步骤803，针对该第二样本语音数据对应的每个第三样本语音帧，根据预设的声学特征提取算法，提取该第三样本语音帧对应的声学特征向量。Step 803, for each third sample speech frame corresponding to the second sample speech data, extract an acoustic feature vector corresponding to the third sample speech frame according to a preset acoustic feature extraction algorithm.

在实施中，步骤803的处理过程与步骤704A及步骤704B的处理过程类似，此处不再赘述。In implementation, the processing process of step 803 is similar to the processing process of step 704A and step 704B, and will not be repeated here.

步骤804，将该第二样本语音数据对应的各第三样本语音帧的声学特征向量输入至待验证的身份验证多任务模型，输出该第二样本语音数据对应的第四后验概率集合。Step 804: Input the acoustic feature vectors of the third sample speech frames corresponding to the second sample speech data into the identity verification multi-task model to be verified, and output the fourth posterior probability set corresponding to the second sample speech data.

其中，第四后验概率集合包括各样本用户标识对应的后验概率。Wherein, the fourth posterior probability set includes the posterior probability corresponding to each sample user identifier.

在实施中，步骤804的处理过程与步骤705A的处理过程类似，此处不再赘述。In implementation, the processing process of step 804 is similar to the processing process of step 705A, and will not be repeated here.

步骤805，如果该第二样本语音数据对应的样本用户标识的后验概率为该第二样本语音数据对应的第四后验概率集合中的最大值，则确定该第二样本语音数据为目标样本语音数据。Step 805, if the posterior probability of the sample user identity corresponding to the second sample voice data is the maximum value in the fourth posterior probability set corresponding to the second sample voice data, then determine that the second sample voice data is the target sample voice data.

在实施中，服务器得到该第二样本语音数据对应的第四后验概率集合后，可以在第四后验概率集合中确定该第二样本语音数据对应的样本用户标识的后验概率是否为最大值。如果该第二样本语音数据对应的样本用户标识的后验概率为最大值，则服务器可以将该第二样本语音数据确定为目标样本语音数据。In implementation, after the server obtains the fourth posterior probability set corresponding to the second sample voice data, it can determine whether the posterior probability of the sample user identifier corresponding to the second sample voice data is the largest in the fourth posterior probability set value. If the posterior probability of the sample user identifier corresponding to the second sample voice data is the maximum value, the server may determine the second sample voice data as the target sample voice data.

步骤806，确定该验证样本集合中目标样本语音数据的数目与该验证样本集合中第二样本语音数据的总数目的比值，作为该验证样本集合的准确率。Step 806: Determine the ratio of the number of target sample speech data in the verification sample set to the total number of second sample speech data in the verification sample set as the accuracy rate of the verification sample set.

在实施中，针对每个验证样本集合，服务器确定出该验证样本集合中的各目标样本语音数据后，可以进一步计算该验证样本集合中目标样本语音数据的数目与该验证样本集合中第二样本语音数据的总数目的比值，得到该验证样本集合的准确率。In implementation, for each verification sample set, after the server determines each target sample speech data in the verification sample set, it can further calculate the number of target sample speech data in the verification sample set and the number of the second sample speech data in the verification sample set. The ratio of the total number of speech data to obtain the accuracy rate of the verification sample set.

步骤807，根据各验证样本集合的准确率，确定各验证样本集合对应的准确率的变化率，如果存在连续预设数目个验证样本集合对应的准确率的变化率小于或等于预设的变化率阈值，则确定待验证的身份验证多任务模型训练完成。Step 807, according to the accuracy rate of each verification sample set, determine the rate of change of the accuracy rate corresponding to each verification sample set, if there are consecutive preset numbers of verification sample sets, the rate of change of the accuracy rate corresponding to the set is less than or equal to the preset rate of change threshold, it is determined that the identity verification multi-task model training to be verified is completed.

在实施中，服务器得到各验证样本集合的准确率后，可以计算各验证样本集合对应的变化率。如果存在连续预设数目个验证样本集合的变化率小于或等于预设的变化率阈值，则服务器可以确定待验证的身份验证多任务模型训练完成。In implementation, after the server obtains the accuracy rate of each verification sample set, it can calculate the change rate corresponding to each verification sample set. If there are a preset number of consecutive verification sample sets whose change rates are less than or equal to the preset change rate threshold, the server may determine that the training of the identity verification multi-task model to be verified is completed.

本申请实施例还提供了一种身份验证的装置，如图9所示，该装置包括：The embodiment of the present application also provides an identity verification device, as shown in Figure 9, the device includes:

第一获取模块910，用于获取目标用户根据目标动态验证码输入的语音数据；The first obtaining module 910 is used to obtain the voice data input by the target user according to the target dynamic verification code;

第一划分模块920，用于根据预设的分段算法，将语音数据划分为至少一个语音帧；The first division module 920 is used to divide the voice data into at least one voice frame according to a preset segmentation algorithm;

第一提取模块930，用于针对每个语音帧，根据预设的声学特征提取算法，提取该语音帧对应的声学特征向量；The first extraction module 930 is used for extracting the acoustic feature vector corresponding to the speech frame according to a preset acoustic feature extraction algorithm for each speech frame;

第一输出模块940，用于将该语音帧对应的声学特征向量输入至预先训练的身份验证多任务模型，输出该语音帧对应的中间用户特征向量和第一后验概率集合，第一后验概率集合包括各预设发音单元对应的后验概率；The first output module 940 is used to input the acoustic feature vector corresponding to the voice frame into the pre-trained identity verification multi-task model, output the intermediate user feature vector and the first posterior probability set corresponding to the voice frame, and the first posterior The probability set includes the posterior probability corresponding to each preset pronunciation unit;

第一确定模块950，用于根据各语音帧对应的中间用户特征向量和预设的池化算法，确定目标用户对应的第一用户特征向量；The first determination module 950 is used to determine the first user feature vector corresponding to the target user according to the intermediate user feature vector corresponding to each speech frame and the preset pooling algorithm;

验证模块960，用于根据目标用户对应的第一用户特征向量和各语音帧对应的第一后验概率集合，对目标用户进行身份验证。The verification module 960 is configured to perform identity verification on the target user according to the first user feature vector corresponding to the target user and the first posterior probability set corresponding to each speech frame.

作为一种可选的实施方式，身份验证多任务模型包括多任务共享隐含层、声纹识别网络和语音识别网络；As an optional implementation, the identity verification multi-task model includes a multi-task shared hidden layer, a voiceprint recognition network and a speech recognition network;

第一输出模块940，具体用于：The first output module 940 is specifically used for:

将该语音帧对应的声学特征向量输入至多任务共享隐含层，输出该语音帧对应的中间特征向量；Input the acoustic feature vector corresponding to the speech frame to the multi-task sharing hidden layer, and output the intermediate feature vector corresponding to the speech frame;

将该语音帧对应的中间特征向量输入至语音识别网络，输出该语音帧对应的发音特征向量和第一后验概率集合；The intermediate feature vector corresponding to the speech frame is input to the speech recognition network, and the pronunciation feature vector and the first posterior probability set corresponding to the speech frame are output;

将该语音帧对应的中间特征向量和发音特征向量输入至声纹识别网络，输出该语音帧对应的中间用户特征向量。The intermediate feature vector and pronunciation feature vector corresponding to the voice frame are input to the voiceprint recognition network, and the intermediate user feature vector corresponding to the voice frame is output.

作为一种可选的实施方式，验证模块960，具体用于：As an optional implementation manner, the verification module 960 is specifically used for:

根据各语音帧对应的第一后验概率集合，确定目标用户对应的目标动态验证码分数；Determine the target dynamic verification code score corresponding to the target user according to the first posterior probability set corresponding to each speech frame;

如果第一用户特征向量与预先存储的目标用户对应的第二用户特征向量的相似度大于或等于预设的相似度阈值，且目标动态验证码分数大于或等于预设的动态验证码分数阈值，则确定目标用户为合法用户；If the similarity between the first user feature vector and the second user feature vector corresponding to the pre-stored target user is greater than or equal to the preset similarity threshold, and the target dynamic verification code score is greater than or equal to the preset dynamic verification code score threshold, Then determine that the target user is a legitimate user;

如果第一用户特征向量与第二用户特征向量的相似度小于预设的相似度阈值，或者目标动态验证码分数小于预设的动态验证码分数阈值，则确定目标用户为非法用户。If the similarity between the first user feature vector and the second user feature vector is smaller than a preset similarity threshold, or the target dynamic verification code score is smaller than the preset dynamic verification code score threshold, it is determined that the target user is an illegal user.

获取目标动态验证码对应的发音单元序列；Obtain the pronunciation unit sequence corresponding to the target dynamic verification code;

根据各语音帧对应的第一后验概率集合、发音单元序列和预设的强制对齐算法，确定各语音帧对应的目标发音单元；According to the first posterior probability set corresponding to each speech frame, the pronunciation unit sequence and the preset forced alignment algorithm, determine the target pronunciation unit corresponding to each speech frame;

根据各语音帧对应的目标发音单元的似然值，确定目标用户对应的目标动态验证码分数。According to the likelihood value of the target pronunciation unit corresponding to each speech frame, the target dynamic verification code score corresponding to the target user is determined.

根据目标动态验证码和预设的分词算法，确定目标动态验证码对应的单词集合；Determine the word set corresponding to the target dynamic verification code according to the target dynamic verification code and the preset word segmentation algorithm;

针对单词集合中的每个单词，根据预先存储的单词和发音单元序列的对应关系，确定该单词对应的发音单元序列；For each word in the word set, determine the pronunciation unit sequence corresponding to the word according to the correspondence between the pre-stored word and the pronunciation unit sequence;

将各单词对应的发音单元序列按照各单词在目标动态验证码中的顺序进行排序，得到目标动态验证码对应的发音单元序列。The pronunciation unit sequence corresponding to each word is sorted according to the order of each word in the target dynamic verification code, and the pronunciation unit sequence corresponding to the target dynamic verification code is obtained.

针对每个语音帧，确定该语音帧对应的目标发音单元的似然值与该语音帧对应的各预设发音单元的似然值中的最大似然值的差值，作为该语音帧对应的动态验证码分数；For each speech frame, determine the difference between the likelihood value of the target pronunciation unit corresponding to the speech frame and the maximum likelihood value in the likelihood values of each preset pronunciation unit corresponding to the speech frame, as the corresponding dynamic captcha score;

确定各语音帧对应的动态验证码分数的平均值，作为目标用户对应的目标动态验证码分数。The average value of the dynamic verification code scores corresponding to each speech frame is determined as the target dynamic verification code score corresponding to the target user.

作为一种可选的实施方式，该装置还包括：As an optional implementation, the device also includes:

第二获取模块，用于获取预先存储的第一训练样本集合，第一训练样本集合包括多个样本用户标识和每个样本用户标识对应的第一样本语音数据；The second obtaining module is used to obtain a pre-stored first training sample set, the first training sample set includes a plurality of sample user identifications and first sample voice data corresponding to each sample user identification;

第二划分模块，用于针对第一训练样本集合中的每个第一样本语音数据，根据预设的分段算法，将该第一样本语音数据划分为至少一个第一样本语音帧；The second division module is used to divide the first sample speech data into at least one first sample speech frame according to a preset segmentation algorithm for each first sample speech data in the first training sample set ;

第二提取模块，用于针对该第一样本语音数据对应的每个第一样本语音帧，根据预设的声学特征提取算法，提取该第一样本语音帧对应的声学特征向量；The second extraction module is used to extract an acoustic feature vector corresponding to the first sample speech frame according to a preset acoustic feature extraction algorithm for each first sample speech frame corresponding to the first sample speech data;

第二输出模块，用于将该第一样本语音数据对应的各第一样本语音帧的声学特征向量输入至待训练的身份验证多任务模型，输出该第一样本语音数据对应的第二后验概率集合，第二后验概率集合包括各样本用户标识对应的后验概率；The second output module is used to input the acoustic feature vector of each first sample speech frame corresponding to the first sample speech data to the identity verification multi-task model to be trained, and output the first sample speech data corresponding to the first acoustic feature vector Two posterior probability sets, the second posterior probability set includes the posterior probability corresponding to each sample user identifier;

第二确定模块，用于根据各第一样本语音数据对应的样本用户标识的后验概率，确定第一训练样本集合对应的第一代价函数；The second determination module is used to determine the first cost function corresponding to the first training sample set according to the posterior probability of the sample user identification corresponding to each first sample speech data;

第一更新模块，用于根据第一代价函数和预设的第一参数更新算法，更新待训练的身份验证多任务模型中多任务共享隐含层对应的参数、声纹识别网络对应的参数和语音识别网络对应的参数。The first update module is used to update the parameters corresponding to the multi-task shared hidden layer in the identity verification multi-task model to be trained according to the first cost function and the preset first parameter update algorithm, the parameters corresponding to the voiceprint recognition network and Parameters corresponding to the speech recognition network.

第三获取模块，用于获取预先存储的第二训练样本集合，第二训练样本集合包括多个第二样本语音帧和每个第二样本语音帧对应的样本发音单元；The third obtaining module is used to obtain a pre-stored second training sample set, the second training sample set includes a plurality of second sample speech frames and sample pronunciation units corresponding to each second sample speech frame;

第三提取模块，用于针对第二训练样本集合中的每个第二样本语音帧，根据预设的声学特征提取算法，提取该第二样本语音帧对应的声学特征向量；The third extraction module is used to extract an acoustic feature vector corresponding to the second sample speech frame according to a preset acoustic feature extraction algorithm for each second sample speech frame in the second training sample set;

第三输出模块，用于将该第二样本语音帧对应的声学特征向量输入至待训练的身份验证多任务模型，输出该第二样本语音帧对应的第三后验概率集合，第三后验概率集合包括各样本发音单元对应的后验概率；The third output module is used to input the acoustic feature vector corresponding to the second sample speech frame to the identity verification multi-task model to be trained, and output the third posterior probability set corresponding to the second sample speech frame, the third posterior The probability set includes the posterior probability corresponding to each sample pronunciation unit;

第三确定模块，用于根据各第二样本语音帧对应的样本发音单元的后验概率，确定第二训练样本集合对应的第二代价函数；The third determination module is used to determine the second cost function corresponding to the second training sample set according to the posterior probability of the sample pronunciation unit corresponding to each second sample speech frame;

第二更新模块，用于根据第二代价函数和预设的第二参数更新算法，更新待训练的身份验证多任务模型中多任务共享隐含层对应的参数和语音识别网络对应的参数。The second update module is used to update the parameters corresponding to the multi-task shared hidden layer and the parameters corresponding to the speech recognition network in the identity verification multi-task model to be trained according to the second cost function and the preset second parameter update algorithm.

第四获取模块，用于获取预先存储的多个验证样本集合，每个验证样本集合包括多个样本用户标识和每个样本用户标识对应的第二样本语音数据；A fourth acquisition module, configured to acquire a plurality of pre-stored verification sample sets, each verification sample set including a plurality of sample user identifiers and second sample voice data corresponding to each sample user identifier;

第三划分模块，用于针对每个验证样本集合中的每个第二样本语音数据，根据预设的分段算法，将该第二样本语音数据划分为至少一个第三样本语音帧；The third dividing module is used to divide the second sample speech data into at least one third sample speech frame according to a preset segmentation algorithm for each second sample speech data in each verification sample set;

第四提取模块，用于针对该第二样本语音数据对应的每个第三样本语音帧，根据预设的声学特征提取算法，提取该第三样本语音帧对应的声学特征向量；The fourth extraction module is used to extract an acoustic feature vector corresponding to the third sample speech frame according to a preset acoustic feature extraction algorithm for each third sample speech frame corresponding to the second sample speech data;

第四输出模块，用于将该第二样本语音数据对应的各第三样本语音帧的声学特征向量输入至待验证的身份验证多任务模型，输出该第二样本语音数据对应的第四后验概率集合，第四后验概率集合包括各样本用户标识对应的后验概率；The fourth output module is used to input the acoustic feature vector of each third sample speech frame corresponding to the second sample speech data to the identity verification multi-task model to be verified, and output the fourth posteriori corresponding to the second sample speech data A probability set, the fourth posterior probability set includes the posterior probability corresponding to each sample user identifier;

第四确定模块，用于如果该第二样本语音数据对应的样本用户标识的后验概率为该第二样本语音数据对应的第四后验概率集合中的最大值，则确定该第二样本语音数据为目标样本语音数据；The fourth determining module is used to determine the second sample voice if the posterior probability of the sample user identification corresponding to the second sample voice data is the maximum value in the fourth posterior probability set corresponding to the second sample voice data The data is target sample voice data;

第五确定模块，用于确定该验证样本集合中目标样本语音数据的数目与该验证样本集合中第二样本语音数据的总数目的比值，作为该验证样本集合的准确率；The fifth determination module is used to determine the ratio of the number of target sample speech data in the verification sample set to the total number of second sample speech data in the verification sample set, as the accuracy rate of the verification sample set;

第六确定模块，用于根据各验证样本集合的准确率，确定各验证样本集合对应的准确率的变化率，如果存在连续预设数目个验证样本集合对应的准确率的变化率小于或等于预设的变化率阈值，则确定待验证的身份验证多任务模型训练完成。The sixth determination module is used to determine the rate of change of the accuracy rate corresponding to each verification sample set according to the accuracy rate of each verification sample set. If the set change rate threshold is determined, it is determined that the identity verification multi-task model training to be verified is completed.

本申请实施例提供了一种身份验证的装置。服务器获取目标用户根据目标动态验证码输入的语音数据，根据预设的分段算法，将语音数据划分为至少一个语音帧。然后，针对每个语音帧，服务器根据预设的声学特征提取算法，确定该语音帧对应的声学特征向量。之后，服务器将该语音帧对应的声学特征向量输入至预先训练的身份验证多任务模型，输出该语音帧对应的中间用户特征向量和第一后验概率集合。其中，第一后验概率集合包括各预设发音单元对应的后验概率。最后，服务器根据各语音帧对应的中间用户特征向量和预设的池化算法，确定目标用户对应的第一用户特征向量，并根据目标用户对应的第一用户特征向量和各语音帧对应的第一后验概率集合，对目标用户进行身份验证。这样，服务器无需部署两套结构、参数均不相同的声纹识别模型和语音识别模型，仅需要部署一套身份验证多任务模型，即可对用户的语音数据处理，从而降低服务器的计算复杂度，提高服务器的处理效率。An embodiment of the present application provides an identity verification device. The server acquires the voice data input by the target user according to the target dynamic verification code, and divides the voice data into at least one voice frame according to a preset segmentation algorithm. Then, for each speech frame, the server determines an acoustic feature vector corresponding to the speech frame according to a preset acoustic feature extraction algorithm. Afterwards, the server inputs the acoustic feature vector corresponding to the speech frame into the pre-trained identity verification multi-task model, and outputs the intermediate user feature vector and the first posterior probability set corresponding to the speech frame. Wherein, the first posterior probability set includes the posterior probability corresponding to each preset pronunciation unit. Finally, the server determines the first user feature vector corresponding to the target user according to the intermediate user feature vector corresponding to each speech frame and the preset pooling algorithm, and according to the first user feature vector corresponding to the target user and the first user feature vector corresponding to each speech frame A set of posterior probabilities for authenticating the target user. In this way, the server does not need to deploy two sets of voiceprint recognition models and speech recognition models with different structures and parameters, but only needs to deploy a set of authentication multi-tasking models to process the user's voice data, thereby reducing the computational complexity of the server , improve the processing efficiency of the server.

在一个实施例中，一种计算机设备，如图10所示，包括存储器及处理器，所述存储器上存储有可在处理器上运行的计算机程序，所述处理器执行所述计算机程序时实现上述任一项所述身份验证的方法步骤。In one embodiment, a computer device, as shown in FIG. 10 , includes a memory and a processor, the memory stores a computer program that can run on the processor, and the processor implements the computer program when executing the computer program. The method steps of identity verification described in any one of the above.

在一个实施例中，一种计算机可读存储介质，其上存储有计算机程序，所述计算机程序被处理器执行时实现上述任一项所述的身份验证的方法的步骤。In one embodiment, a computer-readable storage medium stores a computer program thereon, and when the computer program is executed by a processor, the steps of the identity verification method described in any one of the foregoing are implemented.

本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程，是可以通过计算机程序来指令相关的硬件来完成，所述的计算机程序可存储于一非易失性计算机可读取存储介质中，该计算机程序在执行时，可包括如上述各方法的实施例的流程。其中，本申请所提供的各实施例中所使用的对存储器、存储、数据库或其它介质的任何引用，均可包括非易失性和/或易失性存储器。非易失性存储器可包括只读存储器(ROM)、可编程ROM(PROM)、电可编程ROM(EPROM)、电可擦除可编程ROM(EEPROM)或闪存。易失性存储器可包括随机存取存储器(RAM)或者外部高速缓冲存储器。作为说明而非局限，RAM以多种形式可得，诸如静态RAM(SRAM)、动态RAM(DRAM)、同步DRAM(SDRAM)、双数据率SDRAM(DDRSDRAM)、增强型SDRAM(ESDRAM)、同步链路(Synchlink)DRAM(SLDRAM)、存储器总线(Rambus)直接RAM(RDRAM)、直接存储器总线动态RAM(DRDRAM)、以及存储器总线动态RAM(RDRAM)等。Those of ordinary skill in the art can understand that all or part of the processes in the methods of the above embodiments can be implemented through computer programs to instruct related hardware, and the computer programs can be stored in a non-volatile computer-readable memory In the medium, when the computer program is executed, it may include the processes of the embodiments of the above-mentioned methods. Wherein, any references to memory, storage, database or other media used in the various embodiments provided in the present application may include non-volatile and/or volatile memory. Nonvolatile memory can include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory. Volatile memory can include random access memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in many forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Chain Synchlink DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.

以上实施例的各技术特征可以进行任意的组合，为使描述简洁，未对上述实施例中的各个技术特征所有可能的组合都进行描述，然而，只要这些技术特征的组合不存在矛盾，都应当认为是本说明书记载的范围。The technical features of the above embodiments can be combined arbitrarily. To make the description concise, all possible combinations of the technical features in the above embodiments are not described. However, as long as there is no contradiction in the combination of these technical features, they should be It is considered to be within the range described in this specification.

以上所述实施例仅表达了本申请的几种实施方式，其描述较为具体和详细，但并不能因此而理解为对发明专利范围的限制。应当指出的是，对于本领域的普通技术人员来说，在不脱离本申请构思的前提下，还可以做出若干变形和改进，这些都属于本申请的保护范围。因此，本申请专利的保护范围应以所附权利要求为准。The above-mentioned embodiments only represent several implementation modes of the present application, and the description thereof is relatively specific and detailed, but it should not be construed as limiting the scope of the patent for the invention. It should be noted that those skilled in the art can make several modifications and improvements without departing from the concept of the present application, and these all belong to the protection scope of the present application. Therefore, the scope of protection of the patent application should be based on the appended claims.

Claims

1. a kind of method of authentication, which is characterized in that the described method includes:

Obtain the voice data that target user inputs according to target dynamic identifying code；

According to preset segmentation algorithm, the voice data is divided at least one speech frame；

The corresponding acoustic feature vector of the speech frame is extracted according to preset acoustic feature extraction algorithm for each speech frame；

The corresponding acoustic feature vector of the speech frame is input to authentication multi task model trained in advance, exports the voice The corresponding intermediate user feature vector of frame and the first posterior probability set, the first posterior probability set include each default pronunciation The corresponding posterior probability of unit；

According to the corresponding intermediate user feature vector of each speech frame and preset pond algorithm, determine that the target user is corresponding First user characteristics vector；

The corresponding first posterior probability collection of each speech frame according to the corresponding first user characteristics vector sum of the target user It closes, authentication is carried out to the target user.

2. the method according to claim 1, wherein the authentication multi task model includes that multitask is shared Hidden layer, Application on Voiceprint Recognition network and speech recognition network；

Described that the corresponding acoustic feature vector of the speech frame is input to authentication multi task model trained in advance, output should The corresponding intermediate user feature vector of speech frame and the first posterior probability set, comprising:

The corresponding acoustic feature vector of the speech frame is input to the multitask and shares hidden layer, it is corresponding to export the speech frame Median feature vector；

The corresponding median feature vector of the speech frame is input to the speech recognition network, exports the corresponding pronunciation of the speech frame Feature vector and the first posterior probability set；

The corresponding median feature vector of the speech frame and pronunciation character vector are input to the Application on Voiceprint Recognition network, export the language The corresponding intermediate user feature vector of sound frame.

3. the method according to claim 1, wherein described special according to corresponding first user of the target user The corresponding first posterior probability set of each speech frame described in vector sum is levied, authentication is carried out to the target user, comprising:

According to the corresponding first posterior probability set of each speech frame, the corresponding target dynamic verifying of the target user is determined Code score；

If the first user characteristics vector second user feature vector corresponding with the pre-stored target user Similarity is greater than or equal to preset similarity threshold, and the target dynamic identifying code score is greater than or equal to preset dynamic Identifying code score threshold, it is determined that the target user is legitimate user；

If the first user characteristics vector is less than described preset similar to the similarity of the second user feature vector It spends threshold value or the target dynamic identifying code score is less than the preset dynamic verification code score threshold, it is determined that is described Target user is illegal user.

4. according to the method described in claim 3, it is characterized in that, described general according to corresponding first posteriority of each speech frame Rate set determines the corresponding target dynamic identifying code score of the target user, comprising:

Obtain the corresponding pronunciation unit sequence of the target dynamic identifying code；

It is calculated according to the corresponding first posterior probability set of each speech frame, the pronunciation unit sequence and preset pressure alignment Method determines the corresponding target speaker unit of each speech frame；

The corresponding target of the speech frame is determined in the corresponding first posterior probability set of the speech frame for each speech frame The posterior probability of pronunciation unit, and the posterior probability of the determining target speaker unit and the pre-stored target speaker unit The product of prior probability, the likelihood value as the target speaker unit；

According to the likelihood value of the corresponding target speaker unit of each speech frame, the corresponding target dynamic of the target user is determined Identifying code score.

5. according to the method described in claim 4, it is characterized in that, described obtain the corresponding pronunciation of the target dynamic identifying code Unit sequence, comprising:

According to the target dynamic identifying code and preset segmentation methods, the corresponding set of words of the target dynamic identifying code is determined It closes；

For each word in the set of letters, the corresponding relationship of word according to the pre-stored data and pronunciation unit sequence, Determine the corresponding pronunciation unit sequence of the word；

The corresponding pronunciation unit sequence of each word is carried out according to sequence of each word in the target dynamic identifying code Sequence, obtains the corresponding pronunciation unit sequence of the target dynamic identifying code.

6. according to the method described in claim 4, it is characterized in that, described according to the corresponding target speaker list of each speech frame The likelihood value of member, determines the corresponding target dynamic identifying code score of the target user, comprising:

For each speech frame, determine that the likelihood value of the corresponding target speaker unit of the speech frame is corresponding with the speech frame described The difference of maximum likelihood value in the likelihood value of each default pronunciation unit, as the corresponding dynamic verification code score of the speech frame；

The average value for determining the corresponding dynamic verification code score of each speech frame, it is dynamic as the corresponding target of the target user State identifying code score.

7. according to the method described in claim 2, it is characterized in that, the method also includes:

Obtain pre-stored first training sample set, the first training sample set include multiple sample of users marks and Each sample of users identifies corresponding first sample voice data；

For each first sample voice data in the first training sample set, according to preset segmentation algorithm, by this First sample voice data is divided at least one first sample speech frame；

For the corresponding each first sample speech frame of the first sample voice data, is extracted and calculated according to preset acoustic feature Method extracts the corresponding acoustic feature vector of the first sample speech frame；

The acoustic feature vector of the corresponding each first sample speech frame of the first sample voice data is input to body to be trained Part verifying multi task model, exports the corresponding second posterior probability set of the first sample voice data, second posteriority is general Rate set includes the corresponding posterior probability of each sample user identifier；

According to the posterior probability of the corresponding sample of users mark of each first sample voice data, first training sample set is determined Close corresponding first cost function；

According to first cost function and preset first parameter more new algorithm, to be trained authentication more described in update Multitask described in business model is shared the corresponding parameter of hidden layer, the corresponding parameter of the Application on Voiceprint Recognition network and the voice and is known The corresponding parameter of other network.

8. the method according to claim 2 or 7, which is characterized in that the method also includes:

Pre-stored second training sample set is obtained, the second training sample set includes multiple second sample voice frames Sample pronunciation unit corresponding with each second sample voice frame；

For the second sample voice frame of each of described second training sample set, extracted according to the preset acoustic feature Algorithm extracts the corresponding acoustic feature vector of the second sample voice frame；

The corresponding acoustic feature vector of second sample voice frame is input to authentication multi task model to be trained, is exported The corresponding third posterior probability set of second sample voice frame, the third posterior probability set include each sample pronunciation unit Corresponding posterior probability；

According to the posterior probability of the corresponding sample pronunciation unit of each second sample voice frame, the second training sample set is determined Corresponding second cost function；

According to second cost function and preset second parameter more new algorithm, to be trained authentication more described in update The corresponding parameter of hidden layer and the corresponding parameter of the speech recognition network are shared in multitask described in business model.

9. the method according to the description of claim 7 is characterized in that the method also includes:

Obtain pre-stored multiple verifying sample sets, each verifying sample set include multiple sample of users marks and Each sample of users identifies corresponding second sample voice data；

For the second sample voice data of each of each verifying sample set, according to preset segmentation algorithm, by this second Sample voice data are divided at least one third sample voice frame；

For the corresponding each third sample voice frame of the second sample voice data, is extracted and calculated according to preset acoustic feature Method extracts the corresponding acoustic feature vector of third sample voice frame；

The acoustic feature vector of the corresponding each third sample voice frame of the second sample voice data is input to body to be verified Part verifying multi task model, exports the corresponding 4th posterior probability set of the second sample voice data, the 4th posteriority is general Rate set includes the corresponding posterior probability of each sample user identifier；

If the posterior probability of the corresponding sample of users mark of the second sample voice data is the second sample voice data pair The maximum value in the 4th posterior probability set answered, it is determined that the second sample voice data are target sample voice data；

Determine the number of target sample voice data and the second sample voice in the verifying sample set in the verifying sample set The ratio of the total number of data, the accuracy rate as the verifying sample set；

According to the accuracy rate of each verifying sample set, the change rate of the corresponding accuracy rate of each verifying sample set is determined, if deposited It is less than or equal to preset change rate threshold value in the change rate of the corresponding accuracy rate of continuous preset number verifying sample set, then Determine that the authentication multi task model training to be verified is completed.

10. a kind of device of authentication, which is characterized in that described device includes:

First obtains module, the voice data inputted for obtaining target user according to target dynamic identifying code；

First division module, for according to preset segmentation algorithm, the voice data to be divided at least one speech frame；

First extraction module, for extracting the speech frame pair according to preset acoustic feature extraction algorithm for each speech frame The acoustic feature vector answered；

First output module, for the corresponding acoustic feature vector of the speech frame to be input to trained in advance authentication more Business model, exports the corresponding intermediate user feature vector of the speech frame and the first posterior probability set, first posterior probability Set includes the corresponding posterior probability of each default pronunciation unit；

First determining module, for determining according to the corresponding intermediate user feature vector of each speech frame and preset pond algorithm The corresponding first user characteristics vector of the target user；

Authentication module, for each speech frame according to the corresponding first user characteristics vector sum of the target user corresponding the One posterior probability set carries out authentication to the target user.