WO2017215558A1

WO2017215558A1 - Voiceprint recognition method and device

Info

Publication number: WO2017215558A1
Application number: PCT/CN2017/087911
Authority: WO
Inventors: 李为; 钱柄桦; 金星明; 李科; 吴富章; 吴永坚; 黄飞跃
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2016-06-12
Filing date: 2017-06-12
Publication date: 2017-12-21
Anticipated expiration: 2018-12-12
Also published as: CN106098068B; CN106098068A

Abstract

A voiceprint recognition method and device. The method comprises: obtaining authentication voice information generated by reading a first character string by an authenticated user (S201); performing voice recognition on the authentication voice information to obtain video segments included in the authentication voice information and respectively corresponding to a plurality of characters in the first character string (S202); extracting a voiceprint feature of the video segment corresponding to each character (S203); obtaining, according to the voiceprint feature of the video segment corresponding to each character, a feature vector corresponding to each character in the authentication voice information on the basis of preset general background model training corresponding to a corresponding character (S204); and calculating a similarity score between the feature vector corresponding to each character in the authentication voice information and the feature vector corresponding to a corresponding character in preset registration voice information, and if the similarity score reaches a preset authentication threshold, determining the authenticated user to be a registered user corresponding to the registration voice information (S205). The voiceprint recognition accuracy can be effectively improved.

Description

Voiceprint recognition method and device

本发明要求2016年6月12日递交的发明名称为“一种声纹识别方法和装置”的申请号201610416650.3的在先申请优先权，上述在先申请的内容以引入的方式并入本文本中。The present invention claims priority to the priority of the application No. 201610416650.3, entitled "A Voiceprint Identification Method and Apparatus", filed on June 12, 2016, the contents of which are incorporated herein by reference. .

Technical field

本发明涉及声音识别技术领域，尤其涉及一种声纹识别方法和装置。The present invention relates to the field of voice recognition technology, and in particular, to a voiceprint recognition method and apparatus.

Background technique

声纹识别作为一种生物信息识别的方法，包括用户注册和用户身份识别两个阶段。注册阶段将语音通过一系列处理映射为用户模型。在识别阶段对于一段身份未知的语音，与模型进行相似度的匹配，进而对未知语音的身份与注册语音的身份是否一致进行判断。现有的声纹建模方法通常是从文本无关的层面进行建模以实现对说话人身份特征的描述，但是文本无关的建模方式在用户朗读不同内容时，识别准确率较低，难以满足要求。Voiceprint recognition is a method of biometric information recognition, including user registration and user identification. The registration phase maps speech through a series of processes to a user model. In the identification stage, for a voice with unknown identity, the similarity is matched with the model, and then the identity of the unknown voice is consistent with the identity of the registered voice. The existing voiceprint modeling methods usually model from the text-independent level to realize the description of the speaker's identity characteristics, but the text-independent modeling method is difficult to meet when the user reads different content. Claim.

发明内容Summary of the invention

有鉴于此，本发明实施例提供一种声纹识别方法和装置，可有效提高声纹识别准确率。In view of this, the embodiment of the invention provides a voiceprint recognition method and device, which can effectively improve the accuracy of voiceprint recognition.

为了解决上述技术问题，本发明实施例第一方面提供了一种声纹识别方法，所述方法包括：In order to solve the above technical problem, a first aspect of the embodiments of the present invention provides a voiceprint recognition method, where the method includes:

获取验证用户朗读第一字符串所产生的验证语音信息；Obtaining verification voice information generated by the verification user reading the first character string;

计算验证语音信息中各个字符对应的语音片段与预设的注册语音信息中相应字符对应的语音片段的声纹特征的相似度；Calculating a similarity of the voiceprint features of the voice segment corresponding to the corresponding character in the preset voice information in the voice information of the verification voice information;

当所述相似度达到预设的门限值时，将验证用户验证为所述注册语音信息对应的注册用户。When the similarity reaches a preset threshold, the user verification is verified as the registered user corresponding to the registered voice information.

在一种实现方式中，所述计算验证语音信息中各个字符对应的语音片段与预设的注册语音信息中相应字符对应的语音片段的声纹特征的相似度之前还包括：In an implementation manner, the calculating, before verifying, the similarity between the voice segment corresponding to each character in the voice information and the voiceprint feature of the voice segment corresponding to the corresponding character in the preset registration voice information include:

对所述验证语音信息进行语音识别，得到所述验证语音信息中包含的分别与所述第一字符串中的各个字符对应的语音片段；Performing voice recognition on the verification voice information, and obtaining a voice segment respectively corresponding to each character in the first character string included in the verification voice information;

提取各个字符对应的语音片段的声纹特征。The voiceprint features of the voice segments corresponding to the respective characters are extracted.

在一种实现方式中，所述计算验证语音信息中各个字符对应的语音片段与预设的注册语音信息中相应字符对应的语音片段的声纹特征的相似度包括：In an implementation manner, the similarity between the voiceprint feature of the voice segment corresponding to the corresponding character in the preset registration voice information is verified by the calculation:

根据所述各个字符对应的语音片段的声纹特征，结合预设的相应字符对应的通用背景模型训练，得到所述验证语音信息中各个字符对应的特征向量；And performing, according to the voiceprint feature of the voice segment corresponding to each character, in combination with a common background model corresponding to the preset corresponding character, obtaining a feature vector corresponding to each character in the verification voice information;

计算所述验证语音信息中各个字符对应的特征向量与所述预设的注册语音信息中相应字符对应的特征向量的相似度分数，作为验证语音信息中各个字符对应的语音片段与预设的注册语音信息中相应字符对应的语音片段的声纹特征的相似度。Calculating a similarity score of the feature vector corresponding to each character in the verification voice information and the feature vector corresponding to the corresponding character in the preset registration voice information, as a voice segment corresponding to each character in the verification voice information and a preset registration The similarity of the voiceprint features of the voice segments corresponding to the corresponding characters in the voice information.

在一种实现方式中，所述获取所述验证用户朗读所述第一字符串所产生的所述验证语音信息之前，所述方法还包括：In an implementation manner, before the obtaining the verification voice information generated by the verification user to read the first character string, the method further includes:

获取注册用户朗读第二字符串所产生的注册语音信息，所述第二字符串与所述第一字符串拥有至少一个相同的字符；Obtaining registration voice information generated by the registered user reading the second character string, the second character string having at least one of the same characters as the first character string;

根据所述注册语音信息中各个字符对应的声纹特征，结合预设的相应字符对应的通用背景模型训练，得到所述注册语音信息中各个字符对应的特征向量。The feature vector corresponding to each character in the registered voice information is obtained according to the voiceprint feature corresponding to each character in the registered voice information, combined with the common background model corresponding to the preset corresponding character.

在一种实现方式中，所述根据所述各个字符对应的语音片段的声纹特征，结合所述预设的相应字符对应的通用背景模型训练，得到所述验证语音信息中各个字符对应的特征向量包括：In an implementation manner, the voiceprint feature corresponding to the voice segment corresponding to the respective characters is combined with the common background model corresponding to the preset corresponding character to obtain a feature corresponding to each character in the verification voice information. Vectors include:

将所述验证语音信息中各个字符对应的语音片段的声纹特征作为训练样本数据，采用最大后验概率算法对所述预设的相应字符对应的通用背景模型的均值超向量进行调整，得到所述验证语音信息中各个字符对应的特征向量。Using the voiceprint feature of the voice segment corresponding to each character in the verification voice information as the training sample data, the maximum a posteriori probability algorithm is used to adjust the mean supervector of the common background model corresponding to the preset corresponding character, and obtain the The feature vector corresponding to each character in the voice information is verified.

在一种实现方式中，所述将所述验证语音信息中各个字符对应的语音片段的声纹特征作为训练样本数据，采用所述最大后验概率算法对所述预设的相应字符对应的通用背景模型的均值超向量进行调整，从而得到所述验证语音信息中各个字符对应的特征向量包括：In an implementation manner, the voiceprint feature of the voice segment corresponding to each character in the verification voice information is used as training sample data, and the maximum a posteriori probability algorithm is used to correspond to the preset corresponding character. The mean supervector of the background model is adjusted, so that the feature vector corresponding to each character in the verification voice information is obtained:

将所述验证语音信息中各个字符对应的语音片段的声纹特征作为训练样本数据，采用所述最大后验概率算法对所述预设的相应字符对应的通用背景模型的均值超向量进行调整，并结合预设的超向量子空间矩阵，得到所述验证语音信息中各个字符对应的特征向量。Using the voiceprint feature of the voice segment corresponding to each character in the verification voice information as a training sample In the data, the maximum a posteriori probability algorithm is used to adjust the mean supervector of the universal background model corresponding to the preset corresponding character, and combined with the preset supervector subspace matrix, each of the verified voice information is obtained. The feature vector corresponding to the character.

在一种实现方式中，所述将所述验证语音信息中各个字符对应的语音片段的声纹特征作为所述训练样本数据，采用所述最大后验概率算法对所述预设的相应字符对应的通用背景模型的均值超向量进行调整，并结合所述预设的超向量子空间矩阵，得到所述验证语音信息中各个字符对应的特征向量包括：In an implementation manner, the voiceprint feature of the voice segment corresponding to each character in the verification voice information is used as the training sample data, and the maximum a posteriori probability algorithm is used to correspond to the preset corresponding character. Adjusting the mean supervector of the universal background model, and combining the preset supervector subspace matrix, obtaining the feature vector corresponding to each character in the verification voice information includes:

将所述验证语音信息中各个字符对应的语音片段的声纹特征作为所述训练样本数据，采用下式对所述预设的相应字符对应的通用背景模型的均值超向量进行调整，使得调整后的相应字符对应的通用背景模型的后验概率最大：Using the voiceprint feature of the voice segment corresponding to each character in the verification voice information as the training sample data, the mean supervector of the common background model corresponding to the preset corresponding character is adjusted by using the following formula, so that the adjusted The general background model corresponding to the corresponding character has the largest posterior probability:

M＝m+Tω，其中M代表调整后的某个字符的通用背景模型的均值超向量，m代表调整前的相应字符的通用背景模型的均值超向量，T为所述预设的超向量子空间矩阵，ω为所述验证语音信息中相应字符对应的特征向量。M=m+Tω, where M represents the mean supervector of the universal background model of the adjusted character, m represents the mean supervector of the universal background model of the corresponding character before adjustment, and T is the preset supervector The spatial matrix, ω is a feature vector corresponding to the corresponding character in the verification speech information.

在一种实现方式中，所述预设的超向量子空间矩阵为根据所述通用背景模型中各个高斯模块的权重之间的相关性确定得到的。In an implementation manner, the preset super-vector sub-space matrix is determined according to a correlation between weights of respective Gaussian modules in the universal background model.

在一种实现方式中，所述计算所述验证语音信息中各个字符对应的特征向量与预设的注册语音信息中相应字符对应的特征向量的相似度分数包括：In an implementation manner, the calculating a similarity score of the feature vector corresponding to each character in the verification voice information and the corresponding character in the preset registration voice information includes:

计算所述验证语音信息中各个字符对应的特征向量与预设的注册语音信息中相应字符对应的特征向量之间的余弦距离值，并将所述余弦距离值确定为所述相似度分数。Calculating a cosine distance value between a feature vector corresponding to each character in the verification voice information and a feature vector corresponding to a corresponding character in the preset registration voice information, and determining the cosine distance value as the similarity score.

在一种实现方式中，所述对所述验证语音信息进行语音识别得到所述验证语音信息中包含的分别与所述第一字符串中的多个字符对应的语音片段包括：In an implementation manner, the performing voice recognition on the verification voice information to obtain the voice segments respectively included in the verification voice information and corresponding to the plurality of characters in the first character string includes:

识别所述验证语音信息中的有效语音片段和无效语音片段；Identifying valid voice segments and invalid voice segments in the verification voice information;

对所述有效语音片段进行语音识别得到分别与所述第一字符串中的多个字符对应的语音片段。Performing speech recognition on the valid speech segment to obtain a speech segment respectively corresponding to a plurality of characters in the first character string.

在一种实现方式中，所述将所述验证用户确定为所述注册语音信息对应的所述注册用户之前，所述方法还包括：In an implementation manner, before the determining, by the verification user, the registered user corresponding to the registration voice information, the method further includes:

确定所述验证语音信息中的多个字符的语音片段的排序与所述第一字符串中的相应字符的排序是否一致；以及 Determining whether the ordering of the voice segments of the plurality of characters in the verification voice information is consistent with the ordering of the corresponding characters in the first character string;

在所述相似度达到预设的门限值，并且所述验证语音信息中的多个字符的语音片段的排序与所述第一字符串中的相应字符的排序一致的情况下，将所述验证用户确定为所述注册语音信息对应的所述注册用户。And if the similarity reaches a preset threshold value, and the order of the voice segments of the plurality of characters in the verification voice information is consistent with the ordering of the corresponding characters in the first character string, The verification user determines the registered user corresponding to the registration voice information.

在一种实现方式中，其特征在于，所述获取所述验证用户朗读所述第一字符串所产生的所述验证语音信息之前，所述方法还包括：In an implementation, the method further includes: before the obtaining the verification voice information generated by the verification user to read the first character string, the method further includes:

随机生成所述第一字符串，并显示所述第一字符串。The first character string is randomly generated, and the first character string is displayed.

相应的，本发明实施例第二方面提供了一种声纹识别装置，所述装置包括：Correspondingly, a second aspect of the embodiments of the present invention provides a voiceprint recognition apparatus, where the apparatus includes:

语音获取模块，用于获取验证用户朗读第一字符串所产生的验证语音信息；a voice acquiring module, configured to obtain verification voice information generated by the user to read the first character string;

相似度判断模块，用于计算验证语音信息中各个字符对应的语音片段与预设的注册语音信息中相应字符对应的语音片段的声纹特征的相似度；The similarity judging module is configured to calculate a similarity of the voiceprint features of the voice segment corresponding to the corresponding character in the preset voice information in the voice information corresponding to each character in the verification voice information;

用户识别模块，用于在所述相似度达到预设的门限值时，将验证用户验证为所述注册语音信息对应的注册用户。The user identification module is configured to verify the verification user as the registered user corresponding to the registered voice information when the similarity reaches a preset threshold.

在一种实现方式中，所述声纹识别装置还包括：In an implementation manner, the voiceprint recognition apparatus further includes:

语音片段识别模块，用于对所述验证语音信息进行语音识别，得到所述验证语音信息中包含的分别与所述第一字符串中的多个字符对应的语音片段；a voice segment identification module, configured to perform voice recognition on the verification voice information, and obtain a voice segment respectively included in the verification voice information corresponding to multiple characters in the first character string;

声纹特征提取模块，用于提取验证语音信息中各个字符对应的语音片段的声纹特征。The voiceprint feature extraction module is configured to extract a voiceprint feature of the voice segment corresponding to each character in the verification voice information.

特征模型训练模块，用于根据所述各个字符对应的语音片段的声纹特征，结合预设的相应字符对应的通用背景模型训练，得到所述验证语音信息中各个字符对应的特征向量；a feature model training module, configured to perform, according to the voiceprint feature of the voice segment corresponding to each character, and the common background model corresponding to the corresponding corresponding character, to obtain a feature vector corresponding to each character in the verification voice information;

所述相似度判断模块，用于计算所述验证语音信息中各个字符对应的特征向量与预设的注册语音信息中相应字符对应的特征向量的相似度分数，作为验证语音信息中各个字符对应的语音片段与预设的注册语音信息中相应字符对应的语音片段的声纹特征的相似度。The similarity judging module is configured to calculate a similarity score of the feature vector corresponding to each character in the verification voice information and the corresponding character in the preset registration voice information, as the verification corresponding to each character in the voice information The similarity of the voiceprint features of the voice segment corresponding to the corresponding character in the preset registered voice information.

在一种实现方式中，所述语音获取模块，还用于获取注册用户朗读第二字符串所产生的注册语音信息，所述第二字符串与所述第一字符串拥有至少一个相同的字符；In an implementation manner, the voice acquiring module is further configured to obtain registration voice information generated by a registered user reading a second character string, where the second character string has at least one identical character with the first character string. ;

所述特征模型训练模块，还用于根据所述注册语音信息中各个字符对应的声纹特征，结合预设的相应字符对应的通用背景模型训练，得到所述注册语音信息中各个字符对应的特征向量。The feature model training module is further configured to: according to each character in the registered voice information The voiceprint feature is combined with a common background model corresponding to the preset corresponding character to obtain a feature vector corresponding to each character in the registered voice information.

在一种实现方式中，所述特征模型训练模块用于：In one implementation, the feature model training module is configured to:

将所述验证语音信息中各个字符对应的语音片段的声纹特征作为训练样本数据，采用所述最大后验概率算法对所述预设的相应字符对应的通用背景模型的均值超向量进行调整，并结合预设的超向量子空间矩阵，得到验证语音信息中各个字符对应的特征向量。The voiceprint feature of the voice segment corresponding to each character in the verification voice information is used as the training sample data, and the maximum a posteriori probability algorithm is used to adjust the mean supervector of the common background model corresponding to the preset corresponding character. Combined with the preset super-vector subspace matrix, the feature vector corresponding to each character in the voice information is obtained.

在一种实现方式中，所述特征模型训练模块具体用于：In an implementation manner, the feature model training module is specifically configured to:

将所述验证语音信息中各个字符对应的语音片段的声纹特征作为训练样本数据，采用下式对所述预设的相应字符对应的通用背景模型的均值超向量进行调整，使得调整后的相应字符对应的通用背景模型的后验概率最大：The voiceprint feature of the voice segment corresponding to each character in the verification voice information is used as the training sample data, and the mean supervector of the common background model corresponding to the preset corresponding character is adjusted by using the following formula, so that the adjusted corresponding The general background model corresponding to the character has the largest posterior probability:

在一种实现方式中，所述预设的超向量子空间矩阵为根据所述高斯混合模型的均值超向量中各个维度向量之间的相关性确定得到的。In an implementation manner, the preset super-vector sub-space matrix is determined according to a correlation between respective dimension vectors in the mean super-vector of the Gaussian mixture model.

在一种实现方式中，所述相似度判断模块用于：In an implementation manner, the similarity determination module is configured to:

计算所述验证语音信息中各个字符对应的特征向量与所述预设的注册语音信息中相应字符对应的特征向量之间的余弦距离值，并将所述余弦距离值确定为所述相似度分数。Calculating a cosine distance value between a feature vector corresponding to each character in the verification voice information and a feature vector corresponding to a corresponding character in the preset registration voice information, and determining the cosine distance value as the similarity score .

在一种实现方式中，所述语音片段识别模块包括：In an implementation manner, the voice segment identification module includes:

有效片段识别单元，用于识别所述验证语音信息中的有效语音片段和无效语音片段；a valid segment identification unit, configured to identify a valid speech segment and an invalid speech segment in the verification speech information;

语音识别单元，用于对所述有效语音片段进行语音识别得到分别与所述第一字符串中的多个字符对应的语音片段。 The voice recognition unit is configured to perform voice recognition on the valid voice segment to obtain a voice segment corresponding to multiple characters in the first character string.

字符排序确定模块，用于确定所述验证语音信息中的多个字符的语音片段的排序与所述第一字符串中的相应字符的排序是否一致；a character order determining module, configured to determine whether a sorting of the voice segments of the plurality of characters in the verification voice information is consistent with a ranking of the corresponding characters in the first character string;

所述用户识别模块，还用于在所述相似度达到预设的门限值，并且所述验证语音信息中的多个字符的语音片段的排序与所述第一字符串中的相应字符的排序一致的情况下，将所述验证用户确定为所述注册语音信息对应的所述注册用户。The user identification module is further configured to: when the similarity reaches a preset threshold, and verify the order of the voice segments of the plurality of characters in the voice information and the corresponding characters in the first string In the case that the sorting is consistent, the verification user is determined as the registered user corresponding to the registered voice information.

字符串显示模块，用于随机生成所述第一字符串，并显示所述第一字符串。a string display module, configured to randomly generate the first string, and display the first string.

本发明实施例第三方面还提供了一种声纹识别装置，包括：A third aspect of the embodiments of the present invention further provides a voiceprint recognition apparatus, including:

用户接口，用于获取语音信息；a user interface for obtaining voice information;

存储器，存储计算机可执行程序代码；以及a memory storing computer executable program code;

处理器，用于调用所述计算机可执行程序代码以执行以下操作：a processor for invoking the computer executable program code to perform the following operations:

通过所述用户接口获取验证用户朗读第一字符串所产生的验证语音信息；Acquiring, by the user interface, verification voice information generated by the user to read the first character string;

在一种实现方式中，所述获取所述验证用户朗读所述第一字符串所产生的所述验证语音信息之前，所述处理器还调用所述计算机可执行程序代码执行以下操作：In an implementation manner, before the obtaining the verification voice information generated by the verification user to read the first character string, the processor further calls the computer executable program code to perform the following operations:

在一种实现方式中，所述处理器调用所述计算机可执行程序代码执行以下操作以计算验证语音信息中各个字符对应的语音片段与预设的注册语音信息中相应字符对应的语音片段的声纹特征的相似度：In an implementation manner, the processor invokes the computer executable program code to perform the following operations to calculate a sound segment of a voice segment corresponding to a corresponding character in the preset registered voice information in the voice information of the verification voice information. Similarity of the pattern features:

根据所述各个字符对应的语音片段的声纹特征，结合预设的相应字符对应的通用背景模型训练，得到所述验证语音信息中各个字符对应的特征向量； And performing, according to the voiceprint feature of the voice segment corresponding to each character, in combination with a common background model corresponding to the preset corresponding character, obtaining a feature vector corresponding to each character in the verification voice information;

在一种实现方式中，所述获取所述验证用户朗读所述第一字符串所产生的所述验证语音信息之前，之前，所述处理器还调用所述计算机可执行程序代码执行以下操作：In an implementation manner, before the obtaining the verification voice information generated by the verification user to read the first character string, the processor further calls the computer executable program code to perform the following operations:

在一种实现方式中，所述处理器调用所述计算机可执行程序代码执行以下操作以根据所述各个字符对应的语音片段的声纹特征，结合所述预设的相应字符对应的通用背景模型训练，得到所述验证语音信息中各个字符对应的特征向量：In one implementation, the processor invokes the computer executable program code to perform the following operations to combine a common background model corresponding to the preset corresponding character according to a voiceprint feature of the voice segment corresponding to the respective character Training, obtaining a feature vector corresponding to each character in the verification voice information:

在一种实现方式中，所述处理器调用所述计算机可执行程序代码执行以下操作以将所述验证语音信息中各个字符对应的语音片段的声纹特征作为训练样本数据，采用所述最大后验概率算法对所述预设的相应字符对应的通用背景模型的均值超向量进行调整，从而得到所述验证语音信息中各个字符对应的特征向量：In one implementation, the processor invokes the computer executable program code to perform the following operations to use the voiceprint feature of the voice segment corresponding to each character in the verification voice information as training sample data, using the maximum The probability algorithm adjusts the mean supervector of the universal background model corresponding to the preset corresponding character, thereby obtaining a feature vector corresponding to each character in the verified voice information:

将所述验证语音信息中各个字符对应的语音片段的声纹特征作为训练样本数据，采用所述最大后验概率算法对所述预设的相应字符对应的通用背景模型的均值超向量进行调整，并结合预设的超向量子空间矩阵，得到所述验证语音信息中各个字符对应的特征向量。The voiceprint feature of the voice segment corresponding to each character in the verification voice information is used as the training sample data, and the maximum a posteriori probability algorithm is used to adjust the mean supervector of the common background model corresponding to the preset corresponding character. And combining the preset super vector subspace matrix to obtain a feature vector corresponding to each character in the verification voice information.

在一种实现方式中，所述处理器调用所述计算机可执行程序代码执行以下操作以将所述验证语音信息中各个字符对应的语音片段的声纹特征作为所述训练样本数据，采用所述最大后验概率算法对所述预设的相应字符对应的通用背景模型的均值超向量进行调整，并结合所述预设的超向量子空间矩阵，得到所述验证语音信息中各个字符对应的特征向量：In one implementation, the processor invokes the computer executable program code to perform the following operations to use the voiceprint feature of the voice segment corresponding to each character in the verification voice information as Training the sample data, using the maximum posterior probability algorithm to adjust the mean supervector of the universal background model corresponding to the preset corresponding character, and combining the preset supervector subspace matrix to obtain the verification voice The feature vector corresponding to each character in the message:

在一种实现方式中，所述处理器调用所述计算机可执行程序代码执行以下操作以计算所述验证语音信息中各个字符对应的特征向量与预设的注册语音信息中相应字符对应的特征向量的相似度分数包括：In an implementation manner, the processor calls the computer executable program code to perform the following operations to calculate a feature vector corresponding to a feature vector corresponding to each character in the verification voice information and a corresponding character in the preset registration voice information. The similarity scores include:

在一种实现方式中，所述处理器调用所述计算机可执行程序代码执行以下操作以对所述验证语音信息进行语音识别得到所述验证语音信息中包含的分别与所述第一字符串中的多个字符对应的语音片段包括：In an implementation manner, the processor calls the computer executable program code to perform the following operations to perform voice recognition on the verification voice information to obtain the verification voice information included in the first string and the first character string respectively The voice segments corresponding to multiple characters include:

在一种实现方式中，所述将所述验证用户确定为所述注册语音信息对应的所述注册用户之前，所述处理器还调用所述计算机可执行程序代码执行以下操作：In an implementation manner, before the determining the determined user as the registered user corresponding to the registered voice information, the processor further invokes the computer executable program code to perform the following operations:

确定所述验证语音信息中的多个字符的语音片段的排序与所述第一字符串中的相应字符的排序是否一致；以及Determining whether the ordering of the voice segments of the plurality of characters in the verification voice information is consistent with the ordering of the corresponding characters in the first character string;

在所述相似度达到预设的门限值，并且所述验证语音信息中的多个字符的语音片段的排序与所述第一字符串中的相应字符的排序一致的情况下，将所述验证用户确定为所述注册语音信息对应的所述注册用户。The similarity reaches a preset threshold, and the plurality of characters in the voice information are verified If the order of the voice segments is consistent with the ordering of the corresponding characters in the first character string, the verification user is determined as the registered user corresponding to the registered voice information.

随机生成所述第一字符串，并通过所述用户接口显示所述第一字符串。The first character string is randomly generated, and the first character string is displayed through the user interface.

相应的，本发明实施例第四方面还提供了一种存储介质，所述存储介质中存储有计算机程序，所述计算机程序用以执行如本发明实施例第一方面的任一种实现方式所述的声纹识别方法。Correspondingly, the fourth aspect of the embodiments of the present invention further provides a storage medium, where the computer program stores a computer program, where the computer program is used to perform any implementation manner according to the first aspect of the embodiments of the present invention. The voiceprint recognition method described.

本实施例通过获取验证用户的验证语音信息中各个字符对应的语音片段的声纹特征，结合预设的相应字符的UBM训练得到验证语音信息中各个字符对应的特征向量，并通过将验证语音信息中各个字符对应的特征向量与注册语音信息中相应字符的特征向量进行相似度比较，从而确定验证用户的的用户身份，该方式用以比较的用户特征向量与具体字符对应，充分考虑到用户朗读不同字符时的声纹特征，从而可有效提高声纹识别准确率。In this embodiment, the voiceprint feature of the voice segment corresponding to each character in the verification voice information of the user is obtained, and the UTM training corresponding to the preset corresponding character is used to obtain the feature vector corresponding to each character in the voice message, and the voice information is verified. The feature vector corresponding to each character in the registration is compared with the feature vector of the corresponding character in the registered voice information, thereby determining the user identity of the verified user, and the user feature vector used for comparison corresponds to the specific character, fully considering the user reading aloud The voiceprint features of different characters can effectively improve the accuracy of voiceprint recognition.

DRAWINGS

为了更清楚地说明本发明实施例或现有技术中的技术方案，下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍，显而易见地，下面描述中的附图仅仅是本发明的一些实施例，对于本领域普通技术人员来讲，在不付出创造性劳动的前提下，还可以根据这些附图获得其他的附图。In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the embodiments or the description of the prior art will be briefly described below. Obviously, the drawings in the following description are only It is a certain embodiment of the present invention, and other drawings can be obtained from those skilled in the art without any creative work.

图1是本发明实施例中的声纹识别方法的阶段概述示意图；1 is a schematic diagram showing the stages of a voiceprint recognition method in an embodiment of the present invention;

图2是本发明实施例中的一种声纹识别方法的流程示意图；2 is a schematic flow chart of a voiceprint recognition method according to an embodiment of the present invention;

图3是本发明实施例中从语音信息中识别得到多个字符对应的语音片段的原理示意图；3 is a schematic diagram showing the principle of identifying a voice segment corresponding to a plurality of characters from voice information in the embodiment of the present invention;

图4是本发明实施例中从语音信息中获取各个字符对应的特征向量的原理示意图；4 is a schematic diagram of a principle for acquiring feature vectors corresponding to respective characters from voice information according to an embodiment of the present invention;

图5是本发明实施例中注册用户的声纹注册流程示意图；5 is a schematic diagram of a voiceprint registration process of a registered user in an embodiment of the present invention;

图6是本发明另一实施例中的声纹识别方法的流程示意图； 6 is a schematic flow chart of a voiceprint recognition method in another embodiment of the present invention;

图7是本发明实施例中的一种声纹识别装置的结构示意图；FIG. 7 is a schematic structural diagram of a voiceprint recognition apparatus according to an embodiment of the present invention; FIG.

图8是本发明实施例中的语音片段识别模块的结构示意图。FIG. 8 is a schematic structural diagram of a voice segment identification module in an embodiment of the present invention.

图9是本发明实施例中的另一种声纹识别装置的结构示意图。FIG. 9 is a schematic structural diagram of another voiceprint recognition apparatus in an embodiment of the present invention.

detailed description

下面将结合本发明实施例中的附图，对本发明实施例中的技术方案进行清楚、完整地描述，显然，所描述的实施例仅仅是本发明一部分实施例，而不是全部的实施例。基于本发明中的实施例，本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例，都属于本发明保护的范围。The technical solutions in the embodiments of the present invention are clearly and completely described in the following with reference to the accompanying drawings in the embodiments of the present invention. It is obvious that the described embodiments are only a part of the embodiments of the present invention, but not all embodiments. All other embodiments obtained by those skilled in the art based on the embodiments of the present invention without creative efforts are within the scope of the present invention.

本发明实施例提供了一种声纹识别方法及装置。该声纹识别方法及装置可应用于所有需要识别未知用户身份的场景或设备中。用于进行声纹识别的字符串中的字符可以是阿拉伯数字、英文字母或其他语言字符等。为了简化描述，本发明实施例中的字符以阿拉伯数字为例进行举例说明。Embodiments of the present invention provide a voiceprint recognition method and apparatus. The voiceprint recognition method and apparatus can be applied to all scenes or devices that need to identify an unknown user. The characters in the string used for voiceprint recognition may be Arabic numerals, English letters or other language characters. In order to simplify the description, the characters in the embodiments of the present invention are exemplified by taking Arabic numerals as an example.

本发明实施例中的声纹识别方法可以分为两个阶段，如图1所示：The voiceprint recognition method in the embodiment of the present invention can be divided into two stages, as shown in FIG. 1:

1)注册用户的声纹注册阶段1) Registered user's voiceprint registration stage

在声纹注册阶段，注册用户可以朗读一个注册字符串(即后文出现的第二字符串)，声纹识别装置采集该注册用户在朗读该注册字符串时的注册语音信息，然后对注册语音信息进行声音识别得到所述注册语音信息中包含的分别与所述注册字符串中的多个字符对应的语音片段，进而对各个字符对应的语音片段进行声纹特征提取和声纹模型训练，包括根据所述各个字符对应的语音片段的声纹特征，结合预设的相应字符对应的通用背景模型(Universal Background Model，UBM，即GMM-UBM)训练得到注册语音信息中各个字符对应的特征向量，然后声纹识别装置可以分别为不同的注册用户将其在声纹注册阶段朗读的注册语音信息中的多个字符对应的特征向量保存在声纹识别装置的模型库中。In the voiceprint registration phase, the registered user can read a registration string (ie, the second character string appearing later), and the voiceprint recognition device collects the registered voice information when the registered user reads the registration string, and then registers the voice. Performing voice recognition on the information to obtain voice segments respectively corresponding to the plurality of characters in the registration string, and performing voiceprint feature extraction and voiceprint model training on the voice segments corresponding to each character, including And obtaining, according to the voiceprint feature of the voice segment corresponding to each character, a feature vector corresponding to each character in the registered voice information by using a Universal Background Model (UBM, ie, GMM-UBM) corresponding to the preset corresponding character, Then, the voiceprint recognition device can respectively save the feature vectors corresponding to the plurality of characters in the registered voice information that the different registered users read in the voiceprint registration stage in the model library of the voiceprint recognition device.

比如，注册字符串是数字字符串0185851，包含了四种数字“0”、“1”、“5”、“8”，则声纹识别装置根据注册语音信息中各个字符对应的语音片段进行声纹特征提取和声纹模型训练，得到“0”、“1”、“5”、“8”对应的语音片段的声纹特征，进而结合预设的相应字符对应的UBM训练得到注册语音信息中各个字符对应的特征向量，包括与数字“0”对应的特征向量、与数字“1”对应的特征向量、与数字“5”对应的特征向量以及与数字“8”对应的特征向量。For example, the registration string is a numeric string 0185851, and includes four numbers “0”, “1”, “5”, “8”, and the voiceprint recognition device performs sound according to the voice segment corresponding to each character in the registered voice information. The pattern feature extraction and the voiceprint model training obtain the voiceprint features of the voice segments corresponding to “0”, “1”, “5”, and “8”, and then obtain the registered voice information in combination with the UBM training corresponding to the preset corresponding characters. Individual characters Corresponding feature vectors include a feature vector corresponding to the number “0”, a feature vector corresponding to the number “1”, a feature vector corresponding to the number “5”, and a feature vector corresponding to the number “8”.

2)验证用户的身份识别阶段2) Verify the user's identification phase

在身份识别阶段，验证用户即未知身份的用户朗读一个验证字符串(即后文出现的第一字符串，所述第二字符串与所述第一字符串拥有至少一个相同的字符)，声纹识别装置采集该验证用户在朗读该验证字符串时的验证语音信息，然后对验证语音信息进行声音识别得到所述验证语音信息中包含的分别与所述验证字符串中的多个字符对应的语音片段，进而对各个字符对应的语音片段进行声纹特征提取和声纹模型训练，包括根据所述各个字符对应的语音片段的声纹特征，结合预设的相应字符对应的UBM训练得到验证语音信息中各个字符对应的特征向量，最后计算验证语音信息中各个字符对应的特征向量与预设的注册语音信息中相应字符对应的特征向量的相似度分数，若所述相似度分数达到预设验证门限，则将所述验证用户确定为所述注册语音信息对应的注册用户。In the identification phase, the user who authenticates the user, that is, the unknown identity, reads a verification string (ie, the first character string that appears later, and the second character string has at least one of the same characters as the first character string). The pattern recognition device collects the verification voice information when the verification user reads the verification character string, and then performs voice recognition on the verification voice information to obtain corresponding to the plurality of characters in the verification string respectively included in the verification voice information. The voice segment, and then the voiceprint feature extraction and the voiceprint model training for the voice segment corresponding to each character, including the voiceprint feature corresponding to the voice segment corresponding to the respective character, and the UBM training corresponding to the corresponding corresponding character is used to obtain the verification voice. The feature vector corresponding to each character in the information, and finally calculating the similarity score of the feature vector corresponding to each character in the verification voice information and the corresponding character in the preset registration voice information, if the similarity score reaches the preset verification a threshold, the verification user is determined as a note corresponding to the registration voice information Users.

比如，验证字符串为数字字符串85851510，则声纹识别装置根据验证用户朗读时产生的验证语音信息中各个字符对应的语音片段进行声纹特征提取和声纹模型训练，得到“0”、“1”、“5”、“8”对应的GMM，进而结合预设的相应字符对应的UBM可以计算得到验证用户的验证语音信息的特征向量，包括与数字“0”对应的特征向量、与数字“1”对应的特征向量、与数字“5”对应的特征向量以及与数字“8”对应的特征向量，进而分别计算验证语音信息中的“0”、“1”、“5”、“8”对应的特征向量分别与注册语音信息中的“0”、“1”、“5”、“8”对应的特征向量之间的相似度分数，若所述相似度分数达到预设验证门限，则将所述验证用户确定为所述注册语音信息对应的注册用户。For example, if the verification string is a numeric string 85851510, the voiceprint recognition device performs voiceprint feature extraction and voiceprint model training according to the voice segment corresponding to each character in the verification voice information generated when the user reads aloud, and obtains “0” and “ The GMM corresponding to 1", "5", and "8", and then combined with the preset UBM corresponding to the corresponding character, can calculate the feature vector of the verified voice information of the verified user, including the feature vector corresponding to the number "0", and the number a feature vector corresponding to "1", a feature vector corresponding to the number "5", and a feature vector corresponding to the number "8", thereby respectively calculating "0", "1", "5", "8" in the verification voice information a similarity score between the corresponding feature vectors and the feature vectors corresponding to "0", "1", "5", and "8" in the registered voice information, and if the similarity score reaches a preset verification threshold, And determining, by the verification user, a registered user corresponding to the registered voice information.

需要指出的是，上述注册用户的声纹注册阶段和验证用户的身份识别阶段可以在同一设备或装置中实现，也可以分别在不同的设备或装置中实现，例如注册用户的声纹注册阶段在第一设备中实施，进而第一设备将注册语音信息中的多个字符对应的特征向量发送给第二设备，从而可以在第二设备中实施验证用户的身份识别阶段。 It should be noted that the voiceprint registration phase of the registered user and the identity phase of the authenticated user may be implemented in the same device or device, or may be implemented in different devices or devices, for example, the voiceprint registration phase of the registered user is The first device is implemented, and the first device sends the feature vector corresponding to the multiple characters in the registered voice information to the second device, so that the identity recognition phase of the authenticated user can be implemented in the second device.

下面通过具体实施例分别对上述两个过程进行详细阐述。The above two processes are respectively explained in detail through specific embodiments.

图2是本发明实施例中的一种声纹识别方法的流程示意图，如图所示本实施例中的声纹识别方法流程可以包括：2 is a schematic flow chart of a method for identifying a voiceprint according to an embodiment of the present invention. As shown in the figure, the flow of the voiceprint recognition method in the embodiment may include:

S201，获取验证用户朗读第一字符串所产生的验证语音信息。S201. Acquire verification voice information generated by verifying that the user reads the first character string.

所述验证用户即未知身份的用户，需要通过声纹识别装置验证其用户身份。所述第一字符串是用于验证用户进行身份验证的字符串，可以是随机生成的，也可以是预设固定的一个字符串，例如与预先生成的注册语音信息对应的第二字符串至少部分相同的一个字符串。具体的，所述第一字符串可以包含m个字符，其中有n个互不相同的字符，m，n均为正整数，且m≥n。The user who authenticates the user, that is, the unknown identity, needs to verify the identity of the user through the voiceprint recognition device. The first character string is a character string used for verifying the user's identity verification, and may be randomly generated, or may be a preset fixed string, for example, a second string corresponding to the pre-generated registration voice information. Part of the same string. Specifically, the first character string may include m characters, wherein n characters are different from each other, m and n are positive integers, and m≥n.

比如，第一字符串是“12358948”，共8个字符，包括了7种互不相同的字符“1”、“2”、“3”、“4”、“5”、“8”、“9”。For example, the first string is "12358948", a total of 8 characters, including 7 different characters "1", "2", "3", "4", "5", "8", " 9".

在可选实施例中，声纹识别装置可以生成并显示所述第一字符串，让验证用户根据显示的所述第一字符串进行朗读。In an alternative embodiment, the voiceprint recognition device may generate and display the first character string for the verification user to read aloud according to the displayed first character string.

S202，对所述验证语音信息进行语音识别，以得到所述验证语音信息中包含的分别与所述第一字符串中的多个字符对应的语音片段。S202. Perform speech recognition on the verification voice information to obtain a voice segment respectively included in the verification voice information and corresponding to multiple characters in the first character string.

如图3所示，声纹识别装置可以通过语音识别以及声音强度过滤，将所述验证语音信息划分得到多个字符对应的语音片段，可选的还可以将无效语音片段剔除掉，不参与后续的处理过程。As shown in FIG. 3, the voiceprint recognition device can divide the verification voice information into voice segments corresponding to multiple characters through voice recognition and voice intensity filtering, and optionally remove the invalid voice segments, and do not participate in subsequent Process.

S203，提取各个字符对应的语音片段的声纹特征。S203. Extract a voiceprint feature of the voice segment corresponding to each character.

具体的，声纹识别装置可以提取各个字符对应的语音片段中的MFCC(Mel Frequency Cepstrum Coefficient，梅尔倒谱系数)或PLP(Perceptual Linear Predictive，感知线性预测系数)，作为各个字符所对应的语音片段的声纹特征。Specifically, the voiceprint recognition device may extract MFCC (Mel Frequency Cepstrum Coefficient) or PLP (Perceptual Linear Predictive Coefficient) in the voice segment corresponding to each character as the voice corresponding to each character. The voiceprint features of the clip.

S204，根据所述各个字符对应的语音片段的声纹特征，结合预设的相应字符对应的通用背景模型训练，得到验证语音信息中各个字符对应的特征向量；S204, according to the voiceprint feature of the voice segment corresponding to each character, combined with the common background model corresponding to the preset corresponding character, to obtain a feature vector corresponding to each character in the verification voice information;

本发明实施例中的所述通用背景模型UBM，是一种将大量说话人的特定数字的语音段混合训练而成的混合高斯模型，表征对应数字的语音在特征空间中的分布，又由于训练数据来源于大量的说话人，因此它不表征某一类具体的说话人，具有身份无关的特性，可看作是一种通用背景模型。示意性的，可以采用说话人数大于1000人、时长超过20小时的语音样本，并且各个字符的出现频率相对均衡，训练得到UBM。UBM的数学表达式为：The universal background model UBM in the embodiment of the present invention is a mixed Gaussian model in which a plurality of speech segments of a plurality of speakers are mixed and trained, and the distribution of the corresponding digital speech in the feature space is represented, and The data comes from a large number of speakers, so it does not represent a specific type of speaker, and has an identity-independent nature, which can be regarded as a general background model. Schematically, a speech sample with more than 1000 speakers and a duration of more than 20 hours can be used, and each character is out. The current frequency is relatively balanced and training is obtained by UBM. The mathematical expression of UBM is:

P(x)＝∑_t＝1...Ca_iN(x|μ_i,∑_i)…………式(1)P(x)=∑ _t=1...C a _i N(x|μ _i ,∑ _i )............(1)

其中，P(x)代表UBM的概率分布，C代表UBM中共有C个高斯模块进行加和，a_i代表第i个高斯模块的权重，μ_i代表第i个高斯模块的均值，∑_i代表第i个高斯模块的方差，N(x)代表高斯分布，x代表输入的样本的声纹特征。Where P(x) represents the probability distribution of UBM, C represents the total of C Gaussian modules in UBM, a _i represents the weight of the i-th Gaussian module, μ _i represents the mean of the i-th Gaussian module, ∑ _i represents The variance of the i-th Gaussian module, N(x) represents a Gaussian distribution, and x represents the voiceprint characteristics of the input sample.

声纹识别装置可以将验证语音信息中各个字符对应的语音片段的声纹特征作为训练样本数据，采用最大后验概率算法(Maximum A Posteriori，MAP)对预设的相应字符对应的通用背景模型的参数进行调整，即在将验证语音信息中各个字符对应的语音片段的声纹特征作为输入样本代入式(1)后，通过不断调整预设的相应字符对应的通用背景模型的参数，使得后验概率P(x)最大，从而可以根据使得后验概率P(x)最大的参数确定验证语音信息中相应字符对应的特征向量。The voiceprint recognition device may use the voiceprint feature of the voice segment corresponding to each character in the verification voice information as the training sample data, and adopt a maximum a posteriori probability algorithm (MAP) to the common background model corresponding to the preset corresponding character. The parameter is adjusted, that is, after the voiceprint feature of the voice segment corresponding to each character in the verification voice information is substituted into the formula (1) as an input sample, the posterior check is made by continuously adjusting the parameters of the common background model corresponding to the corresponding corresponding character. The probability P(x) is the largest, so that the feature vector corresponding to the corresponding character in the verification speech information can be determined according to the parameter that makes the posterior probability P(x) the largest.

由于大量的实验和论文验证了UBM模型中每个高斯模块的均值可以用于区分说话人的身份信息，定义UBM模型的均值超向量为：Since a large number of experiments and papers have verified that the mean of each Gaussian module in the UBM model can be used to distinguish the identity information of the speaker, the mean supervector of the UBM model is defined as:

从而，声纹识别装置可以将验证语音信息中各个字符对应的语音片段的声纹特征作为训练样本数据，采用最大后验概率算法(Maximum A Posteriori，MAP)对预设的相应字符对应的通用背景模型的均值超向量进行调整，即在将验证语音信息中各个字符对应的语音片段的声纹特征作为输入样本代入式(1)后，通过不断调整均值超向量，使得后验概率P(x)最大，从而可以将使得后验概率P(x)最大的均值超向量作为验证语音信息中相应字符对应的特征向量。Therefore, the voiceprint recognition device can use the voiceprint feature of the voice segment corresponding to each character in the verification voice information as the training sample data, and adopt the maximum posterior probability algorithm (MAP) to the common background corresponding to the preset corresponding character. The mean value supervector of the model is adjusted, that is, after the voiceprint feature of the voice segment corresponding to each character in the verification voice information is substituted into the formula (1) as an input sample, the posterior probability P(x) is adjusted by continuously adjusting the mean supervector. The maximum, so that the mean supervector that maximizes the posterior probability P(x) can be used as the feature vector corresponding to the corresponding character in the verification speech information.

在另一可选实施例中，为了降低超向量的高维度带来的收敛速度慢的问题，通过基于概率的主成分分析方法(PPCA，probabilistic principal component analysis)将均值超向量的变化范围限制在一个子空间中，声纹识别装置可以将验证语音信息中各个字符对应的语音片段的声纹特征作为训练样本数据，采用最大后验概率算法对预设的相应字符对应的通用背景模型的均值超向量进行调整，并结合预设的超向量子空间矩阵从而得到验证语音信息中各个字符对应的特征向量。具体实现中，可以采用下式对预设的相应字符对应的通用背景模型的均值超向量进行调整，使得调整后的相应字符对应的通用背景模型的后验概率最大：In another alternative embodiment, in order to reduce the slow convergence speed caused by the high dimension of the super vector, The profiling of the mean supervector is limited to a subspace by probabilistic principal component analysis (PPCA), and the voiceprint recognition device can verify the voiceprint features of the speech segments corresponding to each character in the speech information. As the training sample data, the maximum a posteriori probability algorithm is used to adjust the mean supervector of the common background model corresponding to the corresponding corresponding character, and the preset super-vector subspace matrix is used to obtain the corresponding features of each character in the verified voice information. vector. In a specific implementation, the mean supervector of the common background model corresponding to the preset corresponding character may be adjusted by using the following formula, so that the posterior probability of the universal background model corresponding to the adjusted corresponding character is the largest:

M＝m+Tω，其中M代表调整后的某个字符的通用背景模型的均值超向量，m代表调整前的相应字符的通用背景模型的均值超向量，T为预设的超向量子空间矩阵，ω即为验证语音信息中相应字符对应的特征向量，即在将验证语音信息中各个字符对应的语音片段的声纹特征作为输入样本代入式(1)后，通过不断调整ω可以实现调整式(1)中的均值超向量，使得后验概率P(x)最大，从而可以将使得后验概率P(x)最大的ω作为验证语音信息中相应字符对应的特征向量。所述超向量子空间矩阵T为根据所述高斯混合模型的均值超向量中各个维度向量之间的相关性确定得到的。M=m+Tω, where M represents the mean supervector of the universal background model of the adjusted character, m represents the mean supervector of the universal background model of the corresponding character before adjustment, and T is the preset supervector subspace matrix , ω is the feature vector corresponding to the corresponding character in the verification voice information, that is, after the voiceprint feature of the voice segment corresponding to each character in the verification voice information is substituted into the formula (1) as an input sample, the adjustment can be realized by continuously adjusting ω. The mean supervector in (1) maximizes the posterior probability P(x), so that ω that maximizes the posterior probability P(x) can be used as the feature vector corresponding to the corresponding character in the verification speech information. The super-vector subspace matrix T is determined according to the correlation between each dimension vector in the mean supervector of the Gaussian mixture model.

S205，计算验证语音信息中各个字符对应的特征向量与预设的注册语音信息中相应字符对应的特征向量的相似度分数，若所述相似度分数达到预设验证门限，则将所述验证用户确定为所述注册语音信息对应的注册用户。S205. Calculate a similarity score of the feature vector corresponding to the corresponding character in the preset registered voice information in the verification voice information, and if the similarity score reaches the preset verification threshold, the verification user is The registered user corresponding to the registered voice information is determined.

具体的，声纹识别装置可以在声纹注册阶段获取到注册用户的注册语音信息，并通过与本实施例相类似的声纹特征提取和声纹模型训练，可以得到注册语音信息中各个字符的语音片段对应的特征向量。所述注册语音信息，可以是声纹识别装置获取注册用户朗读第二字符串所产生的注册语音信息，所述第二字符串与所述第一字符串拥有至少一个相同的字符，即所述注册语音信息对应的第二字符串与所述第一字符串至少部分相同。进而在可选实施例中，声纹识别装置还可以从外部获取所述注册语音信息中相应字符对应的特征向量，即注册用户通过其他设备录入了注册语音信息后，其他设备或者服务器通过声纹特征提取和声纹模型训练得到注册语音信息中各个字符的语音片段对应的特征向量，声纹识别装置通过从其他设备或者服务器获取到所述注册语音信息中相应字符对应的特征向量，从而在验证用户的身份识别阶段用以与验证语音信息中各个字符对应的特征向量进行比较。Specifically, the voiceprint recognition device can obtain the registered voice information of the registered user in the voiceprint registration stage, and obtain the voiceprint feature extraction and the voiceprint model training similar to the embodiment, so that the characters in the registered voice information can be obtained. The feature vector corresponding to the speech segment. The registration voice information may be that the voiceprint recognition device acquires the registration voice information generated by the registered user reading the second character string, and the second character string has at least one of the same characters as the first character string, that is, the The second character string corresponding to the registration voice information is at least partially identical to the first character string. In an alternative embodiment, the voiceprint recognition device may further acquire the feature vector corresponding to the corresponding character in the registered voice information from the outside, that is, after the registered user inputs the registered voice information through other devices, the other device or the server passes the voiceprint. Feature extraction and voiceprint model training obtain feature vectors corresponding to voice segments of each character in the registered voice information, and the voiceprint recognition device acquires the phase in the registered voice information by using other devices or servers The feature vector corresponding to the character is used to compare the feature vector corresponding to each character in the verification voice information in the identification phase of the verification user.

具体实现中，所述相似度分数是声纹识别装置将验证语音信息中各个字符对应的特征向量与预设的注册语音信息中相应字符对应的特征向量进行比较后，衡量相同字符的两个特征向量之间的相似程度的分值。在可选实施例中，可以计算验证语音信息中各个字符对应的特征向量与预设的注册语音信息中相应字符对应的特征向量之间的余弦距离值，并将所述余弦距离值作为所述相似度分数，即通过下式计算某个字符分别在验证语音信息中对应的特征向量和注册语音信息中的特征向量之间的相似度分数：In a specific implementation, the similarity score is that the voiceprint recognition device compares the feature vector corresponding to each character in the voice information with the feature vector corresponding to the corresponding character in the preset registered voice information, and then measures two features of the same character. The degree of similarity between vectors. In an optional embodiment, a cosine distance value between a feature vector corresponding to each character in the verification voice information and a feature vector corresponding to a corresponding character in the preset registration voice information may be calculated, and the cosine distance value is used as the The similarity score, that is, the similarity score between the corresponding feature vector in the verification voice information and the feature vector in the registered voice information is calculated by the following formula:

其中，下标i表示第i个验证语音信息和注册语音信息中共有的字符，ω_i(tar)表示该字符在验证语音信息中对应的特征向量，ω_i(test)表示该字符在注册语音信息中对应的特征向量。若验证语音信息和注册语音信息中包含多个相同的字符，则可以根据上式计算得到的各个字符的相似度分数取均值，若各个字符的相似度分数均值达到对应的预设验证门限，则将所述验证用户确定为所述注册语音信息对应的注册用户。若存在多位注册用户，例如图1所示的注册用户A、B以及C，可以根据验证用户某个字符的特征向量与各个注册用户的相应字符的特征向量的相似度，当某个注册用户的相应字符的特征向量与验证语音的该字符的特征向量的相似度分数最高且相似度达到预设验证门限，则将该注册用户作为验证用户的身份识别结果。Wherein, the subscript i indicates a character common to the i-th verification voice information and the registration voice information, ω _i (tar) indicates a corresponding feature vector of the character in the verification voice information, and ω _i (test) indicates that the character is in the registered voice. The corresponding feature vector in the message. If the verification voice information and the registration voice information include multiple identical characters, the similarity scores of the characters calculated according to the above formula may be averaged. If the similarity scores of the respective characters reach the corresponding preset verification threshold, then The verification user is determined to be a registered user corresponding to the registration voice information. If there are multiple registered users, such as the registered users A, B, and C shown in FIG. 1, the registered user may be authenticated according to the similarity between the feature vector of a certain character of the user and the corresponding character of each registered user. The feature vector of the corresponding character and the feature vector of the character of the verification voice have the highest similarity score and the similarity reaches the preset verification threshold, and the registered user is used as the identification result of the verification user.

在可选实施例中，若所述验证语音信息中存在同一字符出现不止一次，例如出现如图2所示的验证语音信息中0、1、5以及8分别都出现了2次，那么可以按照两次字符0对应的语音片段处理得到的特征向量分别与预设的注册语音信息中字符0的特征向量的相似度分数的平均值，作为本次验证语音信息中字符0的特征向量与预设的注册语音信息中字符0的特征向量的相似度分数，以此类推。In an optional embodiment, if the same character appears in the verification voice information more than once, for example, if 0, 1, 5, and 8 appear in the verification voice information as shown in FIG. 2, respectively, then The average of the similarity scores of the feature vectors processed by the two-character 0 corresponding to the speech segment and the feature vector of the character 0 in the preset registration speech information is used as the feature vector and preset of the character 0 in the verification speech information. The similarity score of the feature vector of character 0 in the registered voice information, and so on.

需要指出的是，衡量两个特征向量之间的相似度的方式还有很多，以上仅是本发明提供的一种实施方式，本领域技术人员在本发明公开的方案的基础上可以无需创造性劳动地获得更多的计算验证语音信息和注册语音信息中共有的字符的特征向量的相似度分数的方式，本发明无需穷举。It should be noted that there are many ways to measure the similarity between two feature vectors. The above is only one embodiment provided by the present invention, and those skilled in the art based on the solution disclosed by the present invention. It is possible to obtain more ways of calculating the similarity scores of the feature vectors of the characters shared in the voice information and the registered voice information without creative labor, and the present invention need not be exhaustive.

从而，本实施例通过获取验证用户的验证语音信息中各个字符对应的语音片段的声纹特征，结合预设的相应字符的UBM训练得到验证语音信息中各个字符对应的特征向量，并通过将验证语音信息中各个字符对应的特征向量与注册语音信息中相应字符的特征向量进行相似度比较，从而确定验证用户的用户身份，该方式用以比较的用户特征向量与具体字符对应，充分考虑到用户朗读不同字符时的声纹特征，从而可有效提高声纹识别准确率。Therefore, in this embodiment, by obtaining the voiceprint feature of the voice segment corresponding to each character in the verification voice information of the verification user, the UTM training corresponding to the preset corresponding character is used to obtain the feature vector corresponding to each character in the verification voice information, and the verification is performed. The feature vector corresponding to each character in the voice information is compared with the feature vector of the corresponding character in the registered voice information, thereby determining the user identity of the verified user. The user feature vector used for comparison corresponds to the specific character, and the user is fully considered. The voiceprint features of different characters are read aloud, so that the accuracy of voiceprint recognition can be effectively improved.

图5是本发明实施例中注册用户的声纹注册流程示意图，如图所示本实施例中的声纹注册流程可以包括：5 is a schematic diagram of a voiceprint registration process of a registered user in the embodiment of the present invention. As shown in the figure, the voiceprint registration process in this embodiment may include:

S501，获取注册用户朗读第二字符串所产生的注册语音信息，所述第二字符串与所述第一字符串拥有至少一个相同的字符。S501. Acquire a registration voice information generated by a registered user reading a second character string, where the second character string has at least one identical character with the first character string.

所述注册用户即确定合法身份的用户，所述第二字符串是用于采集注册用户声纹特征向量的字符串，可以是随机生成的，也可以是预设固定的一个字符串。具体的，所述第二字符串也可以包含m个字符，其中有n个互不相同的字符，m，n均为正整数，且m≥n。The registered user is a user who determines a legal identity. The second character string is a string used to collect the voiceprint feature vector of the registered user, and may be randomly generated or a fixed string. Specifically, the second character string may also include m characters, wherein there are n characters different from each other, m and n are positive integers, and m≥n.

在可选实施例中，声纹识别装置可以生成并显示所述第二字符串，让注册用户根据显示的所述第二字符串进行朗读。In an alternative embodiment, the voiceprint recognition device may generate and display the second character string for the registered user to read aloud according to the displayed second character string.

S502，对所述注册语音信息进行语音识别得到所述注册语音信息中包含的分别与所述第二字符串中的多个字符对应的语音片段；S502. Perform speech recognition on the registration voice information to obtain a voice segment respectively included in the registration voice information and corresponding to multiple characters in the second character string.

声纹识别装置可以通过语音识别以及声音强度过滤，将所述验证语音信息划分得到多个字符对应的语音片段，可选的还可以将无效语音片段剔除掉，不参与后续的处理过程。The voiceprint recognition device can divide the verification voice information into voice segments corresponding to multiple characters through voice recognition and voice intensity filtering, and optionally remove the invalid voice segments, and do not participate in subsequent processing.

S503，提取注册语音信息中各个字符对应的语音片段的声纹特征。S503. Extract a voiceprint feature of the voice segment corresponding to each character in the registered voice information.

S504，根据注册语音信息中各个字符对应的语音片段的声纹特征，结合预设的相应字符对应的通用背景模型训练得到注册语音信息中各个字符对应的特征向量。S504, according to the voiceprint feature of the voice segment corresponding to each character in the registered voice information, combined with the pre- The general background model corresponding to the corresponding character is trained to obtain the feature vector corresponding to each character in the registered voice information.

UBM的表达式可以参考前文实施例。声纹注册流程的该步骤与声纹识别流程的S204类似，声纹识别装置可以将注册语音信息中各个字符对应的语音片段的声纹特征作为训练样本数据，采用最大后验概率算法(Maximum A Posteriori，MAP)对预设的相应字符对应的通用背景模型的参数进行调整，即在将注册语音信息中各个字符对应的语音片段的声纹特征作为输入样本代入式(1)后，通过不断调整预设的相应字符对应的通用背景模型的参数，使得后验概率P(x)最大，从而可以根据使得后验概率P(x)最大的参数确定注册语音信息中相应字符对应的特征向量。The expression of UBM can refer to the previous embodiment. This step of the voiceprint registration process is similar to S204 of the voiceprint recognition process. The voiceprint recognition device can use the voiceprint feature of the voice segment corresponding to each character in the registered voice information as the training sample data, and adopt the maximum posterior probability algorithm (Maximum A). Posteriori, MAP) adjusts the parameters of the common background model corresponding to the corresponding corresponding characters, that is, after the voiceprint feature of the voice segment corresponding to each character in the registered voice information is substituted into the formula (1) as an input sample, The parameters of the common background model corresponding to the preset corresponding characters are such that the posterior probability P(x) is the largest, so that the feature vector corresponding to the corresponding character in the registered voice information can be determined according to the parameter that makes the posterior probability P(x) the largest.

而由于UBM模型中每个高斯模块的均值可以用于区分说话人的身份信息，声纹识别装置可以将注册语音信息中各个字符对应的语音片段的声纹特征作为训练样本数据，采用最大后验概率算法(Maximum A Posteriori，MAP)对预设的相应字符对应的通用背景模型的均值超向量进行调整，即在将注册语音信息中各个字符对应的语音片段的声纹特征作为输入样本代入式(1)后，通过不断调整均值超向量，使得后验概率P(x)最大，从而可以将使得后验概率P(x)最大的均值超向量作为注册语音信息中相应字符对应的特征向量。Since the mean value of each Gaussian module in the UBM model can be used to distinguish the identity information of the speaker, the voiceprint recognition device can use the voiceprint feature of the voice segment corresponding to each character in the registered voice information as the training sample data, using the maximum posterior The Maximum A Posteriori (MAP) adjusts the mean supervector of the universal background model corresponding to the preset corresponding characters, that is, the voiceprint feature of the voice segment corresponding to each character in the registered voice information is used as an input sample substitution type ( 1) After, by continuously adjusting the mean supervector, the posterior probability P(x) is maximized, so that the mean supervector that maximizes the posterior probability P(x) can be used as the feature vector corresponding to the corresponding character in the registered speech information.

在另一可选实施例中，可以采用下式对预设的相应字符对应的通用背景模型的均值超向量进行调整，使得调整后的相应字符对应的通用背景模型的后验概率最大：In another optional embodiment, the mean supervector of the universal background model corresponding to the preset corresponding character may be adjusted by using the following formula, so that the posterior probability of the universal background model corresponding to the adjusted corresponding character is the largest:

M＝m+Tω，其中M代表调整后的某个字符的通用背景模型的均值超向量，m代表调整前的相应字符的通用背景模型的均值超向量，T为预设的超向量子空间矩阵，ω即为注册语音信息中相应字符对应的特征向量，即在将注册语音信息中各个字符对应的语音片段的声纹特征作为输入样本代入式(1)后，通过不断调整ω可以实现调整式(1)中的均值超向量，使得后验概率P(x)最大，从而可以将使得后验概率P(x)最大的ω作为注册语音信息中相应字符对应的特征向量。M=m+Tω, where M represents the mean supervector of the universal background model of the adjusted character, m represents the mean supervector of the universal background model of the corresponding character before adjustment, and T is the preset supervector subspace matrix , ω is the feature vector corresponding to the corresponding character in the registered voice information, that is, after the voiceprint feature of the voice segment corresponding to each character in the registered voice information is substituted into the formula (1) as an input sample, the adjustment can be realized by continuously adjusting ω. The mean supervector in (1) maximizes the posterior probability P(x), so that ω that maximizes the posterior probability P(x) can be used as the feature vector corresponding to the corresponding character in the registered speech information.

图6是本发明另一实施例中的声纹识别方法的流程示意图，如图所示本实施例中的声纹识别方法可以包括以下流程：6 is a schematic flow chart of a voiceprint recognition method according to another embodiment of the present invention, as shown in the figure. The voiceprint recognition method in the embodiment may include the following processes:

S601，随机生成第一字符串并进行显示。S601. Randomly generate a first character string and display it.

S602，获取验证用户朗读第一字符串所产生的验证语音信息。S602. Acquire verification voice information generated by verifying that the user reads the first character string.

S603，识别所述验证语音信息中的有效语音片段和无效语音片段。S603. Identify valid voice segments and invalid voice segments in the verification voice information.

具体的，可以根据声音强度对验证语音进行划分，将声音强度较小的语音片段视为无效语音片段(例如包括静音段和脉冲噪声)。Specifically, the verification voice may be divided according to the sound intensity, and the voice segment with less sound intensity is regarded as an invalid voice segment (for example, including a silent segment and impulse noise).

S604，对所述有效语音片段进行语音识别得到分别与所述第一字符串中的多个字符对应的语音片段。S604. Perform speech recognition on the valid speech segment to obtain a speech segment respectively corresponding to the plurality of characters in the first character string.

可以通过语音识别，得到分别与所述第一字符串中的多个字符对应的语音片段。A voice segment respectively corresponding to a plurality of characters in the first character string can be obtained by voice recognition.

S605，确定所述验证语音信息中的多个字符的语音片段的排序与所述第一字符串中的相应字符的排序一致。S605. Determine that the order of the voice segments of the plurality of characters in the verification voice information is consistent with the order of the corresponding characters in the first character string.

为了有效避免注册用户的语音信息被盗录或非法拷贝后用以进行声纹识别，可以每次随机生成不同的第一字符串，并在本步骤判断验证语音信息中的多个字符的语音片段的排序是否与第一字符串中的相应字符的排序一致，若不一致，则可以判断声纹识别失败，若与第一字符串中的相应字符的排序一致，则执行后续流程。In order to effectively prevent the voice information of the registered user from being pirated or illegally copied for voiceprint recognition, a different first character string may be randomly generated each time, and in this step, the voice segment of the plurality of characters in the voice information is determined. Whether the sorting is consistent with the sorting of the corresponding characters in the first string, if not, it can be judged that the voiceprint recognition fails, and if the sorting of the corresponding characters in the first string is consistent, the subsequent process is performed.

S606，提取各个字符对应的语音片段的声纹特征。S606. Extract voiceprint features of the voice segments corresponding to the respective characters.

S607，将验证语音信息中各个字符对应的语音片段的声纹特征作为训练样本数据，采用最大后验概率算法对预设的相应字符对应的通用背景模型的均值超向量进行调整，从而估计得到验证语音信息中各个字符对应的特征向量。S607, the voiceprint feature of the voice segment corresponding to each character in the voice information is used as the training sample data, and the maximum a posteriori probability algorithm is used to adjust the mean supervector of the common background model corresponding to the corresponding corresponding character, so that the estimation is verified. A feature vector corresponding to each character in the voice message.

由于大量的实验和论文验证了UBM模型中每个高斯模块的均值可以用于区分说话人的身份信息，声纹识别装置可以将验证语音信息中各个字符对应的语音片段的声纹特征作为训练样本数据，采用最大后验概率算法(Maximum A Posteriori，MAP)对预设的相应字符对应的通用背景模型的均值超向量进行调整，即在将验证语音信息中各个字符对应的语音片段的声纹特征作为输入样本代入式(1)后，通过不断调整均值超向量，使得后验概率P(x)最大，从而可以将使得后验概率P(x)最大的均值超向量作为验证语音信息中相应字符对应的特征向量。Since a large number of experiments and papers have verified that the mean of each Gaussian module in the UBM model can be used to distinguish the identity information of the speaker, the voiceprint recognition device can verify the voiceprint feature of the voice segment corresponding to each character in the voice information as a training sample. Data, using Maximum A Posteriori (MAP) to adjust the mean supervector of the common background model corresponding to the corresponding corresponding character, that is, the voiceprint feature of the voice segment corresponding to each character in the verification voice information As an input sample After substituting into equation (1), by continuously adjusting the mean supervector, the posterior probability P(x) is maximized, so that the mean supervector with the largest posterior probability P(x) can be used as the corresponding feature of the corresponding character in the verification speech information. vector.

在另一可选实施例中，为了降低超向量的高维度带来的收敛速度慢的问题，声纹识别装置可以采用下式对预设的相应字符对应的通用背景模型的均值超向量进行调整，使得调整后的相应字符对应的通用背景模型的后验概率最大：In another optional embodiment, in order to reduce the slow convergence speed caused by the high dimension of the super vector, the voiceprint recognition apparatus may adjust the mean super vector of the common background model corresponding to the preset corresponding character by using the following formula: The posterior probability of the universal background model corresponding to the adjusted corresponding characters is the largest:

M＝m+Tω，其中M代表调整后的某个字符的通用背景模型的均值超向量，m代表调整前的相应字符的通用背景模型的均值超向量，T为预设的超向量子空间矩阵，ω即为验证语音信息中相应字符对应的特征向量，即在将验证语音信息中各个字符对应的语音片段的声纹特征作为输入样本代入式(1)后，通过不断调整ω可以实现调整式(1)中的均值超向量，使得后验概率P(x)最大，从而可以将使得后验概率P(x)最大的ω作为验证语音信息中相应字符对应的特征向量。M=m+Tω, where M represents the mean supervector of the universal background model of the adjusted character, m represents the mean supervector of the universal background model of the corresponding character before adjustment, and T is the preset supervector subspace matrix , ω is the feature vector corresponding to the corresponding character in the verification voice information, that is, after the voiceprint feature of the voice segment corresponding to each character in the verification voice information is substituted into the formula (1) as an input sample, the adjustment can be realized by continuously adjusting ω. The mean supervector in (1) maximizes the posterior probability P(x), so that ω that maximizes the posterior probability P(x) can be used as the feature vector corresponding to the corresponding character in the verification speech information.

S608，计算验证语音信息中各个字符对应的特征向量与预设的注册语音信息中相应字符对应的特征向量的相似度分数，若相似度分数达到预设验证门限，则将验证用户确定为注册语音信息对应的注册用户。S608. Calculate a similarity score of the feature vector corresponding to each character in the verification voice information and the corresponding character in the preset registration voice information. If the similarity score reaches a preset verification threshold, the verification user is determined to be the registered voice. The registered user corresponding to the information.

本实施例中，声纹识别装置可以计算验证语音信息中各个字符对应的特征向量与预设的注册语音信息中相应字符对应的特征向量之间的余弦距离值，并将所述余弦距离值作为所述相似度分数，即通过下式计算某个字符分别在验证语音信息中对应的特征向量和注册语音信息中的特征向量之间的相似度分数：In this embodiment, the voiceprint recognition apparatus may calculate a cosine distance value between the feature vector corresponding to each character in the verification voice information and the feature vector corresponding to the corresponding character in the preset registration voice information, and use the cosine distance value as The similarity score, that is, the similarity score between the corresponding feature vector in the verification voice information and the feature vector in the registered voice information is calculated by the following formula:

从而，本实施例通过将验证语音信息中各个字符对应的特征向量与注册语音信息中相应字符的特征向量进行相似度比较，并且结合了语音片段的时序判断，可以进一步确保验证用户的用户身份的准确性。Therefore, in this embodiment, by verifying the similarity between the feature vector corresponding to each character in the verification voice information and the feature vector of the corresponding character in the registered voice information, and combining the timing judgment of the voice segment, the user identity of the user can be further verified. accuracy.

图7是本发明实施例中的一种声纹识别装置的结构示意图，如图所示本实施例中的声纹识别装置可以包括：FIG. 7 is a schematic structural diagram of a voiceprint identifying apparatus according to an embodiment of the present invention. As shown in the figure, the voiceprint identifying apparatus in this embodiment may include:

语音获取模块710，用于获取验证用户朗读第一字符串所产生的验证语音信息。The voice obtaining module 710 is configured to obtain verification voice information generated by the user to read the first character string.

语音片段识别模块720，用于对所述验证语音信息进行语音识别得到所述验证语音信息中包含的分别与所述第一字符串中的多个字符对应的语音片段。The voice segment identification module 720 is configured to perform voice recognition on the verification voice information to obtain voice segments respectively corresponding to the plurality of characters in the first character string included in the verification voice information.

如图3所示，语音片段识别模块720可以通过语音识别以及声音强度过滤，将所述验证语音信息划分得到多个字符对应的语音片段，可选的还可以将无效语音片段剔除掉，不参与后续的处理过程。As shown in FIG. 3, the voice segment identification module 720 can divide the verification voice information into voice segments corresponding to multiple characters through voice recognition and voice intensity filtering, and optionally remove the invalid voice segments without participating. Subsequent processing.

在可选实施例中，所述语音片段识别模块如图8所示进一步可以包括：In an optional embodiment, the voice segment identification module may further include:

有效片段识别单元721，用于识别所述验证语音信息中的有效语音片段和无效语音片段。The valid segment identification unit 721 is configured to identify a valid speech segment and an invalid speech segment in the verification speech information.

具体的，有效片段识别单元721可以根据声音强度对验证语音进行划分，将声音强度较小的语音片段视为无效语音片段(例如包括静音段和脉冲噪声)。Specifically, the effective segment recognizing unit 721 can divide the verification speech according to the sound intensity, and treat the speech segment with a small sound intensity as an invalid speech segment (for example, including a silent segment and impulse noise).

语音识别单元722，用于对所述有效语音片段进行语音识别得到分别与所述第一字符串中的多个字符对应的语音片段。a voice recognition unit 722, configured to perform voice recognition on the valid voice segment to obtain a separate voice A speech segment corresponding to a plurality of characters in the first character string.

声纹特征提取模块730，用于提取验证语音信息中各个字符对应的语音片段的声纹特征。The voiceprint feature extraction module 730 is configured to extract a voiceprint feature of the voice segment corresponding to each character in the verification voice information.

具体的，声纹特征提取模块730可以提取各个字符对应的语音片段中的MFCC(Mel Frequency Cepstrum Coefficient，梅尔倒谱系数)或PLP(Perceptual Linear Predictive，感知线性预测系数)，作为各个字符所对应的语音片段的声纹特征。Specifically, the voiceprint feature extraction module 730 can extract MFCC (Mel Frequency Cepstrum Coefficient) or PLP (Perceptual Linear Predictive Coefficient) in the voice segment corresponding to each character, as corresponding to each character. The voiceprint features of the speech segment.

特征模型训练模块740，用于根据所述各个字符对应的语音片段的声纹特征，结合预设的相应字符对应的通用背景模型训练得到验证语音信息中各个字符对应的特征向量。The feature model training module 740 is configured to train, according to the voiceprint feature of the voice segment corresponding to the respective characters, a feature vector corresponding to each character in the verification voice information according to the common background model corresponding to the preset corresponding character.

特征模型训练模块740可以将验证语音信息中各个字符对应的语音片段的声纹特征作为训练样本数据，采用最大后验概率算法(Maximum A Posteriori，MAP)对预设的相应字符对应的通用背景模型的参数进行调整，即在将验证语音信息中各个字符对应的语音片段的声纹特征作为输入样本代入式(1)后，通过不断调整预设的相应字符对应的通用背景模型的参数，使得后验概率P(x)最大，从而特征模型训练模块740可以根据使得后验概率P(x)最大的参数确定验证语音信息中相应字符对应的特征向量。The feature model training module 740 may use the voiceprint feature of the voice segment corresponding to each character in the voice information as the training sample data, and adopt a maximum posterior probability algorithm (MAP) to the common background model corresponding to the preset corresponding character. The parameters are adjusted, that is, after the voiceprint feature of the voice segment corresponding to each character in the verification voice information is substituted into the formula (1) as an input sample, the parameters of the common background model corresponding to the corresponding corresponding characters are continuously adjusted, so that The probability P(x) is the largest, so that the feature model training module 740 can determine the feature vector corresponding to the corresponding character in the verification voice information according to the parameter that maximizes the posterior probability P(x).

由于大量的实验和论文验证了UBM模型中每个高斯模块的均值可以用于区分说话人的身份信息，我们定义UBM模型的均值超向量为：Since a large number of experiments and papers have verified that the mean of each Gaussian module in the UBM model can be used to distinguish the identity information of the speaker, we define the mean supervector of the UBM model as:

从而，特征模型训练模块740可以将验证语音信息中各个字符对应的语音片段的声纹特征作为训练样本数据，采用最大后验概率算法(Maximum A Posteriori，MAP)对预设的相应字符对应的通用背景模型的均值超向量进行调整，即在将验证语音信息中各个字符对应的语音片段的声纹特征作为输入样本代入式(1)后，通过不断调整均值超向量，使得后验概率P(x)最大，特征模型训练模块740可以将使得后验概率P(x)最大的均值超向量作为验证语音信息中相应字符对应的特征向量。Therefore, the feature model training module 740 can use the voiceprint feature of the voice segment corresponding to each character in the voice information as the training sample data, and use the Maximum A Posteriori (MAP) algorithm to correspond to the preset corresponding character. The mean supervector of the background model is adjusted In the whole, that is, after the voiceprint feature of the voice segment corresponding to each character in the verification voice information is substituted into the formula (1), the posterior probability P(x) is maximized by continuously adjusting the mean supervector, and the feature model training module 740 may use the mean supervector that maximizes the posterior probability P(x) as the feature vector corresponding to the corresponding character in the verification speech information.

在另一可选实施例中，为了降低超向量的高维度带来的收敛速度慢的问题，我们通过基于概率的主成分分析方法(PPCA，probabilistic principal component analysis)将均值超向量的变化范围限制在一个子空间中，特征模型训练模块740可以将验证语音信息中各个字符对应的语音片段的声纹特征作为训练样本数据，采用最大后验概率算法对预设的相应字符对应的通用背景模型的均值超向量进行调整，并结合预设的超向量子空间矩阵从而得到验证语音信息中各个字符对应的特征向量。具体实现中，特征模型训练模块740可以采用下式对预设的相应字符对应的通用背景模型的均值超向量进行调整，使得调整后的相应字符对应的通用背景模型的后验概率最大：In another alternative embodiment, in order to reduce the slow convergence rate caused by the high dimension of the supervector, we limit the variation range of the mean supervector by probabilistic principal component analysis (PPCA). In a subspace, the feature model training module 740 may use the voiceprint feature of the voice segment corresponding to each character in the verification voice information as the training sample data, and adopt the maximum posterior probability algorithm to the common background model corresponding to the preset corresponding character. The mean super vector is adjusted, and the preset super vector subspace matrix is combined to obtain a feature vector corresponding to each character in the verified speech information. In a specific implementation, the feature model training module 740 may adjust the mean super vector of the common background model corresponding to the preset corresponding character by using the following formula, so that the posterior probability of the common background model corresponding to the adjusted corresponding character is the largest:

相似度判断模块750，用于计算验证语音信息中各个字符对应的特征向量与预设的注册语音信息中相应字符对应的特征向量的相似度分数。The similarity determining module 750 is configured to calculate a similarity score of the feature vector corresponding to the corresponding character in the preset registered voice information in the verification voice information.

具体的，声纹识别装置在可以在声纹注册阶段获取到注册用户的注册语音信息，并通过语音片段识别模块720、声纹特征提取模块730以及特征模型训练模块740，可以得到注册语音信息中各个字符的语音片段对应的特征向量。所述注册语音信息，可以是声纹识别装置获取注册用户朗读第二字符串所产生的注册语音信息，所述第二字符串与所述第一字符串拥有至少一个相同的字符，即所述注册语音信息对应的第二字符串与所述第一字符串至少部分相同。进而在可选实施例中，声纹识别装置还可以从外部获取所述注册语音信息中相应字符对应的特征向量，即注册用户通过其他设备录入了注册语音信息后，其他设备或者服务器通过声纹特征提取和声纹模型训练得到注册语音信息中各个字符的语音片段对应的特征向量，声纹识别装置通过从其他设备或者服务器获取到所述注册语音信息中相应字符对应的特征向量，从而在验证用户的身份识别阶段相似度判断模块750用以与验证语音信息中各个字符对应的特征向量进行比较。Specifically, the voiceprint recognition device can obtain the registered voice information of the registered user in the voiceprint registration stage, and can obtain the registered voice information through the voice segment identification module 720, the voiceprint feature extraction module 730, and the feature model training module 740. The feature vector corresponding to the speech segment of each character. The registration voice information may be that the voiceprint recognition device acquires the registration voice information generated by the registered user reading the second character string, and the second character string has at least one of the same characters as the first character string, that is, the The second character string corresponding to the registration voice information is at least partially identical to the first character string. and then In an optional embodiment, the voiceprint recognition device may further acquire a feature vector corresponding to the corresponding character in the registered voice information, that is, after the registered user inputs the registered voice information through other devices, the other device or the server passes the voiceprint feature. The extraction and voiceprint model training obtains a feature vector corresponding to the voice segment of each character in the registered voice information, and the voiceprint recognition device obtains the feature vector corresponding to the corresponding character in the registered voice information from other devices or servers, thereby verifying the user The identification stage similarity judgment module 750 is configured to compare the feature vectors corresponding to the respective characters in the verification voice information.

具体实现中，所述相似度分数是声纹识别装置将验证语音信息中各个字符对应的特征向量与预设的注册语音信息中相应字符对应的特征向量进行比较后，衡量相同字符的两个特征向量之间的相似程度的分值。在可选实施例中，相似度判断模块750可以计算验证语音信息中各个字符对应的特征向量与预设的注册语音信息中相应字符对应的特征向量之间的余弦距离值，并将所述余弦距离值作为所述相似度分数，即通过下式计算某个字符分别在验证语音信息中对应的特征向量和注册语音信息中的特征向量之间的相似度分数：In a specific implementation, the similarity score is that the voiceprint recognition device compares the feature vector corresponding to each character in the voice information with the feature vector corresponding to the corresponding character in the preset registered voice information, and then measures two features of the same character. The degree of similarity between vectors. In an optional embodiment, the similarity determination module 750 may calculate a cosine distance value between the feature vector corresponding to each character in the verification voice information and the feature vector corresponding to the corresponding character in the preset registration voice information, and the cosine distance value is The distance value is used as the similarity score, that is, the similarity score between the corresponding feature vector in the verification voice information and the feature vector in the registered voice information is calculated by the following formula:

其中，下标i表示第i个验证语音信息和注册语音信息中共有的字符，ω_i(tar)表示该字符在验证语音信息中对应的特征向量，ω_i(test)表示该字符在注册语音信息中对应的特征向量。在可选实施例中，若所述验证语音信息中存在同一字符出现不止一次，例如出现如图2所示的验证语音信息中0、1、5以及8分别都出现了2次，那么可以将两次字符0对应的语音片段处理得到的特征向量分别与预设的注册语音信息中字符0的特征向量的相似度分数的平均值，作为本次验证语音信息中字符0的特征向量与预设的注册语音信息中字符0的特征向量的相似度分数，以此类推。Wherein, the subscript i indicates a character common to the i-th verification voice information and the registration voice information, ω _i (tar) indicates a corresponding feature vector of the character in the verification voice information, and ω _i (test) indicates that the character is in the registered voice. The corresponding feature vector in the message. In an optional embodiment, if the same character appears in the verification voice information more than once, for example, if 0, 1, 5, and 8 appear in the verification voice information as shown in FIG. 2, respectively, then The average of the similarity scores of the feature vectors processed by the two-character 0 corresponding to the speech segment and the feature vector of the character 0 in the preset registration speech information is used as the feature vector and preset of the character 0 in the verification speech information. The similarity score of the feature vector of character 0 in the registered voice information, and so on.

需要指出的是，衡量两个特征向量之间的相似度的方式还有很多，以上仅是本发明提供的一种实施方式，本领域技术人员在本发明公开的方案的基础上可以无需创造性劳动地获得更多的计算验证语音信息和注册语音信息中共有的字符的特征向量的相似度分数的方式，本发明无需穷举。It should be noted that there are many ways to measure the similarity between two feature vectors. The above is only one embodiment provided by the present invention, and those skilled in the art can perform no creative work on the basis of the solution disclosed by the present invention. In order to obtain more ways of calculating the similarity scores of the feature vectors of the characters shared in the verification voice information and the registration voice information, the present invention need not be exhaustive.

用户识别模块760，用于若所述相似度分数达到预设验证门限，则将所述验证用户确定为所述注册语音信息对应的注册用户。a user identification module 760, configured to: if the similarity score reaches a preset verification threshold, The verification user determines the registered user corresponding to the registered voice information.

若验证语音信息和注册语音信息中包含多个相同的字符，则用户识别模块760可以根据相似度判断模块750计算得到的各个字符的相似度分数取均值，若各个字符的相似度分数均值达到对应的预设验证门限，则将所述验证用户确定为所述注册语音信息对应的注册用户。若存在多位注册用户，例如图1所示的注册用户A、B以及C，用户识别模块760可以根据验证用户某个字符的特征向量与各个注册用户的相应字符的特征向量的相似度，当某个注册用户的相应字符的特征向量与验证语音的该字符的特征向量的相似度分数最高且相似度达到预设验证门限，则将该注册用户作为验证用户的身份识别结果。If the verification voice information and the registration voice information include a plurality of identical characters, the user identification module 760 may take the average value of the similarity scores of the respective characters calculated by the similarity determination module 750, if the average value of the similarity scores of the respective characters reaches the corresponding value. The preset verification threshold determines the authenticated user as the registered user corresponding to the registered voice information. If there are multiple registered users, such as the registered users A, B, and C shown in FIG. 1, the user identification module 760 may perform the similarity between the feature vector of a certain character of the user and the feature vector of the corresponding character of each registered user. If the feature vector of the corresponding character of a registered user and the feature vector of the character of the verification voice have the highest similarity score and the similarity reaches the preset verification threshold, the registered user is used as the identification result of the verification user.

进而在可选实施例中，所述语音获取模块710，还用于获取注册用户朗读第二字符串所产生的注册语音信息，所述第二字符串与所述第一字符串拥有至少一个相同的字符；In an optional embodiment, the voice acquiring module 710 is further configured to obtain registration voice information generated by the registered user reading the second character string, where the second character string has at least one of the same as the first character string. character of;

所述语音片段识别模块720，还用于对所述注册语音信息进行语音识别得到所述注册语音信息中包含的分别与所述第二字符串中的多个字符对应的语音片段；The voice segment identification module 720 is further configured to perform voice recognition on the registration voice information to obtain a voice segment respectively included in the registration voice information and corresponding to multiple characters in the second character string;

所述声纹特征提取模块730，还用于提取注册语音信息中各个字符对应的语音片段的声纹特征；The voiceprint feature extraction module 730 is further configured to extract a voiceprint feature of a voice segment corresponding to each character in the registered voice information;

所述特征模型训练模块740，还用于根据所述注册语音信息中各个字符对应的语音片段的声纹特征，结合预设的相应字符对应的通用背景模型训练得到注册语音信息中各个字符对应的特征向量。The feature model training module 740 is further configured to: according to the voiceprint feature of the voice segment corresponding to each character in the registered voice information, and the common background model corresponding to the preset corresponding character, the corresponding characters in the registered voice information are trained. Feature vector.

在可选实施例中，声纹识别装置进一步还可以包括：In an alternative embodiment, the voiceprint recognition apparatus may further include:

字符排序确定模块770，用于确定所述验证语音信息中的多个字符的语音片段的排序与所述第一字符串中的相应字符的排序是否一致。The character order determining module 770 is configured to determine whether the order of the voice segments of the plurality of characters in the verification voice information is consistent with the order of the corresponding characters in the first character string.

为了有效避免注册用户的语音信息被盗录或非法拷贝后用以进行声纹识别，可以每次随机生成不同的第一字符串，并在本步骤判断验证语音信息中的多个字符的语音片段的排序是否与第一字符串中的相应字符的排序一致，若不一致，则可以判断声纹识别失败，若与第一字符串中的相应字符的排序一致，则可以通知声纹特征提取模块730或特征模型训练模块740执行针对该验证语音信息的特征提取和声纹训练。 In order to effectively prevent the voice information of the registered user from being pirated or illegally copied for voiceprint recognition, a different first character string may be randomly generated each time, and in this step, the voice segment of the plurality of characters in the voice information is determined. Whether the sorting is consistent with the sorting of the corresponding characters in the first string, if not, it may be determined that the voiceprint recognition fails, and if the sorting of the corresponding characters in the first character string is consistent, the voiceprint feature extraction module 730 may be notified. Or feature model training module 740 performs feature extraction and voiceprint training for the verification voice information.

字符串显示模块700，用于随机生成所述第一字符串并进行显示。The string display module 700 is configured to randomly generate the first string and display it.

在实际测试实例中，在1000人训练样本，29万次测试中(其中身份匹配的测试在1万次左右，不匹配测试约在28万次)，能够实现千分之一错误率下79.8％的召回率，等错概率(EER，Equal Error Rate)为3.39％，相较于传统的文本无关建模方法，声纹识别性能提升超过40％以上。In the actual test case, in the 1000-person training sample, 290,000 tests (in which the identity matching test is about 10,000 times, the mismatch test is about 280,000 times), it is able to achieve 79.8% of the one-thousandth error rate. The recall rate, EER (Equal Error Rate) is 3.39%. Compared with the traditional text-independent modeling method, the voiceprint recognition performance is improved by more than 40%.

图9为本发明实施例提供的另一种声纹识别装置的结构示意图。如图9所示，所述声纹识别装置1000可以包括：至少一个处理器1001(例如CPU)、用户接口1003、存储器1005以及至少一个通信总线1002。其中，通信总线1002用于实现这些组件之间的连接通信。其中，用户接口1003可以包括显示屏(Display)、键盘(Keyboard)、麦克风等，可选地用户接口1003还可以包括标准的有线接口、无线接口。存储器1005可以是高速RAM存储器，也可以是非不稳定的存储器(non-volatile memory)，例如至少一个磁盘存储器。如图9所示，作为一种计算机存储介质的存储器1005中可以包括操作系统、用户接口模块以及计算机可执行程序代码(例如声纹识别程序)。FIG. 9 is a schematic structural diagram of another voiceprint recognition apparatus according to an embodiment of the present invention. As shown in FIG. 9, the voiceprint recognition apparatus 1000 may include at least one processor 1001 (eg, a CPU), a user interface 1003, a memory 1005, and at least one communication bus 1002. Among them, the communication bus 1002 is used to implement connection communication between these components. The user interface 1003 may include a display, a keyboard, a microphone, and the like. Optionally, the user interface 1003 may further include a standard wired interface and a wireless interface. The memory 1005 may be a high speed RAM memory or a non-volatile memory such as at least one disk memory. As shown in FIG. 9, an operating system, a user interface module, and computer executable program code (eg, a voiceprint recognition program) may be included in the memory 1005 as a computer storage medium.

在图9所示的声纹识别装置1000中，处理器1001可以用于调用存储器1005中存储的计算机可执行程序代码，并具体执行以下步骤：In the voiceprint recognition apparatus 1000 shown in FIG. 9, the processor 1001 can be used to call the computer executable program code stored in the memory 1005, and specifically perform the following steps:

在一种实现方式中，所述获取所述验证用户朗读所述第一字符串所产生的所述验证语音信息之前，所述处理器1001还调用所述计算机可执行程序代码执行以下操作：In an implementation manner, the obtaining, by the verification user, reading the first character string Before the verifying the voice information, the processor 1001 also calls the computer executable program code to perform the following operations:

在一种实现方式中，所述处理器1001调用所述计算机可执行程序代码执行以下操作以计算验证语音信息中各个字符对应的语音片段与预设的注册语音信息中相应字符对应的语音片段的声纹特征的相似度：In an implementation manner, the processor 1001 invokes the computer executable program code to perform the following operations to calculate a voice segment corresponding to a corresponding one of the preset voice information in the voice message corresponding to each character in the verification voice information. Similarity of voiceprint features:

在一种实现方式中，所述获取所述验证用户朗读所述第一字符串所产生的所述验证语音信息之前，所述处理器1001还调用所述计算机可执行程序代码执行以下操作：获取注册用户朗读第二字符串所产生的注册语音信息，所述第二字符串与所述第一字符串拥有至少一个相同的字符；对所述注册语音信息进行语音识别，得到所述注册语音信息中包含的分别与所述第二字符串中的多个字符对应的语音片段；提取所述注册语音信息中各个字符对应的语音片段的声纹特征；根据所述注册语音信息中各个字符对应的语音片段的声纹特征，结合预设的相应字符对应的通用背景模型训练，得到所述注册语音信息中各个字符对应的特征向量。In an implementation manner, before the obtaining the verification voice information generated by the verification user to read the first character string, the processor 1001 further invokes the computer executable program code to perform the following operations: acquiring Registering a user to read the registered voice information generated by the second character string, the second character string having at least one of the same characters as the first character string; performing voice recognition on the registered voice information to obtain the registered voice information a voice segment respectively corresponding to the plurality of characters in the second character string; extracting a voiceprint feature of the voice segment corresponding to each character in the registered voice information; and corresponding to each character in the registered voice information The voiceprint feature of the voice segment is trained in conjunction with a common background model corresponding to the corresponding corresponding character to obtain a feature vector corresponding to each character in the registered voice information.

在一种实现方式中，所述处理器1001调用所述计算机可执行程序代码执行以下操作以根据所述各个字符对应的语音片段的声纹特征，结合所述预设的相应字符对应的通用背景模型训练，得到所述验证语音信息中各个字符对应的特征向量：将所述验证语音信息中各个字符对应的语音片段的声纹特征作为训练样本数据，采用最大后验概率算法对所述预设的相应字符对应的通用背景模型的均值超向量进行调整，得到所述验证语音信息中各个字符对应的特征向量。 In an implementation manner, the processor 1001 invokes the computer executable program code to perform the following operations, according to the voiceprint feature of the voice segment corresponding to the respective characters, and the common background corresponding to the preset corresponding character Model training, obtaining a feature vector corresponding to each character in the verification voice information: using a voiceprint feature of a voice segment corresponding to each character in the verification voice information as training sample data, and using a maximum posterior probability algorithm on the preset The mean supervector of the universal background model corresponding to the corresponding character is adjusted to obtain a feature vector corresponding to each character in the verified voice information.

在一种实现方式中，所述处理器1001调用所述计算机可执行程序代码执行以下操作以将所述验证语音信息中各个字符对应的语音片段的声纹特征作为训练样本数据，采用所述最大后验概率算法对所述预设的相应字符对应的通用背景模型的均值超向量进行调整，从而得到所述验证语音信息中各个字符对应的特征向量：将所述验证语音信息中各个字符对应的语音片段的声纹特征作为训练样本数据，采用所述最大后验概率算法对所述预设的相应字符对应的通用背景模型的均值超向量进行调整，并结合预设的超向量子空间矩阵，得到所述验证语音信息中各个字符对应的特征向量。In one implementation, the processor 1001 invokes the computer executable program code to perform the following operations to use the voiceprint feature of the voice segment corresponding to each character in the verification voice information as training sample data, using the maximum The posterior probability algorithm adjusts the mean supervector of the universal background model corresponding to the preset corresponding character, thereby obtaining a feature vector corresponding to each character in the verification voice information: corresponding to each character in the verification voice information The voiceprint feature of the voice segment is used as the training sample data, and the maximum a posteriori probability algorithm is used to adjust the mean supervector of the common background model corresponding to the preset corresponding character, and combined with the preset supervector subspace matrix. Obtaining a feature vector corresponding to each character in the verification voice information.

在一种实现方式中，所述处理器1001调用所述计算机可执行程序代码执行以下操作以将所述验证语音信息中各个字符对应的语音片段的声纹特征作为所述训练样本数据，采用所述最大后验概率算法对所述预设的相应字符对应的通用背景模型的均值超向量进行调整，并结合所述预设的超向量子空间矩阵，得到所述验证语音信息中各个字符对应的特征向量：将所述验证语音信息中各个字符对应的语音片段的声纹特征作为所述训练样本数据，采用下式对所述预设的相应字符对应的通用背景模型的均值超向量进行调整，使得调整后的相应字符对应的通用背景模型的后验概率最大：M＝m+Tω，其中M代表调整后的某个字符的通用背景模型的均值超向量，m代表调整前的相应字符的通用背景模型的均值超向量，T为所述预设的超向量子空间矩阵，ω为所述验证语音信息中相应字符对应的特征向量。In an implementation manner, the processor 1001 invokes the computer executable program code to perform the following operations to use the voiceprint feature of the voice segment corresponding to each character in the verification voice information as the training sample data, The maximum posterior probability algorithm adjusts the mean supervector of the universal background model corresponding to the preset corresponding character, and combines the preset supervector subspace matrix to obtain corresponding characters of the verified voice information. Feature vector: using the voiceprint feature of the voice segment corresponding to each character in the verification voice information as the training sample data, and adjusting the mean super vector of the common background model corresponding to the preset corresponding character by using the following formula: The posterior probability of the universal background model corresponding to the adjusted corresponding character is maximized: M=m+Tω, where M represents the mean supervector of the universal background model of the adjusted character, and m represents the generality of the corresponding character before the adjustment. Mean supervector of the background model, T is the preset supervector subspace matrix, and ω is the verification speech information Eigenvectors corresponding to the respective characters.

在一种实现方式中，所述处理器1001调用所述计算机可执行程序代码执行以下操作以计算所述验证语音信息中各个字符对应的特征向量与所述预设的注册语音信息中相应字符对应的特征向量的所述相似度分数：计算所述验证语音信息中各个字符对应的特征向量与所述预设的注册语音信息中相应字符对应的特征向量之间的余弦距离值，并将所述余弦距离值确定为所述相似度分数。In an implementation manner, the processor 1001 invokes the computer executable program code to perform the following operations to calculate that a feature vector corresponding to each character in the verification voice information corresponds to a corresponding character in the preset registration voice information. The similarity score of the feature vector: calculating a cosine distance value between a feature vector corresponding to each character in the verification voice information and a feature vector corresponding to a corresponding character in the preset registration voice information, and The cosine distance value is determined as the similarity score.

在一种实现方式中，所述处理器1001调用所述计算机可执行程序代码执行以下操作以对所述验证语音信息进行语音识别得到所述验证语音信息中包含的分别与所述第一字符串中的多个字符对应的语音片段：识别所述验证语音信息中的有效语音片段和无效语音片段；对所述有效语音片段进行语音识别得到分别与所述第一字符串中的多个字符对应的语音片段。In an implementation manner, the processor 1001 invokes the computer executable program code to perform the following operations to perform voice recognition on the verification voice information to obtain the verification voice information package. a speech segment respectively corresponding to the plurality of characters in the first character string: identifying a valid speech segment and an invalid speech segment in the verification speech information; performing speech recognition on the valid speech segment respectively A speech segment corresponding to a plurality of characters in the first character string.

在一种实现方式中，所述将所述验证用户确定为所述注册语音信息对应的所述注册用户之前，所述处理器1001还调用所述计算机可执行程序代码执行以下操作：确定所述验证语音信息中的多个字符的语音片段的排序与所述第一字符串中的相应字符的排序是否一致；以及在所述相似度达到预设的门限值，并且所述验证语音信息中的多个字符的语音片段的排序与所述第一字符串中的相应字符的排序一致的情况下，将所述验证用户确定为所述注册语音信息对应的所述注册用户。In an implementation manner, before the verifying the user is determined as the registered user corresponding to the registered voice information, the processor 1001 further invokes the computer executable program code to perform the following operations: determining the Verifying whether the ordering of the voice segments of the plurality of characters in the voice information is consistent with the ordering of the corresponding characters in the first character string; and when the similarity reaches a preset threshold value, and the verification voice information is In the case that the ordering of the speech segments of the plurality of characters is consistent with the ordering of the corresponding characters in the first character string, the verification user is determined as the registered user corresponding to the registration voice information.

在一种实现方式中，所述获取所述验证用户朗读所述第一字符串所产生的所述验证语音信息之前，所述处理器1001还调用所述计算机可执行程序代码执行以下操作：随机生成所述第一字符串，并显示所述第一字符串。In an implementation manner, before the obtaining the verification voice information generated by the verification user to read the first character string, the processor 1001 further invokes the computer executable program code to perform the following operations: random Generating the first character string and displaying the first character string.

本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程，是可以通过计算机程序来指令相关的硬件来完成，所述的程序可存储于一计算机可读取存储介质中，该程序在执行时，可包括如上述各方法的实施例的流程。其中，所述的存储介质可为磁碟、光盘、只读存储记忆体(Read-Only Memory，ROM)或随机存储记忆体(Random Access Memory，RAM)等。One of ordinary skill in the art can understand that all or part of the process of implementing the foregoing embodiments can be completed by a computer program to instruct related hardware, and the program can be stored in a computer readable storage medium. When executed, the flow of an embodiment of the methods as described above may be included. The storage medium may be a magnetic disk, an optical disk, a read-only memory (ROM), or a random access memory (RAM).

以上所揭露的仅为本发明较佳实施例而已，当然不能以此来限定本发明之权利范围，因此依本发明权利要求所作的等同变化，仍属本发明所涵盖的范围。 The above is only the preferred embodiment of the present invention, and the scope of the present invention is not limited thereto, and thus equivalent changes made in the claims of the present invention are still within the scope of the present invention.

Claims

A voiceprint recognition method, characterized in that the method comprises:

Obtaining verification voice information generated by the verification user reading the first character string;

Calculating a similarity of the voiceprint features of the voice segment corresponding to the corresponding character in the preset voice information in the voice information of the verification voice information;

When the similarity reaches a preset threshold, the user verification is verified as the registered user corresponding to the registered voice information.

The voiceprint recognition method according to claim 1, wherein the calculating and verifying the similarity of the voiceprint features of the voice segments corresponding to the respective characters in the preset registered voice information in the voice information is verified. Previously included:

Performing voice recognition on the verification voice information, and obtaining a voice segment respectively corresponding to each character in the first character string included in the verification voice information;

The voiceprint features of the voice segments corresponding to the respective characters are extracted.

The voiceprint recognition method according to claim 1, wherein the calculating and verifying the similarity of the voiceprint features of the voice segments corresponding to the respective characters in the preset registered voice information in the voice information is verified. include:

And performing, according to the voiceprint feature of the voice segment corresponding to each character, in combination with a common background model corresponding to the preset corresponding character, obtaining a feature vector corresponding to each character in the verification voice information;

Calculating a similarity score of the feature vector corresponding to each character in the verification voice information and the feature vector corresponding to the corresponding character in the preset registration voice information, as a voice segment corresponding to each character in the verification voice information and a preset registration The similarity of the voiceprint features of the voice segments corresponding to the corresponding characters in the voice information.

The voiceprint recognition method according to claim 3, wherein the method further comprises: before the obtaining the verification voice information generated by the verification user reading the first character string, the method further comprising:

Obtaining registration voice information generated by the registered user reading the second character string, the second string and The first string has at least one of the same characters;

The feature vector corresponding to each character in the registered voice information is obtained according to the voiceprint feature corresponding to each character in the registered voice information, combined with the common background model corresponding to the preset corresponding character.

The voiceprint recognition method according to claim 3, wherein the voiceprint feature according to the voice segment corresponding to each character is combined with the common background model corresponding to the preset corresponding character to obtain the Verifying the feature vectors corresponding to each character in the voice information includes:

Using the voiceprint feature of the voice segment corresponding to each character in the verification voice information as the training sample data, the maximum a posteriori probability algorithm is used to adjust the mean supervector of the common background model corresponding to the preset corresponding character, and obtain the The feature vector corresponding to each character in the voice information is verified.

The voiceprint recognition method according to claim 5, wherein the voiceprint feature of the voice segment corresponding to each character in the verification voice information is used as training sample data, and the maximum posterior probability algorithm is used Adjusting the mean supervector of the common background model corresponding to the preset corresponding character, so as to obtain the feature vector corresponding to each character in the verification voice information, including:

The voiceprint feature of the voice segment corresponding to each character in the verification voice information is used as the training sample data, and the maximum a posteriori probability algorithm is used to adjust the mean supervector of the common background model corresponding to the preset corresponding character. And combining the preset super vector subspace matrix to obtain a feature vector corresponding to each character in the verification voice information.

The voiceprint recognition method according to claim 6, wherein the voiceprint feature of the voice segment corresponding to each character in the verification voice information is used as the training sample data, and the maximum posterior probability algorithm is used. Adjusting the mean supervector of the common background model corresponding to the preset corresponding character, and combining the preset supervector subspace matrix, obtaining the feature vector corresponding to each character in the verification voice information includes:

Using the voiceprint feature of the voice segment corresponding to each character in the verification voice information as the training sample data, the mean supervector of the common background model corresponding to the preset corresponding character is adjusted by using the following formula, so that the adjusted The general background model corresponding to the corresponding character has the largest posterior probability:

M=m+Tω, where M represents the mean supervector of the general background model of the adjusted character. m represents the mean supervector of the universal background model of the corresponding character before the adjustment, T is the preset supervector subspace matrix, and ω is the feature vector corresponding to the corresponding character in the verification speech information.

The voiceprint recognition method according to claim 6, wherein the preset super-vector sub-space matrix is determined according to a correlation between weights of respective Gaussian modules in the universal background model.

The voiceprint recognition method according to claim 5, wherein the mathematical expression of the universal background model is:

P(x)=∑ _t=1...C a _i N(x|μ _i ,∑ _i )

Where P(x) represents the probability distribution of UBM, C represents the total of C Gaussian modules in UBM, a _i represents the weight of the i-th Gaussian module, μ _i represents the mean of the i-th Gaussian module, ∑ _i represents The variance of the i-th Gaussian module, N(x) represents a Gaussian distribution, and x represents the voiceprint characteristics of the input sample.

The voiceprint recognition method according to claim 3, wherein the calculating a similarity score of the feature vector corresponding to each character in the verification voice information and the corresponding character in the preset registration voice information includes: :

Calculating a cosine distance value between a feature vector corresponding to each character in the verification voice information and a feature vector corresponding to a corresponding character in the preset registration voice information, and determining the cosine distance value as the similarity score.

The voiceprint recognition method according to claim 10, wherein the cosine distance value between the feature vector corresponding to each character in the voice information and the feature vector corresponding to the corresponding character in the preset registered voice information is calculated by the following formula :

Wherein, the subscript i indicates a character common to the i-th verification voice information and the registration voice information, ω _i (tar) indicates a corresponding feature vector of the character in the verification voice information, and ω _i (test) indicates that the character is in the registered voice. The corresponding feature vector in the message.

The voiceprint recognition method according to claim 2, wherein the voice recognition of the verification voice information is performed, and the verification voice information is respectively included corresponding to a plurality of characters in the first character string. The voice clips include:

Identifying valid voice segments and invalid voice segments in the verification voice information;

Performing speech recognition on the valid speech segment to obtain a speech segment respectively corresponding to a plurality of characters in the first character string.

The voiceprint recognition method according to claim 1, wherein the method further comprises: before the authenticating user is determined as the registered user corresponding to the registered voice information, the method further comprises:

Determining whether the ordering of the voice segments of the plurality of characters in the verification voice information is consistent with the ordering of the corresponding characters in the first character string;

And if the similarity reaches a preset threshold value, and the order of the voice segments of the plurality of characters in the verification voice information is consistent with the ordering of the corresponding characters in the first character string, The verification user determines the registered user corresponding to the registration voice information.

The voiceprint recognition method according to any one of claims 1 to 11, wherein the method further comprises: before acquiring the verification voice information generated by the verification user reading the first character string include:

The first character string is randomly generated, and the first character string is displayed.

A voiceprint recognition device, characterized in that the device comprises:

a voice acquiring module, configured to obtain verification voice information generated by the user to read the first character string;

a voice segment identification module, configured to perform voice recognition on the verification voice information, and obtain a voice segment respectively included in the verification voice information corresponding to multiple characters in the first character string;

a voiceprint feature extraction module, configured to extract a voiceprint feature of a voice segment corresponding to each character in the verification voice information;

a feature model training module, configured to perform training according to a voice pattern feature of the voice segment corresponding to each character, and a common background model corresponding to the corresponding corresponding character, to obtain each of the verification voice information a feature vector corresponding to the character;

The similarity judging module is configured to calculate a similarity score of the feature vector corresponding to each character in the verification voice information and the corresponding character in the preset registration voice information, as a voice segment corresponding to each character in the verification voice information The similarity of the voiceprint features of the voice segment corresponding to the corresponding character in the preset registration voice information;

The user identification module is configured to verify the verification user as the registered user corresponding to the registered voice information when the similarity reaches a preset threshold.

A voiceprint recognition device according to claim 15, wherein:

The voice acquiring module is further configured to obtain registration voice information generated by the registered user reading the second character string, where the second character string and the first character string have at least one of the same characters;

The feature model training module is further configured to: according to the voiceprint feature corresponding to each character in the registered voice information, and the common background model corresponding to the corresponding corresponding character, to obtain the feature corresponding to each character in the registered voice information. vector.

The voiceprint recognition device according to claim 15, wherein said feature model training module is configured to:

The voiceprint feature of the voice segment corresponding to each character in the verification voice information is used as the training sample data, and the mean supervector of the common background model corresponding to the preset corresponding character is adjusted by using the following formula, so that the adjusted corresponding The general background model corresponding to the character has the largest posterior probability:

M=m+Tω, where M represents the mean supervector of the universal background model of the adjusted character, m represents the mean supervector of the universal background model of the corresponding character before adjustment, and T is the preset supervector a spatial matrix, ω is a feature vector corresponding to a corresponding character in the verification voice information, and the preset super-vector sub-space matrix is determined according to a correlation between each dimension vector in the mean super-vector of the Gaussian mixture model. of;

The similarity judgment module is used to:

Calculating a cosine distance value between a feature vector corresponding to each character in the verification voice information and a feature vector corresponding to a corresponding character in the preset registration voice information, and determining the cosine distance value as the similarity score .

The voiceprint recognition device according to claim 15, wherein the voice segment recognition module comprises:

a valid segment identification unit, configured to identify a valid speech segment and an invalid speech segment in the verification speech information;

The voice recognition unit is configured to perform voice recognition on the valid voice segment to obtain a voice segment corresponding to multiple characters in the first character string.

The voiceprint recognition device according to claim 15, further comprising:

a character order determining module, configured to determine whether a sorting of the voice segments of the plurality of characters in the verification voice information is consistent with a ranking of the corresponding characters in the first character string;

The user identification module is further configured to: when the similarity reaches a preset threshold, and verify the order of the voice segments of the plurality of characters in the voice information and the corresponding characters in the first string In the case that the sorting is consistent, the verification user is determined as the registered user corresponding to the registered voice information.

The voiceprint recognition device according to any one of claims 15 to 19, wherein the voiceprint recognition device further comprises:

a string display module, configured to randomly generate the first string, and display the first string.

A voiceprint recognition device includes:

a user interface for obtaining voice information;

a memory storing computer executable program code;

a processor for invoking the computer executable program code to perform the following operations:

Acquiring, by the user interface, verification voice information generated by the user to read the first character string;

Extracting a voiceprint feature of the voice segment corresponding to each character;

Corresponding to the preset corresponding characters according to the voiceprint features of the voice segments corresponding to the respective characters The universal background model is trained to obtain a feature vector corresponding to each character in the verification voice information;

Calculating a similarity score of the feature vector corresponding to each character in the verification voice information and the feature vector corresponding to the corresponding character in the preset registration voice information, as a voice segment corresponding to each character in the verification voice information and a preset registration The similarity of the voiceprint features of the voice segments corresponding to the corresponding characters in the voice information;

The voiceprint recognition apparatus according to claim 21, wherein said processor further calls said processor before said verifying said verification voice information generated by said user reading said first character string The computer executable code performs the following actions:

Obtaining registration voice information generated by the registered user reading the second character string, the second character string having at least one of the same characters as the first character string;

A voiceprint recognition apparatus according to claim 21, wherein said processor calls said computer executable program code to perform an operation of combining said sound according to a voiceprint feature of said voice segment corresponding to said respective character The general background model corresponding to the corresponding character is trained to obtain the feature vector corresponding to each character in the verification voice information:

M=m+Tω, where M represents the mean supervector of the universal background model of the adjusted character, m represents the mean supervector of the universal background model of the corresponding character before adjustment, and T is the preset supervector a spatial matrix, ω is a feature vector corresponding to a corresponding character in the verification voice information, and the preset super-vector sub-space matrix is determined according to a correlation between weights of respective Gaussian modules in the universal background model;

The processor invoking the computer executable program code to perform the following operations to calculate the test The similarity score of the feature vector corresponding to each character in the voice information and the corresponding character in the preset registered voice information:

A voiceprint recognition apparatus according to claim 21, wherein said processor calls said computer executable program code to perform the following operations to perform voice recognition on said verification voice information to obtain said voice information included in said verification voice information a speech segment corresponding to a plurality of characters in the first character string:

The voiceprint identifying apparatus according to claim 21, wherein said processor further calls said computer executable program before said verifying user determines said registered user corresponding to said registered voice information The code does the following:

A voiceprint recognition apparatus according to any one of claims 21 to 25, wherein said processor is obtained before said verification voice information generated by said verification user reading said first character string The computer executable program code is also invoked to perform the following operations:

The first character string is randomly generated, and the first character string is displayed through the user interface.

A storage medium, characterized in that a computer program is stored in the storage medium The computer program is for performing the voiceprint recognition method according to any one of claims 1-14.