RU2422920C2

RU2422920C2 - Pass phrase based speaker authentication method

Info

Publication number: RU2422920C2
Application number: RU2009106368/09A
Authority: RU
Inventors: Евгений Львович Столов (RU); Евгений Львович Столов
Priority date: 2009-02-24
Filing date: 2009-02-24
Publication date: 2011-06-27
Also published as: RU2009106368A

Abstract

FIELD: information technology.

SUBSTANCE: input speech signal of a speaker undergoes segment-by-segment comparison with stored standard parameters of standard phrases uttered by speakers known in advance, for which parametric descriptions of successive segments of the input speech signal are compared with parametric descriptions of successive segments from those selected for comparison with said standard with subsequent authentication of the speaker. The parametric descriptions used is a transition matrix, for which is constructed a sequence of special points selected by comparing the reading in the segment with the surrounding of the reading determined through generalised coefficients of linear prediction and a threshold T. Further, the sequences of special points are merged into blocks with length L. A transition matrix similar to the transition matrix in a Markovian chain is constructed based on the number of special points in the block and the obtained matrix is compared with the model of the standard matrix with given accuracy ε and a decision is made on correct authentication of the speaker.

EFFECT: high reliability of speaker recognition when using a pass phrase with a limited length.

1 dwg

Description

Изобретение относится к области техники анализа речи, в частности к системам ограничения несанкционированного доступа в помещения или информационным ресурсам. Техническим результатом является повышение достоверности распознавания диктора при использовании парольной фразы ограниченной длины. Технический результат достигается тем, что в звуковом сегменте находятся интервалы, содержащие особые точки, выделяемые обобщенной процедурой линейного предсказания, в качестве параметрического описания звукового сегмента используется статистическая матрица переходов в последовательности интервалов, содержащих особые точки, и стандартная метрика в пространстве матриц.The invention relates to the field of speech analysis technology, in particular to systems for restricting unauthorized access to premises or information resources. The technical result is to increase the reliability of speaker recognition when using a passphrase of limited length. The technical result is achieved by the fact that in the audio segment there are intervals containing singular points distinguished by the generalized linear prediction procedure, as a parametric description of the sound segment, a statistical transition matrix in the sequence of intervals containing singular points and a standard metric in the matrix space are used.

Заявляемый способ относится к области техники анализа речи, в частности к системам ограничения несанкционированного доступа в помещения или к информационным ресурсам.The inventive method relates to the field of speech analysis technology, in particular to systems for restricting unauthorized access to premises or to information resources.

Известны способы и устройства для распознавания дикторов по речевому сегменту, независимому от текста, на основе оценки статистических параметров в сегменте [1].Known methods and devices for recognizing speakers by the speech segment, independent of the text, based on the evaluation of statistical parameters in the segment [1].

Данный способ предполагает наличие звукового сегмента для настройки и анализа продолжительностью порядка минуты, что не может быть применимо для аутентификации по парольной фразе, продолжительность которой составляет около 2-3 секунд.This method assumes the presence of an audio segment for tuning and analysis lasting about a minute, which may not be applicable for authentication using a passphrase, the duration of which is about 2-3 seconds.

Наибольшую популярность получили методы оценки параметров в модели на основе смеси Гауссовских распределений, например, [2].The most popular methods for estimating parameters in a model based on a mixture of Gaussian distributions, for example, [2].

Этот метод позволяет распознавать диктора по произвольной фразе, но и в этом способе необходимы звуковые сегменты продолжительностью порядка половины минутыThis method allows you to recognize the speaker by an arbitrary phrase, but this method also requires sound segments lasting about half a minute

Известен метод идентификации диктора на основе коэффициентов линейного предсказания, теория которого представлена в [3]. Эти коэффициенты подсчитываются согласно формулеA known method of speaker identification based on linear prediction coefficients, the theory of which is presented in [3]. These coefficients are calculated according to the formula

Недостатком этого метода является слабая устойчивость коэффициентов предсказания, когда размер звукового сегмента имеет малу длину.The disadvantage of this method is the weak stability of the prediction coefficients when the size of the audio segment is small.

Известен способ идентификации диктора, совпадающее с заявленным решением по наибольшему количеству существенных признаков и достигаемому техническому результату, принятому за прототип, по особенностям произнесения парольной фразы на основе разбиения звукового сегмента на отдельные зоны и анализу различных параметров вычисленных по этим зонам [4]. Полученная информация обрабатывается статистическими методами. Решение принимается путем оценки вероятности появления вычисленного вектора параметров в принятой статистической модели с учетом длин доверительных интервалов.There is a method of identifying a speaker that matches the solution stated by the largest number of essential features and the technical result achieved as a prototype, by the features of pronouncing a passphrase based on dividing the audio segment into separate zones and analyzing various parameters calculated for these zones [4]. The received information is processed by statistical methods. The decision is made by assessing the likelihood of a calculated vector of parameters in the adopted statistical model taking into account the lengths of confidence intervals.

Недостатком известного метода является привязка способа разбиения на зоны к процедуре вычисления основного тона, который по короткой фраз определяется с высокой вариабельностью (изменчивостью). Таким образом недостатком всех известных методов является отсутствие учета особенносте произнесения последовательностей фонем в заданном контексте, который представлен парольной фразой. Известные методы основаны на применении гармонического анализа, предполагающего стационарность исследуемого участка звукового сегмента, что приводит к ошибкам при исследовании сегментов малой длины.The disadvantage of this method is the binding of the method of dividing into zones to the procedure for calculating the fundamental tone, which is determined by short phrases with high variability (variability). Thus, the disadvantage of all known methods is the lack of consideration for the peculiarity of pronouncing phoneme sequences in a given context, which is represented by a passphrase. Known methods are based on the use of harmonic analysis, which assumes the stationarity of the studied section of the sound segment, which leads to errors in the study of short segments.

Задачей данного изобретения является создание способа, учитывающего особенности произнесения последовательности отдельных фонем в контексте одной и той же парольной фразы, основанного на оценках параметров, не зависящих от коэффициента усиления микрофона и устойчивых к колебаниям длины звукового сегмента, отвечающего парольной фразе.The objective of the invention is to provide a method that takes into account the features of the pronunciation of the sequence of individual phonemes in the context of the same password phrase, based on parameter estimates that are independent of the microphone gain and resistant to fluctuations in the length of the audio segment corresponding to the password phrase.

Поставленная задача решается путем выделения особых точек в звуковом сегменте и методом обработки распределения особых точек. Под особой точкой звукового сегмента заявителем понимается отсчет в звуковом сегменте, который сильно отличается от своего окружения. В отличие от метода линейного предсказания отклонение в каждой точке от окружения оценивается по разности между этим отсчетом и линейной аппроксимацией отсчетов как предшествующих этому отсчету, так и следующих за ним.The problem is solved by highlighting singular points in the sound segment and by processing the distribution of singular points. By a particular point of the sound segment, the applicant refers to a count in the sound segment, which is very different from its surroundings. In contrast to the linear prediction method, the deviation at each point from the environment is estimated by the difference between this sample and the linear approximation of the samples both preceding and following it.

Заявленное техническое решение реализуется посредством применения ЭВМ с устройством звукового ввода и программы, обеспечивающей реализацию заявленного способа выделения особых точек и способа описания распределения этих особых точек.The claimed technical solution is implemented through the use of a computer with a sound input device and a program that provides for the implementation of the claimed method for highlighting singular points and a method for describing the distribution of these singular points.

Сущность заявленного технического решения заключается в том, что способ аутентификации диктора по парольной фразе включает посегментное сравнение входного речевого сигнала диктора с заранее сохраненными эталонами параметров эталонных фраз, произносимых заранее известными дикторами, для чего осуществляют сравнение параметрических описаний последовательных сегментов входного речевого сигнала с параметрическими описаниями последовательных сегментов из выбранных для сравнения с упомянутым эталоном с последующей аутентификацией диктора, при этом в качестве упомянутых параметрических описаний берут матрицу переходов, построенную в соответствии с правилом, заключающимся в том, что строят последовательность особых точек, выделенных сравнением отсчета в сегменте с окружением отсчета, определенным посредством обобщенных коэффициентов линейного предсказания и порога Т, далее агрегируют последовательности особых точек в блоки длины L, строят матрицу переходов, аналогичную матрице переходов в цепи Маркова, по числу особых точек в блоке и сравнивают полученную матрицу с образцом эталонной матрицы с заданной точностью ε и принимают решение о правильности аутентификации диктора.The essence of the claimed technical solution lies in the fact that the method of authenticating the speaker using a passphrase includes step-by-step comparison of the input speech signal of the speaker with pre-stored patterns of parameters of the reference phrases pronounced by previously known speakers, for which a comparison of the parametric descriptions of consecutive segments of the input speech signal with the parametric descriptions of sequential segments from selected for comparison with the mentioned standard with subsequent authentication for which, in this case, as the mentioned parametric descriptions, we take the transition matrix constructed in accordance with the rule that they construct a sequence of singular points distinguished by comparing the reference in the segment with the reference environment determined by the generalized linear prediction coefficients and threshold T, then aggregate sequences of singular points into blocks of length L, construct a transition matrix similar to the transition matrix in the Markov chain, using the number of singular points in the block and compare the resulting matrix with a sample of the reference matrix with a given accuracy ε and decide on the correctness of authentication of the speaker.

Блок схема алгоритма заявленного способа приведена на чертеже. Блок схема состоит из четырех последовательно включенных блоков с номерами 1, 2. 3, 4, реализующих заявленный способ.The block diagram of the algorithm of the claimed method is shown in the drawing. The block diagram consists of four series-connected blocks with numbers 1, 2. 3, 4, which implement the claimed method.

На вход блока 1 поступает звуковой сегмент. Этот блок подсчитывает обобщенные коэффициенты линейного предсказания согласно формулеAt the input of block 1, an audio segment arrives. This block calculates the generalized linear prediction coefficients according to the formula

по формулам, приведенным ниже, и среднеквадратическое отклонение σ по стандартной формуле. В формуле (1) отсчет x_n аппроксимируется линейной комбинацией р отсчетов до отсчета x_n и р отсчетов после отсчета x_n. Для отыскания коэффициенты a _k, b_k введем следующие обозначения:according to the formulas below and the standard deviation σ according to the standard formula. In formula (1), the sample x _{n is} approximated by a linear combination of p samples before the sample x _n and p samples after the sample x _n . To find the coefficients a _k , b _k we introduce the following notation:

,

В этих обозначениях отыскание коэффициентов в (1) сводится к решению системы уравненийIn these notations, finding the coefficients in (1) reduces to solving the system of equations

Обоснование указанных формул приведено в [5].The justification of these formulas is given in [5].

Блок 2 определяет, является ли центр интервала длины 2р+1 особой точкой. На вход блока поступают звуковой сегмент, среднеквадратическое отклонение σ, обобщенные коэффициенты линейного предсказания a _k, b_k, k=1,…,p и порог T (выбирается из существующей экспериментальной базы КГУ) Для центра каждого интервала проверяется выполнение неравенстваBlock 2 determines whether the center of an interval of length 2p + 1 is a singular point. The sound segment, the standard deviation σ, the generalized linear prediction coefficients a _k , b _k , k = 1, ..., p and the threshold T (selected from the existing experimental base of the Kazan State University) are received at the input of the block. For the center of each interval, the inequality

При выполнении неравенства (2) центр интервала объявляется особой точкой. В силу однородности формулы выполнение неравенства (2) не зависит от коэффициента усиления микрофона. На выходе блока получается последовательность z_n, состоящая из 1 и 0 в зависимости от того, является ли x_n особой точкой соответствующего интервала длины 2р+1 или не является особой точкой.When inequality (2) holds, the center of the interval is declared a singular point. Due to the homogeneity of the formula, the fulfillment of inequality (2) does not depend on the microphone gain. At the output of the block, a sequence z _n consisting of 1 and 0 is obtained depending on whether x _{n is a} singular point of the corresponding interval of length 2p + 1 or is not a singular point.

На вход блока 3 поступает последовательность {z_n}, сгенерированная блоком 2, и параметр L (выбирается из существующей экспериментальной базы КГУ)/The input of block 3 receives the sequence {z _n } generated by block 2, and the parameter L (selected from the existing experimental base of KSU) /

(сигналов?). Блок 3 производит агрегирование значений {z_n} путем выбора натурального числа L и перехода к последовательностям

Согласно определению элементы последовательности s_N могут принимать значения из интервала [0, L]. Эта последовательность поступает на вход блока 4.(signals?). Block 3 aggregates the values of {z _n } by choosing a positive integer L and moving to sequences

By definition, the elements of the sequence s _N can take values from the interval [0, L]. This sequence is fed to the input of block 4.

Блок 4 осуществляет статистическую обработку последовательности {s_N} посредством параметра ε (выбирается из существующей экспериментальной базы КГУ) и сравнение с эталоном, (′эталон диктора) с этой целью строится матрица Q размером (L+1)×(L+1), аналогичная матрице переходов Марковской (цепь Маркова) цепи. Обозначим через q_i, i=0,1,…,L количество элементов в последовательности {s_N}, равных i. Элемент Q[i/j] матрицы Q, стоящий в строке с номером i и столбце с номером j, вычисляется по формулеBlock 4 performs statistical processing of the sequence {s _N } using the parameter ε (selected from the existing experimental base of the Kazan State University) and compares it with the standard, (′ announcer’s standard) for this purpose, a matrix Q of size (L + 1) × (L + 1) is built, similar to the transition matrix of the Markov chain (Markov chain) chain. Denote by q _i , i = 0,1, ..., L the number of elements in the sequence {s _N } equal to i. The element Q [i / j] of the matrix Q, which is in the row with number i and the column with number j, is calculated by the formula

Q[i/j]=t_ij/q_i Q [i / j] = t _ij / q _i

Здесь t_ij - число пар в последовательности {s_N}, где s_N=i,s_N+l=j. Согласно построению матрица Q будет стохастической. Ее элементы являются оценками вероятностей перехода от одной группы особых точек к другой, что обеспечивает описание особенностей распределения особых точек, характерных для данного диктора при произнесении парольной фразы Далее осуществляется сравнение вычисленной матрицы Q с эталонной матрицей Q. Сравнение производится на основе вычисления обычного расстояния между матрицами, подсчитанного с помощью формулы

, где

. Если d<ε, принимается решение о правильной аутентификации, в противном случае принимается решение об отказе в доступе к ресурсу (выбирается из существующей экспериментальной базы КГУ).Here t _ij is the number of pairs in the sequence {s _N }, where s _N = i, s _{N + l} = j. By construction, the matrix Q will be stochastic. Its elements are estimates of the probabilities of transition from one group of singular points to another, which provides a description of the distribution features of the singular points characteristic of a given speaker when pronouncing a passphrase. Next, the calculated matrix Q is compared with the reference matrix Q. The comparison is based on the calculation of the usual distance between matrices calculated using the formula

where

. If d <ε, a decision is made on the correct authentication, otherwise a decision is made on the denial of access to the resource (selected from the existing experimental base of KSU).

Заявленное техническое решение соответствует критерию «новизна», предъявляемому к изобретениям, так как в результате исследований заявителем не выявлены технические решения, обладающие совокупностью заявленных признаков, приводящих к реализации поставленных целей - созданию способа аутентификации диктора по парольной фразе, учитывающего особенности произнесения последовательности отдельных фонем в контексте одной и той же парольной фразы, основанного на оценках параметров, не зависящих от коэффициента усиления микрофона и устойчивых к колебаниям длины звукового сегмента, отвечающего парольной фразе.The claimed technical solution meets the criterion of "novelty" for inventions, since as a result of research the applicant has not identified technical solutions that have a combination of the claimed features that lead to the realization of the goals - to create a way to authenticate the speaker using a passphrase that takes into account the particularities of pronouncing the sequence of individual phonemes in context of the same passphrase based on parameter estimates independent of microphone gain and stable O to fluctuations in the length of the audio segment corresponding to the password entry.

Заявленное техническое решение соответствует критерию «изобретательский уровень», предъявляемому к изобретениям, так как для специалиста в заявленной области техники не являются очевидным полученные технические результаты, выражающиеся в том, что заявителем решена актуальная, не разрешенная до даты подачи настоящей заявки проблема, заключающаяся в необходимости создании надежного способа аутентификации диктора по парольной фразе существовавшая в течение длительного периода времени, которая разрешена авторами посредством создания принципиально нового способа, заключающегося в выделения особых точек в звуковом сегменте и методом обработки распределения особых точек, при этом под особой точкой звукового сегмента заявителем понимается отсчет в звуковом сегменте, который сильно отличается от своего окружения. В отличие от метода линейного предсказания, отклонение в каждой точке от окружения оценивается по разности между этим отсчетом и линейной аппроксимацией отсчетов как предшествующих этому отсчету, так и следующих за ним. Таким образом заявленное решение не вытекает явным образом из известного уровня техники, что также является дополнительным доказательством соответствия заявленного технического решения критерию «изобретательский уровень».The claimed technical solution meets the criterion of "inventive step" for inventions, since it is not obvious to a specialist in the claimed field of technology that the technical results obtained are expressed in that the applicant has solved an urgent problem that is not resolved before the filing date of this application, consisting in the need creating a reliable way to authenticate a speaker using a passphrase that existed for a long period of time, which is allowed by the authors by creating basically the new method with the selection of special points in the audio segment, and by processing the distribution of singular points, the singular point by the applicant is understood a sound segment count in the audio segment, which differs greatly from its environment. Unlike the linear prediction method, the deviation at each point from the environment is estimated by the difference between this sample and the linear approximation of the samples both preceding and following it. Thus, the claimed solution does not follow explicitly from the prior art, which is also additional evidence of the conformity of the claimed technical solution to the criterion of "inventive step".

Заявленное техническое решение реализовано в лабораторных условиях Казанского государственного университета и может быть реализовано на любом специализированном предприятии с использованием стандартного оборудования, что является доказательством соответствия заявленного технического решения критерию «промышленная применимость», предъявляемого к изобретениям.The claimed technical solution is implemented in the laboratory of Kazan State University and can be implemented at any specialized enterprise using standard equipment, which is evidence of the conformity of the claimed technical solution to the criterion of "industrial applicability" of the invention.

Источники информации, принятые во вниманиеSources of information taken into account

1. Патент РФ 2107950.1. RF patent 2107950.

2.Патент США 6411930.2 U.S. Patent 6,411,930.

3. A.Oppenheim, R.Schafer. Discrete-time signal processing. Prentice Hall, 19893. A. Oppenheim, R. Schafer. Discrete-time signal processing. Prentice Hall, 1989

4. Патент РФ 2230375.4. RF patent 2230375.

5. Е.Л.Столов. Алгоритм обработки голосового пароля // Исследования по информатике, № 11, "Отечество", Казань, 2007. с.103-1085. E.L. Stolov. Voice Password Processing Algorithm // Computer Science Research, No. 11, "Fatherland", Kazan, 2007. p.103-108

Claims

A method for authenticating an announcer using a passphrase, which includes a step-by-step comparison of the input speech signal of the speaker with pre-stored patterns of parameters of the reference phrases pronounced by previously known speakers, for which a comparison of the parametric descriptions of successive segments of the input speech signal with the parametric descriptions of successive segments selected for comparison with the mentioned pattern followed by speaker authentication, characterized in that as mentioned parametric of descriptions take a transition matrix constructed in accordance with the rule that they construct a sequence of singular points distinguished by comparing the reference in the segment with the reference environment defined by the generalized linear prediction coefficients and threshold T, then aggregate the sequence of singular points into blocks of length L , construct the transition matrix, similar to the transition matrix in the Markov chain, by the number of singular points in the block and compare the resulting matrix with a sample of the reference matrix with a given accuracy ε and decide on the correct speaker authentication.