[go: up one dir, main page]

WO1999059136A1 - Systeme et procede d'estimation de voie destines a etre utilises dans des systemes automatiques de verification de haut-parleurs - Google Patents

Systeme et procede d'estimation de voie destines a etre utilises dans des systemes automatiques de verification de haut-parleurs Download PDF

Info

Publication number
WO1999059136A1
WO1999059136A1 PCT/US1999/010038 US9910038W WO9959136A1 WO 1999059136 A1 WO1999059136 A1 WO 1999059136A1 US 9910038 W US9910038 W US 9910038W WO 9959136 A1 WO9959136 A1 WO 9959136A1
Authority
WO
WIPO (PCT)
Prior art keywords
channel
speech
estimate
enrollment
component
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/US1999/010038
Other languages
English (en)
Inventor
Richard J. Mammone
Rajesh Balchandran
Andy A. Garcia
Vidhya Ramanujam
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
T-NETIX Inc
T Netix Inc
Original Assignee
T-NETIX Inc
T Netix Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by T-NETIX Inc, T Netix Inc filed Critical T-NETIX Inc
Priority to AU38897/99A priority Critical patent/AU3889799A/en
Publication of WO1999059136A1 publication Critical patent/WO1999059136A1/fr
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/20Pattern transformations or operations aimed at increasing system robustness, e.g. against channel noise or different working conditions
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/04Training, enrolment or model building
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation

Definitions

  • the invention is directed to improved systems and methods of channel estimation
  • ASV automatic speaker verification
  • the invention relates to the fields of digital speech processing and speaker recognition.
  • ASV automatic speaker verification
  • identity is undergoing verification with a known voice.
  • One type of voice recognition system is a text-dependent automatic speaker
  • the text-dependent ASV system requires that the user speak a specific password or phrase (the "password"). This password is determined by the system or by the user during enrollment. However, in most text-dependent ASV systems, the password is constrained to be within a fixed vocabulary, such as a limited number of numerical digits.
  • HMM Hidden Markov Models
  • ANN Artificial Neural Network
  • Modeling at the subword level expands the versatility of the system. Moreover, it is also
  • channel estimation both referred to as "channel estimation” unless separately identified
  • CMS Cepstral Mean Subtraction
  • CMS also undesirably extracts substantial amoums of the desired speech
  • pole filtering approximation is one method that attempts to overcome this. While pole filtering
  • the voice print system of the present invention builds and improves upon existing ASV
  • the voice print system of the present invention is a subword-based, text-dependent automatic speaker verification system that embodies the capability of user-selectable
  • Channel estimation and normalization removes the nonuniform effects of different
  • Channel normalization is able to remove the characteristics of the test channel
  • Curve-Fitting and Clean Speech separately, together, and in
  • the improved voice print system using the inventive channel estimation methods can be any desired voice print system using the inventive channel estimation methods.
  • All ASV systems include at least two components, an enrollment component and a
  • the enrollment component is used to store information concerning a
  • the system of the present invention includes inventive enrollment and
  • the bootstrap component is used to generate data which assists the enrollment component to model the user's voice.
  • Each of these components comprise the channel estimation and normalization techniques of the
  • An enrollment component is used to characterize a known user's voice and store the characteristics in a database, so that this information is available for future comparisons.
  • system of the present invention utilizes an improved enrollment process. During enrollment,
  • the characteristics of the enrollment channel are estimated and stored in a database.
  • database may be indexed by identification information, such as by the user's name, credit card
  • the Clean Speech method separately estimates the
  • the enrollment channel can be more accurately recalled from the
  • Feature extraction is then performed to extract features of the user's voice, such as
  • a reference template may be generate from the extracted features.
  • Segmentation divides the voice sample into a number of subwords.
  • the present invention uses subword modeling and may use any of the known techniques, but preferably uses a discriminant training based hierarchical classifier called a Neural Tree
  • NTN Network
  • the NTN is a hierarchical classifier that combines the properties of decision
  • the system also utilizes the principles of multiple classifier fusion and data resampling.
  • the additional classifier used herein is the Gaussian Mixture Model (GMM) classifier. If only GMM is the Gaussian Mixture Model (GMM) classifier.
  • a fusion function which is set and then stored in the database, is used to weigh the
  • the threshold value is stored in the
  • enrollment produces a voice print database
  • an index such as the user's name or credit card number
  • test component is the component which performs the verification. During testing or verification, the system first accepts "test speech" and index information from a user
  • the next step is to perform channel normalization or channel adaptation.
  • Channel normalization is performed if the enrollment channel was also normalized. The characteristics of the test channel are normalized to remove the effects of the
  • test channel from the test voice signal.
  • the channel normalization may be performed with the
  • channel adaptation is performed by removing from the test sample the
  • the test sample is filtered through the recalled enrollment channel. This type of
  • the present invention also improves on the channel adaption in the testing component.
  • the performance of ASV systems can be significantly degraded by background noise and sounds.
  • the invention uses a key word/ key
  • the multiple classifiers of the enrollment component are used to "score" the subword
  • Bootstrapping is used to generate a pool of speech data representative of the speech of
  • nonspeakers or "antispeakers.” This data is used during enrollment to train the discriminant training-based classifiers. Bootstrapping involves obtaining voice samples from antispeakers,
  • Figure IA is a diagram of a enrollment component of the present invention.
  • Figure IB shows pseudo-code for creating a filter to perform the channel estimation
  • Figure IC shows pseudo-code for inverting the filter of Figure IB.
  • Figure ID shows a flow diagram for performing the Curve-Fit channel estimation.
  • Figure IE shows a chart of an actual channel and a channel obtained from a cepstral mean.
  • Figure IF shows a chart of an actual channel and a channel obtained from Curve-
  • Figure 1G shows a chart of an inverse channel and an inverse channel obtained from
  • Figure 1H shows a flow diagram for performing Clean Speech channel normalization.
  • Figure 2 is a diagram of a testing component of the present invention.
  • Figures 3 A and 3B are flow diagrams of a channel adaptation module, shown in
  • Figure 2 of the present invention.
  • Figure 4 is a diagram of a bootstrapping component, used to generate antispeaker data
  • the preferred system used with the present invention includes an enrollment
  • the enrollment component includes a testing component, and a bootstrap component.
  • the enrollment component includes a testing component, and a bootstrap component.
  • antispeaker data uses antispeaker data to generate and store information concerning a user's voice.
  • the information concerning the user's voice is compared to the voice undergoing verification
  • the bootstrap component is used to provide initial antispeaker data for use by the enrollment component, such that the enrollment component
  • the user may properly perform its function of generating data concerning the user's voice.
  • the enrollment component is used to store information (using supervised learning)
  • the enrollment component also stores information concerning the channel on which the user provides the speech, the "enrollment
  • Figure IA shows the enrollment component 10.
  • the first step 20 is to obtain enrollment speech (the password) and to obtain 26 an index , such as the user's name or credit card number.
  • the enrollment speech may be obtained via a receiver, telephone or
  • a speech encoding method such as ITU G 11 standard ⁇ and A
  • a sampling rate of 8000 Hz is used.
  • the speech may be obtained in digital format, such as from an ISDN
  • a telephony board is used to handle Telco signaling protocol.
  • CPU Intel Pentium platform general purpose computer processing unit
  • an additional embodiment could be the Dialogic Antares card.
  • Preprocessing 30 may include
  • Silence removal using energy and zero-crossing statistics is primarily based on finding a short interval which is guaranteed to be background silence (generally found a few milliseconds at the beginning of the utterance, before the speaker actually starts recording). Thresholds are set using the silence region statistics, in order to discriminate speech and silence frames.
  • Silence removal based on an energy histogram.
  • a histogram of frame energies is generated.
  • a threshold energy value is determined based on the assumption that the biggest peak in the histogram at the lower energy region shall correspond to the background silence frame energies. This threshold energy value is used to perform speech versus silence discrimination.
  • DC Bias removal to remove DC bias introduced by analog-to-digital hardware or other components.
  • the mean value of the signal is computed over the entire voice sample and then is subtracted from the voice samples.
  • the preprocessing is preferably
  • channel estimation 40 is performed. This procedure
  • the voice print database 115 stores characteristics of the enrollment channel in the voice print database 115.
  • print database 115 may be RAM, ROM, EPROM, EEPROM, hard disk, CD ROM, writeable
  • CD ROM compact disc ROM
  • minidisk minidisk
  • file server or other storage device.
  • processing unit defined above as an Intel Pentium platform general purpose computer processing unit (CPU) of at least 100 MHZ having about 10MB associated RAM memory and
  • Dialogic Antares card The present invention, however, is not limited to these preferred
  • a speech signal with frequency spectrum S(co) is distorted by a transmission channel
  • Cepstrum is defined as the inverse Fourier transform of the logarithm of short-time spectral
  • Time invariant convolution distortion H(c ) can be eliminated by Cepstral Mean
  • E[.] represents the expected value.
  • the channel cepstrum is a constant additive component in the above equation.
  • E[C Y (n)] can be assumed to represent the channel cepstrum only, that is:
  • CMS may be conducted on the cepstral features obtained for the voice signal to
  • cepstral mean may include information other than the estimate of the time-invariant
  • convolutional distortion such as coarse spectral distribution of the speech itself.
  • cepstrum is the weighted combination of LP poles or
  • the narrow band-width LP poles were selectively deflated by broadening their bandwidth and
  • PFCC pole filtered cepstral coefficients
  • LPCC LP-derived cepstral coefficients
  • the PFCC mean ma be used to create an
  • the filter may be inverted 54, and speech passed through the inverted filter 56.
  • the preprocessed speech during enrollment may be inverse-
  • test speech on the testing channel
  • Pole filtering attempts to account for this, but Pole filtering still leaves a lot of speech
  • Curve-Fitting method overcomes the limitations of Cepstral Mean Subtraction and the Pole
  • the Curve-Fitting method extracts channel related information from the cepstral mean, but not any speech information. Using this channel information, the Curve-Fitting
  • Curve-Fitting method can use this channel information to create an
  • Figure IE illustrates a comparison between an actual channel and a channel derived from the cepstral mean subtraction method. As seen in Figure IE, the channel obtained from
  • the cepstral mean contains a substantial amount of unwanted speech information, especially in
  • the pass band of the channel is
  • this detected lowest frequency spectral peak are scanned 64 to find the point at which the
  • a passband is then modeled 72 as a straight line at zero dB between the
  • Fitting method channel estimate is illustrated in Figure IF. It is noted that other
  • the Curve-Fitting method enrollment channel estimate can be stored in the voice print database 115, as described above, for recall and use as a filter in the testing component, as
  • channel estimate may be converted to an inverse filter to inverse filter the channel corrupted
  • the Curve-Fitting channel estimation method and module can be further improved by
  • C PF (n) is the channel estimate obtained using Pole filtering and C CF (n) is the channel
  • channel estimate can be improved by combining or fusing it with channel estimation
  • module extracts channel related information by separately estimating the speech information in
  • the Clean Speech method improves the channel estimation 40. Likewise, the Clean Speech method can use this channel information to
  • the "clean" recording can be made during the same session in
  • the recording of the clean speech is done before or after the corrupted
  • a high quality microphone will have minimal channel distortion.
  • a wide bandwidth microphone has a flat frequency response between 20 Hz and 20 kHz.
  • Sennheiser® microphone is a Sennheiser® microphone from
  • Sennheiser, Inc. which would be connected to a preferred Pentium® based computer with a
  • the Clean Speech method assumes that the cepstral mean of the "clean" recording will be representative of the speech information in the recording.
  • the cepstral mean for the "clean" recording can be represented as the
  • cepstral coefficients of a channel corrupted speech can be represented as shown above
  • E[C H (n)] and that the same person is speaking the same text between the "clean” and channel corrupted transmission, i.e., E[C s (n)] E[C s . CLEAN (n)].
  • the channel normalization process shown above is done in the cepstrum domain.
  • the same processing can be done in the time waveform domain.
  • the channel estimate E[C H (n)] calculated under the Clean Speech method can be
  • feature extraction 50 is performed on ;he processed speech.
  • Feature extraction may occur after (as shown) or simultaneously with ihe step of channel estimation 40 (in parallel computing embodiments).
  • the detail of the feature extraction is
  • This template is stored 60 in the voice print database 115. Following storage of the template 60, the speech is segmented
  • the preferred technique for subword generation 70 is automatic blind speech
  • each sub-word is then modeled P0, 90, preferably with
  • first neural tree network (NTN) 80 Preferably a first neural tree network (NTN) 80 and a second neural tree network (NTN) 80.
  • NTN neural tree network
  • GMM Gaussian mixture model
  • a leave-one-out method data resampling scheme 100 is u ed. Data resampling
  • 100 is performed by creating multiple subsets of the training data, each of which is created by
  • Figure 1 A shows N models for the NTN classifier 80 and N models for the GMM
  • classifier 90 For model #1 of the NTN classifier, a enrollment sample such as the 1st sample, is left out of the classifier.
  • the antispeaker database 110 label the subword data available in the antispeaker database 110.
  • RAM random access memory
  • ROM read-only memory
  • EPROM erasable programmable read-only memory
  • EEPROM electrically erasable programmable read-only memory
  • hard disk e.g., hard disk
  • CD ROM e.g., CD ROM
  • file server e.g., a file server
  • the subword data from the speaker being trained is labeled as enrollment speaker data. Because there is a no linguistic labelling information in the antispeaker database 110, the entire
  • the mean vector and covariance matrix of the subword segments obtained from subword generation are used to find the "close" subwords.
  • module 120 searches the antispeaker database 110 to find the "close” subwords of antispeaker
  • the anti-speaker data in the antispeaker database 110 is either manuahy created, or created
  • classifier models 80, 90 are trained by comparing antispeaker data with N-l samples of
  • Both modules 80, 90 can determine a score for each spectral feature
  • a classifier fusion module 130 can be combined, or "fused" by a classifier fusion module 130 to obtain a composite score for
  • mixture model 90 are fused 130 using a linear opinion pool, as described below. However,
  • a scoring algorithm 145, 150 is used for each of the NTN and GMM models.
  • Transitional (or durational) probabilities between the subwords can also be used while
  • the preferred embodiment is (b) subword-average scoring.
  • the result of scoring provides a GMM score and an NTN score
  • a classifier fusion module 130 us.ng the linear opinion
  • pool method combines the NTN score and the GMM score.
  • Use of the linear opinion pool is referred to as a data fusion function, because the data from each classifier is "fused," or
  • S(a) is the probability of the combined system, a ; are weights, and s ; (a)
  • n is the number of classifiers
  • variable a is set as a constant (although it may be dynamically adapted as discussed below), and functions to provide more influence on one classifier method as
  • first classifier s x would be more important, and would be made greaier than 0.5, or its
  • previous value is only incremented or decremented by _. small amount, €.
  • a threshold value 140 is output
  • the threshold value output 140 is compared to a
  • Figure 2 shows a general outline of the testing component 150 which has many
  • the testing component 150 is used to determine whether test speech received from a user
  • test speech and index information 160 is supplied to the test component.
  • the index information is used to recall subword/segmentation information and the threshold value
  • the index information may b any nonvoice data which identifies the user, such as the user's name, credit card number, etc..
  • test speech After obtaining the test speech and index information, the test speech is preprocessed
  • Preprocessing 170 may be performed as previously described in _he enrollment
  • test speech as was performed during enrollment.
  • Channel adaptation 180 adapts the system to the particular enrollment
  • Channel adaptation 180 includes processing under both the
  • the enrollment channel is estimated 40 during the enrollment component 10, also shown in Figures 3 A and 3B at 300
  • the enrollment channel estimate is also stored 310 in the voice print database 115 during the enrollment component.
  • the enrollment channel may be estimated and stored using the preferred embodiments of the present invention previously discussed with
  • test channel is estimated 320 during the testing
  • the test channel may be estimated by generating a filter using the procedures
  • test speech is inverse filtered through the test channel 330. To achiev. this, the test speech is
  • the enrollment channel is added to the test speech by filtering the test speech through the
  • the saved enrollment filter is recalled 340 and the test
  • the procedure of Figure 3 A stores the enrollment data with the enrollment channel
  • the enrollment speech is filtered through an inverse of the enrollment channel filter 360.
  • the enrollment speech is inverse filtered
  • the enrollment speech may normalized using the
  • test channel is estimated 370, and an inverse filter constructed using the
  • test speech is then filtered through the inverse filter 380.
  • the system adapts to account for the channel distortion on the enrollment channel and on the test channel. It has been found that
  • the channel print carries the characteristics of the particular cellular handset of which the speaker is an authorized user, and therefore creates an
  • an authorized user's request for service may be denied due to the phone print mismatch.
  • Channel adaptation 180 provides a solution to this problem. It first removes the phone
  • channel adaptation can add the phone and channel print of the training environment to the speech so that it looks as if the verification speech is recorded
  • Channel adaptation 180 in this manner can still be advantageous in cellular fraud
  • the channels can be estimated using techniques such as pole-filtered cepstrum,
  • cepstrum mean as well as FFT based periodogram of the speech signa. Pole-filtered
  • cepstrum as shown in Figure IB, is the preferred method.
  • feature extraction 190 is performed afte: preprocessing.
  • extraction 190 may occur immediately after channel adaption 180, or may occur
  • channel adaption 180 in a multiple processor embodiment.
  • subword generation 210 in the testing component is performed based on the subwords/segment model computed in the enrollment phase 10.
  • the GMM modeling 90 is used in the test component
  • subword generation 210 to "force align" the test phrase into segments corresponding to the previously formed subwords.
  • subword GMMs as reference models, Viterbi or
  • DP Dynamic programming
  • the normalized subword duration (stored during enrollment) is used as a constraint for force alignment since it provides stability to the
  • scoring 240, 250 using the techniques
  • NTN 220 is performed on the subwords. Scoring using the NTN and GMM classifiers 220,
  • An NTN scoring algorithm 240 and a GMM scoring algorithm 250 are used, as
  • the classifier fusion mod le 260 outputs a "final score" 270.
  • the "final score” 270 is then compared 280 to the threshold value 140. If the
  • the user is verified. If the "final score" 270 is less than the threshold value 140 then the user is not verified or permitted to complete the transaction requiring verification.
  • the present invention also employs a number of additional adaptations, in addition to channel adaptation 180.
  • the multiple classifier system uses a c ssifier fusion module
  • the enrollment may not be optimal for the testing in that every single classifier may have its
  • the fusion function changes accordingly in order to achieve the optimal results for fusion. Also,
  • one classifier may perform better than the other.
  • Fusion adaptation uses predetermined
  • a fusion adaptation module 290 is connected to the classifier
  • the fusion adaptation module 290 changes the constant, a, in the linear
  • Si is the score of the first classifier and s 2 is the score of the second classifier.
  • the fusion adaptation module 290 dynamically changes to weigh either the NTN (s,)
  • Threshold adaptation adapts the threshold value in response to prior final scores.
  • Threshold adaptation module 295 is shown in Figure 2. The detail of the threshold adaptation
  • module 295 is herein incorporate by reference from VOICE PRINT SYSTEM AND
  • Model adaptation adapts the classifier models to subsequent successful verifications.
  • Fusion adaptation 290 model adaptation, and threshold adaptation 600 all may effect
  • Model adaptation is more dramatic than threshold adaptation or fusion adaptation, which both
  • the voiceprint database 115 may or may not be coresident with the antispeaker
  • Voice print data stored in the voice print database may include: enrollment
  • threshold value normalized segment durations, and/or other intermediate scores or authorization results used for adaptation.
  • the antispeaker database 110 must be initially be filled with
  • the initial antispeaker data may be generated via anificial simulation
  • FIG. 4 shows a bootstrapping component 700.
  • the bootstrar ⁇ ing component 700 is a bootstrapping component 700.
  • the antispeaker speech may be phrases from any number
  • the speech then undergoes feature extraction 770.
  • the feature extraction may occur as previously
  • the antispeaker speech undergoes sub-word generation 750, using the techniques previously described with respeci to Figure 1 A.
  • the preferable method of sub-word generation is automatic blind speech segmentation, discussed
  • the bootstrapping component initializes the database with antispeaker data which
  • the present invention provides for an accurate and reliable automatic speaker
  • Adaptation schemes adapt the ASV to changes in success/failures and to changes in the user by using channel adaptation 180, model adaptation 540, fusion adapta t ion 290, and threshold adaptation 600.

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Telephonic Communication Services (AREA)

Abstract

Le système d'empreinte vocale de l'invention se rapporte à un système automatique de vérification de haut-parleurs (ASV) basé sur des sous-mots et tributaire de textes non soumis à une quelconque contrainte liée au choix d'un vocabulaire ou d'un langage. Un composant du système ASV préféré est un composant d'estimation et de normalisation de voie capable de supprimer les caractéristiques du composant de la voie de mesure (150), et/ou un composant de voie d'enrôlement (90) pouvant augmenter la précision. Les procédés et systèmes préférés de l'invention, appelés 'ajustement des courbes' (62, 64, 66) et 'dialogue pur' (82, 86, 88, 90, 92), qu'ils soient mis en oeuvre seuls, ensemble ou en association avec un 'filtrage polaire' (42, 44, 46), améliorent sensiblement les procédés d'estimation et de normalisation de voie existants. A la différence de la 'soustraction de moyenne spectrale', les procédés et systèmes d''ajustement des courbes' (62, 64, 66) comme ceux de 'dialogue pur' (82, 86, 88, 90, 92) n'extraient de la moyenne spectrale que les informations de voie pertinentes, à l'exclusion de toute information vocale.
PCT/US1999/010038 1998-05-08 1999-05-07 Systeme et procede d'estimation de voie destines a etre utilises dans des systemes automatiques de verification de haut-parleurs Ceased WO1999059136A1 (fr)

Priority Applications (1)

Application Number Priority Date Filing Date Title
AU38897/99A AU3889799A (en) 1998-05-08 1999-05-07 Channel estimation system and method for use in automatic speaker verification systems

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US8482798P 1998-05-08 1998-05-08
US60/084,827 1998-05-08

Publications (1)

Publication Number Publication Date
WO1999059136A1 true WO1999059136A1 (fr) 1999-11-18

Family

ID=22187467

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US1999/010038 Ceased WO1999059136A1 (fr) 1998-05-08 1999-05-07 Systeme et procede d'estimation de voie destines a etre utilises dans des systemes automatiques de verification de haut-parleurs

Country Status (2)

Country Link
AU (1) AU3889799A (fr)
WO (1) WO1999059136A1 (fr)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6687672B2 (en) 2002-03-15 2004-02-03 Matsushita Electric Industrial Co., Ltd. Methods and apparatus for blind channel estimation based upon speech correlation structure
US7953216B2 (en) 2007-05-04 2011-05-31 3V Technologies Incorporated Systems and methods for RFID-based access management of electronic devices
CN107705791A (zh) * 2016-08-08 2018-02-16 中国电信股份有限公司 基于声纹识别的来电身份确认方法、装置和声纹识别系统
CN113516987A (zh) * 2021-07-16 2021-10-19 科大讯飞股份有限公司 一种说话人识别方法、装置、存储介质及设备

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5839103A (en) * 1995-06-07 1998-11-17 Rutgers, The State University Of New Jersey Speaker verification system using decision fusion logic
US5864810A (en) * 1995-01-20 1999-01-26 Sri International Method and apparatus for speech recognition adapted to an individual speaker
US5913192A (en) * 1997-08-22 1999-06-15 At&T Corp Speaker identification with user-selected password phrases

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5864810A (en) * 1995-01-20 1999-01-26 Sri International Method and apparatus for speech recognition adapted to an individual speaker
US5839103A (en) * 1995-06-07 1998-11-17 Rutgers, The State University Of New Jersey Speaker verification system using decision fusion logic
US5913192A (en) * 1997-08-22 1999-06-15 At&T Corp Speaker identification with user-selected password phrases

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
NAIK D.: "Pole-filtered cepstral subtraction", 1995 INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, vol. 1, May 1995 (1995-05-01), pages 157 - 160, XP000657954 *

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6687672B2 (en) 2002-03-15 2004-02-03 Matsushita Electric Industrial Co., Ltd. Methods and apparatus for blind channel estimation based upon speech correlation structure
EP1485909A4 (fr) * 2002-03-15 2005-11-30 Matsushita Electric Industrial Co Ltd Procedes et appareils pour une estimation aveugle de canal sur la base d'une structure de correlation de parole
US7953216B2 (en) 2007-05-04 2011-05-31 3V Technologies Incorporated Systems and methods for RFID-based access management of electronic devices
US9443361B2 (en) 2007-05-04 2016-09-13 John D. Profanchik Systems and methods for RFID-based access management of electronic devices
US9971918B2 (en) 2007-05-04 2018-05-15 John D. Profanchik, Sr. Systems and methods for RFID-based access management of electronic devices
US10671821B2 (en) 2007-05-04 2020-06-02 John D. Profanchik, Sr. Systems and methods for RFID-based access management of electronic devices
CN107705791A (zh) * 2016-08-08 2018-02-16 中国电信股份有限公司 基于声纹识别的来电身份确认方法、装置和声纹识别系统
CN113516987A (zh) * 2021-07-16 2021-10-19 科大讯飞股份有限公司 一种说话人识别方法、装置、存储介质及设备
CN113516987B (zh) * 2021-07-16 2024-04-12 科大讯飞股份有限公司 一种说话人识别方法、装置、存储介质及设备

Also Published As

Publication number Publication date
AU3889799A (en) 1999-11-29

Similar Documents

Publication Publication Date Title
US6760701B2 (en) Subword-based speaker verification using multiple-classifier fusion, with channel, fusion, model and threshold adaptation
Furui Recent advances in speaker recognition
Reynolds et al. Robust text-independent speaker identification using Gaussian mixture speaker models
US6480825B1 (en) System and method for detecting a recorded voice
EP0822539B1 (fr) Sélection en deux étapes du groupe de normalisation pour un système de vérification de locuteur
EP0501631B1 (fr) Procédé de décorrélation temporelle pour vérification robuste de locuteur
US5950157A (en) Method for establishing handset-dependent normalizing models for speaker recognition
US6038528A (en) Robust speech processing with affine transform replicated data
EP1159737B9 (fr) Reconnaissance du locuteur
US20040236573A1 (en) Speaker recognition systems
EP1027700A1 (fr) Systeme d'adaptation de modele et procede de verification de locuteur
AU2002311452A1 (en) Speaker recognition system
JPH11507443A (ja) 話者確認システム
EP0892388A1 (fr) Méthode et dispositif d'authentification d'interlocuteur par vérification d'information utilisant un codage forcé
Malayath et al. Data-driven temporal filters and alternatives to GMM in speaker verification
Ozaydin Design of a text independent speaker recognition system
JP4696418B2 (ja) 情報検出装置及び方法
Furui Speaker recognition
WO1999059136A1 (fr) Systeme et procede d'estimation de voie destines a etre utilises dans des systemes automatiques de verification de haut-parleurs
Gutman et al. Speaker verification using phoneme-adapted gaussian mixture models
Yu et al. Speaker verification from coded telephone speech using stochastic feature transformation and handset identification
Luettin Speaker VeriFication Experiments on the M¾VTS Database
Saeidi et al. Study of model parameters effects in adapted Gaussian mixture models based text independent speaker verification
Rosenberg Sadaoki Furui
Morin et al. A voice-centric multimodal user authentication system for fast and convenient physical access control

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AL AM AT AU AZ BA BB BG BR BY CA CH CN CU CZ DE DK EE ES FI GB GE GH GM HU ID IL IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MD MG MK MN MW MX NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT UA UG UZ VN YU ZW

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): GH GM KE LS MW SD SL SZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
NENP Non-entry into the national phase

Ref country code: KR

REG Reference to national code

Ref country code: DE

Ref legal event code: 8642

122 Ep: pct application non-entry in european phase