WO1999059136A1 - Systeme et procede d'estimation de voie destines a etre utilises dans des systemes automatiques de verification de haut-parleurs - Google Patents
Systeme et procede d'estimation de voie destines a etre utilises dans des systemes automatiques de verification de haut-parleurs Download PDFInfo
- Publication number
- WO1999059136A1 WO1999059136A1 PCT/US1999/010038 US9910038W WO9959136A1 WO 1999059136 A1 WO1999059136 A1 WO 1999059136A1 US 9910038 W US9910038 W US 9910038W WO 9959136 A1 WO9959136 A1 WO 9959136A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- channel
- speech
- estimate
- enrollment
- component
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/20—Pattern transformations or operations aimed at increasing system robustness, e.g. against channel noise or different working conditions
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/063—Training
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/04—Training, enrolment or model building
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
Definitions
- the invention is directed to improved systems and methods of channel estimation
- ASV automatic speaker verification
- the invention relates to the fields of digital speech processing and speaker recognition.
- ASV automatic speaker verification
- identity is undergoing verification with a known voice.
- One type of voice recognition system is a text-dependent automatic speaker
- the text-dependent ASV system requires that the user speak a specific password or phrase (the "password"). This password is determined by the system or by the user during enrollment. However, in most text-dependent ASV systems, the password is constrained to be within a fixed vocabulary, such as a limited number of numerical digits.
- HMM Hidden Markov Models
- ANN Artificial Neural Network
- Modeling at the subword level expands the versatility of the system. Moreover, it is also
- channel estimation both referred to as "channel estimation” unless separately identified
- CMS Cepstral Mean Subtraction
- CMS also undesirably extracts substantial amoums of the desired speech
- pole filtering approximation is one method that attempts to overcome this. While pole filtering
- the voice print system of the present invention builds and improves upon existing ASV
- the voice print system of the present invention is a subword-based, text-dependent automatic speaker verification system that embodies the capability of user-selectable
- Channel estimation and normalization removes the nonuniform effects of different
- Channel normalization is able to remove the characteristics of the test channel
- Curve-Fitting and Clean Speech separately, together, and in
- the improved voice print system using the inventive channel estimation methods can be any desired voice print system using the inventive channel estimation methods.
- All ASV systems include at least two components, an enrollment component and a
- the enrollment component is used to store information concerning a
- the system of the present invention includes inventive enrollment and
- the bootstrap component is used to generate data which assists the enrollment component to model the user's voice.
- Each of these components comprise the channel estimation and normalization techniques of the
- An enrollment component is used to characterize a known user's voice and store the characteristics in a database, so that this information is available for future comparisons.
- system of the present invention utilizes an improved enrollment process. During enrollment,
- the characteristics of the enrollment channel are estimated and stored in a database.
- database may be indexed by identification information, such as by the user's name, credit card
- the Clean Speech method separately estimates the
- the enrollment channel can be more accurately recalled from the
- Feature extraction is then performed to extract features of the user's voice, such as
- a reference template may be generate from the extracted features.
- Segmentation divides the voice sample into a number of subwords.
- the present invention uses subword modeling and may use any of the known techniques, but preferably uses a discriminant training based hierarchical classifier called a Neural Tree
- NTN Network
- the NTN is a hierarchical classifier that combines the properties of decision
- the system also utilizes the principles of multiple classifier fusion and data resampling.
- the additional classifier used herein is the Gaussian Mixture Model (GMM) classifier. If only GMM is the Gaussian Mixture Model (GMM) classifier.
- a fusion function which is set and then stored in the database, is used to weigh the
- the threshold value is stored in the
- enrollment produces a voice print database
- an index such as the user's name or credit card number
- test component is the component which performs the verification. During testing or verification, the system first accepts "test speech" and index information from a user
- the next step is to perform channel normalization or channel adaptation.
- Channel normalization is performed if the enrollment channel was also normalized. The characteristics of the test channel are normalized to remove the effects of the
- test channel from the test voice signal.
- the channel normalization may be performed with the
- channel adaptation is performed by removing from the test sample the
- the test sample is filtered through the recalled enrollment channel. This type of
- the present invention also improves on the channel adaption in the testing component.
- the performance of ASV systems can be significantly degraded by background noise and sounds.
- the invention uses a key word/ key
- the multiple classifiers of the enrollment component are used to "score" the subword
- Bootstrapping is used to generate a pool of speech data representative of the speech of
- nonspeakers or "antispeakers.” This data is used during enrollment to train the discriminant training-based classifiers. Bootstrapping involves obtaining voice samples from antispeakers,
- Figure IA is a diagram of a enrollment component of the present invention.
- Figure IB shows pseudo-code for creating a filter to perform the channel estimation
- Figure IC shows pseudo-code for inverting the filter of Figure IB.
- Figure ID shows a flow diagram for performing the Curve-Fit channel estimation.
- Figure IE shows a chart of an actual channel and a channel obtained from a cepstral mean.
- Figure IF shows a chart of an actual channel and a channel obtained from Curve-
- Figure 1G shows a chart of an inverse channel and an inverse channel obtained from
- Figure 1H shows a flow diagram for performing Clean Speech channel normalization.
- Figure 2 is a diagram of a testing component of the present invention.
- Figures 3 A and 3B are flow diagrams of a channel adaptation module, shown in
- Figure 2 of the present invention.
- Figure 4 is a diagram of a bootstrapping component, used to generate antispeaker data
- the preferred system used with the present invention includes an enrollment
- the enrollment component includes a testing component, and a bootstrap component.
- the enrollment component includes a testing component, and a bootstrap component.
- antispeaker data uses antispeaker data to generate and store information concerning a user's voice.
- the information concerning the user's voice is compared to the voice undergoing verification
- the bootstrap component is used to provide initial antispeaker data for use by the enrollment component, such that the enrollment component
- the user may properly perform its function of generating data concerning the user's voice.
- the enrollment component is used to store information (using supervised learning)
- the enrollment component also stores information concerning the channel on which the user provides the speech, the "enrollment
- Figure IA shows the enrollment component 10.
- the first step 20 is to obtain enrollment speech (the password) and to obtain 26 an index , such as the user's name or credit card number.
- the enrollment speech may be obtained via a receiver, telephone or
- a speech encoding method such as ITU G 11 standard ⁇ and A
- a sampling rate of 8000 Hz is used.
- the speech may be obtained in digital format, such as from an ISDN
- a telephony board is used to handle Telco signaling protocol.
- CPU Intel Pentium platform general purpose computer processing unit
- an additional embodiment could be the Dialogic Antares card.
- Preprocessing 30 may include
- Silence removal using energy and zero-crossing statistics is primarily based on finding a short interval which is guaranteed to be background silence (generally found a few milliseconds at the beginning of the utterance, before the speaker actually starts recording). Thresholds are set using the silence region statistics, in order to discriminate speech and silence frames.
- Silence removal based on an energy histogram.
- a histogram of frame energies is generated.
- a threshold energy value is determined based on the assumption that the biggest peak in the histogram at the lower energy region shall correspond to the background silence frame energies. This threshold energy value is used to perform speech versus silence discrimination.
- DC Bias removal to remove DC bias introduced by analog-to-digital hardware or other components.
- the mean value of the signal is computed over the entire voice sample and then is subtracted from the voice samples.
- the preprocessing is preferably
- channel estimation 40 is performed. This procedure
- the voice print database 115 stores characteristics of the enrollment channel in the voice print database 115.
- print database 115 may be RAM, ROM, EPROM, EEPROM, hard disk, CD ROM, writeable
- CD ROM compact disc ROM
- minidisk minidisk
- file server or other storage device.
- processing unit defined above as an Intel Pentium platform general purpose computer processing unit (CPU) of at least 100 MHZ having about 10MB associated RAM memory and
- Dialogic Antares card The present invention, however, is not limited to these preferred
- a speech signal with frequency spectrum S(co) is distorted by a transmission channel
- Cepstrum is defined as the inverse Fourier transform of the logarithm of short-time spectral
- Time invariant convolution distortion H(c ) can be eliminated by Cepstral Mean
- E[.] represents the expected value.
- the channel cepstrum is a constant additive component in the above equation.
- E[C Y (n)] can be assumed to represent the channel cepstrum only, that is:
- CMS may be conducted on the cepstral features obtained for the voice signal to
- cepstral mean may include information other than the estimate of the time-invariant
- convolutional distortion such as coarse spectral distribution of the speech itself.
- cepstrum is the weighted combination of LP poles or
- the narrow band-width LP poles were selectively deflated by broadening their bandwidth and
- PFCC pole filtered cepstral coefficients
- LPCC LP-derived cepstral coefficients
- the PFCC mean ma be used to create an
- the filter may be inverted 54, and speech passed through the inverted filter 56.
- the preprocessed speech during enrollment may be inverse-
- test speech on the testing channel
- Pole filtering attempts to account for this, but Pole filtering still leaves a lot of speech
- Curve-Fitting method overcomes the limitations of Cepstral Mean Subtraction and the Pole
- the Curve-Fitting method extracts channel related information from the cepstral mean, but not any speech information. Using this channel information, the Curve-Fitting
- Curve-Fitting method can use this channel information to create an
- Figure IE illustrates a comparison between an actual channel and a channel derived from the cepstral mean subtraction method. As seen in Figure IE, the channel obtained from
- the cepstral mean contains a substantial amount of unwanted speech information, especially in
- the pass band of the channel is
- this detected lowest frequency spectral peak are scanned 64 to find the point at which the
- a passband is then modeled 72 as a straight line at zero dB between the
- Fitting method channel estimate is illustrated in Figure IF. It is noted that other
- the Curve-Fitting method enrollment channel estimate can be stored in the voice print database 115, as described above, for recall and use as a filter in the testing component, as
- channel estimate may be converted to an inverse filter to inverse filter the channel corrupted
- the Curve-Fitting channel estimation method and module can be further improved by
- C PF (n) is the channel estimate obtained using Pole filtering and C CF (n) is the channel
- channel estimate can be improved by combining or fusing it with channel estimation
- module extracts channel related information by separately estimating the speech information in
- the Clean Speech method improves the channel estimation 40. Likewise, the Clean Speech method can use this channel information to
- the "clean" recording can be made during the same session in
- the recording of the clean speech is done before or after the corrupted
- a high quality microphone will have minimal channel distortion.
- a wide bandwidth microphone has a flat frequency response between 20 Hz and 20 kHz.
- Sennheiser® microphone is a Sennheiser® microphone from
- Sennheiser, Inc. which would be connected to a preferred Pentium® based computer with a
- the Clean Speech method assumes that the cepstral mean of the "clean" recording will be representative of the speech information in the recording.
- the cepstral mean for the "clean" recording can be represented as the
- cepstral coefficients of a channel corrupted speech can be represented as shown above
- E[C H (n)] and that the same person is speaking the same text between the "clean” and channel corrupted transmission, i.e., E[C s (n)] E[C s . CLEAN (n)].
- the channel normalization process shown above is done in the cepstrum domain.
- the same processing can be done in the time waveform domain.
- the channel estimate E[C H (n)] calculated under the Clean Speech method can be
- feature extraction 50 is performed on ;he processed speech.
- Feature extraction may occur after (as shown) or simultaneously with ihe step of channel estimation 40 (in parallel computing embodiments).
- the detail of the feature extraction is
- This template is stored 60 in the voice print database 115. Following storage of the template 60, the speech is segmented
- the preferred technique for subword generation 70 is automatic blind speech
- each sub-word is then modeled P0, 90, preferably with
- first neural tree network (NTN) 80 Preferably a first neural tree network (NTN) 80 and a second neural tree network (NTN) 80.
- NTN neural tree network
- GMM Gaussian mixture model
- a leave-one-out method data resampling scheme 100 is u ed. Data resampling
- 100 is performed by creating multiple subsets of the training data, each of which is created by
- Figure 1 A shows N models for the NTN classifier 80 and N models for the GMM
- classifier 90 For model #1 of the NTN classifier, a enrollment sample such as the 1st sample, is left out of the classifier.
- the antispeaker database 110 label the subword data available in the antispeaker database 110.
- RAM random access memory
- ROM read-only memory
- EPROM erasable programmable read-only memory
- EEPROM electrically erasable programmable read-only memory
- hard disk e.g., hard disk
- CD ROM e.g., CD ROM
- file server e.g., a file server
- the subword data from the speaker being trained is labeled as enrollment speaker data. Because there is a no linguistic labelling information in the antispeaker database 110, the entire
- the mean vector and covariance matrix of the subword segments obtained from subword generation are used to find the "close" subwords.
- module 120 searches the antispeaker database 110 to find the "close” subwords of antispeaker
- the anti-speaker data in the antispeaker database 110 is either manuahy created, or created
- classifier models 80, 90 are trained by comparing antispeaker data with N-l samples of
- Both modules 80, 90 can determine a score for each spectral feature
- a classifier fusion module 130 can be combined, or "fused" by a classifier fusion module 130 to obtain a composite score for
- mixture model 90 are fused 130 using a linear opinion pool, as described below. However,
- a scoring algorithm 145, 150 is used for each of the NTN and GMM models.
- Transitional (or durational) probabilities between the subwords can also be used while
- the preferred embodiment is (b) subword-average scoring.
- the result of scoring provides a GMM score and an NTN score
- a classifier fusion module 130 us.ng the linear opinion
- pool method combines the NTN score and the GMM score.
- Use of the linear opinion pool is referred to as a data fusion function, because the data from each classifier is "fused," or
- S(a) is the probability of the combined system, a ; are weights, and s ; (a)
- n is the number of classifiers
- variable a is set as a constant (although it may be dynamically adapted as discussed below), and functions to provide more influence on one classifier method as
- first classifier s x would be more important, and would be made greaier than 0.5, or its
- previous value is only incremented or decremented by _. small amount, €.
- a threshold value 140 is output
- the threshold value output 140 is compared to a
- Figure 2 shows a general outline of the testing component 150 which has many
- the testing component 150 is used to determine whether test speech received from a user
- test speech and index information 160 is supplied to the test component.
- the index information is used to recall subword/segmentation information and the threshold value
- the index information may b any nonvoice data which identifies the user, such as the user's name, credit card number, etc..
- test speech After obtaining the test speech and index information, the test speech is preprocessed
- Preprocessing 170 may be performed as previously described in _he enrollment
- test speech as was performed during enrollment.
- Channel adaptation 180 adapts the system to the particular enrollment
- Channel adaptation 180 includes processing under both the
- the enrollment channel is estimated 40 during the enrollment component 10, also shown in Figures 3 A and 3B at 300
- the enrollment channel estimate is also stored 310 in the voice print database 115 during the enrollment component.
- the enrollment channel may be estimated and stored using the preferred embodiments of the present invention previously discussed with
- test channel is estimated 320 during the testing
- the test channel may be estimated by generating a filter using the procedures
- test speech is inverse filtered through the test channel 330. To achiev. this, the test speech is
- the enrollment channel is added to the test speech by filtering the test speech through the
- the saved enrollment filter is recalled 340 and the test
- the procedure of Figure 3 A stores the enrollment data with the enrollment channel
- the enrollment speech is filtered through an inverse of the enrollment channel filter 360.
- the enrollment speech is inverse filtered
- the enrollment speech may normalized using the
- test channel is estimated 370, and an inverse filter constructed using the
- test speech is then filtered through the inverse filter 380.
- the system adapts to account for the channel distortion on the enrollment channel and on the test channel. It has been found that
- the channel print carries the characteristics of the particular cellular handset of which the speaker is an authorized user, and therefore creates an
- an authorized user's request for service may be denied due to the phone print mismatch.
- Channel adaptation 180 provides a solution to this problem. It first removes the phone
- channel adaptation can add the phone and channel print of the training environment to the speech so that it looks as if the verification speech is recorded
- Channel adaptation 180 in this manner can still be advantageous in cellular fraud
- the channels can be estimated using techniques such as pole-filtered cepstrum,
- cepstrum mean as well as FFT based periodogram of the speech signa. Pole-filtered
- cepstrum as shown in Figure IB, is the preferred method.
- feature extraction 190 is performed afte: preprocessing.
- extraction 190 may occur immediately after channel adaption 180, or may occur
- channel adaption 180 in a multiple processor embodiment.
- subword generation 210 in the testing component is performed based on the subwords/segment model computed in the enrollment phase 10.
- the GMM modeling 90 is used in the test component
- subword generation 210 to "force align" the test phrase into segments corresponding to the previously formed subwords.
- subword GMMs as reference models, Viterbi or
- DP Dynamic programming
- the normalized subword duration (stored during enrollment) is used as a constraint for force alignment since it provides stability to the
- scoring 240, 250 using the techniques
- NTN 220 is performed on the subwords. Scoring using the NTN and GMM classifiers 220,
- An NTN scoring algorithm 240 and a GMM scoring algorithm 250 are used, as
- the classifier fusion mod le 260 outputs a "final score" 270.
- the "final score” 270 is then compared 280 to the threshold value 140. If the
- the user is verified. If the "final score" 270 is less than the threshold value 140 then the user is not verified or permitted to complete the transaction requiring verification.
- the present invention also employs a number of additional adaptations, in addition to channel adaptation 180.
- the multiple classifier system uses a c ssifier fusion module
- the enrollment may not be optimal for the testing in that every single classifier may have its
- the fusion function changes accordingly in order to achieve the optimal results for fusion. Also,
- one classifier may perform better than the other.
- Fusion adaptation uses predetermined
- a fusion adaptation module 290 is connected to the classifier
- the fusion adaptation module 290 changes the constant, a, in the linear
- Si is the score of the first classifier and s 2 is the score of the second classifier.
- the fusion adaptation module 290 dynamically changes to weigh either the NTN (s,)
- Threshold adaptation adapts the threshold value in response to prior final scores.
- Threshold adaptation module 295 is shown in Figure 2. The detail of the threshold adaptation
- module 295 is herein incorporate by reference from VOICE PRINT SYSTEM AND
- Model adaptation adapts the classifier models to subsequent successful verifications.
- Fusion adaptation 290 model adaptation, and threshold adaptation 600 all may effect
- Model adaptation is more dramatic than threshold adaptation or fusion adaptation, which both
- the voiceprint database 115 may or may not be coresident with the antispeaker
- Voice print data stored in the voice print database may include: enrollment
- threshold value normalized segment durations, and/or other intermediate scores or authorization results used for adaptation.
- the antispeaker database 110 must be initially be filled with
- the initial antispeaker data may be generated via anificial simulation
- FIG. 4 shows a bootstrapping component 700.
- the bootstrar ⁇ ing component 700 is a bootstrapping component 700.
- the antispeaker speech may be phrases from any number
- the speech then undergoes feature extraction 770.
- the feature extraction may occur as previously
- the antispeaker speech undergoes sub-word generation 750, using the techniques previously described with respeci to Figure 1 A.
- the preferable method of sub-word generation is automatic blind speech segmentation, discussed
- the bootstrapping component initializes the database with antispeaker data which
- the present invention provides for an accurate and reliable automatic speaker
- Adaptation schemes adapt the ASV to changes in success/failures and to changes in the user by using channel adaptation 180, model adaptation 540, fusion adapta t ion 290, and threshold adaptation 600.
Landscapes
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Telephonic Communication Services (AREA)
Abstract
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| AU38897/99A AU3889799A (en) | 1998-05-08 | 1999-05-07 | Channel estimation system and method for use in automatic speaker verification systems |
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US8482798P | 1998-05-08 | 1998-05-08 | |
| US60/084,827 | 1998-05-08 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO1999059136A1 true WO1999059136A1 (fr) | 1999-11-18 |
Family
ID=22187467
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/US1999/010038 Ceased WO1999059136A1 (fr) | 1998-05-08 | 1999-05-07 | Systeme et procede d'estimation de voie destines a etre utilises dans des systemes automatiques de verification de haut-parleurs |
Country Status (2)
| Country | Link |
|---|---|
| AU (1) | AU3889799A (fr) |
| WO (1) | WO1999059136A1 (fr) |
Cited By (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US6687672B2 (en) | 2002-03-15 | 2004-02-03 | Matsushita Electric Industrial Co., Ltd. | Methods and apparatus for blind channel estimation based upon speech correlation structure |
| US7953216B2 (en) | 2007-05-04 | 2011-05-31 | 3V Technologies Incorporated | Systems and methods for RFID-based access management of electronic devices |
| CN107705791A (zh) * | 2016-08-08 | 2018-02-16 | 中国电信股份有限公司 | 基于声纹识别的来电身份确认方法、装置和声纹识别系统 |
| CN113516987A (zh) * | 2021-07-16 | 2021-10-19 | 科大讯飞股份有限公司 | 一种说话人识别方法、装置、存储介质及设备 |
Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US5839103A (en) * | 1995-06-07 | 1998-11-17 | Rutgers, The State University Of New Jersey | Speaker verification system using decision fusion logic |
| US5864810A (en) * | 1995-01-20 | 1999-01-26 | Sri International | Method and apparatus for speech recognition adapted to an individual speaker |
| US5913192A (en) * | 1997-08-22 | 1999-06-15 | At&T Corp | Speaker identification with user-selected password phrases |
-
1999
- 1999-05-07 WO PCT/US1999/010038 patent/WO1999059136A1/fr not_active Ceased
- 1999-05-07 AU AU38897/99A patent/AU3889799A/en not_active Abandoned
Patent Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US5864810A (en) * | 1995-01-20 | 1999-01-26 | Sri International | Method and apparatus for speech recognition adapted to an individual speaker |
| US5839103A (en) * | 1995-06-07 | 1998-11-17 | Rutgers, The State University Of New Jersey | Speaker verification system using decision fusion logic |
| US5913192A (en) * | 1997-08-22 | 1999-06-15 | At&T Corp | Speaker identification with user-selected password phrases |
Non-Patent Citations (1)
| Title |
|---|
| NAIK D.: "Pole-filtered cepstral subtraction", 1995 INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, vol. 1, May 1995 (1995-05-01), pages 157 - 160, XP000657954 * |
Cited By (9)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US6687672B2 (en) | 2002-03-15 | 2004-02-03 | Matsushita Electric Industrial Co., Ltd. | Methods and apparatus for blind channel estimation based upon speech correlation structure |
| EP1485909A4 (fr) * | 2002-03-15 | 2005-11-30 | Matsushita Electric Industrial Co Ltd | Procedes et appareils pour une estimation aveugle de canal sur la base d'une structure de correlation de parole |
| US7953216B2 (en) | 2007-05-04 | 2011-05-31 | 3V Technologies Incorporated | Systems and methods for RFID-based access management of electronic devices |
| US9443361B2 (en) | 2007-05-04 | 2016-09-13 | John D. Profanchik | Systems and methods for RFID-based access management of electronic devices |
| US9971918B2 (en) | 2007-05-04 | 2018-05-15 | John D. Profanchik, Sr. | Systems and methods for RFID-based access management of electronic devices |
| US10671821B2 (en) | 2007-05-04 | 2020-06-02 | John D. Profanchik, Sr. | Systems and methods for RFID-based access management of electronic devices |
| CN107705791A (zh) * | 2016-08-08 | 2018-02-16 | 中国电信股份有限公司 | 基于声纹识别的来电身份确认方法、装置和声纹识别系统 |
| CN113516987A (zh) * | 2021-07-16 | 2021-10-19 | 科大讯飞股份有限公司 | 一种说话人识别方法、装置、存储介质及设备 |
| CN113516987B (zh) * | 2021-07-16 | 2024-04-12 | 科大讯飞股份有限公司 | 一种说话人识别方法、装置、存储介质及设备 |
Also Published As
| Publication number | Publication date |
|---|---|
| AU3889799A (en) | 1999-11-29 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US6760701B2 (en) | Subword-based speaker verification using multiple-classifier fusion, with channel, fusion, model and threshold adaptation | |
| Furui | Recent advances in speaker recognition | |
| Reynolds et al. | Robust text-independent speaker identification using Gaussian mixture speaker models | |
| US6480825B1 (en) | System and method for detecting a recorded voice | |
| EP0822539B1 (fr) | Sélection en deux étapes du groupe de normalisation pour un système de vérification de locuteur | |
| EP0501631B1 (fr) | Procédé de décorrélation temporelle pour vérification robuste de locuteur | |
| US5950157A (en) | Method for establishing handset-dependent normalizing models for speaker recognition | |
| US6038528A (en) | Robust speech processing with affine transform replicated data | |
| EP1159737B9 (fr) | Reconnaissance du locuteur | |
| US20040236573A1 (en) | Speaker recognition systems | |
| EP1027700A1 (fr) | Systeme d'adaptation de modele et procede de verification de locuteur | |
| AU2002311452A1 (en) | Speaker recognition system | |
| JPH11507443A (ja) | 話者確認システム | |
| EP0892388A1 (fr) | Méthode et dispositif d'authentification d'interlocuteur par vérification d'information utilisant un codage forcé | |
| Malayath et al. | Data-driven temporal filters and alternatives to GMM in speaker verification | |
| Ozaydin | Design of a text independent speaker recognition system | |
| JP4696418B2 (ja) | 情報検出装置及び方法 | |
| Furui | Speaker recognition | |
| WO1999059136A1 (fr) | Systeme et procede d'estimation de voie destines a etre utilises dans des systemes automatiques de verification de haut-parleurs | |
| Gutman et al. | Speaker verification using phoneme-adapted gaussian mixture models | |
| Yu et al. | Speaker verification from coded telephone speech using stochastic feature transformation and handset identification | |
| Luettin | Speaker VeriFication Experiments on the M¾VTS Database | |
| Saeidi et al. | Study of model parameters effects in adapted Gaussian mixture models based text independent speaker verification | |
| Rosenberg | Sadaoki Furui | |
| Morin et al. | A voice-centric multimodal user authentication system for fast and convenient physical access control |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| AK | Designated states |
Kind code of ref document: A1 Designated state(s): AL AM AT AU AZ BA BB BG BR BY CA CH CN CU CZ DE DK EE ES FI GB GE GH GM HU ID IL IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MD MG MK MN MW MX NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT UA UG UZ VN YU ZW |
|
| AL | Designated countries for regional patents |
Kind code of ref document: A1 Designated state(s): GH GM KE LS MW SD SL SZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG |
|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
| DFPE | Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101) | ||
| NENP | Non-entry into the national phase |
Ref country code: KR |
|
| REG | Reference to national code |
Ref country code: DE Ref legal event code: 8642 |
|
| 122 | Ep: pct application non-entry in european phase |