EP1406244A2 - Détection d'activité vocale basée sur l'agrégation non-supervisée - Google Patents
Détection d'activité vocale basée sur l'agrégation non-supervisée Download PDFInfo
- Publication number
- EP1406244A2 EP1406244A2 EP03102639A EP03102639A EP1406244A2 EP 1406244 A2 EP1406244 A2 EP 1406244A2 EP 03102639 A EP03102639 A EP 03102639A EP 03102639 A EP03102639 A EP 03102639A EP 1406244 A2 EP1406244 A2 EP 1406244A2
- Authority
- EP
- European Patent Office
- Prior art keywords
- signal
- classes
- speech
- class
- language
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/78—Detection of presence or absence of voice signals
Definitions
- a Voice Activity Detector is a device that it allows between speech including background noise (“Speech”) and the background noise alone (“non-speech”) to distinguish.
- the input of a VAD can for example one recorded by a microphone Voice signal of a communication terminal.
- the signal is made up of his voice and background noise (e.g. street noise).
- background noise e.g. street noise
- the signal consists solely of the Background noise.
- the output of a voice activity detector now adds the information to the input signal whether it contains language or not.
- VAD Voice over IP
- VADs are either set based on heuristics or trained in the course of a training phase.
- the input signal is used in a suitable manner preprocessed audio signal.
- a property extraction is obtained depending on the number of properties used property vectors of different sizes.
- the simplest but still widely used heuristic is a signal based on a specific, fixed Assess the energy threshold. Exceeds the Signal energy the threshold, so "language” is assumed, otherwise "non-language”.
- Another example is the determination of the Zero crossing rate of the autocorrelation function of the Speech signal and a corresponding threshold for Differentiate whether a speech signal is present or not.
- VADs that are trained during a training phase, include statistical VADs or neural ones Networks. For this purpose, they are trained with data in which it is known when speech and when a noise occurs. It So this is data that is pre-recorded, for example, manually are labeled. Examples of procedures that are used on this Can be decided in a manner whether a speech signal is present or not, for example in Stadermann J .: "Speech / pause detection in automatic Speech recognition ", University of Duisburg, diploma thesis, 1999, Pages 28-36.
- VADs especially for wireless communication
- the object of the invention is to enable a more precise distinction between language and non-language. It should also aim at automatic adaptability to different noise situations, spokesman or languages are important.
- the plurality is preferably greater than or equal to 10, in particular greater than or equal to 64. Depending on the class in which the signal is divided, it is then decided whether that Signal is a voice signal or not.
- Speech signals that are recognized as such i.e. after the Voice activity detection
- two or more classes can also be provided at who are decided that the signal is not speech, if it is divided into them.
- the classes can be clustered in clusters, so that similar classes are adjacent or in groups are summarized.
- the classes are in one to be trained unsupervised, self-organizing Cluster process in a training phase, especially based on of test signals, formed automatically.
- a neural network is preferably used here, in particular a Kohonen network with the network architecture of a self-organizing card.
- the device described in FIG biometric speech recognition during enrollment be used to voice the rolling person to be recorded as a reference and not more or less large Parts of the background noise. Otherwise there may be a Person who during verification a similar one Noise environment has been authenticated by the system.
- a method of detecting whether a speech signal is present or not can be analogous to the device described build up. This also applies to his preferred Configurations.
- a program product for a data processing system the Contains code sections with which one of the described Procedures are carried out on the data processing system can be achieved by implementing the Procedure in one programming language and translation in of Execute executable code of the data processing system. The Sections of code are saved for this. It is under a program product the program as a tradable product Roger that. It can be in any form, such as Example on paper, a computer readable data medium or distributed over a network.
- This first phase is to be illustrated using FIG. 1 become.
- An audio database 1 can also be recognized there Audio signals. These are pre-processed 2 fed. This preprocessing is preferably the same as used for later speech recognition. This saves a second preprocessing.
- the preprocessing 2 extracts from the audio signals of the Audio database 1 property vectors 3 in which Properties of the audio signals can be specified.
- This Property vectors 3 become one of the input neurons neural network 4 supplied.
- the neural network 4 is a Kohonen network with the Network architecture of a self-organizing map (SOM: Self-Organizing Map). It has the property of being a local Neighborhood relationship between the individual neurons exists so the individual classes representing reference vectors after training are spatially ordered.
- SOM Self-Organizing Map
- the neural network is trained on the basis of a database, which, for example, speech and sound in the same Frequency.
- the result of the classifier training is one Class representation 5.
- the association phase the assignment of each individual class of classifier 4 in the form of the neural Network to one of the two classes of language and non-language.
- Classifier 4 itself is now used for this purpose Classification mode operated, that is, he gives to everyone Property vector 3 from the associated class 6. This is in Figure 2 shown.
- the association unit 7 is against it operated in training mode, that is, it learns on the basis of the labeled audio signals 8 the assignment of each of the Classifier classes for "language” or "non-language”. It is determined which classes each how many Test signals have been assigned, the "language” or the Are "non-language”. Depending on this result each class is shown in an association step as Language or declared as a non-language class. As The result is class assignment 9 of the VAD.
- the obtained Results further improved by using an average filter is used to eliminate individual outliers.
- the first line labeled "Real” has the actual classification given. Here is “Noise” for “non-speech” and “speech” for “speech”.
- the procedure is independent of the language and / or content of the spoken text.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Computational Linguistics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Machine Translation (AREA)
- Image Analysis (AREA)
- Character Discrimination (AREA)
- Telephonic Communication Services (AREA)
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| DE10245107 | 2002-09-27 | ||
| DE2002145107 DE10245107B4 (de) | 2002-09-27 | 2002-09-27 | Voice Activity Detection auf Basis von unüberwacht trainierten Clusterverfahren |
Publications (3)
| Publication Number | Publication Date |
|---|---|
| EP1406244A2 true EP1406244A2 (fr) | 2004-04-07 |
| EP1406244A3 EP1406244A3 (fr) | 2005-01-12 |
| EP1406244B1 EP1406244B1 (fr) | 2006-10-11 |
Family
ID=31984148
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| EP20030102639 Expired - Lifetime EP1406244B1 (fr) | 2002-09-27 | 2003-08-25 | Détection d'activité vocale basée sur l'agrégation non-supervisée |
Country Status (3)
| Country | Link |
|---|---|
| EP (1) | EP1406244B1 (fr) |
| DE (2) | DE10245107B4 (fr) |
| ES (1) | ES2269917T3 (fr) |
Cited By (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2007128530A3 (fr) * | 2006-05-05 | 2008-03-20 | Giesecke & Devrient Gmbh | Procédé et dispositif pour personnaliser des cartes |
Families Citing this family (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US11502863B2 (en) * | 2020-05-18 | 2022-11-15 | Avaya Management L.P. | Automatic correction of erroneous audio setting |
Family Cites Families (7)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US4802221A (en) * | 1986-07-21 | 1989-01-31 | Ncr Corporation | Digital system and method for compressing speech signals for storage and transmission |
| JP2643593B2 (ja) * | 1989-11-28 | 1997-08-20 | 日本電気株式会社 | 音声・モデム信号識別回路 |
| JP3088171B2 (ja) * | 1991-02-12 | 2000-09-18 | 三菱電機株式会社 | 自己組織型パタ−ン分類システム及び分類方法 |
| DE4442613C2 (de) * | 1994-11-30 | 1998-12-10 | Deutsche Telekom Mobil | System zum Ermitteln der Netzgüte in Nachrichtennetzen aus Endnutzer- und Betreibersicht, insbesondere Mobilfunknetzen |
| IT1281001B1 (it) * | 1995-10-27 | 1998-02-11 | Cselt Centro Studi Lab Telecom | Procedimento e apparecchiatura per codificare, manipolare e decodificare segnali audio. |
| US5737716A (en) * | 1995-12-26 | 1998-04-07 | Motorola | Method and apparatus for encoding speech using neural network technology for speech classification |
| US6564198B1 (en) * | 2000-02-16 | 2003-05-13 | Hrl Laboratories, Llc | Fuzzy expert system for interpretable rule extraction from neural networks |
-
2002
- 2002-09-27 DE DE2002145107 patent/DE10245107B4/de not_active Expired - Fee Related
-
2003
- 2003-08-25 EP EP20030102639 patent/EP1406244B1/fr not_active Expired - Lifetime
- 2003-08-25 DE DE50305333T patent/DE50305333D1/de not_active Expired - Lifetime
- 2003-08-25 ES ES03102639T patent/ES2269917T3/es not_active Expired - Lifetime
Cited By (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2007128530A3 (fr) * | 2006-05-05 | 2008-03-20 | Giesecke & Devrient Gmbh | Procédé et dispositif pour personnaliser des cartes |
Also Published As
| Publication number | Publication date |
|---|---|
| DE50305333D1 (de) | 2006-11-23 |
| ES2269917T3 (es) | 2007-04-01 |
| EP1406244B1 (fr) | 2006-10-11 |
| DE10245107B4 (de) | 2006-01-26 |
| EP1406244A3 (fr) | 2005-01-12 |
| DE10245107A1 (de) | 2004-04-08 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| DE69432570T2 (de) | Spracherkennung | |
| DE69031284T2 (de) | Verfahren und Einrichtung zur Spracherkennung | |
| DE69329855T2 (de) | Methode zur erkennung alphanumerischer zeichenketten, die über ein telefonnetz gesprochen werden | |
| DE69722980T2 (de) | Aufzeichnung von Sprachdaten mit Segmenten von akustisch verschiedenen Umgebungen | |
| DE60108373T2 (de) | Verfahren zur Detektion von Emotionen in Sprachsignalen unter Verwendung von Sprecheridentifikation | |
| DE69924596T2 (de) | Auswahl akustischer Modelle mittels Sprecherverifizierung | |
| DE69814104T2 (de) | Aufteilung von texten und identifizierung von themen | |
| DE69131689T2 (de) | Gleichzeitige sprecherunabhängige sprachererkennung und sprecherverifikation über einen fernsprechnetz | |
| DE69707876T2 (de) | Verfahren und vorrichtung fuer dynamisch eingestelltes training zur spracherkennung | |
| DE60023517T2 (de) | Klassifizierung von schallquellen | |
| DE60111329T2 (de) | Anpassung des phonetischen Kontextes zur Verbesserung der Spracherkennung | |
| DE60128270T2 (de) | Verfahren und System zur Erzeugung von Sprechererkennungsdaten, und Verfahren und System zur Sprechererkennung | |
| CN111524527A (zh) | 话者分离方法、装置、电子设备和存储介质 | |
| EP0964390A2 (fr) | Dispositif pour la vérification de signaux | |
| DE112018007847B4 (de) | Informationsverarbeitungsvorrichtung, informationsverarbeitungsverfahren und programm | |
| DE2422028A1 (de) | Schaltungsanordnung zur identifizierung einer formantfrequenz in einem gesprochenen wort | |
| DE69813597T2 (de) | Mustererkennung, die mehrere referenzmodelle verwendet | |
| DE3750365T2 (de) | Sprecheridentifizierung. | |
| DE102019205543A1 (de) | Verfahren zum Klassifizieren zeitlich aufeinanderfolgender digitaler Audiodaten | |
| CN116758911A (zh) | 一种基于语音信号处理的校园暴力监测方法及系统 | |
| EP3847646B1 (fr) | Appareil de traitement audio et procédé de classification de scène audio | |
| EP1406244B1 (fr) | Détection d'activité vocale basée sur l'agrégation non-supervisée | |
| DE19705471C2 (de) | Verfahren und Schaltungsanordnung zur Spracherkennung und zur Sprachsteuerung von Vorrichtungen | |
| DE10209324C1 (de) | Automatische Detektion von Sprecherwechseln in sprecheradaptiven Spracherkennungssystemen | |
| EP0817167B1 (fr) | Procédé de reconnaissance de la parole et dispositif de mise en oeuvre du procédé |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PUAI | Public reference made under article 153(3) epc to a published international application that has entered the european phase |
Free format text: ORIGINAL CODE: 0009012 |
|
| AK | Designated contracting states |
Kind code of ref document: A2 Designated state(s): AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IT LI LU MC NL PT RO SE SI SK TR |
|
| AX | Request for extension of the european patent |
Extension state: AL LT LV MK |
|
| PUAL | Search report despatched |
Free format text: ORIGINAL CODE: 0009013 |
|
| AK | Designated contracting states |
Kind code of ref document: A3 Designated state(s): AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IT LI LU MC NL PT RO SE SI SK TR |
|
| AX | Request for extension of the european patent |
Extension state: AL LT LV MK |
|
| 17P | Request for examination filed |
Effective date: 20050711 |
|
| AKX | Designation fees paid |
Designated state(s): DE ES FR GB |
|
| GRAP | Despatch of communication of intention to grant a patent |
Free format text: ORIGINAL CODE: EPIDOSNIGR1 |
|
| GRAS | Grant fee paid |
Free format text: ORIGINAL CODE: EPIDOSNIGR3 |
|
| GRAA | (expected) grant |
Free format text: ORIGINAL CODE: 0009210 |
|
| AK | Designated contracting states |
Kind code of ref document: B1 Designated state(s): DE ES FR GB |
|
| REG | Reference to a national code |
Ref country code: GB Ref legal event code: FG4D Free format text: NOT ENGLISH |
|
| GBT | Gb: translation of ep patent filed (gb section 77(6)(a)/1977) |
Effective date: 20061011 |
|
| REF | Corresponds to: |
Ref document number: 50305333 Country of ref document: DE Date of ref document: 20061123 Kind code of ref document: P |
|
| REG | Reference to a national code |
Ref country code: ES Ref legal event code: FG2A Ref document number: 2269917 Country of ref document: ES Kind code of ref document: T3 |
|
| ET | Fr: translation filed | ||
| PLBE | No opposition filed within time limit |
Free format text: ORIGINAL CODE: 0009261 |
|
| STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: NO OPPOSITION FILED WITHIN TIME LIMIT |
|
| 26N | No opposition filed |
Effective date: 20070712 |
|
| PGFP | Annual fee paid to national office [announced via postgrant information from national office to epo] |
Ref country code: ES Payment date: 20130925 Year of fee payment: 11 |
|
| PGFP | Annual fee paid to national office [announced via postgrant information from national office to epo] |
Ref country code: GB Payment date: 20130814 Year of fee payment: 11 Ref country code: FR Payment date: 20130814 Year of fee payment: 11 |
|
| PGFP | Annual fee paid to national office [announced via postgrant information from national office to epo] |
Ref country code: DE Payment date: 20141020 Year of fee payment: 12 |
|
| GBPC | Gb: european patent ceased through non-payment of renewal fee |
Effective date: 20140825 |
|
| REG | Reference to a national code |
Ref country code: FR Ref legal event code: ST Effective date: 20150430 |
|
| PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: GB Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20140825 |
|
| PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: FR Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20140901 |
|
| REG | Reference to a national code |
Ref country code: DE Ref legal event code: R119 Ref document number: 50305333 Country of ref document: DE |
|
| PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: ES Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20140826 Ref country code: DE Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20160301 |