[go: up one dir, main page]

EP1406244A2 - Détection d'activité vocale basée sur l'agrégation non-supervisée - Google Patents

Détection d'activité vocale basée sur l'agrégation non-supervisée Download PDF

Info

Publication number
EP1406244A2
EP1406244A2 EP03102639A EP03102639A EP1406244A2 EP 1406244 A2 EP1406244 A2 EP 1406244A2 EP 03102639 A EP03102639 A EP 03102639A EP 03102639 A EP03102639 A EP 03102639A EP 1406244 A2 EP1406244 A2 EP 1406244A2
Authority
EP
European Patent Office
Prior art keywords
signal
classes
speech
class
language
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
EP03102639A
Other languages
German (de)
English (en)
Other versions
EP1406244B1 (fr
EP1406244A3 (fr
Inventor
Stephan Dr. Grashey
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Siemens AG
Siemens Corp
Original Assignee
Siemens AG
Siemens Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Siemens AG, Siemens Corp filed Critical Siemens AG
Publication of EP1406244A2 publication Critical patent/EP1406244A2/fr
Publication of EP1406244A3 publication Critical patent/EP1406244A3/fr
Application granted granted Critical
Publication of EP1406244B1 publication Critical patent/EP1406244B1/fr
Anticipated expiration legal-status Critical
Expired - Lifetime legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/78Detection of presence or absence of voice signals

Definitions

  • a Voice Activity Detector is a device that it allows between speech including background noise (“Speech”) and the background noise alone (“non-speech”) to distinguish.
  • the input of a VAD can for example one recorded by a microphone Voice signal of a communication terminal.
  • the signal is made up of his voice and background noise (e.g. street noise).
  • background noise e.g. street noise
  • the signal consists solely of the Background noise.
  • the output of a voice activity detector now adds the information to the input signal whether it contains language or not.
  • VAD Voice over IP
  • VADs are either set based on heuristics or trained in the course of a training phase.
  • the input signal is used in a suitable manner preprocessed audio signal.
  • a property extraction is obtained depending on the number of properties used property vectors of different sizes.
  • the simplest but still widely used heuristic is a signal based on a specific, fixed Assess the energy threshold. Exceeds the Signal energy the threshold, so "language” is assumed, otherwise "non-language”.
  • Another example is the determination of the Zero crossing rate of the autocorrelation function of the Speech signal and a corresponding threshold for Differentiate whether a speech signal is present or not.
  • VADs that are trained during a training phase, include statistical VADs or neural ones Networks. For this purpose, they are trained with data in which it is known when speech and when a noise occurs. It So this is data that is pre-recorded, for example, manually are labeled. Examples of procedures that are used on this Can be decided in a manner whether a speech signal is present or not, for example in Stadermann J .: "Speech / pause detection in automatic Speech recognition ", University of Duisburg, diploma thesis, 1999, Pages 28-36.
  • VADs especially for wireless communication
  • the object of the invention is to enable a more precise distinction between language and non-language. It should also aim at automatic adaptability to different noise situations, spokesman or languages are important.
  • the plurality is preferably greater than or equal to 10, in particular greater than or equal to 64. Depending on the class in which the signal is divided, it is then decided whether that Signal is a voice signal or not.
  • Speech signals that are recognized as such i.e. after the Voice activity detection
  • two or more classes can also be provided at who are decided that the signal is not speech, if it is divided into them.
  • the classes can be clustered in clusters, so that similar classes are adjacent or in groups are summarized.
  • the classes are in one to be trained unsupervised, self-organizing Cluster process in a training phase, especially based on of test signals, formed automatically.
  • a neural network is preferably used here, in particular a Kohonen network with the network architecture of a self-organizing card.
  • the device described in FIG biometric speech recognition during enrollment be used to voice the rolling person to be recorded as a reference and not more or less large Parts of the background noise. Otherwise there may be a Person who during verification a similar one Noise environment has been authenticated by the system.
  • a method of detecting whether a speech signal is present or not can be analogous to the device described build up. This also applies to his preferred Configurations.
  • a program product for a data processing system the Contains code sections with which one of the described Procedures are carried out on the data processing system can be achieved by implementing the Procedure in one programming language and translation in of Execute executable code of the data processing system. The Sections of code are saved for this. It is under a program product the program as a tradable product Roger that. It can be in any form, such as Example on paper, a computer readable data medium or distributed over a network.
  • This first phase is to be illustrated using FIG. 1 become.
  • An audio database 1 can also be recognized there Audio signals. These are pre-processed 2 fed. This preprocessing is preferably the same as used for later speech recognition. This saves a second preprocessing.
  • the preprocessing 2 extracts from the audio signals of the Audio database 1 property vectors 3 in which Properties of the audio signals can be specified.
  • This Property vectors 3 become one of the input neurons neural network 4 supplied.
  • the neural network 4 is a Kohonen network with the Network architecture of a self-organizing map (SOM: Self-Organizing Map). It has the property of being a local Neighborhood relationship between the individual neurons exists so the individual classes representing reference vectors after training are spatially ordered.
  • SOM Self-Organizing Map
  • the neural network is trained on the basis of a database, which, for example, speech and sound in the same Frequency.
  • the result of the classifier training is one Class representation 5.
  • the association phase the assignment of each individual class of classifier 4 in the form of the neural Network to one of the two classes of language and non-language.
  • Classifier 4 itself is now used for this purpose Classification mode operated, that is, he gives to everyone Property vector 3 from the associated class 6. This is in Figure 2 shown.
  • the association unit 7 is against it operated in training mode, that is, it learns on the basis of the labeled audio signals 8 the assignment of each of the Classifier classes for "language” or "non-language”. It is determined which classes each how many Test signals have been assigned, the "language” or the Are "non-language”. Depending on this result each class is shown in an association step as Language or declared as a non-language class. As The result is class assignment 9 of the VAD.
  • the obtained Results further improved by using an average filter is used to eliminate individual outliers.
  • the first line labeled "Real” has the actual classification given. Here is “Noise” for “non-speech” and “speech” for “speech”.
  • the procedure is independent of the language and / or content of the spoken text.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Computational Linguistics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Machine Translation (AREA)
  • Image Analysis (AREA)
  • Character Discrimination (AREA)
  • Telephonic Communication Services (AREA)
EP20030102639 2002-09-27 2003-08-25 Détection d'activité vocale basée sur l'agrégation non-supervisée Expired - Lifetime EP1406244B1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
DE10245107 2002-09-27
DE2002145107 DE10245107B4 (de) 2002-09-27 2002-09-27 Voice Activity Detection auf Basis von unüberwacht trainierten Clusterverfahren

Publications (3)

Publication Number Publication Date
EP1406244A2 true EP1406244A2 (fr) 2004-04-07
EP1406244A3 EP1406244A3 (fr) 2005-01-12
EP1406244B1 EP1406244B1 (fr) 2006-10-11

Family

ID=31984148

Family Applications (1)

Application Number Title Priority Date Filing Date
EP20030102639 Expired - Lifetime EP1406244B1 (fr) 2002-09-27 2003-08-25 Détection d'activité vocale basée sur l'agrégation non-supervisée

Country Status (3)

Country Link
EP (1) EP1406244B1 (fr)
DE (2) DE10245107B4 (fr)
ES (1) ES2269917T3 (fr)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2007128530A3 (fr) * 2006-05-05 2008-03-20 Giesecke & Devrient Gmbh Procédé et dispositif pour personnaliser des cartes

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11502863B2 (en) * 2020-05-18 2022-11-15 Avaya Management L.P. Automatic correction of erroneous audio setting

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4802221A (en) * 1986-07-21 1989-01-31 Ncr Corporation Digital system and method for compressing speech signals for storage and transmission
JP2643593B2 (ja) * 1989-11-28 1997-08-20 日本電気株式会社 音声・モデム信号識別回路
JP3088171B2 (ja) * 1991-02-12 2000-09-18 三菱電機株式会社 自己組織型パタ−ン分類システム及び分類方法
DE4442613C2 (de) * 1994-11-30 1998-12-10 Deutsche Telekom Mobil System zum Ermitteln der Netzgüte in Nachrichtennetzen aus Endnutzer- und Betreibersicht, insbesondere Mobilfunknetzen
IT1281001B1 (it) * 1995-10-27 1998-02-11 Cselt Centro Studi Lab Telecom Procedimento e apparecchiatura per codificare, manipolare e decodificare segnali audio.
US5737716A (en) * 1995-12-26 1998-04-07 Motorola Method and apparatus for encoding speech using neural network technology for speech classification
US6564198B1 (en) * 2000-02-16 2003-05-13 Hrl Laboratories, Llc Fuzzy expert system for interpretable rule extraction from neural networks

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2007128530A3 (fr) * 2006-05-05 2008-03-20 Giesecke & Devrient Gmbh Procédé et dispositif pour personnaliser des cartes

Also Published As

Publication number Publication date
DE50305333D1 (de) 2006-11-23
ES2269917T3 (es) 2007-04-01
EP1406244B1 (fr) 2006-10-11
DE10245107B4 (de) 2006-01-26
EP1406244A3 (fr) 2005-01-12
DE10245107A1 (de) 2004-04-08

Similar Documents

Publication Publication Date Title
DE69432570T2 (de) Spracherkennung
DE69031284T2 (de) Verfahren und Einrichtung zur Spracherkennung
DE69329855T2 (de) Methode zur erkennung alphanumerischer zeichenketten, die über ein telefonnetz gesprochen werden
DE69722980T2 (de) Aufzeichnung von Sprachdaten mit Segmenten von akustisch verschiedenen Umgebungen
DE60108373T2 (de) Verfahren zur Detektion von Emotionen in Sprachsignalen unter Verwendung von Sprecheridentifikation
DE69924596T2 (de) Auswahl akustischer Modelle mittels Sprecherverifizierung
DE69814104T2 (de) Aufteilung von texten und identifizierung von themen
DE69131689T2 (de) Gleichzeitige sprecherunabhängige sprachererkennung und sprecherverifikation über einen fernsprechnetz
DE69707876T2 (de) Verfahren und vorrichtung fuer dynamisch eingestelltes training zur spracherkennung
DE60023517T2 (de) Klassifizierung von schallquellen
DE60111329T2 (de) Anpassung des phonetischen Kontextes zur Verbesserung der Spracherkennung
DE60128270T2 (de) Verfahren und System zur Erzeugung von Sprechererkennungsdaten, und Verfahren und System zur Sprechererkennung
CN111524527A (zh) 话者分离方法、装置、电子设备和存储介质
EP0964390A2 (fr) Dispositif pour la vérification de signaux
DE112018007847B4 (de) Informationsverarbeitungsvorrichtung, informationsverarbeitungsverfahren und programm
DE2422028A1 (de) Schaltungsanordnung zur identifizierung einer formantfrequenz in einem gesprochenen wort
DE69813597T2 (de) Mustererkennung, die mehrere referenzmodelle verwendet
DE3750365T2 (de) Sprecheridentifizierung.
DE102019205543A1 (de) Verfahren zum Klassifizieren zeitlich aufeinanderfolgender digitaler Audiodaten
CN116758911A (zh) 一种基于语音信号处理的校园暴力监测方法及系统
EP3847646B1 (fr) Appareil de traitement audio et procédé de classification de scène audio
EP1406244B1 (fr) Détection d'activité vocale basée sur l'agrégation non-supervisée
DE19705471C2 (de) Verfahren und Schaltungsanordnung zur Spracherkennung und zur Sprachsteuerung von Vorrichtungen
DE10209324C1 (de) Automatische Detektion von Sprecherwechseln in sprecheradaptiven Spracherkennungssystemen
EP0817167B1 (fr) Procédé de reconnaissance de la parole et dispositif de mise en oeuvre du procédé

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

AK Designated contracting states

Kind code of ref document: A2

Designated state(s): AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IT LI LU MC NL PT RO SE SI SK TR

AX Request for extension of the european patent

Extension state: AL LT LV MK

PUAL Search report despatched

Free format text: ORIGINAL CODE: 0009013

AK Designated contracting states

Kind code of ref document: A3

Designated state(s): AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IT LI LU MC NL PT RO SE SI SK TR

AX Request for extension of the european patent

Extension state: AL LT LV MK

17P Request for examination filed

Effective date: 20050711

AKX Designation fees paid

Designated state(s): DE ES FR GB

GRAP Despatch of communication of intention to grant a patent

Free format text: ORIGINAL CODE: EPIDOSNIGR1

GRAS Grant fee paid

Free format text: ORIGINAL CODE: EPIDOSNIGR3

GRAA (expected) grant

Free format text: ORIGINAL CODE: 0009210

AK Designated contracting states

Kind code of ref document: B1

Designated state(s): DE ES FR GB

REG Reference to a national code

Ref country code: GB

Ref legal event code: FG4D

Free format text: NOT ENGLISH

GBT Gb: translation of ep patent filed (gb section 77(6)(a)/1977)

Effective date: 20061011

REF Corresponds to:

Ref document number: 50305333

Country of ref document: DE

Date of ref document: 20061123

Kind code of ref document: P

REG Reference to a national code

Ref country code: ES

Ref legal event code: FG2A

Ref document number: 2269917

Country of ref document: ES

Kind code of ref document: T3

ET Fr: translation filed
PLBE No opposition filed within time limit

Free format text: ORIGINAL CODE: 0009261

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: NO OPPOSITION FILED WITHIN TIME LIMIT

26N No opposition filed

Effective date: 20070712

PGFP Annual fee paid to national office [announced via postgrant information from national office to epo]

Ref country code: ES

Payment date: 20130925

Year of fee payment: 11

PGFP Annual fee paid to national office [announced via postgrant information from national office to epo]

Ref country code: GB

Payment date: 20130814

Year of fee payment: 11

Ref country code: FR

Payment date: 20130814

Year of fee payment: 11

PGFP Annual fee paid to national office [announced via postgrant information from national office to epo]

Ref country code: DE

Payment date: 20141020

Year of fee payment: 12

GBPC Gb: european patent ceased through non-payment of renewal fee

Effective date: 20140825

REG Reference to a national code

Ref country code: FR

Ref legal event code: ST

Effective date: 20150430

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: GB

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20140825

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: FR

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20140901

REG Reference to a national code

Ref country code: DE

Ref legal event code: R119

Ref document number: 50305333

Country of ref document: DE

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: ES

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20140826

Ref country code: DE

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20160301