[go: up one dir, main page]

RS20060611A - Technique and system for automatic gain control (agc) using microphone array - Google Patents

Technique and system for automatic gain control (agc) using microphone array

Info

Publication number
RS20060611A
RS20060611A RSP-2006/0611A RSP20060611A RS20060611A RS 20060611 A RS20060611 A RS 20060611A RS P20060611 A RSP20060611 A RS P20060611A RS 20060611 A RS20060611 A RS 20060611A
Authority
RS
Serbia
Prior art keywords
agc
signal
power
estimation
input signal
Prior art date
Application number
RSP-2006/0611A
Other languages
Serbian (sr)
Inventor
Zoran Šarić
Slobodan Jovičić
Vladimir Kovačević
Nikola Teslić
Ištvan Pap
Original Assignee
Micronasnit,
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Micronasnit, filed Critical Micronasnit,
Priority to RSP-2006/0611A priority Critical patent/RS49857B/en
Publication of RS20060611A publication Critical patent/RS20060611A/en
Publication of RS49857B publication Critical patent/RS49857B/en

Links

Landscapes

  • Control Of Amplification And Gain Control (AREA)
  • Circuit For Audible Band Transducer (AREA)

Abstract

Invention belongs to the procedure and system for automatic control of a speech signal gain (AGC) recorded by means of a microphone sequence in specific ‘hands-free’ teleconference conditions of communication and in accoustic ambience with large level of various interferences such as: accoustic echo, interferential noise, reverberation and other speakers, as well as a different position of a speaker in space in comparison to the microphone sequence. The procedure for AGC consists from: estimation of trajectory of peak speech power of the speech signal, estimation of powers of accoustic interferences and participation thereof in determination of coeficient of increase, determination of the downward(11)compression characteristics, determination of coefficient of amplification for AGC and formation of speech signal output with minimal distortions. The system consists of one input and one output for the current speech signal, several inputs whereto the signals are brought for accoustic disturbance estimation, and three inputs whereto are set three constants that define the field of active functioning AGC: nominal power of output signal, dynamics of change of automatic gain control (AGC) system and maximal downward slant of compression characteristics. AEC SD VF Power Estimation of Diffused Noise Echo Power Estimation Formula u cetvrtom redu odozgo Slope estimation Gain coefficientPeak Power Estimation Moment Power estimation

Description

POSTUPAK I SISTEM ZA AUTOMATSKU^ REGULACIJU POJAČANJA (AGC) NA OSNOV%| PROCEDURE AND SYSTEM FOR AUTOMATIC^ GAIN CONTROL (AGC) BASED ON %|

OČITAVANJA MIKROFONSKOG NIZA MICROPHONE ARRAY READINGS

OBLASTTEHNIKE NA KOJU SE PRONALAZAK ODNOSI TECHNICAL FIELDS TO WHICH THE INVENTION RELETS

Pronalazak pripada oblasti obrade akustičkog signala, ili konkretnije, metodama automatske kontrole pojačanja (AGC) govornog signala snimljenog pomoću mikrofonskog niza u specifičnim „hands-free" telekonferencijskim uslovima komuniciranja. The invention belongs to the field of acoustic signal processing, or more specifically, methods of automatic gain control (AGC) of a speech signal recorded using a microphone array in specific "hands-free" teleconference communication conditions.

TEHNIČKI PROBLEM TECHNICAL PROBLEM

Slobodni, "hands-free" (engl.), komunikacioni sistemi za prenos govornog signala u punom dupleksu koriste se u mnogim aplikacijama kao što su: video-telefonski sistemi, telekonferencijski sistemi, spikerfoni u prostoriji ili kolima, komunikacija čovek-računar putem glasa, itd. "Hands-free" govorna komunikacija podrazumeva da se govornik nalazi u akustičkom ambijentu na određenoj distanci od interfejsnih elemenata sistema - mikrofona i zvučnika. Ovakvi uslovi odvijanja govorne komunikacije generišu više tehničkih problema koje je potrebno rešiti da bi se održao kvalitet komunikacije na prihvatljivom nivou. Jedan od problema je i održavanje nivoa predajnog govornog signala na približno konstantnom nivou nezavisno od tipa govornika ili njegove udaljenosti od mikrofona. Hands-free, full-duplex voice communication systems are used in many applications such as: video-telephone systems, teleconferencing systems, speakerphones in rooms or cars, human-computer voice communication, etc. "Hands-free" voice communication implies that the speaker is in an acoustic environment at a certain distance from the interface elements of the system - microphone and speaker. Such conditions of speech communication generate more technical problems that need to be solved in order to maintain the quality of communication at an acceptable level. One of the problems is maintaining the level of the transmitted speech signal at an approximately constant level regardless of the type of speaker or his distance from the microphone.

Naime, tri činjenice su ključne u prethodno naznačenom problemu. Prvo, intenzitet glasa različitih govornika zavisi od njihovog fonetskog kapaciteta. To je prirodna individualna karakteristika svakog čoveka, ali može biti i stečena karakteristika u zavisnosti od socijalnih ili habitualnih faktora. Na primer, govornik iz ruralnih krajeva najčešće ima povišen intenzitet govora u odnosu na govornika iz gradske sredine. Drugo, kod upotrebe „hands-free" komunikacionih sistema govornik može biti na različitim rastojanjima od mikrofona. Poznato je da intenzitet zvuka opada sa kvadratom rastojanja što u prethodno pomenutom slučaju unosi znatne varijacije u veličini govornog signala na izlazu mikrofona. I treće, posebno kod telekonferencijskih aplikacija, govornik je često u pokretu i menja lokaciju u prostoru u odnosu na položaj mikrofona, što prouzrokuje variranje intenziteta govornog signala na izlazu mikrofona. Namely, three facts are key in the previously indicated problem. First, the intensity of different speakers' voices depends on their phonetic capacity. It is a natural individual characteristic of every person, but it can also be an acquired characteristic depending on social or habitual factors. For example, a speaker from rural areas usually has a higher intensity of speech compared to a speaker from an urban area. Second, when using "hands-free" communication systems, the speaker may be at different distances from the microphone. It is known that the sound intensity decreases with the square of the distance, which in the previously mentioned case introduces significant variations in the size of the speech signal at the microphone output. And thirdly, especially in teleconference applications, the speaker is often in motion and changes his location in space in relation to the position of the microphone, which causes the intensity of the speech signal at the microphone output to vary.

Pored navedenih tehničkih problema pojavljuje se i problem variranja intenziteta glasa u slučajevima kada su u prostoriji prisutni ambijentalna buka, reverberacija ili drugi govornici, što je tipičan slučaj kod ,,cocktail-party" efekta, zbog čega aktuelni govornik podiže nivo svog glasa (tzv. Lombardov efekat). Svi navedeni tehnički problemi rešavaju se tehničkim rešenjem pod poznatim nazivom „automatska kontrola pojačanja" -AGC( Automatic Gain Control,engl.).AGCkao tehničko rešenje podešava pojačanje aktuelnog govornog signala održavajući približno konstantnim njegov nivo na svom izlazu. Kada je govorni signal u pitanju ova kontrola se obavlja u oba domena: amplitudskom i vremenskom. In addition to the mentioned technical problems, there is also the problem of varying the voice intensity in cases where ambient noise, reverberation or other speakers are present in the room, which is a typical case of the "cocktail-party" effect, due to which the current speaker raises the level of his voice (the so-called Lombard effect). All the mentioned technical problems are solved by a technical solution known as "automatic gain control" -AGC (Automatic Gain Control, English). constant its level at its output. When it comes to the speech signal, this control is performed in both domains: amplitude and time.

Dodatna dva problemaAGCkao tehničko rešenje mora da resi u slučaju telekonferencijskih primena, a to su: a) diskriminacija govorni signal - ambijentalne smetnje, gde treba obuhvatiti signale reverberacije i kompetetivne govornike, i b) brzina reakcijeAGCrešenja na pojavu ili nestanak govornog signala u analiziranom signalu, što zahteva dobru usklađenost radaAGCrešenja sa prirodom govornog signala. Two additional problems AGC as a technical solution must solve in the case of teleconferencing applications, namely: a) speech signal discrimination - ambient interference, where reverberation signals and competing speakers should be included, and b) the speed of AGC's reaction to the appearance or disappearance of a speech signal in the analyzed signal, which requires good compatibility of AGC's work with the nature of the speech signal.

Rešenje svih navedenih problema treba da ima za cilj obezbeđivanje maksimalne razumljivosti, prirodnosti i prijatnosti slušanja govornog signala u kompleksnim telekonferencijskim uslovima primeneAGCrešenja. The solution of all the mentioned problems should aim at ensuring maximum intelligibility, naturalness and pleasantness of listening to the voice signal in complex teleconference conditions of application of AGSolutions.

STANJE TEHNIKESTATE OF THE ART

Postoje dve generalne tehnike u realizacijiAGCproblema. U prvoj se koristi ulazni signal za generisanje koeficijenta pojačanja koji se množi sa ulaznim signalom i tako produkuje izlazni signal približno konstantne snage. Ova tehnika se naziva "feed fonvard" tehnika. Druga tehnika je tzv. "feedback" tehnika u kojoj se za generisanje koeficijenta pojačanja koristi izlazni signal, koji se poredi sa referentnim signalom (koji definiše željeni nivo izlaznog signala) a signal greške defmiše koeficijent pojačanja. There are two general techniques in the realization of the AGC problem. In the first, the input signal is used to generate a gain coefficient that is multiplied by the input signal and thus produces an output signal of approximately constant power. This technique is called the "feed back" technique. Another technique is the so-called "feedback" technique in which the output signal is used to generate the gain coefficient, which is compared with the reference signal (which defines the desired level of the output signal) and the error signal defines the gain coefficient.

KonvencionalniAGCsistemi, zasnovani na bilo kojoj od ove dve tehnike, rešavaju pitanje automatske kontrole nivoa govornog signala na bazi kompresije amplitude govornog signala koristeći nelinearnu estimaciju snage govornog signala i elemente detektora aktivnosti govora(VAD-voice activity detector)ili tzv.squelchsistem. Na primer: U.S. patent 4,947,133, prijavljen 19. januara 1988, sa naslovom "Method and apparatus for automatic signal level adjustment", koristi "feed fonvard" tehniku u adaptivnoj kompresiji govornog signala, zatim U.S. patent 5,854,845, prijavljen 21. februara 1996, sa naslovom "Method and circuit for voice automatic gain control", daje metod za automatsku kontrolu pojačanja signala uz minimiziranje distorzija na granicama govor-pauza, kao i U.S. patent 6,959,082, prijavljen 10. jula 2001, sa naslovom "Method and svstem for automatic gain control with adaptive table lookup", koji koristi obe tehnike "feed fonvard" i "feedback" u automatskoj kontroli pojačanja pri čemu se pojačanje tabelarno određuje na bazi estimirane snage ulaznog signala. Conventional AGC systems, based on either of these two techniques, solve the issue of automatic voice signal level control based on voice signal amplitude compression using non-linear estimation of the voice signal power and elements of the speech activity detector (VAD-voice activity detector) or the so-called squelch system. For example: U.S. patent 4,947,133, filed Jan. 19, 1988, entitled "Method and apparatus for automatic signal level adjustment," uses a "feedback" technique in adaptive speech signal compression, then U.S. Pat. patent 5,854,845, filed Feb. 21, 1996, entitled "Method and circuit for voice automatic gain control," provides a method for automatically controlling signal gain while minimizing distortion at speech-pause boundaries, as well as U.S. Pat. patent 6,959,082, filed July 10, 2001, entitled "Method and system for automatic gain control with adaptive table lookup", which uses both "feedback" and "feedback" techniques in automatic gain control where the gain is tabulated based on estimated input signal strength.

Nešto drugačiji zahtevi u realizaciji AGC nastaju u specifičnim „hands-free" telekonferencijskim uslovima komuniciranja u punom dupleksu. Kvalitetno snimanje govora u uslovima prisustva akustičkih smetnji i reverberacije prostorije predstavlja složen problem. U uslovima kada se spektri korisnog govornog signala preklapaju sa spektrima prisutnih smetnji, jednokanalnim postupcima obrade signala nije moguće ostvariti značajnije poboljšanje kvaliteta govornog signala već se koriste višemikrofonski postupci (mikrofonski nizovi). Prednost mikrofonskih nizova u odnosu na jednokanalne postupke obrade je njihova sposobnost da prilagode svoju prostornu karakteristiku prijema (karakteristiku usmerenosti) trenutnom prostornom rasporedu odabranog govornika i smetnji. Pri tome ostvaruju maksimalno potiskivanje prisutnih smetnji uz istovremeno isticanje odabranog govornika. Osnovni problemi koji se u primeni mikrofonskih nizova sreću su sledeći (M.S. Brandstein, D.B. Ward (Eds.),Microphone Arrays: Signal Processing Techniques and Applications,Springer, Berlin 2001; Y. Huang, J. Benestv,Audio signal processing for next generation multimedia communication systems,Kluwer Academic Publishers Publ., 2004.): nepoznavanje tačne lokacije odabranog govornika, nepoznavanje broja i prostornog rasporeda prisutnih smetnji, višestruke refleksije korisnog izvora i smetnji o zidove prostorije i nestacionarnost izvora akustičkih smetnji i odabranog govornika. Somewhat different requirements in the implementation of AGC arise in specific "hands-free" teleconference conditions of full duplex communication. High-quality speech recording in the presence of acoustic disturbances and room reverberation is a complex problem. In conditions where the spectrum of the useful speech signal overlaps with the spectrum of the present disturbances, it is not possible to achieve a significant improvement in the quality of the speech signal with single-channel signal processing procedures, but multi-microphone procedures (microphone arrays) are used. The advantage of microphone arrays over single-channel processing methods is their ability to adapt their spatial reception characteristic (directivity characteristic) to the current spatial arrangement of the selected speaker and interference. At the same time, they achieve the maximum suppression of the present disturbances while at the same time highlighting the selected speaker. The basic problems encountered in the application of microphone arrays are the following (M.S. Brandstein, D.B. Ward (Eds.), Microphone Arrays: Signal Processing Techniques and Applications, Springer, Berlin 2001; Y. Huang, J. Benestv, Audio signal processing for next generation multimedia communication systems, Kluwer Academic Publishers Publ., 2004.): not knowing the exact location of the selected speaker, not knowing the number and spatial arrangement of the interferences present, multiple reflections of the useful source and interference on the walls of the room and the non-stationarity of the source of acoustic interference and the selected speaker.

Kada se mikrofonski niz upotrebi u telekonferencijskim sistemima koji funkcionišu u punom dupleksu, onda se broj problema uvećava. Najveći problem je pojava akustičkog eha, zatim dvostruka govorna aktivnost (aktivnost u oba smera komuniciranja), kao i moguća pojava nestabilnosti sistema, tzv. mikrofonija. Funkcionisanje AGC je znatno komplikovanije u ovakvim uslovima (K. Kobavashi, Y. Haneda, K. Furuva, A. Kataoka, "A hands-free unit with adaptive microphone array for directional AGC",2005 International JVorkshop on Acoustic Echo and Noise Control,Eindhoven, The Netherlands, September 12- 15, 2005). When the microphone array is used in teleconferencing systems that operate in full duplex, then the number of problems increases. The biggest problem is the appearance of an acoustic echo, followed by double speech activity (activity in both directions of communication), as well as the possible appearance of system instability, the so-called. microphonics. The functioning of AGC is significantly more complicated in these conditions (K. Kobavashi, Y. Haneda, K. Furuva, A. Kataoka, "A hands-free unit with adaptive microphone array for directional AGC", 2005 International Workshop on Acoustic Echo and Noise Control, Eindhoven, The Netherlands, September 12-15, 2005).

Integralno rešenjeAGC,izloženo u ovom patentu, objedinjuje pozitivne osobine nekih konvencionalnih rešenja sa novim rešenjima u multimikrofonskom okruženju i zahtevima koje nameće „hands-free" telekonferencijska aplikacija u punom dupleksu obezbeđujući kvalitetnu govornu komunikaciju. The integral solution AGC, presented in this patent, combines the positive features of some conventional solutions with new solutions in a multi-microphone environment and the requirements imposed by the "hands-free" teleconferencing application in full duplex, ensuring quality voice communication.

IZLAGANJE SUŠTINE PRONALASKADISCLOSURE OF THE ESSENCE OF THE INVENTION

Predmet ovog pronalaska je metod i sistem za automatsku kontrolu pojačanja (AGC) govornog signala snimljenog pomoću mikrofonskog niza u složenom akustičkom ambijentu i u specifičnim „hands-free" telekonferencijskim uslovima komuniciranja a u cilju obezbeđenja kvaliteta i razumljivosti govornog signala. The subject of this invention is a method and system for automatic gain control (AGC) of a speech signal recorded using a microphone array in a complex acoustic environment and in specific "hands-free" teleconference communication conditions, with the aim of ensuring the quality and intelligibility of the speech signal.

Suština pronalaska jeste u specifičnoj obradi govornog signala koji se snima u akustičkom ambijentu prostorije u kojoj se nalazi sistem i govornik. Za snimanje govornika u prostoriji, koji se nalazi na određenom rastojanju (do nekoliko metara), koristi se mikrofonski niz od N mikrofona. Mikrofonski niz snima sve signale u prostoriji: koristan signal kao direktan talas koji stiže od govornika do mikrofona i signale smetnji koji mogu biti raznovrsni. Kao signali smetnje pojavljuju se: akustički eho kao direktan zvučni talas iz zvučnika preko kojih se emituje glas sagovornika sa udaljenog kraja komunikacionog kanala, direktni talasi od jednog ili više izvora šumova ili izvora drugih smetnji koji se mogu naći u prostoriji i svi reflektovani talasi (eho prostorije) koji potiču od svih izvora zvukova, uključujući i govornika, a koji nastaju usled reverberacije prostorije. Treba naglasiti da izvori zvukova u prostoriji mogu biti stacionarni ili nestacionarni, što je najčešći slučaj, kako po svojim karakteristikama tako i po lokaciji u prostoriji (pokretni izvori zvukova). Suština pronalaska jeste da se očuva kvalitet i razumljivost govornog signala aktuelnog govornika uprkos velikim ambijentalnim smetnjama, različitim mogućim prostornim položajima govornika u odnosu na mikrofonski niz i, konsekventno, velikim varijacijama u intenzitetu govornog signala. The essence of the invention lies in the specific processing of the speech signal that is recorded in the acoustic environment of the room where the system and the speaker are located. A microphone array of N microphones is used to record the speaker in the room, located at a certain distance (up to several meters). The microphone array records all the signals in the room: the useful signal as a direct wave arriving from the speaker to the microphone and interference signals that can be varied. The following interference signals appear: acoustic echo as a direct sound wave from the speakers through which the interlocutor's voice is emitted from the far end of the communication channel, direct waves from one or more noise sources or sources of other disturbances that can be found in the room and all reflected waves (room echoes) originating from all sound sources, including the speaker, which are caused by room reverberation. It should be emphasized that the sources of sounds in the room can be stationary or non-stationary, which is the most common case, both according to their characteristics and their location in the room (moving sources of sounds). The essence of the invention is to preserve the quality and intelligibility of the current speaker's speech signal despite large ambient disturbances, different possible spatial positions of the speaker in relation to the microphone array and, consequently, large variations in the intensity of the speech signal.

Specifičan aspekt pronalaska se nalazi u kompletnoj obradi signala u frekvencijskom domenu, koji omogućava određene prednosti u pogledu brzine obrade i broja računskih operacija, stoje veoma važno za implementaciju u realnom vremenu. A specific aspect of the invention is found in the complete signal processing in the frequency domain, which enables certain advantages in terms of processing speed and the number of computational operations, which are very important for real-time implementation.

Sledeću specifičnost pronalaska čini nelinearna karakteristika amplitudske kompresije. Nagib ove karakteristike je određen na kompleksan način preko trajektorije vršne snage govornog signala. Analiza obuhvata trend i konveksnost trajektorije, i relativno poređenje ovih parametara sa trenutnom vršnom snagom govornog signala. Dakle, u potpunosti su uzete u obzir karakteristike govornog signala, te je karakteristika kompresije optimizirana za ovu vrstu signala. The next specificity of the invention is the nonlinear characteristic of amplitude compression. The slope of this characteristic is determined in a complex way through the trajectory of the peak power of the speech signal. The analysis includes the trend and convexity of the trajectory, and the relative comparison of these parameters with the current peak power of the speech signal. Therefore, the characteristics of the speech signal are fully taken into account, and the compression characteristic is optimized for this type of signal.

Određivanje koeficijenta pojačanja je naredna specifičnost pronalaska. On se određuje na bazi nagiba karakteristike kompresije kao i na bazi veličine vršne snage ulaznog govornog signala i srednjih snaga smetnji koje se pojavljuju u sistemu slobodne govorne komunikacije na bazi mikrofonskog niza. Determination of the gain coefficient is a further specificity of the invention. It is determined on the basis of the slope of the compression characteristic as well as on the basis of the magnitude of the peak power of the input speech signal and the average power of the disturbances that appear in the system of free speech communication based on the microphone array.

Sledeću specifičnost pronalaska čini postupak adaptivnog određivanja koeficijenta pojačanja na bazi detektovanih informacija o pauzama u govornom signalu, prisutnosti rezidualnog eha u sistemu i prisutnosti konkurentnog govornika ili akustičke smetnje. Ove informacije omogućavaju pravilno funkcionisanje AGC kod različitog sadržaja ulaznog signala i obezbeđuju poboljšanje kvaliteta izlaznog signala. The next specificity of the invention is the procedure of adaptive determination of the amplification coefficient based on detected information about pauses in the speech signal, the presence of residual echo in the system and the presence of a competing speaker or acoustic disturbance. This information enables the proper functioning of the AGC with different input signal content and ensures the improvement of the quality of the output signal.

Posebnu specifičnost pronalaska čini set parametara koji omogućava optimalan izbor uslova rada AGC u skladu sa zahtevima aplikacije. Ovi parametri definišu nominalni nivo izlaznog signala, maksimalan nagib karakteristike kompresije i maksimalno pojačanje AGC. Na taj načinje postignuta fleksibilnost rešenja. A particular specificity of the invention is a set of parameters that enables the optimal selection of AGC operating conditions in accordance with the application requirements. These parameters define the nominal level of the output signal, the maximum slope of the compression characteristic and the maximum gain of the AGC. In this way, the flexibility of the solution was achieved.

Inventivnost u ovom pronalasku se nalazi u poboljšanju svake od navedenih specifičnosti a posebno u činjenici da su na posredan način u algoritam estimacije koeficijenta pojačanja unete informacije o ambijentalnim smetnjama kao i informacija o aktivnosti aktuelnog govora. Inventiveness in this invention is found in the improvement of each of the mentioned specificities, and especially in the fact that information about ambient disturbances as well as information about current speech activity are indirectly entered into the amplification coefficient estimation algorithm.

Ovi i drugi aspekti, specifičnosti i benefiti ovog pronalaska biće očigledniji nakon uvida u detaljan opis pronalaska, patentne zahteve i pripadajuće crteže. These and other aspects, specificities and benefits of the present invention will be more apparent upon review of the detailed description of the invention, patent claims and accompanying drawings.

KRATAK OPIS SLIKA I NACRTA BRIEF DESCRIPTION OF THE IMAGES AND DRAWINGS

Slika 1 - prikazuje ambijentalne uslove primene sistema za slobodnu video-telefonsku komunikaciju pomoću mikrofonskog niza. Figure 1 - shows the ambient conditions of application of the system for free video-telephone communication using a microphone array.

Slika 2 - prikazuje blok dijagram sistema za obradu audio signala u okviru sistema za slobodnu viđeo-telefonsku komunikaciju i lociranje modula za AGC; osnovni blokovi ovog sistema su: blok za potiskivanje eha (AEC), blok za formiranje karakteristike usmerenosti mikrofonskog niza (SD-BF), blok za lociranje govornika u prostoru (DOA), blok za potiskivanje šuma (NR) i blok za automatsku kontrolu pojačanja (AGC). Figure 2 - shows the block diagram of the system for processing audio signals within the system for free video-telephone communication and locating the module for AGC; the basic blocks of this system are: the echo cancellation block (AEC), the microphone array directivity characteristic formation block (SD-BF), the speaker localization block (DOA), the noise cancellation block (NR) and the automatic gain control (AGC) block.

Slika3 - prikazuje postupak formiranja prenosne karakteristike AGC. Figure 3 - shows the process of forming the transmission characteristic of AGC.

Slika 4 - prikazuje blok dijagram osnovne strukture sistema automatske kontrole pojačanja (AGC). Figure 4 - shows a block diagram of the basic structure of the automatic gain control (AGC) system.

Slika 5 - prikazuje tok operacija u izračunavanju faktora pojačanjaAagc.Figure 5 - shows the flow of operations in the calculation of the amplification factorAagc.

Slika 6 - prikazuje interfejsne tačke i signale AGC sistema. Figure 6 - shows the interface points and signals of the AGC system.

DETALJAN OPIS PRONALASKA DETAILED DESCRIPTION OF THE INVENTION

Ovaj pronalazak opisuje sistem i postupak automatske kontrole pojačanja (AGC) govornog signala u sistemu slobodne govorne komunikacije pomoću mikrofonskog niza. Da bi se razumela potreba i značaj AGC u ovakvim sistemima na slici 1 prikazani su ambijentalni uslovi odvijanja slobodne govorne komunikacije, dok je na slici 2 prikazan blok dijagram sistema za obradu audio signala u okviru kojeg je lociran modul za AGC govornog signala. This invention describes a system and method for automatic gain control (AGC) of a speech signal in a free speech communication system using a microphone array. In order to understand the need and importance of AGC in such systems, Figure 1 shows the ambient conditions for free speech communication, while Figure 2 shows the block diagram of the audio signal processing system in which the voice signal AGC module is located.

Slika 1 šematski prikazuje ambijentalne uslove primene sistema za slobodnu video-telefonsku komunikaciju pomoću mikrofonskog niza. U prostoriji 101 nalaze se sistem za slobodnu video-telefonsku komunikaciju, govornik 111 i izvor šuma 114, što je uobičajeno za svaki akustički ambijent. Preko zvučnika 102 stereo audio sistema govornik 111 sluša dolazni govorni signal 104 sagovornika sa udaljenog kraja najčešće kao mono signal; moguće je da se preko stereo audio sistema emituje i drugi audio signal. Zvuk u ambijentu prostorije 101 snima mikrofonski niz 103 sastavljen od N mikrofona. Nakon kompleksne obrade mikrofonskih signala u bloku 107 govorni signal govornika 111 se preko bloka 108 prenosi ka udaljenom sagovorniku kao mono signal. Figure 1 schematically shows the ambient conditions of application of the system for free video-telephone communication using a microphone array. In the room 101 there is a system for free video-telephone communication, a speaker 111 and a noise source 114, which is common for any acoustic environment. Through the speaker 102 of the stereo audio system, the speaker 111 listens to the incoming speech signal 104 of the interlocutor from the far end, usually as a mono signal; it is possible that another audio signal is transmitted through the stereo audio system. Sound in the room environment 101 is recorded by a microphone array 103 composed of N microphones. After the complex processing of microphone signals in block 107, the speech signal of the speaker 111 is transmitted via block 108 to the remote interlocutor as a mono signal.

Ambijentalni uslovi odvijanja govorne komunikacije u prostoriji 101 su veoma kompleksni. Kod slobodne video-telefonske komunikacije u prostoriji 101 postoji minimum tri izvora zvuka: stereo zvučnici 102 koji emituju govor udaljenog sagovornika i drugi audio signal, govornik 111 i bar jedan izvor šuma 114. U prostoriji može biti i više izvora šumova: šum računara, šum klima sistema, buka sa ulice koja prodire u prostoriju kroz prozore, buka iz susednih prostorija, vibracije zgrade, ili drugi govornik, više govornika, izvor muzike, itd. Dakle, pojavljuje se veoma složena akustička slika u prostoriji. Mikrofonski niz 103 snima, kao senzorski sistem, sve zvuke u prostoriji, snima direktne zvučne talase od svakog izvora ali i sve refleksije od zidova prostorije i drugih predmeta koji se nalaze u njoj. Tako na primer, od zvučnika 102 do mikrofonskog niza 103 stiže direktan talas 109 i mnogi reflektovani talasi od kojih je samo jedan 110 prikazan na slici 1; od govornika 111 stiže direktan talas 112 i pored ostalih i dva reflektovana talasa 113a i 113b, od izvora šuma 114 stiže direktan talas 115 i pored ostalih i reflektovani talas 116. The ambient conditions for speech communication in room 101 are very complex. During free video-telephone communication, there are at least three sound sources in the room 101: stereo speakers 102 that emit the speech of the remote interlocutor and another audio signal, speaker 111 and at least one source of noise 114. There may be several sources of noise in the room: computer noise, air conditioning system noise, noise from the street penetrating the room through the windows, noise from neighboring rooms, building vibrations, or another speaker, multiple speakers, music source, etc. Thus, a very complex acoustic picture appears in the room. The microphone array 103 records, as a sensor system, all sounds in the room, it records direct sound waves from each source, but also all reflections from the walls of the room and other objects in it. So, for example, from the speaker 102 to the microphone array 103, a direct wave 109 and many reflected waves arrive, of which only one 110 is shown in Figure 1; from the speaker 111 comes a direct wave 112 and, in addition to the others, two reflected waves 113a and 113b, from the noise source 114 a direct wave 115 arrives and, in addition to the others, a reflected wave 116.

Od svih zvukova koje mikrofonski niz snima jedino je direktan talas 112 od govornika 111 koristan signal, svi ostali su smetnje. Akustički eho 109 koji dolazi iz zvučnika 102, obično predstavlja smetnju najvećeg intenziteta. Sve ostale refleksije zbirno čine reverberaciju prostorije. Zadatak bloka za obradu audio signala 107 jeste da potisne signal akustičkog eha, da selektuje koristan signal 112 od svih ostalih smetnji, da potisne signale reverberacije i da potisne direktne signale izvora smetnji, kojih može da bude i više od jednog izvora. Poseban zadatak bloka 107 jeste adaptivno praćenje nestacionarnosti akustičke scene u prostoriji bilo da se govornik pokreće, ili da se od razgovora do razgovora nalazi na različitim pozicijama u prostoriji, ili da se izvori šumova pokreću, da su nestacionarni ili da menjaju svoje karakteristike. Sa aspekta udaljenog sagovornika odlazni govorni signal 108 treba da je stabilan i nezavisan od navedenih akustičkih varijabilnosti, što treba da obezbedi kvalitetnu i prijatnu govornu komunikaciju. Of all the sounds that the microphone array records, only the direct wave 112 from the speaker 111 is a useful signal, all others are interference. The acoustic echo 109 coming from the speaker 102 is usually the most intense disturbance. All other reflections collectively make up the reverberation of the room. The task of the audio signal processing block 107 is to suppress the acoustic echo signal, to select the useful signal 112 from all other interferences, to suppress the reverberation signals and to suppress the direct signals of the interference sources, which may be more than one source. The special task of block 107 is to adaptively monitor the non-stationarity of the acoustic scene in the room, whether the speaker is moving, or is in different positions in the room from conversation to conversation, or that noise sources are moving, are non-stationary or change their characteristics. From the point of view of the remote interlocutor, the outgoing voice signal 108 should be stable and independent of the aforementioned acoustic variability, which should ensure quality and pleasant voice communication.

Na slici 2 prikazan je blok dijagram sistema za obradu audio signala u okviru sistema za slobodnu video-telefonsku komunikaciju, koji je na slici 1 označen sa 107. Mikrofonski signali 103, od Ml do MN, kao i signali stereo zvučnika 102, Zv-L i Zv-D, se preko ulaznog interfejsa 201 uvode u blok 202, N-kanalni akustički potiskivač eha - AEC( Acoustic Echo Cancelling),u kome se vrši potiskivanje akustičkog eha 109, slika 1, u snimanom govornom signalu 112, slika 1. U ulaznom interfejsu 201 vrši se konverzija signala u frekvencijski domen pomoću diskretne Fourierove transformacije (DFT) i sva dalja obrada signala se vrši u frekvencijskom domenu, na nivou bloka ulaznih odmeraka. Na izlazu bloka 202 dobijaju se signali iz mikrofona sa potisnutim akustičkim ehom,SaecidosAecn-Ovi signali se uvode u blok 203, superdirektivni usmerivač - SD-BF( Superdirective Beamformer),koji oblikuje karakteristiku usmerenosti mikrofonskog niza 103, i u blok 204, azimut DOA( Direction Of Arrival),koji određuje lokaciju aktuelnog govornika u horizontalnoj ravni i daje podatak o uglu azimuta6a.Ovaj podatak se koristi za usmeravanje karakteristike mikrofonskog niza ka aktuelnom govorniku. Figure 2 shows a block diagram of the system for processing audio signals within the system for free video-telephone communication, which is marked 107 in Figure 1. Microphone signals 103, from Ml to MN, as well as stereo speaker signals 102, Zv-L and Zv-D, are introduced via the input interface 201 to block 202, N-channel acoustic echo canceller - AEC (Acoustic Echo Cancelling), in which suppresses the acoustic echo 109, Fig. 1, in the recorded speech signal 112, Fig. 1. In the input interface 201, the signal is converted into the frequency domain using the Discrete Fourier Transform (DFT) and all further processing of the signal is performed in the frequency domain, at the level of the block of input measurements. At the output of block 202, signals are received from the microphone with suppressed acoustic echo, SaecidosAecn- These signals are introduced into block 203, the superdirective beamformer - SD-BF (Superdirective Beamformer), which shapes the directionality characteristic of the microphone array 103, and into block 204, the azimuth DOA (Direction Of Arrival), which determines the location of the current speaker in the horizontal plane and provides information about the angle azimuth6a. This data is used to direct the characteristic of the microphone array towards the current speaker.

Signal na izlazu bloka 203 sadrži koristan govorni signal i signal smetnji koji se sastoji od rezidualnog signala nakon potiskivanja akustičkog eha, potisnut šum ambijenta i potisnute signale reverberacije. Ovaj signal ulazi u blok 205, potiskivač šuma - NR( Noise Reduction),gde se vrši dodatno potiskivanje signala smetnji. Proces potiskivanja je adaptivan obzirom na nestacionarnost signala smetnji. The signal at the output of block 203 contains the useful speech signal and the interference signal consisting of the residual signal after acoustic echo suppression, suppressed ambient noise and suppressed reverberation signals. This signal enters block 205, the noise suppressor - NR (Noise Reduction), where additional interference signal suppression is performed. The suppression process is adaptive considering the non-stationarity of the interference signal.

Finalni blok obrade signala u sistemu za slobodnu govornu komunikaciju jeste blok 206, automatska kontrola pojačanja - AGC, koji jeste predmet ovog pronalaska. U ovom bloku koristi se više informacija iz celokupnog sistema koje su važne za definisanje mogućih uslova u kojima se govorni signal može naći i gde je potrebno na odgovarajući način izvršiti njegovu amplitudsku korekciju. Na taj način se može obezbediti približno isti nivo predajnog govornog signala nezavisno od udaljenosti aktuelni govornik od mikrofonskog niza i obezbediti njegov bolji kvalitet na udaljenom kraju komunikacionog kanala. Preko izlaznog interfejsa 207 estimirani govorni signal na bližem krajušse prenosi kroz komunikacioni kanal ka udaljenom sagovorniku. The final block of signal processing in the system for free speech communication is block 206, automatic gain control - AGC, which is the subject of this invention. In this block, more information from the entire system is used, which is important for defining the possible conditions in which the speech signal can be found and where it is necessary to perform its amplitude correction in an appropriate manner. In this way, it is possible to provide approximately the same level of the transmitted speech signal regardless of the distance of the current speaker from the microphone array and ensure its better quality at the far end of the communication channel. Through the output interface 207, the estimated speech signal at the near end is transmitted through the communication channel to the remote interlocutor.

Važno je zapaziti da se funkcionisanje AGC u kompleksnim ambijentalnim uslovima primene mikrofonskog niza, u višekanalnim sistemima i u slobodnoj govornoj komunikaciji, veoma razlikuje od konvencionalne primene AGC u jednokanalnim sistemima i primenama. It is important to note that the functioning of AGC in complex ambient conditions of microphone array application, in multi-channel systems and in free speech communication, is very different from the conventional application of AGC in single-channel systems and applications.

Na slici 3 prikazan je proces formiranja prenosne karakteristike AGC sistema. Dijagrami su dati u formiLou, = /(L,„).U slučaju sistema bez funkcije AGC sa jediničnim pojačanjem prenosna karakteristika bi bila 301,Ki,koja obezbeduje samo prosleđivanje ulaza na izlaz. Osnovni zadatak AGC sistema je da govorni signal na svom izlazu održi na istom optimalnom nivou snage 302,Lopl,i u idealnom slučaju - kada imamo potpunu kompresiju, svaki ulazni nivo bi se dovodio na nivoLopt.Figure 3 shows the process of forming the transmission characteristic of the AGC system. The diagrams are given in the form Low, = /(L,„). In the case of a system without an AGC function with unity gain, the transfer characteristic would be 301,Ki, which provides only input to output forwarding. The basic task of the AGC system is to maintain the speech signal at its output at the same optimal power level 302,Lopl, and in the ideal case - when we have full compression, each input level would be brought to the Lopt level.

Prenosna karakteristikaK2kojom se ostvaruje potpuna kompresija dinamike ulaza nije dobra u uslovima prisustva šuma u sistemu, jer se u tom slučaju i šum pojačava i izjednačava po nivou sa govornim signalom. Kompromisno rešenje je da se umesto prenosne karakteristike 303,K2,primeni karakteristika 304,K3,kojom se smanjuje (komprimuje) ulazna dinamika prema zadatoj konstanti kompresijea.Karakteristika 304,K3se u decibelskom dijagramu opisuje linearnom funkcijom: The transmission characteristic K2, which achieves complete compression of the input dynamics, is not good in conditions of the presence of noise in the system, because in that case the noise is amplified and equalized in level with the speech signal. A compromise solution is to use the characteristic 304,K3 instead of the transmission characteristic 303,K2, which reduces (compresses) the input dynamics according to the given compression constant. The characteristic 304,K3 is described in the decibel diagram by a linear function:

i gde su:Pm- snaga signala na ulazu u kompresor,Pout- snaga signala na izlazu iz kompresora, Pn0m- nominalna izlazna snaga na koju se želi podesiti izlazni signal a Po - referentna snaga u odnosu na koju se mere nivoi. and where: Pm- signal power at the input to the compressor, Pout- signal power at the output from the compressor, Pn0m- nominal output power to which the output signal is to be set and Po - reference power in relation to which the levels are measured.

Kada je ulazni signalLopttada je nivo na izlazu takođeLopta linearno pojačanjeAagcza koje se ostvaruje ovaj proces treba da je 1. Ako je: When the input signal is Lopttada the level at the output is also Lopta linear amplification Aagcza that is achieved by this process should be 1. If:

tada se za pojačanjeA,ubacivanjem (2) i (3) u (1), dobija:mog( AlgcPJP0)=lOa\ og( Pin/ P0)+ (1 -a)101og(P„om /P0),(4) then for amplification A, inserting (2) and (3) into (1), we get: mog( AlgcPJP0)=lOa\ og( Pin/ P0)+ (1 -a)101og(P„om /P0), (4)

odnosno: namely:

Iz (5) se vidi sledeća osobina pojačanjaAagc.Kada snaga ulaznog signala Pjnteži nuli, uz uslov 0<ot<l, tada pojačanjeAagcteži beskonačnosti. Ova osobina nije poželjna, jer u tom slučaju dolazi do velikog i nepotrebnog pojačanja vrlo slabih signala koji najčešće predstavljaju šum. Zbog toga se relacija (5) modifikuje dodavanjem člana5<P>nomimeniocu te dobij amo: From (5) we can see the following property of amplification Aagc. When the power of the input signal Pjnto zero, with the condition 0<ot<l, then the amplification Aagc tends to infinity. This feature is not desirable, because in this case there is a large and unnecessary amplification of very weak signals, which most often represent noise. Therefore, the relation (5) is modified by adding the term 5<P> to the denominator, and we get here:

Za ekstremno slabe signale, relacijom (6) se pojačanje ograničava na vrednost: For extremely weak signals, relation (6) limits the gain to the value:

Takođe, relacijom (6) prenosna karakteristika 304,K3,se modifikuje u karakteristiku 305,K4(slika 4). Na karakteristiciK4uočavaju se tri karakteristične zone. Zona 1, kada je nivo ulaznog signala mnogo manji od vrednostiSPnom.U tom slučaju je pojačanjeAagckonstantno prema relaciji (7). Očigledno je da ulazni signal ne treba da se nalazi u toj zoni, jer je tu pojačanje konstantno te nema delovanja AGC. Zona 2 predstavlja optimalnu radnu oblast. Za slabe signale pojačanje je limitirano naAagcmaxdok se na delu okoLop,primenjuje kompresija dinamike signala definisana konstantom a. U zoni 3 je konstantan stepen kompresije ulaznog signala koji je definisan konstantom kompresije a. Also, by relation (6), the transfer characteristic 304,K3 is modified into the characteristic 305,K4 (Figure 4). Three characteristic zones can be observed on the characteristic K4. Zone 1, when the level of the input signal is much lower than the value SPnom. In that case, the gain Aagc is constant according to relation (7). It is obvious that the input signal should not be in that zone, because there the gain is constant and there is no effect of AGC. Zone 2 represents the optimal working area. For weak signals, the amplification is limited to Aagcmax, while the compression of signal dynamics defined by the constant a is applied to the part around Lop. In zone 3, there is a constant degree of compression of the input signal, which is defined by the compression constant a.

Da bi ulazni signal doveli u jednu od zona 1, 2 ili 3, ulazni signal treba pojačati pre AGC bloka fiksnim pojačanjem, što karakteristiku 301,Ki,slika 3, pomera u položaj 306,K' i.In order to bring the input signal to one of zones 1, 2 or 3, the input signal should be amplified before the AGC block with a fixed gain, which moves the characteristic 301,Ki, figure 3, to the position 306,K' i.

Na slici 4 prikazan je blok dijagram AGC sistema. Njegov zadatak je: (1) da pojača slabe govorne signale a da oslabi previše jake signale prema unapred zadatoj karakteristici kompresije dinamike signala, (2) da na delovima ulaznog signala gde je prisutan samo eho signala, stacionaran šum ili konkurentni govomik-smetnja, smanji pojačanje kako bi se ove smetnje dovoljno utišale i (3) da utiša delove ulaznog signala gde su jednovremeno prisutni i koristan govorni signal i smetnje, a da pri tome očuva razumljivost govora. Figure 4 shows the block diagram of the AGC system. Its task is: (1) to amplify weak speech signals and to weaken excessively strong signals according to a predetermined characteristic of compression of signal dynamics, (2) to reduce the gain on parts of the input signal where there is only an echo of the signal, stationary noise or competing speech-interference, in order to silence these interferences sufficiently and (3) to silence parts of the input signal where both a useful speech signal and interference are simultaneously present, while preserving speech intelligibility.

Ulazni signalsin,koji je jednak signalusnriz bloka 205 sa slike 2, se množi u bloku 401 sa pojačanjemAagciz bloka 402, i tako se dobija izlazni signalsout- sagc-PojačanjeAagcje prema relaciji (6) određeno sa četiri parametra: Pnom- je fiksan parametar kojim se definiše željeni nivo izlaznog signalasout=sAgc, 8 -je parametar koji ograničava pojačanje ulaznog signalasin = sNRna vrednostA„ gcmax(na primer: zaS=0.001 maksimalno moguće pojačanje jeAagcmax =31.6, jednačina (7)), P[ n - snaga signala na ulazu u kompresor i a-konstanta kompresije koja definiše nagib karakteristike kompresije (slika 3). Relacija (6) se dalje modifikuje u smislu da je uvedena smena oznaka, pa je konstantaa = nagib,i ona je složena funkcija vršne snage korisnog govornog signalaPd, i umestoP^ nuvodi se estimacija snage ulaznog signalaPin,koja sadrži koristan signal aktuelnog govornika i sve signale smetnji. Ovom drugom smenom uvedena je funkcionalna zavisnostAagcod signala smetnji, što se pokazalo veoma korisnim sa aspekta očuvanja kvaliteta izlaznog govornog signala. Sada relacija (6) postaje: The input signal sin, which is equal to the signal of block 205 from Figure 2, is multiplied in block 401 with the amplification of Aagciz of block 402, and thus the output signalout-sagc-AmplificationAagcje according to the relation (6) is determined by four parameters: Pnom- is a fixed parameter that defines the desired level of the output signalasout=sAgc, 8 is a parameter that limits the amplification of the input signalasin = sNR valueA„ gcmax (for example: for S=0.001 the maximum possible amplification is Aagcmax = 31.6, equation (7)), P[ n - the signal power at the input to the compressor and a-compression constant that defines the slope of the compression characteristic (Figure 3). input power estimation signalPin, which contains the useful signal of the current speaker and all interference signals. With this second shift, the functional dependence of the Aagcod interference signal was introduced, which proved to be very useful from the aspect of preserving the quality of the output speech signal. Now relation (6) becomes:

Estimacija nagiba karakteristike kompresijenagibse vrši u bloku 403, dok se estimacija ulazne snagePinvrši u bloku 404. Oba ova bloka zahtevaju estimaciju vršne snage korisnog govornog signalaPdkoja se određuje na sledeći način. Najpre se u bloku 405 određuje estimacija trenutne snage aktuelnog govornika iz odabranog pravca mikrofonskog niza, prema slici 2, na sledeći način: gde su:C' f = [c, f c2 f ■ ■ ■ cN f ]težinski koeficijenti superdirektivnog prostornog filtra SD-BF,X = [ sAEa sAEC2 •••<s>AECN]',/min<i>fimxsu minimalna i maksimalna frekvencija u DFT spektru analiziranog govornog signala, respektivno. Zatim se u bloku 406 vrši procena vršne snage signala iz odabranog pravcaPd ( t)na sledeći način: The estimation of the slope of the compression characteristic is performed in block 403, while the estimation of the input power P is performed in block 404. Both of these blocks require an estimation of the peak power of the useful speech signal Pd which is determined as follows. First, in block 405, the estimation of the current power of the current speaker from the selected direction of the microphone array is determined, according to Figure 2, as follows: where: C' f = [c, f c2 f ■ ■ ■ cN f ]weighting coefficients of the superdirective spatial filter SD-BF,X = [ sAEa sAEC2 •••<s>AECN]',/min<i>fimx are the minimum and maximum frequency in the DFT spectrum of the analyzed speech signals, respectively. Then, in block 406, the peak power of the signal from the selected direction Pd (t) is evaluated as follows:

gde je:n- aktuelni blok obrade signala ic?- konstanta po vrednosti blizu 1. where: n- the current block of signal processing and ic?- a constant close to 1.

U blok 404 kao informacija ulazi podatak, slika 4, o estimacija trenutne snage aktuelnog govornikaPd(relacija (9)). Druga informacija jeste estimacija snage nepotisnutog ehaPeh0,koja se određuje na sledeći način: In block 404, as information, the data, figure 4, about the estimation of the current power of the current speaker Pd (relation (9)) enters. Another piece of information is the estimate of the power of the unsuppressed echo Peh0, which is determined as follows:

gde je:y- estimirani signal eha dobijen u bloku 202, slika 2, dok jeaeho -konstanta potiskivanja eho signala. where:y is the estimated echo signal obtained in block 202, Figure 2, while isecho is the echo suppression constant.

Treća informacija koja ulazi u blok 404 jeste gruba procena snage difuznog šumaPn.Ova snaga se dobija obradom izlaznih signalasAecidosAecniz bloka 202, blok AEC slika 2, i ulaznog signalasin= sNRu blok AGC. Naime, signalisAEcidosAEcnne sadrže akustički eho koji je potisnut u bloku 202, odnosno sadrže aktuelni govorni signal i sve ostale signale smetnji. Ako se od usrednjene snage signalasAEcidosAEcnoduzme estimirana snaga aktuelnog govornikaPdostaje gruba procena snage svih ostalih smetnji u snimanom ambijentu, pre svega difuznog šuma. Ovo se može izraziti sledećom relacijom: gde jePusrednjena snaga na izlazu iz bloka AEC i ona se određuje na sledeći način: The third piece of information that enters block 404 is a rough estimate of the diffuse noise power Pn. This power is obtained by processing the output signals of block 202, block AEC of Figure 2, and the input signal of block AGC. Namely, the signalisAEcidosAEcnne contain the acoustic echo that is suppressed in block 202, that is, they contain the current speech signal and all other interference signals. If the estimated power of the current speaker is taken from the average power of the signal, a rough estimate of the power of all other disturbances in the recorded environment, primarily diffuse noise, is obtained. This can be expressed by the following relation: where is the average power at the output of the AEC block and it is determined as follows:

Dakle, izlaz iz bloka 404 jeste estimacija snagePinkoja pored vršne snage govornog signala sadrži i estimacije snaga smetnji u ambijentu snimanja prema relaciji: Therefore, the output from block 404 is a power estimate, which, in addition to the peak power of the speech signal, also contains estimates of the interference power in the recording environment according to the relation:

U bloku 403 vrši se estimacija nagiba karakteristike kompresije i izračunavanje veličinenagib.Postupak izračunavanja prikazanje na slici 5. In block 403, the slope of the compression characteristic is estimated and the slope is calculated. The calculation procedure is shown in Figure 5.

Analizom trajektorije vršne snage govornog signala izračunavaju se dva parametra: trend rasta, u bloku 501, i konveksnost trajektorije snage, u bloku 502, u cilju „meke odluke" o prisustvu govora u ulaznom signalu, prema relacijama: kon veksnost—max { kon veksnost_,0 } (17) Zatim se izračunava složena promenljivagama( n),u bloku 503, sledećim postupkom: By analyzing the trajectory of the peak power of the speech signal, two parameters are calculated: the growth trend, in block 501, and the convexity of the power trajectory, in block 502, in order to make a "soft decision" about the presence of speech in the input signal, according to the relations: convexity—max { convexity_,0 } (17) Then, the complex of variables (n) is calculated, in block 503, with the following procedure:

Konačno, veličinanagib,koja predstavlja stepen kompresije dinamike signala i ulazi u relaciju (8), izračunava se u bloku 505 na osnovu prethodno izračunate veličinegama_i veličinenagibmaxkoja predstavlja zadatu maksimalnu vrednosti nagiba. Finally, the magnitude slope, which represents the degree of compression of the signal dynamics and enters relation (8), is calculated in block 505 on the basis of the previously calculated magnitude gamma_ and the magnitude slopemax, which represents the given maximum value of the slope.

Veličinanagibse računa relacijom The magnitude of the slope is calculated by the relation

Granični vrednosti veličinenagib_su: The limit values of the slope_size are:

Time su određene sve veličine ujednačim (8) za izračunavanjepojačanja Aagc.Thus, all quantities are determined by the equation (8) for calculating the gain Aagc.

Sada se vidi uloga veličinePinu izračunavanjuAagc,jednačina (8). Ukoliko je veći nivo snage zaostalog eho signala ili nivo difuznog ambijentalnog šuma, koji se ne može poništiti već samo redukovati, posredstvom AGC kontrole realizuje se manje pojačanje u odnosu na slučaj kada ne bi bilo prisutnih smetnji. Sa druge strane, ako u ulaznom signalu s,„ =s^ rnema govornog signala već samo rezidualnog šuma nakon redukcije šuma u NR bloku, blok 205 slika 2, tadaP^težiminimalnoj vrednosti kao i sve veličine u (15) do (21), što prouzrokuje smanjenjeAagcna minimalnu vrednost, odnosnoA—»1. Now we can see the role of size in the calculation of Aagc, equation (8). If the power level of the residual echo signal or the level of diffuse ambient noise, which cannot be canceled but only reduced, is higher, a lower amplification is realized by means of the AGC control compared to the case where there would be no interference. On the other hand, if in the input signal s,„ =s^ there is no speech signal but only residual noise after the noise reduction in the NR block, block 205 of Figure 2, then P^ tends to the minimum value as well as all quantities in (15) to (21), which causes a decrease in Aagcna minimum value, that is, A—»1.

U ovom pronalasku opisan je postupak obrade akustičkih i govornih signala u sistemu slobodne govorne komunikacije koji funkcioniše u punom dupleksu, a koji se odnosi na automatsku kontrolu izlaznog nivoa govornog signala. Specifičnost ovog pronalaska jeste njegova primena u sistemima na bazi mikrofonskih nizova, njegova potpuna realizacija u frekvencijskom domenu i mogućnost njegove primene i u drugim komunikacionim sistemima kao što su video-telefonski sistemi, telekonferencijski sistemi, spikerfoni u prostoriji ili kolima, komunikacija čovek-računar putem glasa, i td. This invention describes the process of processing acoustic and speech signals in a free speech communication system that functions in full duplex, and which refers to the automatic control of the output level of the speech signal. The specificity of this invention is its application in systems based on microphone arrays, its complete realization in the frequency domain and the possibility of its application in other communication systems such as video-telephone systems, teleconference systems, speakerphones in rooms or cars, human-computer communication by voice, etc.

Postupci i tehnike obrade akustičkih i govornih signala u ovom pronalasku su nezavisni od broja mikrofona u mikrofonskom nizu, slika 2, i nalaze se pod kontrolom većeg broja parametara koji omogućavaju optimizaciju rešenja za različite aplikacije. The procedures and techniques for processing acoustic and speech signals in this invention are independent of the number of microphones in the microphone array, Figure 2, and are under the control of a number of parameters that enable the optimization of solutions for different applications.

Postupci i tehnike obrade akustičkih i govornih signala u ovom pronalasku mogu se implementirati na različite načine. Na primer, ove tehnike mogu biti implementirane u hardveru, softveru ili kombinovano. U hardverskoj implementaciji mogu se koristiti specifična integrisana kola (ASIC), procesori za digitalnu obradu signala (DSP), programabilna logička kola (PLD ili FPGA) i druga elektronska kola projektovana tako da mogu izvršiti opisane funkcije u ovom pronalasku. U softverskoj implementaciji programski kodovi mogu biti memorisani u memorijskim jedinicama i izvršavani pomoću procesora kao što su PC, PDA, DSP, itd. The acoustic and speech signal processing methods and techniques of the present invention can be implemented in a variety of ways. For example, these techniques can be implemented in hardware, software, or a combination. A hardware implementation may use specific integrated circuits (ASICs), digital signal processors (DSPs), programmable logic circuits (PLDs or FPGAs), and other electronic circuits designed to perform the functions described in this invention. In a software implementation, program codes can be stored in memory units and executed by a processor such as a PC, PDA, DSP, etc.

Na slici 6 prikazani su interfejsi AGC sistema kao nezavisnog hardversko/softverskog rešenja 600. Pored ulaza 601 i izlaza 602, za ulazni govorni signal i izlazni obrađeni govorni signal, ovo AGC rešenje ima i tri kontrolna ulaza 603, 604 i 605, preko kojih se u sistem uvode parametrinagibmax, Pnomi8,respektivno, koji definišu optimalan rad AGC sistema. Pored toga, AGC sistem sadrži i (N + 1) ulaza kao interfejs sa sistemom na bazi mikrofonskog niza od N mikrofona, namenjenog za slobodnu („hands-free") govornu komunikaciju u specifičnim telekonferencijskim uslovima primene. Figure 6 shows the interfaces of the AGC system as an independent hardware/software solution 600. In addition to the input 601 and output 602, for the input speech signal and the output processed speech signal, this AGC solution also has three control inputs 603, 604 and 605, through which the parameters are introduced into the system, Pnomi8, respectively, which define the optimal operation of the AGC system. In addition, the AGC system contains (N + 1) inputs as an interface with a system based on a microphone array of N microphones, intended for hands-free voice communication in specific teleconferencing application conditions.

Detalji ovog pronalaska opisani ovde omogućavaju bilo kom stručnjaku u ovoj oblasti da generičke principe ovog pronalaska može implementirati u drugim sistemima za slobodnu govornu komunikaciju čime se ne izlazi iz okvira ovog pronalaska. The details of the invention described herein enable any person skilled in the art to implement the generic principles of the invention in other speech communication systems without departing from the scope of the invention.

Claims (18)

1. Postupak za automatsku regulaciju pojačanja (AGC) na osnovu očitavanja mikrofonskog nizakarakterisan time,što sadrži: adaptivno određivanje koeficijenta pojačanja, koji neposredno upravlja automatskom kontrolom pojačanja izlaznog signala i održava njegov nivo u definisanim granicama; adaptivnu estimaciju nagiba karakteristike kompresije pomoću koje se komprimuje ulazni signal; adaptivnu estimaciju vršne snage ulaznog signala, koja je osnovni ulazni parametar za estimaciju nagiba karakteristike kompresije i koeficijenta pojačanja; adaptivnu estimaciju snaga signala smetnji na bazi očitavanja mikrofonskog niza u sistemu slobodne govorne komunikacije.1. A procedure for automatic gain control (AGC) based on the reading of the microphone array, characterized by: adaptive determination of the gain coefficient, which directly controls automatic control of the amplification of the output signal and maintains its level within defined limits; adaptive estimation of the slope of the compression characteristic by which it is compressed input signal; adaptive estimation of the peak power of the input signal, which is the basic input parameter for estimating the slope of the compression characteristic and the amplification coefficient; adaptive estimation of interference signal strength based on the readings of the microphone array in system of free speech communication. 2. Postupak prema zahtevu 1karakterisan time,što se kompletna obrada svih audio signala vrši u frekvencijskom domenu.2. The procedure according to claim 1, characterized by the fact that the complete processing of all audio signals is performed in the frequency domain. 3. Postupak prema zahtevu 1karakterisan time,što se koeficijent pojačanja ulaznog signala određuje adaptivno u zavisnosti od snage ulaznog govornog signala i snaga signala smetnji.3. The method according to claim 1, characterized by the fact that the coefficient of amplification of the input signal is determined adaptively depending on the power of the input speech signal and the power of the interference signal. 4. Postupak prema zahtevu 3karakterisan time,što se koeficijent pojačanja ulaznog signala kontroliše pomoću dve konstante: nominalne izlazne snage na koju se želi podesiti izlazni signal i koeficijenta koji ograničava maksimalno pojačanje AGC sistema.4. The procedure according to claim 3, characterized by the fact that the input signal amplification coefficient is controlled by two constants: the nominal output power to which the output signal is to be adjusted and the coefficient that limits the maximum amplification of the AGC system. 5. Postupak prema zahtevu 3karakterisan time,što se koeficijent pojačanja ulaznog signala određuje na bazi adaptivne estimacije parametra nagiba karakteristike kompresije pomoću koje se komprimuje ulazni signal.5. The method according to claim 3, characterized by the fact that the amplification coefficient of the input signal is determined based on the adaptive estimation of the slope parameter of the compression characteristic by means of which the input signal is compressed. 6. Postupak prema zahtevu 1karakterisan time,što se koeficijent nagiba karakteristike kompresije estimira na bazi: dve predefinisane konstante koje ograničavaju zonu delovanja AGC a čijim izborom se može optimizirati funkcija AGC u različitim aplikacijama, i na bazi adaptivne estimacije vršne snage ulaznog signala koja neposredno određuje aktuelnu vrednost koeficijenta pojačanja.6. The procedure according to claim 1, characterized by the fact that the slope coefficient of the compression characteristic is estimated on the basis of: two predefined constants that limit the zone of action of AGC and whose by choosing, the AGC function can be optimized in various applications, and based on the adaptive estimation of the peak power of the input signal, which directly determines the current value of the gain coefficient. 7. Postupak prema zahtevu 6karakterisan time,što se koeficijent nagiba karakteristike kompresije estimira na bazi dve predefinisane konstante: nominalne izlazne snage na koju se želi podesiti izlazni signal i koeficijenta koji definiše maksimalno dozvoljenu vrednost nagiba karakteristike kompresije.7. The procedure according to claim 6, characterized by the fact that the slope coefficient of the compression characteristic is estimated on the basis of two predefined constants: the nominal output power to which the output signal is to be set and the coefficient that defines the maximum allowed value of the slope of the compression characteristic. 8. Postupak prema zahtevu 6karakterisan time,što se koeficijent nagiba karakteristike kompresije adaptivno estimira na bazi adaptivne estimacije vršne snage ulaznog signala i određivanja njene trajektorije u vremenu.8. The method according to claim 6, characterized by the fact that the slope coefficient of the compression characteristic is adaptively estimated based on the adaptive estimation of the peak power of the input signal and the determination of its trajectory in time. 9. Postupak prema zahtevu 8karakterisan time,što se koeficijent nagiba karakteristike kompresije određuje na bazi veličine trenda rasta trajektorije vršne snage ulaznog signala, adaptivno računatog na nekoliko prethodnih blokova analize vršne snage ulaznog signala.9. The method according to claim 8, characterized by the fact that the slope coefficient of the compression characteristic is determined based on the size of the growth trend of the trajectory of the peak power of the input signal, adaptively calculated on several previous blocks of analysis of the peak power of the input signal. 10. Postupak prema zahtevu 8karakterisan time,što se koeficijent nagiba karakteristike kompresije određuje na bazi veličine konveksnosti trajektorije vršne snage ulaznog signala, adaptivno računate na nekoliko prethodnih blokova analize vršne snage ulaznog signala.10. The method according to claim 8, characterized by the fact that the slope coefficient of the compression characteristic is determined based on the convexity of the trajectory of the peak power of the input signal, adaptively calculated on several previous blocks of analysis of the peak power of the input signal. 11. Postupak prema zahtevu 8karakterisan time,što se koeficijent nagiba karakteristike kompresije određuje na bazi „meke odluke" o prisustvu govora u ulaznom signalu koristeći parametre trenda rasta i konveksnosti trajektorije vršne snage ulaznog signala.11. The method according to claim 8, characterized by the fact that the slope coefficient of the compression characteristic is determined on the basis of a "soft decision" about the presence of speech in the input signal using the parameters of the growth trend and the convexity of the trajectory of the peak power of the input signal. 12. Postupak prema zahtevu 1karakterisan time,što se estimacija vršne snage ulaznog signala određuje adaptivno na takav način da se pri rastu snage usvaja trenutna estimacija snage, dok se pri opadanju snage koristi rekurzija sa određenom vremenskom konstantom.12. The method according to claim 1, characterized by the fact that the estimation of the peak power of the input signal is determined adaptively in such a way that when the power increases, the current power estimation is adopted, while when the power decreases, recursion with a certain time constant is used. 13. Postupak prema zahtevu 1karakterisan time,što se vrši estimacija snage nepotisnutog eha, na bazi estimacije snage eho signala u delu sistema slobodne govorne komunikacije koji vrši potiskivanje eha i primene procenjenog faktora potiskivanja eha do ulaza u AGC, a koja je potrebna za određivanje koeficijenta pojačanja AGC.13. The procedure according to claim 1, characterized by the estimation of the strength of the unsuppressed echo, based on the estimation of the strength of the echo signal in the part of the free speech communication system that performs echo suppression and the application of the estimated echo suppression factor to the input to the AGC, which is required to determine the AGC amplification coefficient. 14. Postupak prema zahtevu 1 karakterisan time, što se vrši gruba procena snage difuznog šuma, u akustičkom ambijentu u kome mikrofonski niz snima, na bazi razlike estimirane srednje snage mikrofonskih signala, u kojima je izvršeno potiskivanje eha, i estimirane trenutne snage aktuelnog govornog signala, a koja je potrebna za određivanje koeficijenta pojačanja AGC.14. The method according to claim 1, characterized by the fact that a rough estimate of the power of the diffuse noise is made, in the acoustic environment in which the microphone array records, based on the difference between the estimated mean power of the microphone signals, in which echo suppression was performed, and the estimated current power of the current speech signal, which is needed to determine the AGC amplification coefficient. 15. Sistem za automatsku regulaciju pojačanja (AGC) na osnovu očitavanja mikrofonskog niza karakterisan time, što sadrži: blok za adaptivno određivanje koeficijenta pojačanja, koji neposredno upravlja automatskom kontrolom pojačanja izlaznog signala i održava njegov nivo u definisanim granicama; blok za adaptivnu estimaciju nagiba karakteristike kompresije pomoću koje se komprimuje ulazni signal; blok za adaptivnu estimaciju vršne snage ulaznog signala, koja je osnovni ulazni parametar za estimaciju nagiba karakteristike kompresije i koeficijenta pojačanja; blok za adaptivnu estimaciju snaga signala smetnji na bazi očitavanja mikrofonskog niza u sistemu slobodne govorne komunikacije.15. A system for automatic gain control (AGC) based on the reading of the microphone array, characterized by: a block for adaptive determination of the gain coefficient, which directly controls automatic control of the amplification of the output signal and maintains its level within defined limits; block for adaptive estimation of the slope of the compression characteristics by means of which compresses the input signal; block for adaptive estimation of the peak power of the input signal, which is the basic input parameter for estimating the slope of the compression characteristic and the amplification coefficient; block for adaptive estimation of interference signal strengths based on readings of the microphone array in the system of free speech communication. 16. Sistem prema zahtevu 15 karakterisan time, što je njegovo funkcionisanje optimizirano za primenu u sistemima slobodne govorne komunikacije na bazi mikrofonskih nizova u kojima je uključeno potiskivanje akustičkog eha (AEC) i lociranje govornika u prostoru na bazi usmerene karakteristike (BF) mikrofonskog niza, pri čemu broj mikrofona u nizu nije ograničavajući faktor.16. The system according to claim 15, characterized by the fact that its functioning is optimized for use in free speech communication systems based on microphone arrays in which acoustic echo suppression (AEC) and locating the speaker in space based on the directional characteristic (BF) of the microphone array are included, whereby the number of microphones in the array is not a limiting factor. 17. Sistem prema zahtevu 15 karakterisan time, što sadrži jedan ulaz i jedan izlaz za aktuelni govorni signal i više ulaza na koje se dovode signali iz sistemima slobodne govorne komunikacije a koji omogućavaju estimaciju svih akustičkih smetnji u ambijentu snimanja mikrofonskog niza.17. The system according to claim 15, characterized by the fact that it contains one input and one output for the current speech signal and several inputs to which signals from free speech communication systems are fed and which enable the estimation of all acoustic disturbances in the environment of recording the microphone array. 18. Sistem prema zahtevu 15 karakterisan time, što sadrži tri ulaza na kojima se postavljaju tri konstante koje defmišu polje aktivnog funkcionisanja AGC, a to su: nominalna snaga izlaznog signala, dinamika promene pojačanja AGC sistema i maksimalni nagib karakteristike kompresije.18. The system according to claim 15, characterized by the fact that it contains three inputs where three constants are set that define the field of active operation of the AGC, namely: the nominal power of the output signal, the dynamics of the gain change of the AGC system and the maximum slope of the compression characteristic.
RSP-2006/0611A 2006-11-03 2006-11-03 PROCEDURE AND SYSTEM FOR AUTOMATIC AMPLIFIER CONTROL (AGC) BASED ON MICROPHONE LINE READING RS49857B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
RSP-2006/0611A RS49857B (en) 2006-11-03 2006-11-03 PROCEDURE AND SYSTEM FOR AUTOMATIC AMPLIFIER CONTROL (AGC) BASED ON MICROPHONE LINE READING

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
RSP-2006/0611A RS49857B (en) 2006-11-03 2006-11-03 PROCEDURE AND SYSTEM FOR AUTOMATIC AMPLIFIER CONTROL (AGC) BASED ON MICROPHONE LINE READING

Publications (2)

Publication Number Publication Date
RS20060611A true RS20060611A (en) 2007-06-04
RS49857B RS49857B (en) 2008-08-07

Family

ID=43646403

Family Applications (1)

Application Number Title Priority Date Filing Date
RSP-2006/0611A RS49857B (en) 2006-11-03 2006-11-03 PROCEDURE AND SYSTEM FOR AUTOMATIC AMPLIFIER CONTROL (AGC) BASED ON MICROPHONE LINE READING

Country Status (1)

Country Link
RS (1) RS49857B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10923132B2 (en) 2016-02-19 2021-02-16 Dolby Laboratories Licensing Corporation Diffusivity based sound processing method and apparatus

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10923132B2 (en) 2016-02-19 2021-02-16 Dolby Laboratories Licensing Corporation Diffusivity based sound processing method and apparatus

Also Published As

Publication number Publication date
RS49857B (en) 2008-08-07

Similar Documents

Publication Publication Date Title
KR102352928B1 (en) Dual microphone voice processing for headsets with variable microphone array orientation
US10079026B1 (en) Spatially-controlled noise reduction for headsets with variable microphone array orientation
WO2008041878A2 (en) System and procedure of hands free speech communication using a microphone array
US9443532B2 (en) Noise reduction using direction-of-arrival information
US10250975B1 (en) Adaptive directional audio enhancement and selection
US7724891B2 (en) Method to reduce acoustic coupling in audio conferencing systems
US9392353B2 (en) Headset interview mode
KR20200009035A (en) Correlation Based Near Field Detector
KR20080059147A (en) Robust separation of speech signals in a noisy environment
GB2491173A (en) Setting gain applied to an audio signal based on direction of arrival (DOA) information
JP3332143B2 (en) Sound pickup method and device
US20140254825A1 (en) Feedback canceling system and method
EP4156711B1 (en) Audio device with dual beamforming
RS20060611A (en) Technique and system for automatic gain control (agc) using microphone array
US20230101635A1 (en) Audio device with distractor attenuator
CN212013003U (en) Pickup and sound amplification system
EP4156183B1 (en) Audio device with a plurality of attenuators
US12200448B2 (en) Audio device with microphone sensitivity compensator
JP3332144B2 (en) Target sound source area detection method and apparatus
CN116390005A (en) Wireless multi-microphone hearing aid method, hearing aid, and computer-readable storage medium
Goldin Long Range Noise Canceling Microphone
RS49859B (en) SYSTEM AND PROCEDURE FOR LOCATION OF SPEAKERS BY MICROPHONE