[go: up one dir, main page]

USRE50627E1 - Wired and wireless microphone arrays - Google Patents

Wired and wireless microphone arrays

Info

Publication number
USRE50627E1
USRE50627E1 US16/402,088 US201916402088A USRE50627E US RE50627 E1 USRE50627 E1 US RE50627E1 US 201916402088 A US201916402088 A US 201916402088A US RE50627 E USRE50627 E US RE50627E
Authority
US
United States
Prior art keywords
signal
audio
microphone
speech
noise
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active, expires
Application number
US16/402,088
Inventor
Jacob G. Appelbaum
Paul Wilkinson Dent
Leonid Krasny
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Advanced Technology Development Inc
Original Assignee
Advanced Technology Development Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Advanced Technology Development Inc filed Critical Advanced Technology Development Inc
Priority to US16/402,088 priority Critical patent/USRE50627E1/en
Application granted granted Critical
Publication of USRE50627E1 publication Critical patent/USRE50627E1/en
Active legal-status Critical Current
Adjusted expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R3/00Circuits for transducers, loudspeakers or microphones
    • H04R3/002Damping circuit arrangements for transducers, e.g. motional feedback circuits
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R29/00Monitoring arrangements; Testing arrangements
    • H04R29/004Monitoring arrangements; Testing arrangements for microphones
    • H04R29/005Microphone arrays
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R3/00Circuits for transducers, loudspeakers or microphones
    • H04R3/005Circuits for transducers, loudspeakers or microphones for combining the signals of two or more microphones
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R2201/00Details of transducers, loudspeakers or microphones covered by H04R1/00 but not provided for in any of its subgroups
    • H04R2201/40Details of arrangements for obtaining desired directional characteristic by combining a number of identical transducers covered by H04R1/40 but not provided for in any of its subgroups
    • H04R2201/4012D or 3D arrays of transducers

Definitions

  • the present invention relates to improving the signal to acoustic background noise ratio for voice or other audio signals picked up by acoustic transducers.
  • Noise-canceling microphones are a known type of prior art transducer used to improve signal to background noise ratio.
  • the prior art noise canceling microphone operates by pressure difference, wherein the wanted source, for example the mouth of a human speaker, is much closer to the microphone than more distant noise sources, and therefore the acoustic pressure difference from the front to the back of the microphone is small for the distant sources but large for the nearby source. Therefore a microphone which operates on the pressure difference between front and back can discriminate in favor of nearby sources. Two microphones, one at the front and one at the back may be used, with their outputs being subtracted.
  • noise canceling microphone requires very close proximity (e.g. 1′′) to the wanted source.
  • distance from front to back of the microphone which may be 1′′ for example, causes phase shifts at higher frequencies that result in loss of discrimination at frequencies above 1 KHz
  • the prior art contains examples of using arrays of microphones, the outputs of which are digitized to feed separately into a digital signal processor which can combine the signals using more complex algorithms.
  • U.S. Pat. No. 6,738,481 to present inventor Krasny et al and filed Jan. 10, 2001 describes such a system, which in one implementation divides the audio frequency range into many narrow sub-bands and performs optimum noise reduction for each sub-band.
  • the microphones are located close together. However, if the microphones have a spacing less than half an acoustic wavelength (6′′ at 1 KHz) the effectiveness of the array processing is reduced. Even just two microphones spaced 6′′ apart however implies a large device; larger, for example, than a modern mobile phone. (b) If widely spaced microphones are used, then the clutter and unreliability of extra cables becomes a nuisance.
  • a noise reduction system which uses incidental microphones that are often present in particular applications, but which, in the prior art, are not normally activated at the same time as a principal microphone, or which, if left in an active state, do not in the prior art provide signals that are jointly processed with the signals from a principal microphone.
  • incidental microphones are activated to provide signals that are processed jointly with signals from one or more principal microphones to effect noise reduction, thereby making better use of existing resources such as microphones and their signal connections to processing resources.
  • an array of at least two microphones provides signals to a digital signal processing unit, which performs adaptive noise cancellation, at least one of the microphones providing its output signal to the signal processing unit using a short-range wireless link.
  • the short-range wireless link may be an optical or infra-red link; a radio link using for example a Bluetooth® (a short-range, ad-hoc, wireless network protocol and communication standard) or other suitable radio device; an inductive loop magnetic method with or without a frequency translation; an electrostatic method with or without frequency translation, or an ultrasonic link (frequency translation implied).
  • the wireless link digitizes the audio signal from its associated microphone or microphones using a high-quality analog-to-digital encoding technique, and transmits the signal digitally using error correction coding if necessary to assure unimpaired reception at the signal processor.
  • the signal processor digitizes the signals from any analog microphone sources not already digitized and then jointly processes the digital audio signals using algorithms to enhance the ratio of wanted signals to unwanted signals.
  • the wanted signal may be a single signal, while the noise may comprise a multitude of unwanted acoustic sources.
  • the noise may comprise a multitude of unwanted acoustic sources.
  • there may be multiple wanted signal sources, that may or may not be active at the same time, as well as multiple unwanted noise sources.
  • the invention comprises a mobile phone having an own, internal microphone and used in conjunction with a Bluetooth headset, the signals from the Bluetooth headset being processed jointly with the signals from the mobile phone's own internal microphone to enhance the ratio of the wanted speaker's voice to background noise without introducing additional microphones or cables.
  • participants in the same room and in audio conference with participants at another location are equipped with Bluetooth or similar wireless microphones, the signals from which are received at a signal processor and jointly processed with signals from any other microphones to enhance the signal to background noise ratio for at least one speaker's voice.
  • FIG. 7 illustrates multiple microphones, wherein a currently active speaker is indicated by activation of a Push-To-Talk (PTT) pressel.
  • PTT Push-To-Talk
  • the 64 kb/s CVSD speech (or other form of digitally encoded speech) received via Bluetooth from microphone 1 (110) is transcoded if necessary to provide a first PCM audio signal, while the audio signal from microphone 2 ( 130 ) of mobile phone ( 120 ) is encoded to a second PCM audio signal.
  • the two PCM audio signals are then jointly processed by digital signal processing in mobile phone ( 120 ), using algorithms to be described, in order to enhance the ratio of the wanted audio signal to background noise.
  • One basic principle that can be used for signal-to-noise-ratio enhancement is to divide each audio source signal into its constituent narrowband spectral components, such that the channel through which each spectral component is received may be described by a simple attenuation and phase factor, that is by a complex number. Noise arriving from different locations than the wanted signal has different attenuation and phase factors, so that it is possible to find complex multiplicative combining factors for weighted combining of the two source signals such as to favor the wanted signal and disfavor the noise.
  • the optimum combining factors may thus be chosen independently for each frequency component of the wanted signal.
  • FIG. 2 illustrates receipt of a signal S from speaker ( 200 ) at a first microphone ( 210 ) via a channel with impulse response h1(t).
  • the received signal is thus S convolved with h1(t), written S*h1(t).
  • n1 To this is added a first noise signal n1.
  • the speaker's voice S is received via a second microphone ( 220 ) through a second channel h2(t), with additive noise n2.
  • n1 is the result of receiving n through a 3rd channel h3(t) while n2 is the result of receiving n through a 4th channel h4(t).
  • Convolution can be replaced by polynomial multiplication when dealing with sampled signals, leading to the matrix equation
  • IIR factors that represent unstable, exponentially rising impulse responses become stable factors if applied to the signal using time-reverse processing, that is the audio samples are processed in time reversed order by accepting a delayed output so that future samples are used to correct the current sample.
  • More information on inverting matrices of impulse response polynomials may be found in U.S. Pat. No. 6,996,380 to Dent, filed Jul. 26 2001, which is hereby incorporated by reference herein.
  • the channel polynomials h 1 (z) . . . h 4 (z) must be determined.
  • this method is only useful when the number of independent noise sources is relatively small, and lower than the number of microphone signals being jointly processed.
  • the noise has a more diffuse character, other methods to be described are more appropriate.
  • FIG. 3 illustrates a situation comprising more than two microphones.
  • a number of collaborating speakers for example co-workers in a noisy factory, each have wireless headsets 300 (a), 300 (b) . . . etc., as well as potentially a unit, that could be clipped to belt, that can itself have an inbuilt microphone.
  • the number of microphone signals available for joint signal processing can be as many as two times the number of collaborators.
  • the signal processing may have fewer than the total number of signals available for joint processing.
  • the belt-worn unit 310 (a) may process signals only from headset 1 ( 300 (a)) and microphone 2 ( 320 (a)) to cancel noise prior to transmission to the other collaborators' belt-worn wireless units such as unit 310 (b).
  • unit 310 (b) can now further process the signal received from the first collaborator jointly with audio signals received from his local microphone 320 (b) and the microphone of headset 300 (b) to further reduce noise that was correlated with the remaining noise from the first collaborator.
  • FIG. 7 depicts, for example, an aircraft having a pilot and co-pilot, each equipped with a headset 700(a), 700(b) comprising earphones and microphone. Press-to-talk is generally used in such situations to prevent leaving a microphone in the “live” state which, in the prior art, would amplify ambient noise and feed it through to all crew headsets, causing annoyance.
  • a microphone may be left in the active state collecting audio signals 722(a), 722(b) without necessarily passing those signals directly through to crew headsets 700(a), 700(b).
  • the audio signals 722(a), 722(b) are processed together with the audio signal 722(a), 722(b) from the principal microphone, which in this example would be the microphone associated with an activated press-to-talk switch, 710(a), 710(b), as indicated by associated pressel switch signals 724(a), 724(b), in order to enhance the signal to noise ratio of the wanted signal from the principal microphone.
  • the microphone and its associated microphone amplifier are left in the active state whether the pressel switch 710(a), 710(b) is activated or not; the output however not being simply passed through to the headsets 700(a), 700(b) or communications system, but rather being jointly processed with the signal designated to be the wanted signal.
  • a signal may for example be designated to be the wanted signal by determining which pressel switch or switches 710(a), 710(b) are pressed, their associated microphones then being designated to be the principal microphones and the persons pressing the associated pressel switches 710(a), 710(b) are assumed to be desirous of being heard.
  • the audio signals 722(a), 722(b) of the active speakers desirous of being heard are passed from the microphones designated as the principal microphones to the signal processing unit 720 where those signals are now processed jointly with signals 722(a), 722(b) from other microphones that, according to the invention, are placed in an active state whether their associated pressel switches 710(a), 710(b) are depressed or not.
  • the second implementation is categorized in general by jointly processing the output of one or more microphones that are associated with a wanted speaker or audio sources together with the output of one or more microphones normally associated with a different speaker or audio source.
  • the term “normally associated with” reflects the meaning that that microphone is so positioned as to favor the audio source that would be heard best from that position, whether or not an audio source is present and active at that position at any particular instant.
  • a microphone attached to the personal headset of a particular person is associated with that person and not normally associated with a different person.
  • the microphone normally associated with one person or location can be useful to enhance the signal noise ratio of the signal from the principal microphone, which is the microphone associated with the current active speaker, audio source, or location.
  • the audio signal from all four microphones could be transmitted using a two-channel duplex link between the two collaborators whose belt-worn units ( 320 (a) and 320 (b) respectively would jointly process all four signals in order to enhance the ratio of the other speaker's voice to background noise.
  • the audio signals from the one or two microphones each of a multiplicity of collaborators could be transmitted to a central radio base station nearby in the same location, which would jointly process all signals to enhance the signal to noise ratio for each speaker and then return the processed signal of the speaker deemed to be currently active to all parties via a return radio link.
  • a radio set would differ considerably from the prior art, as it may be transmitting audio from its associated microphone substantially all the time, whether the pressel switch was pressed or not, the state of the pressel switch, if one is provided, being signaled independently over the radio channel to indicate that the speaker is desirous of being heard.
  • the receiving system Upon the receiving system detecting via the signaling that a pressel switch has been activated, the receiving system designates the microphone of the remote unit with the activated pressel switch to be a principal microphone, and passes an indication to the signal processing to jointly process all received microphone signals in order reduce the noise noise on the the audio signal received from the principal microphone.
  • VAD Voice Activity Detection
  • a conference comprises a panel of speakers on stage, whose voices may be picked up by a number of fixed microphones as well as individual wireless “lapel mikes”, and in addition one or more members of the audience may have lapel mikes or be passed a roaming microphone to ask questions.
  • all microphone signals are conveyed by wire or wirelessly to central processing unit 420 which processes the signals jointly in order to enhance the signal to background noise ratio of any desired speaker.
  • FIG. 5 A further example of scenarios amenable to the current invention is shown in FIG. 5 .
  • a number of participants in a teleconference are sitting around a speakerphone in a conference room.
  • Each may have a laptop with an audio headset, and the laptops may be networked to a central server, either by cable or by WiFi.
  • Bluetooth headsets convey audio to and from the laptop and the laptop passes the audio on via the network to a server.
  • the Bluetooth headsets communicate audio directly to a multiple-Bluetooth-equipped speakerphone.
  • a headset wired into a laptop uses the laptop's built-in Bluetooth or WiFi to convey audio to the speakerphone, equipped likewise.
  • the speakerphone may also comprise a number of fixed microphones that are arranged around the conference table.
  • the speakerphone may receive all microphone signals, either by wire, Bluetooth, WiFi or by a wired (Ethernet) connection to a server, or any combination of the above, and process the signals jointly.
  • the speakerphone may just convey the outputs of its microphones to a server which also receives the signals from the participants microphones, and the joint processing may be carried out by software in the server, the server returning the noise-reduced signals to the speakerphone and/or the participants.
  • a single user having a single laptop may be making a call or participating in a conference.
  • the Skype program may exist on the laptop, which is a well known program allowing a computer to place Voice-over-IP (VoIP) calls over the Internet.
  • VoIP Voice-over-IP
  • the laptop or computer's own microphone may be supplemented by a Bluetooth headset, the audio from both being jointly pre-processed in the computer by a software program configured according to the invention in order to enhance the speech to background noise ratio in noisy environments.
  • a duplex teleconference can be considered to comprise two separate, interconnected systems, either or both of which can employ a separate instance of the current invention.
  • speech activity detection can be used to determine the principal active speaker as opposed to reliance upon a press-to-talk switch.
  • the noise reduction can be applied without waiting for a decision from the activity detector. Noise reduction can be applied with the assumption that a given speaker is active simultaneously for every hypothesis of which speaker is active to obtain noise-reduced signals for all speakers ready and waiting to be selected for broadcast.
  • press-to-talk switch states In the example of aircraft or tank crew, a hard selection mechanism determined by press-to-talk switch states was described.
  • press-to-talk switches provides the simplest method of source selection.
  • other method of source identification can be used. For example, when all potential sources are pre-separated, and available and waiting for selection as just described, a soft-selection mechanism can then be employed, where the gain for a speaker deemed to have become the principally active speaker is ramped up from zero over a period of 50 milliseconds for example, and the gain for a speaker deemed to have become inactive is ramped down over a similar period, in order to avoid the unpleasant clicks of a hard selection.
  • the determination of a speaker becoming active or inactive can be made on the relative strength of the signals, or change thereof.
  • Other techniques known in the art as voice activity detection (VAD) can be used to discriminate sources that contain wanted speech from sources that contain non-speech sounds.
  • VAD voice activity detection
  • U.S. Pat. No. 6,381,570 describes using adaptive energy thresholds for discriminating between speech and noise
  • US patent application publication nos. 2010/0057453 and 20090076814 describe the performance of more complex feature extraction to make a speech/no-speech decision.
  • the fact that the spectrum of speech switches regularly between voiced and unvoiced sounds may be used as a feature to discriminate speech from background noise.
  • hysteresis and time delays can be employed to ensure that, once selected, a speaker remains selected for at least a period of the order of one or two seconds before being ramped off if no further activity is detected meantime.
  • the microphone positions are arbitrary relative to each other.
  • Many prior art array processing algorithms while assuming arbitrary positions for the noise and signal sources, are nevertheless designed for arrays having fixed relative microphone positions.
  • the current invention is designed for a microphone antenna array where the elements of the array are placed arbitrarily, and may even be changing.
  • the input signals observed at the output of the microphones are represented by u 1 (n) and u 2 (n) etc, i.e., u i (n) is output sample n of the i-th microphone.
  • the algorithm first decomposes each signal u 1 (n) and u 2 (n), etc etc. into a set of narrowband constituent components using a windowed FFT. Overlapping blocks of signals are processed, and the overlap of the windowing function adds to unity to ensure each sample is given equal gain to the final output.
  • it can be a smoothed Hanning window:
  • w ⁇ ( n ) ⁇ sin 2 ( ⁇ ⁇ n / ( N 0 - N 1 ) ) , n ⁇ [ 0 , ( N 0 - N 1 ) / 2 - 1 ] 1 , n ⁇ [ [ 0 , ] ⁇ ( N 0 - N 1 ) / 2 , ( N 0 + N 1 ) / 2 - 1 ] sin 2 ( ⁇ ⁇ ( n - N 0 + 1 ) / ( N 0 - N 1 ) ) , n ⁇ [ ( N 0 + N 1 ) / 2 , ( N 0 - 1 ) ] . ( 2 )
  • the FFT is described by:
  • the VAD operations are:
  • the frequency responses for microphones 1 and 2 are calculated by means of:
  • the output signal is then calculated from:
  • H w ( k ) max ⁇ ⁇ H w ⁇ 0 , 1 - ⁇ ⁇ N ( k , q ) ⁇ ⁇ SN ( k , q ) ⁇ , ( 15 )
  • time domain output samples are computed from:
  • Equation 4 The VAD described in Section Equation 4 is modified in a straightforward way, by indexing the summation over all N microphones.
  • Eq.(4) is modified as:
  • Matrix ⁇ circumflex over (K) ⁇ ip ⁇ 1 (k,q) in Eq. (19) is an estimate of the inverse noise spatial correlation matrix at the q-th frame.
  • Equation 7 9 For the case of N microphones, instead of an estimation of the noise spatial correlation matrix in Equation 7 9 a direct estimation of the inverse noise spatial correlation matrix ⁇ circumflex over (K) ⁇ ip ⁇ 1 (k,q) based on RLS algorithm is used, which is modified for processing in the frequency domain according to equation (20) below:
  • Equation 12 the Array Processing Output in Frequency Domain (Equation 12) is modified in a straightforward way, by indexing the summation over all microphones.
  • Equation (12) is modified to obtain equation (22) below:
  • the field u (t, r i ) is a superposition of the signals from M sound sources and background noise.
  • the Fourier transform U ( ⁇ , r i ) of the field u(t, r i ) received by the i-th array element has the form:
  • constraint ( 26 25) represents the degree of degradation of the desired signals and permits the combination of various frequency bins at the space-time processing output with a priori desired amplitude and phase distortion.
  • H ( ⁇ , r i ) arg ⁇ ⁇ min ⁇ g o ⁇ u ⁇ t N ( ⁇ ) ⁇ ( 28 ) subject to the constraint ( 26 25), where
  • the optimization problem (31)-(32) may be solved by using M Lagrange coefficients Wm ( ⁇ ) to adjoin the constraints (31) to a new goal functional
  • the algorithm (42) describes the multichannel system which consists of M spatial channels ⁇ U 1 ( ⁇ ), . . . , U M ( ⁇ ) ⁇ .
  • the frequency responses H( ⁇ ; r i , R m ) of the filters at the each of these channels are matched with the spatial structure of the signal from the m-th user and the background noise and satisfy the system of equations (37).
  • the array processing in the m-th spatial channel is optimized to detect signal from the m-th user against the background noise.
  • the output voltages of the M spatial channels are accumulated with the weighting functions ⁇ W 1 ( ⁇ ), . . . , W M ( ⁇ ) ⁇ , which satisfy the system of equations (38).
  • the frequency responses of the filters H ( ⁇ ; ri, R1 R 1 ) at the first channel are matched with the spatial coordinates R1 R 1 of the desired signal source and the frequency responses of the filters H ( ⁇ ; ri, R2 R 2 ) at the second channel are matched with the spatial coordinates R 2 R 2 of the second signal source.
  • the signal U 2 ( ⁇ ) is weighted by a function ⁇ 12( ⁇ )/ ⁇ 22( ⁇ ) ⁇ 12 ( ⁇ )/ ⁇ 22 ( ⁇ ) and subtracted from the signal U1( ⁇ ) U 1 ( ⁇ ). This algorithm separates signals from two sources and produces the output signal U out ( ⁇ ) where the signal from the second source is completely suppressed.

Landscapes

  • Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Acoustics & Sound (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Otolaryngology (AREA)
  • Circuit For Audible Band Transducer (AREA)
  • Telephone Function (AREA)

Abstract

An acoustic noise canceling microphone arrangement and processor that uses a principal microphone and other microphones that may be incidentally or deliberately located in the vicinity of the principal microphone in order to derive an audio signal of enhanced signal-to-background noise ratio. In one implementation, the principal and incidental microphones comprise the microphone built into a mobile phone and the microphone built into a Bluetooth headset.

Description

Priority for the subject matter herein is claimed from U.S. Provisional Patent Application No. 61/690,019 filed 18 Jun. 2012.
BACKGROUND
The present invention relates to improving the signal to acoustic background noise ratio for voice or other audio signals picked up by acoustic transducers.
Noise-canceling microphones are a known type of prior art transducer used to improve signal to background noise ratio. The prior art noise canceling microphone operates by pressure difference, wherein the wanted source, for example the mouth of a human speaker, is much closer to the microphone than more distant noise sources, and therefore the acoustic pressure difference from the front to the back of the microphone is small for the distant sources but large for the nearby source. Therefore a microphone which operates on the pressure difference between front and back can discriminate in favor of nearby sources. Two microphones, one at the front and one at the back may be used, with their outputs being subtracted.
One disadvantage of the prior art noise canceling microphone is that it requires very close proximity (e.g. 1″) to the wanted source. Another disadvantage is that the distance from front to back of the microphone, which may be 1″ for example, causes phase shifts at higher frequencies that result in loss of discrimination at frequencies above 1 KHz
As an improvement over the noise canceling microphone, the prior art contains examples of using arrays of microphones, the outputs of which are digitized to feed separately into a digital signal processor which can combine the signals using more complex algorithms. For example, U.S. Pat. No. 6,738,481 to present inventor Krasny et al and filed Jan. 10, 2001 describes such a system, which in one implementation divides the audio frequency range into many narrow sub-bands and performs optimum noise reduction for each sub-band.
The dilemma with arrays of microphones in the prior art however is that either of the following is usually true: (a)
To avoid the clutter of multiple microphone cables, the microphones are located close together. However, if the microphones have a spacing less than half an acoustic wavelength (6″ at 1 KHz) the effectiveness of the array processing is reduced. Even just two microphones spaced 6″ apart however implies a large device; larger, for example, than a modern mobile phone. (b) If widely spaced microphones are used, then the clutter and unreliability of extra cables becomes a nuisance.
Thus there is need for methods and devices that overcome the main disadvantages of the need either for extra microphones or a for multitude of extra cables in the prior art outlined above.
SUMMARY
A noise reduction system is provided which uses incidental microphones that are often present in particular applications, but which, in the prior art, are not normally activated at the same time as a principal microphone, or which, if left in an active state, do not in the prior art provide signals that are jointly processed with the signals from a principal microphone. According to the invention, such incidental microphones are activated to provide signals that are processed jointly with signals from one or more principal microphones to effect noise reduction, thereby making better use of existing resources such as microphones and their signal connections to processing resources.
In a first implementation, an array of at least two microphones provides signals to a digital signal processing unit, which performs adaptive noise cancellation, at least one of the microphones providing its output signal to the signal processing unit using a short-range wireless link. The short-range wireless link may be an optical or infra-red link; a radio link using for example a Bluetooth® (a short-range, ad-hoc, wireless network protocol and communication standard) or other suitable radio device; an inductive loop magnetic method with or without a frequency translation; an electrostatic method with or without frequency translation, or an ultrasonic link (frequency translation implied). Preferably, the wireless link digitizes the audio signal from its associated microphone or microphones using a high-quality analog-to-digital encoding technique, and transmits the signal digitally using error correction coding if necessary to assure unimpaired reception at the signal processor.
The signal processor digitizes the signals from any analog microphone sources not already digitized and then jointly processes the digital audio signals using algorithms to enhance the ratio of wanted signals to unwanted signals.
In some applications, the wanted signal may be a single signal, while the noise may comprise a multitude of unwanted acoustic sources. In other applications to be described, there may be multiple wanted signal sources, that may or may not be active at the same time, as well as multiple unwanted noise sources.
In an exemplary first implementation, the invention comprises a mobile phone having an own, internal microphone and used in conjunction with a Bluetooth headset, the signals from the Bluetooth headset being processed jointly with the signals from the mobile phone's own internal microphone to enhance the ratio of the wanted speaker's voice to background noise without introducing additional microphones or cables.
In another exemplary first implementation, participants in the same room and in audio conference with participants at another location are equipped with Bluetooth or similar wireless microphones, the signals from which are received at a signal processor and jointly processed with signals from any other microphones to enhance the signal to background noise ratio for at least one speaker's voice.
Other similar situations arise where multiple microphones exist but where joint processing was not previously considered in the prior art, and can constitute a second implementation of the invention. For example, in an aircraft, the pilot and co-pilot and potentially other crew members already have microphones, and thus by jointly processing the outputs of pilot's and copilot's microphones a reduction in noise can be obtained without the encumbrance of additional microphones or leads. Other applications of this second implementation can be envisaged, for example in army tanks that have multiple crew members already equipped with microphones, or operations in a noisy work environment where co-workers are equipped with duplex headsets for communication; film crews having cameras equipped with boom mikes as well as the crews themselves being equipped with two-way headsets; conferences in which on-stage speakers have individual microphones and audience participants have additional microphones, and so-on. The invention may be employed to enhance signal quality in such scenarios by jointly processing the signals from the multiplicity of microphones that already exist for such applications.
BRIEF DESCRIPTION OF THE DRAWINGS
FIG. 1 illustrates a two-microphone situation comprising a mobile phone and a Bluetooth headset.
FIG. 2 illustrates the sampling of a speakers voice by two microphones connected to a joint processing unit.
FIG. 3 illustrates a multiple microphone situation when multiple parties collaborate in close proximity.
FIG. 4 illustrates multiple microphones available at a conference having individual microphones for on stage and off-stage participants as well as fixed microphones.
FIG. 5 illustrates multiple microphones available during a teleconference using a speakerphone and individual wireless headsets.
FIG. 6 is a flow diagram of a method of improving the signal to noise ratio of an audio signal.
FIG. 7 illustrates multiple microphones, wherein a currently active speaker is indicated by activation of a Push-To-Talk (PTT) pressel.
DETAILED DESCRIPTION
During wireless telephone communications the speech signal is often corrupted by environmental noise, which degrades the performance of speech coding or speech recognition algorithms. It is essential to reduce the noise level without distorting the original speech signal.
One conventional approach to solve this problem is a single-microphone noise reduction technique, which utilizes differences in the spectral characteristics of the speech signal and the background noise. It is hampered by the fact that in many situations the speech and the noise tend to have similar spectral distributions. Under these conditions, the single-microphone noise reduction technique will not yield substantial improvement in speech intelligibility. Another approach tried in the prior art was the use of microphone arrays, which however encounter the disadvantages described above in the background section.
Wireless headsets are often used with mobile phones when both hands are needed for other functions, such as driving. Such headsets are self contained, comprising earphone, microphone, short range radio link using the Bluetooth standard and battery. In the prior art, the mobile phone takes the audio input for transmission either from its internal microphone or from the signal received from the Bluetooth headset. By contrast, in the situation illustrated in FIG. 1 , a mobile phone (120) according to the current invention receives an audio signal both from microphone 1 (110) of the Bluetooth headset (100) via the Bluetooth short range radio link and from microphone 2 (130) of mobile phone (120) and jointly processes both signals in the audio processing section of Mobile terminal (120) in order to enhance the ratio of the wanted audio signal from the speaker to unwanted background noise, thereby improving communication intelligibility in noisy environments without additional microphones or cables. The mobile phone and Bluetooth headset are merely exemplary and not restrictive. For example, another implementation in the same category would be the use of laptop having its own microphone and having a wireless connection to another microphone, such as a Bluetooth headset.
Most mobile phones and laptops of today are already equipped with Bluetooth short range radio links. Bluetooth digitizes all signals and may transmit voice or data or both. Voice is typically converted to 64 kilobits per second continuously variable delta modulation, also known as CVSD for short, and Bluetooth can support 64 kilobits transmission in both directions simultaneously for duplex telephone voice. Upon reception, the 64 kb/s CVSD is first transcoded to 16-bit PCM at 8 or 16 kilosamples per second and then may be further transcoded by a lower bitrate speech encoder for transmission over a digital cellular channel, or else converted to an analog waveform to drive a local speaker or earpiece.
According to this first implementation of the invention, the 64 kb/s CVSD speech (or other form of digitally encoded speech) received via Bluetooth from microphone 1 (110) is transcoded if necessary to provide a first PCM audio signal, while the audio signal from microphone 2 (130) of mobile phone (120) is encoded to a second PCM audio signal. The two PCM audio signals are then jointly processed by digital signal processing in mobile phone (120), using algorithms to be described, in order to enhance the ratio of the wanted audio signal to background noise.
One basic principle that can be used for signal-to-noise-ratio enhancement is to divide each audio source signal into its constituent narrowband spectral components, such that the channel through which each spectral component is received may be described by a simple attenuation and phase factor, that is by a complex number. Noise arriving from different locations than the wanted signal has different attenuation and phase factors, so that it is possible to find complex multiplicative combining factors for weighted combining of the two source signals such as to favor the wanted signal and disfavor the noise. The optimum combining factors may thus be chosen independently for each frequency component of the wanted signal.
It is also possible to perform noise cancellation or reduction by time domain processing. FIG. 2 illustrates receipt of a signal S from speaker (200) at a first microphone (210) via a channel with impulse response h1(t). The received signal is thus S convolved with h1(t), written S*h1(t). To this is added a first noise signal n1. Likewise the speaker's voice S is received via a second microphone (220) through a second channel h2(t), with additive noise n2. If there is a single noise source causing n1 and n2, then n1 is the result of receiving n through a 3rd channel h3(t) while n2 is the result of receiving n through a 4th channel h4(t). Convolution can be replaced by polynomial multiplication when dealing with sampled signals, leading to the matrix equation
( u1 ( z ) u 2 ( z ) ) = ( h 1 ( z ) h 3 ( z ) h 2 ( z ) h 4 ( z ) ) ( s n ) Equation A
The above matrix of polynomials may be inverted by the usual matrix inversion formula Adjoint/Determinant to completely eliminate the noise, giving:
S=[h1(z)·u1(z)−h3(z)·u2(z)]/[h1(z)·h4(z)−h2(z)·h3(z)]  Equation B
The numerator in equation B is simply a Finite Impulse Response (FIR) filter which is always stable. The denominator represents an Infinite Impulse Response (IIR) filter which may not be stable. However, omission of the IIR denominator is simply equivalent to passing the speech signal through an FIR filter with the same coefficients, and just alters the frequency response of the speech in a way that is no different from other acoustic effects of the environment. If desired, stable IIR factors that represent rapidly decaying impulse responses can be left in the denominator of the right hand side of equation B. Also, IIR factors that represent unstable, exponentially rising impulse responses become stable factors if applied to the signal using time-reverse processing, that is the audio samples are processed in time reversed order by accepting a delayed output so that future samples are used to correct the current sample. More information on inverting matrices of impulse response polynomials may be found in U.S. Pat. No. 6,996,380 to Dent, filed Jul. 26 2001, which is hereby incorporated by reference herein.
In order to perform the matrix inversion described above, the channel polynomials h1(z) . . . h4(z) must be determined. However, this method is only useful when the number of independent noise sources is relatively small, and lower than the number of microphone signals being jointly processed. When the noise has a more diffuse character, other methods to be described are more appropriate.
FIG. 3 illustrates a situation comprising more than two microphones. A number of collaborating speakers, for example co-workers in a noisy factory, each have wireless headsets 300(a), 300(b) . . . etc., as well as potentially a unit, that could be clipped to belt, that can itself have an inbuilt microphone. Thus the number of microphone signals available for joint signal processing can be as many as two times the number of collaborators. Depending on the system configuration, the signal processing may have fewer than the total number of signals available for joint processing. For example, if no central station or base station is involved, the belt-worn unit 310(a) may process signals only from headset 1 (300(a)) and microphone 2 (320(a)) to cancel noise prior to transmission to the other collaborators' belt-worn wireless units such as unit 310(b). However, unit 310(b) can now further process the signal received from the first collaborator jointly with audio signals received from his local microphone 320(b) and the microphone of headset 300(b) to further reduce noise that was correlated with the remaining noise from the first collaborator.
One difference between the current invention and prior equipment is that microphones associated with other than the current speaker may remain in an active state in order to enhance noise suppression. Consider FIG. 7 depicts, for example, an aircraft having a pilot and co-pilot, each equipped with a headset 700(a), 700(b) comprising earphones and microphone. Press-to-talk is generally used in such situations to prevent leaving a microphone in the “live” state which, in the prior art, would amplify ambient noise and feed it through to all crew headsets, causing annoyance. However, it may be realized that a microphone may be left in the active state collecting audio signals 722(a), 722(b) without necessarily passing those signals directly through to crew headsets 700(a), 700(b). Thus, according to the invention, the audio signals 722(a), 722(b) are processed together with the audio signal 722(a), 722(b) from the principal microphone, which in this example would be the microphone associated with an activated press-to-talk switch, 710(a), 710(b), as indicated by associated pressel switch signals 724(a), 724(b), in order to enhance the signal to noise ratio of the wanted signal from the principal microphone. In a second implementation of the invention therefore, the microphone and its associated microphone amplifier are left in the active state whether the pressel switch 710(a), 710(b) is activated or not; the output however not being simply passed through to the headsets 700(a), 700(b) or communications system, but rather being jointly processed with the signal designated to be the wanted signal. A signal may for example be designated to be the wanted signal by determining which pressel switch or switches 710(a), 710(b) are pressed, their associated microphones then being designated to be the principal microphones and the persons pressing the associated pressel switches 710(a), 710(b) are assumed to be desirous of being heard. The audio signals 722(a), 722(b) of the active speakers desirous of being heard are passed from the microphones designated as the principal microphones to the signal processing unit 720 where those signals are now processed jointly with signals 722(a), 722(b) from other microphones that, according to the invention, are placed in an active state whether their associated pressel switches 710(a), 710(b) are depressed or not. After joint processing to suppress background noise corrupting the wanted signal, the noise-reduced signal 726 is then routed to crew earphones or other communications equipment such as ground-to-air radio. Similar situations arise in combat vehicles such as army tanks for example. An army tank may have several crew members, including commander, gunner, loader and driver, each equipped with a press-to-talk headset. In the prior art, no microphone output was provided unless the associated pressel switch was operated. In the current invention, all microphones are made electrically available all the time, the operation of a pressel switch merely indicating which speaker is desirous of being heard. The output of the associated microphone is then jointly processed with the output of at least one other microphone to enhance signal-to-noise ratio before passing the signal 726 on to the headset earpieces through intercom equipment or to radio equipment.
Thus the second implementation is categorized in general by jointly processing the output of one or more microphones that are associated with a wanted speaker or audio sources together with the output of one or more microphones normally associated with a different speaker or audio source. The term “normally associated with” reflects the meaning that that microphone is so positioned as to favor the audio source that would be heard best from that position, whether or not an audio source is present and active at that position at any particular instant. Clearly, a microphone attached to the personal headset of a particular person is associated with that person and not normally associated with a different person. Nevertheless, according to the invention, the microphone normally associated with one person or location can be useful to enhance the signal noise ratio of the signal from the principal microphone, which is the microphone associated with the current active speaker, audio source, or location.
In another system configuration, in the case of two collaborators each having a main and auxiliary microphone, the audio signal from all four microphones could be transmitted using a two-channel duplex link between the two collaborators whose belt-worn units (320(a) and 320(b) respectively would jointly process all four signals in order to enhance the ratio of the other speaker's voice to background noise.
In yet another system configuration, in order to reduce the complexity and power consumption of the belt-worn units, the audio signals from the one or two microphones each of a multiplicity of collaborators could be transmitted to a central radio base station nearby in the same location, which would jointly process all signals to enhance the signal to noise ratio for each speaker and then return the processed signal of the speaker deemed to be currently active to all parties via a return radio link. Such a radio set would differ considerably from the prior art, as it may be transmitting audio from its associated microphone substantially all the time, whether the pressel switch was pressed or not, the state of the pressel switch, if one is provided, being signaled independently over the radio channel to indicate that the speaker is desirous of being heard. Upon the receiving system detecting via the signaling that a pressel switch has been activated, the receiving system designates the microphone of the remote unit with the activated pressel switch to be a principal microphone, and passes an indication to the signal processing to jointly process all received microphone signals in order reduce the noise noise on the the audio signal received from the principal microphone. It may be realized that Voice Activity Detection (VAD) may be provided in lieu of a pressel switch for hands free operation of the remote unit.
A similar scenario to that just described is shown in FIG. 4 . A conference comprises a panel of speakers on stage, whose voices may be picked up by a number of fixed microphones as well as individual wireless “lapel mikes”, and in addition one or more members of the audience may have lapel mikes or be passed a roaming microphone to ask questions. Thus, just as in the scenario postulated in FIG. 3 , there are a number of microphone signals available to be jointly processed. In FIG. 4 , all microphone signals are conveyed by wire or wirelessly to central processing unit 420 which processes the signals jointly in order to enhance the signal to background noise ratio of any desired speaker.
In any of the implementations heretofore described, the joint processing may insert a number of samples additional delay in any digitized audio stream to roughly align all audio sources in time to compensate for the different delays of different methods of transporting the signals from each microphone to the common processing unit.
A further example of scenarios amenable to the current invention is shown in FIG. 5 . A number of participants in a teleconference are sitting around a speakerphone in a conference room. Each may have a laptop with an audio headset, and the laptops may be networked to a central server, either by cable or by WiFi. In one situation, Bluetooth headsets convey audio to and from the laptop and the laptop passes the audio on via the network to a server. In an alternative scenario the Bluetooth headsets communicate audio directly to a multiple-Bluetooth-equipped speakerphone. In yet another scenario, a headset wired into a laptop uses the laptop's built-in Bluetooth or WiFi to convey audio to the speakerphone, equipped likewise. The speakerphone may also comprise a number of fixed microphones that are arranged around the conference table. The speakerphone may receive all microphone signals, either by wire, Bluetooth, WiFi or by a wired (Ethernet) connection to a server, or any combination of the above, and process the signals jointly. Alternatively, the speakerphone may just convey the outputs of its microphones to a server which also receives the signals from the participants microphones, and the joint processing may be carried out by software in the server, the server returning the noise-reduced signals to the speakerphone and/or the participants.
In a degenerate case, a single user having a single laptop may be making a call or participating in a conference. For example, the Skype program may exist on the laptop, which is a well known program allowing a computer to place Voice-over-IP (VoIP) calls over the Internet. To implement the invention in this case, the laptop or computer's own microphone may be supplemented by a Bluetooth headset, the audio from both being jointly pre-processed in the computer by a software program configured according to the invention in order to enhance the speech to background noise ratio in noisy environments.
Ultimately, the noise-reduced signal of one or more speakers deemed to be the principal active speakers is conveyed in particular to the remote parties to the teleconference. A duplex teleconference can be considered to comprise two separate, interconnected systems, either or both of which can employ a separate instance of the current invention.
In any of the above situations where multiple potential speakers exist, speech activity detection can be used to determine the principal active speaker as opposed to reliance upon a press-to-talk switch. However, the noise reduction can be applied without waiting for a decision from the activity detector. Noise reduction can be applied with the assumption that a given speaker is active simultaneously for every hypothesis of which speaker is active to obtain noise-reduced signals for all speakers ready and waiting to be selected for broadcast.
In the example of aircraft or tank crew, a hard selection mechanism determined by press-to-talk switch states was described. The use of press-to-talk switches provides the simplest method of source selection. However, other method of source identification can be used. For example, when all potential sources are pre-separated, and available and waiting for selection as just described, a soft-selection mechanism can then be employed, where the gain for a speaker deemed to have become the principally active speaker is ramped up from zero over a period of 50 milliseconds for example, and the gain for a speaker deemed to have become inactive is ramped down over a similar period, in order to avoid the unpleasant clicks of a hard selection. The determination of a speaker becoming active or inactive can be made on the relative strength of the signals, or change thereof. Other techniques known in the art as voice activity detection (VAD) can be used to discriminate sources that contain wanted speech from sources that contain non-speech sounds.
For example, U.S. Pat. No. 6,381,570 describes using adaptive energy thresholds for discriminating between speech and noise, while US patent application publication nos. 2010/0057453 and 20090076814 describe the performance of more complex feature extraction to make a speech/no-speech decision. The fact that the spectrum of speech switches regularly between voiced and unvoiced sounds may be used as a feature to discriminate speech from background noise. Moreover, hysteresis and time delays can be employed to ensure that, once selected, a speaker remains selected for at least a period of the order of one or two seconds before being ramped off if no further activity is detected meantime.
In one embodiment, a simple source identification technique may be used when at least one of the microphones has access to a sampled signal with significantly higher signal to noise ratio than the other microphones. In that case, identification of the principal microphone is made based on relative energy, after compensation for any gain differences that may be learned in a set-up phase. These situations can arise in the scenario where a mobile phone sometimes has access to the microphone on the phone as well as a Bluetooth headset. In this case, the Bluetooth headset is situated close to the speaker's mouth and has higher signal-to-noise ratio for the wanted speech signal, while the microphone on the phone has better access to the noise environment.
One characteristic of all the scenarios mentioned above in both the summary and the description is that the microphone positions are arbitrary relative to each other. Many prior art array processing algorithms, while assuming arbitrary positions for the noise and signal sources, are nevertheless designed for arrays having fixed relative microphone positions. In contrast to that prior art, the current invention is designed for a microphone antenna array where the elements of the array are placed arbitrarily, and may even be changing.
Yet another distinction of the invention is that, in a general, multiple-user case, the noise-enhancing processor may have access, via Bluetooth, to multiple remote microphones, and can select to connect via Bluetooth any remote microphone to pair with the local microphone, depending on which remote microphone has best access to the noise desired to be suppressed. The Bluetooth standard, for example, describes procedures for pairing devices. The ability to pair two microphones in an ad-hoc manner may thus be used to suppress noise in the environment during recording of an acoustic signal, or transmitting it using a communication device. A processor may thus pair remote microphones with local microphones in an ad-hoc manner for best effect. For example, two unrelated mobile phone users may be waiting in a noisy environment such as an airport. One mobile phone user places or receives a call, and simultaneously activates its Bluetooth to perform “service discovery”, in order to identify another, nearby mobile phone that is willing to collaborate in noise reduction. The mobile phone engaged in a telephone call may then receive audio via Bluetooth from the other, collaborating mobile phone's microphone as well as its own built-in microphone, and jointly process the two signals in order to suppress background noise.
All of the implementations of the invention are characterized by the joint processing of signals from a principal microphone, which is a microphone normally associated with the currently active speaker, with signals from a microphone not normally associated with or used in the prior art for the currently active speaker, which may herein be referred to in general as an incidental microphone. The incidental microphone is located remotely from said principal microphone by several acoustic wavelengths at a mid-band audio frequency. The microphone in a mobile phone is an incidental microphone in the case where a Bluetooth headset is being used, as in that case, the mobile phone's own microphone is not in the prior art used for the speaker.
A more detailed description of the adaptive noise reduction algorithm now follows.
The input signals observed at the output of the microphones are represented by u1(n) and u2(n) etc, i.e., ui(n) is output sample n of the i-th microphone. The algorithm first decomposes each signal u1(n) and u2(n), etc etc. into a set of narrowband constituent components using a windowed FFT. Overlapping blocks of signals are processed, and the overlap of the windowing function adds to unity to ensure each sample is given equal gain to the final output. The frequency domain filtering technique is thus applied on a frame-block basis. In a mobile telephone, each frame typically contains N1=160 samples. The representation of the spectrum is effectively improved by the overlap increasing the FFT length. The FFT size used is N0=256 points. Therefore, the N1 samples of frame q are overlapped with the last (N0-N1) (N0-N1) samples of the previous frame (q−1). As a result, frame q of the microphone i has sampled signal
ui(n,q)≡ui(q·N1−N0+n),  (1)
where n=[0,NoN0−1] and i=[1,2].
The signals (1) are windowed using a suitable windowing function w(n).
For example, it can be a smoothed Hanning window:
w ( n ) = { sin 2 ( π n / ( N 0 - N 1 ) ) , n [ 0 , ( N 0 - N 1 ) / 2 - 1 ] 1 , n [ [ 0 , ] ( N 0 - N 1 ) / 2 , ( N 0 + N 1 ) / 2 - 1 ] sin 2 ( π ( n - N 0 + 1 ) / ( N 0 - N 1 ) ) , n [ ( N 0 + N 1 ) / 2 , ( N 0 - 1 ) ] . ( 2 )
The FFT is described by:
For k=[0,NoN0−1] and i=[1,2] calculate
U i ( k , q ) = n = 0 N 0 - 1 w ( n ) · u i ( n , q ) · exp ( - j 2 π kn / N 0 ) . ( 3 )
Voice activity detection (VAD) is used to distinguish between noise with speech present and noise without speech present. If the VAD output voltage UVAD(q) for the frame q exceeds some threshold Tr (UVAD(q)>Tr), the VAD makes a decision that the speech signal is present at the q-th frame. Otherwise, if UVAD(q) is less than some threshold Tr0 (UVAD(q)≤Tr0), the VAD makes decision that a speech signal is absent.
The VAD operations are:
(i) Beamforming in the frequency domain:
    • For k=[0,NoN0−1] calculate
Y ( k , q ) = 1 2 i = 1 2 U i ( k , q ) . ( 4 )
(ii) Estimation of the noise power spectral density (PSD) at the output of the beamformer (4):
{circumflex over (Φ)}N(k,q)=m·{circumflex over (Φ)}N(k,q−1)+(1−m)·|Y(k,q)|2  (5)
where m=[0.9,0.95] is a convergence factor.
(iii) VAD output:
U VAD ( q ) = 2 N 0 + 2 k = 0 N 0 / 2 | Y ( k , q ) | 2 Φ ˆ N ( k , q ) . ( 6 )
A signal correlation matrix is estimated for frame q using the following equations:
For k=[0,No N0−1] and i=[1,2] calculate
K ˆ i S ( k , q ) = { K ˆ i S ( k , q - 1 ) + U i ( k , q ) · U 1 * ( k , q ) , U VAD ( q ) > T r K ˆ i S ( k , q - 1 ) , U VAD ( q ) T r ( 7 )
445 [0064] One can see from Eq. (7) that if the VAD detects speech (UVAD(q)>Tr) at the frame q, the signal correlation matrix is updated. Otherwise, if (UVAD(q)<=Tr) the estimation of the signal correlation matrix is switched off.
The Green's function for frame q is estimated by the following:
For k=[0,NoN0−1] calculate
G ˆ i ( k , q ) = K ^ i S ( k , q ) K ^ i S ( k , q ) . ( 8 )
The Noise Spatial Correlation Matrix for frame q is estimated as follows:
    • For k=[0,NoN0−1], i=[1,2], and p=[1,2] calculate
K ^ ip S ( k , q ) = { m · K ^ ip ( k , q - 1 ) + U i ( k , q ) · U p * ( k , q ) , U VAD ( q ) T r 0 K ^ ip ( k , q - 1 ) , U VAD ( q ) > T r 0 ( 9 )
    • The initial matrix for Eq. (9) can be chosen as
      {circumflex over (K)}ip(k,0)=a·δip,
where a is a small constant (a=[0.0001,0.001]).
One can see from Eq. (9) that if VAD does not detect speech, i.e. (UVAD (q)<=Tr0) at the frame q, the noise correlation matrix is updated. Otherwise, if (UVAD (q)>Tr0), the estimation of the noise correlation matrix is switched off.
The frequency responses for microphones 1 and 2 are calculated by means of:
    • For k=[0,NoN0/2] calculate
      H1(k,q)={circumflex over (K)}22(k,q)−{circumflex over (K)}12(k;q)·Ĝ2(k,q)  (10)
      H2(k,q)={circumflex over (K)}11(k,q)·Ĝ2(k,q)−{circumflex over (K)}21(k,q)  (11)
The output signal, still in the frequency domain, is then calculated from:
    • For k=[0,NoN0/2] calculate
X q ( k ) = i = 1 2 U i ( k , q ) H i * ( k , q ) i = 1 2 G ^ i ( k , q ) H i * ( k , q ) . ( 12 )
    • For k=[NoN0/2+1,NoN0−1] calculate
      Xq(k)=[Xq(N0−k)]*.  (13)
After array processing, a PDS is calculated as follows:
Φ ˆ SN ( k , q ) ) = { m · Φ ˆ SN ( k , q ) - 1 ) + 1 - m ) · "\[LeftBracketingBar]" X q ( k ) "\[RightBracketingBar]" 2 , U VAD ( q ) > T r Φ ˆ SN ( k , q ) - 1 ) , U VAD ( q ) T r ( 14 )
The following Wiener filter is also used:
H w ( k ) = max { H w 0 , 1 - Φ ^ N ( k , q ) Φ ^ SN ( k , q ) } , ( 15 )
where Hwo=0.315 is a “floor” constant for the Wiener filter, and
Φ ^ N ( k , q ) = 1 i = 1 2 G ^ i ( k , q ) H i * ( k , q ) . ( 16 )
Finally, the time domain output samples are computed from:
For n=[0,NoN0−1] calculate inverse FFT as:
U O U T ( n ) = k = 0 N 0 - 1 X q ( k ) · H w ( k ) · exp ( j 2 π kn / N 0 ) . ( 17 )
To generalize the algorithm to jointly process more than two microphone signals, the algorithm is modified in the following ways:
The VAD described in Section Equation 4 is modified in a straightforward way, by indexing the summation over all N microphones. Thus, Eq.(4) is modified as:
Y ( k , q ) = 1 N i = 1 N U i ( k , q ) . ( 18 )
For the case of N microphones, the frequency response of the filter at the i-th microphone is calculated as equation (19) below:
H i ( k , q ) = p = 1 N K ^ ip - 1 ( k , q ) G ^ p ( k , q ) ( 19 )
Matrix {circumflex over (K)}ip −1 (k,q) in Eq. (19) is an estimate of the inverse noise spatial correlation matrix at the q-th frame.
For the case of N microphones, instead of an estimation of the noise spatial correlation matrix in Equation 7 9 a direct estimation of the inverse noise spatial correlation matrix {circumflex over (K)}ip −1 (k,q) based on RLS algorithm is used, which is modified for processing in the frequency domain according to equation (20) below:
K ˆ ip 1 ( k , q ) = 1 m · { K ˆ ip 1 ( k , q - 1 ) - D i ( k , q ) · D p * ( k , q ) m + 1 - 1 N D i ( k , q ) · U i * ( k , q ) } ( 20 )
The coefficients Di(k,q) in Eq.(20) are calculated from an estimate of the inverse noise spatial correlation matrix in the previous frame (q−1) and are given by equation 21 below:
D i ( k , q ) = p = 1 N K ^ ip - 1 ( k , q - 1 ) · U p ( k , q ) . ( 21 )
For the case of N microphones, the Array Processing Output in Frequency Domain (Equation 12) is modified in a straightforward way, by indexing the summation over all microphones. Thus, Eq. (12) is modified to obtain equation (22) below:
X q ( k ) = i = 1 N U i ( k , q ) H i * ( k , q ) i = 1 N G ^ i ( k , q ) H i * ( k , q ) , ( 22 )
The antenna array processing algorithm can be described by the following equation in a frequency domain:
U OUT = i = 1 N U ( ω , r i ) [ H * ] H * ( ω , r i ) , ( 23 )
where Uout (ω) and U (ω, ri) are respectively the Fourier transform of the an antenna processor output and the field u(t, ri) observed at the output of the i-th antenna element with the spatial coordinates ri, H (ω; ri) is the frequency response of the filter at the i-th antenna element.
We assume that the field u (t, ri) is a superposition of the signals from M sound sources and background noise. When a mixture of the signals and background noise are incident on the received antenna array, the Fourier transform U (ω, ri) of the field u(t, ri) received by the i-th array element has the form:
U ( ω , r 1 ) = m = 1 M S m ( ω ) · G ( ω , r i , R m ) + N ( ω , r i ) , ( 24 )
where Sm (ω) is the spectrum of the signal from the m-th sound source, G(ω, ri, Rm) is the Green function which describes propagation channel from the m-th sound source with the spatial coordinates Rm to the i-th antenna element, and N (ω, ri) is the Fourier transform of the noise field.
Based on this model, the problem is to synthesize a noise reduction space-time processing algorithm, the output of which gives the optimal estimates of the signals from the desired users.
We consider this optimization problem as one of minimizing the output noise spectral density subject to an equality constrain
S o u t ( ω ) = m = 1 M B m ( ω ) · S m ( ω ) ( 25 )
where Sout (ω) Sout(ω) is the spectrum of the signal after array processing, and
B1 (ω) B1(ω), . . . , BM (ω) BM(ω) are some arbitrary functions. The choice of these functions depends on our goal. For example, if we want to keep clear speech from all M users the functions B1 (ω) B1(ω), . . . , BM (ω) BM(ω) are chosen as
Bi(ω)Bi(ω)≡1,iϵ[1,M].  (26)
If the signal from some k-th sound source is unwanted and we would like to suppress its signal the functions B1 (ω) B1(ω), . . . , BM (ω) BM(ω) are chosen as
B i ( ω ) = 1 , if [ i = k ] i k , i [ 1 , M ] 0 , if i = k . ( 27 )
It is clear, that the constraint (26 25) represents the degree of degradation of the desired signals and permits the combination of various frequency bins at the space-time processing output with a priori desired amplitude and phase distortion.
According to our approach the optimal weighting functions H (ω, ri) are obtained as a solution of the variation problem
H ( ω , r i ) = arg { min g o u t N ( ω ) } ( 28 )
subject to the constraint (26 25), where
g out N f ( ω ) = i = 1 N k = 1 N [ gN ] g N ( ω , r i , r k ) [ H * ] H * ( ω , r i ) [ H * ( ω , r i ) ] H * ( ω , r k ) ( 29 )
is the noise spectral density after array processing (23), and gN (ω; ri, rk ri, rk) is the spatial correlation function of the noise field N (ω; ri ri).
It follows from Eq.(23) and Eq.(25) that the spectrum of the output signal has the form
S out ( ω ) = m = 1 M S m ( ω ) i = 1 N G ( ω , r i , R m ) [ H * ] H * ( ω , r i ) ( 30 )
Thus the constraint (26 25) must be equivalent to the M linear constraints:
i = 1 N G ( ω , r i , R m ) [ H * ] H * ( ω ; r i ) = B m ( ω ) , m = [ 1 , M ] . ( 31 )
Therefore, the optimal weighting functions H(ω, ri ri) in the algorithm (23) are obtained as a solution of the variation problem
H ( ω , r i ) = arg { i = 1 N k = 1 N min g N ( ω ; r i , r k ) [ H * ] H * ( ω , r i ) H _ * ( ω , r k ) } ( 32 )
subject to the M constraints (31).
The optimization problem (31)-(32) may be solved by using M Lagrange coefficients Wm (ω) to adjoin the constraints (31) to a new goal functional
J ( H ) i = 1 N k = 1 N g N ( ω ; r i , r k ) [ H * ] H * ( ω , r i ) H * ( ω , r k ) - m = 1 M [ W m ] W m ( ω ) [ j = 1 N G ( ω , r i , R m ) [ H * ] H * ( ω ; r i ) - B m ( ω ) ] ( 33 )
Minimization of this functional gives the following equations for the H(ω, ri ri):
H ( ω , r i ) = k = 1 N [ gN ] g N ( ω ; r i , r k ) H ( ω , r k ) m = 1 M W m ( ω ) G ( ω ; r i , R m ) ( 34 )
The solution of this system of equations can thus be presented in the form
H ( ω , r i ) = k = 1 M W m ( ω ) H ( ω ; r i , R m ) , ( 35 )
where the functions H(ω; ri, Rm) satisfy the following system of equations
i = 1 N g N ( ω ; r i , r k ) H ( ω ; r k , R m ) = G ( ω , r i , R m ) ( 36 )
To obtain the unknown Lagrange coefficients Wm Wm (ω) in Eq.(35) we substitute Eq.(35) into Eq.(31). As a result, we get
k = 1 M W k ( ω ) i = 1 N G ( ω ; r i , R m ) H * ( ω , r i , R k ) = B m ( ω ) ( 37 )
from which it can be seen that the Lagrange coefficients Wm (ω) satisfy the following system of equations:
k = 1 M Ψ m k ( ω ) W k ( ω ) = B m ( ω ) ( 38 ) where Ψ m k ( ω ) = i = 1 N G ( ω , r i , R m ) H * ( ω , r i , R k ) . ( 39 )
If there is just one user in the system then M=1 and from Eq.(38) we get:
W ( ω ) = B 1 ( ω ) [ + ] / i = 1 N G ( ω ; r i , R 1 ) H * ( ω , r i , R 1 ) ( 40 )
Substitution of this equation into Eq. (35) gives the optimal functions:
( 41 ) H ( ω , r i ) = [ B 1 ( ω ) [ × ] · H ( ω ; r i , R 1 ) ] / i = 1 N G ( ω , r i , R 1 ) [ × ] · H * ( ω ; r i , R 1 )
which was already obtained and thus disclosed in the above-mentioned '481 patent to present inventor Krasny et al, and which is now hereby incorporated by reference herein.
Substituting Eq.(35) into Eq.(23) we get the optimal space-time noise reduction algorithm as
U out ( ω ) = m = 1 M W m ( ω ) · U m ( ω ) where ( 42 ) U m ( ω ) = i = 1 N U ( ω ; r i ) H * ( ω , r i , R m ) ( 43 )
The algorithm (42) describes the multichannel system which consists of M spatial channels {U1(ω), . . . , UM(ω)}. The frequency responses H(ω; ri, Rm) of the filters at the each of these channels are matched with the spatial structure of the signal from the m-th user and the background noise and satisfy the system of equations (37). One can see that the array processing in the m-th spatial channel is optimized to detect signal from the m-th user against the background noise. The output voltages of the M spatial channels are accumulated with the weighting functions {W1(ω), . . . , WM(ω)}, which satisfy the system of equations (38).
An interesting interpretation of the optimal algorithm is to present the solution of the system (39) in the form
W m ( ω ) = k = 1 M Ψ mk - 1 ( ω ) · B k ( ω ) , ( 44 )
where Ψ 1 mk(ω) denotes the elements of the matrix Ψ 1 (ω), which is an inverse of the matrix Ψ(ω) with elements Ψmk (ω). Substituting Eq.(44) into Eq.(42) we get
U out ( ω ) = k = 1 M B k ( ω ) { m = 1 M Ψ mk ( ω ) i = 1 N U ( ω , r i ) H * ( ω , r i , R m ) } . ( 45 ) One can see that S k ( ω ) = m = 1 M Ψ mk - 1 ( ω ) i = 1 N U ( ω , r i ) H * ( ω , r i , R m ) ( 46 )
is the ML estimate of the signal spectrum Sk (ω) from the k-th user.
Therefore, the optimal algorithm estimates the signal spectrums from all users and accumulates these estimates with constraint functions Bk (ω), i.e.
U out ( ω ) = k = 1 M B k ( w ) · S k ( w ) . ( 47 )
As an example, Let Let us assume that there are two sound sources and we would like to keep the signal from the desired sound source and
    • suppress the signal from the second source. In this case we choose M=2. Therefore, the system consists of two spatial channels
U 1 ( ω ) = i = 1 N U ( ω , r i ) [ H * ] H * ( w ; r i , R 1 ) and U 2 ( ω ) = i = 1 N U ( ω , r i ) [ H * ] H * ( w ; r i , R 2 )
The frequency responses of the filters H (ω; ri, R1 R1) at the first channel are matched with the spatial coordinates R1 R1 of the desired signal source and the frequency responses of the filters H (ω; ri, R2 R2) at the second channel are matched with the spatial coordinates R2 R2 of the second signal source.
The functions B1(ω) B1(ω) and B2(ω) B2(ω) are chosen according to equations
B1(ω)=1,B2(ω)=0.  (48)
In this case the weighting functions W1(ω) and W2(ω) are described by the equations
W 1 ( ω ) = B 1 ( ω ) * Ψ 2 2 ( ω ) / D ( ω ) [ W 2 ] W 2 ( ω ) = B 1 ( ω ) * Ψ 1 2 ( ω ) / D ( ω ) where D ( ω ) = Ψ 1 1 ( ω ) · Ψ 2 2 ( ω ) - | Ψ 1 2 ( ω ) | 2
Therefore, the optimal algorithm has the form
[ U out ( ω ) ] U out ( ω ) = B 1 ( ω ) · Ψ 2 2 ( w ) / D ( ω ) * { [ U 1 ] U 1 ( ω ) - [ U 2 ] U 2 ( ω ) Ψ 1 2 ( w ) / Ψ 2 2 ( w ) } ( 49 )
According to Eq.(49) the optimal array processing uses two spatial channels: a signal channel U1(ω) U1(ω) representing the received speech signal from the desired 715 signal source and the compensation channel U2(ω) U2(ω) representing the signal U2(ω) U2(ω) from the second source.
The signal U2(ω) is weighted by a function Ψ12(ω)/Ψ22(ω) Ψ12(ω)/Ψ22(ω) and subtracted from the signal U1(ω) U1(ω). This algorithm separates signals from two sources and produces the output signal Uout (ω) where the signal from the second source is completely suppressed.
Thus it has been described above how many situations in which multiple microphones exist are not, in the prior art, benefiting from the potential of multiple microphone array processing, and thus may be improved by using the invention with the above-described adaptive signal processing.
A person of ordinary skill in the art based on the above teachings, may recognize additional scenarios in which acoustic transducers exist that are not today being employed for joint processing, and using the teachings herein can improve the performance in those scenarios by connecting the transducers in such a way that the just described above-described noise enhancement algorithms can be employed to advantage.

Claims (41)

We claim:
1. A system and apparatus for dynamically improving the ratio of a wanted speech signal from a principal speaker to random background noise that is not known or characterized a priori, comprising:
a principal microphone configured to be worn by said principal speaker and configured to produce a first principal audio signal containing a first sampling of the wanted speech plus unwanted background noise that has not previously been measured or characterized;
at least one incidental microphone located remotely from said principal microphone by several acoustic wavelengths at a mid-band audio frequency, and each said at least one incidental microphone configured to produce a second respective incidental audio signal containing a second sampling of at least said unwanted background noise that has not previously been measured or characterized; and
a signal processor configured to dynamically jointly process said first principal audio and second at least one said incidental audio signalssignal, without reference to noise profiles or filters constructed in advance, by
receiving and processing said first principal audio signal to determine a first set of individual spectral components at a set of predetermined frequencies;
receiving and processing the at least one said second incidental audio signal to determine at least one or more additional sets set of individual spectral components at said set of predetermined frequencies; and
dynamically combining corresponding spectral components from said first set and the at least one or more of said additional sets set to obtain a combined set of spectral components in which unwanted background noise components are reduced compared to wanted speech components; and
generating an output audio waveform solely from the combined set of spectral components, without filtering or suppressing noise by reference to a predetermined noise profile, in which the ratio of the wanted speech to unwanted noise is greater than the corresponding ratio for either the first principal or the second any of the at least one said incidental audio signals alone.
2. The system and apparatus of claim 1 in which said principal microphone is part of a Bluetooth wireless headset and said at least one incidental microphone is part of a Bluetooth-equipped communication device in wireless communication with said Bluetooth headset.
3. The system and apparatus of claim 1, further comprising additional microphones producing additional audio signals containing different samplings of said wanted speech signal and unwanted background noise, and wherein the signal processor is configured to receive all of the first principal, second at least one said incidental, and said additional audio signals, and to derive therefrom a derived output signal to dynamically jointly process also said additional audio signals, wherein, in the output audio waveform generated, the ratio of said wanted signal speech to unwanted background noise is greater than the corresponding ratio for any of the audio signals alone.
4. The system and apparatus of claim 1, further comprising additional microphones producing additional audio signals containing different samplings of said wanted speech signal and unwanted background noise, and wherein the signal processor is configured to receive all of the first principal, second at least one said incidental, and said additional audio signals and to derive a derived output signal by processing to dynamically jointly process the first principal and at least one incidental audio signals jointly with a selected one of the second and said additional audio signal signals, wherein the ratio of said wanted signal speech to unwanted background noise in the derived signal output audio waveform generated is greater than the corresponding ratio for any of the audio signals alone.
5. The system and apparatus of claim 1 in which said joint processing comprises time-domain to spectral domain converters conversion for separating said first principal and second at least one said incidental audio signals into said spectral components, a spectral combiner combining for performing weighted combining of corresponding said spectral components to produce a said combined set of spectral domain signal components, and a spectral domain to time domain converter conversion to convert said combined set of spectral domain signal components to generate said derived output signal audio waveform.
6. A system and apparatus for dynamically enhancing speech communications between a first multiplicity of speakers three or more persons in the presence of random acoustic background noise that is not known or characterized a priori, comprising:
a second multiplicity of microphones arranged such that for each of said first multiplicity of speakers, at least one of the second multiplicity of microphones is a principal microphone associated with that speaker, the second multiplicity of microphones producing a corresponding number of audio output signals containing different combinations of wanted speech anda plurality of at least three microphones arranged such that at least one microphone is associated with each said person, wherein each microphone outputs an audio signal containing speech if the associated person is speaking, or acoustic background noise alone that has not previously been measured or characterized; and
a signal processor configured to dynamically process jointly an audio output signal from a principle microphone containing speech along with one two or more other said audio output signals containing acoustic background noise in order to derive a derived output signal solely from the audio signals and without reference to noise profiles or filters constructed in advance, in which the a ratio of the speech signal from the principle microphone to unwanted acoustic background noise is greater than the a corresponding ratio for any one of said the audio output signals signal containing speech or the two or more said audio signals containing acoustic background noise alone;
wherein the joint processing of the audio output signal from the principle microphone containing speech and one the two or more other said audio output signals containing acoustic background noise includes, on a frame-block basis:
estimating a signal an inverse noise correlation matrix without reliance on stored statistics;
for each audio signal,
distinguishing between noise with whether speech present and noise without speech present,
updating the signal inverse noise spatial correlation matrix only if speech is present, and
calculating a frequency response from the updated signal inverse noise spatial correlation matrix if speech is present, or from a non-updated inverse noise spatial correlation matrix if speech is not present;
dynamically jointly processing the frequency responses response for each the audio signal containing speech and two or more audio signals containing acoustic background noise to derive an output signal in the frequency domain solely from the audio output signals and without reference to noise profiles or filters constructed in advance; and
converting the derived output signal to the a time domain.
7. The system and apparatus of claim 6 in which the an audio output signal of at least one of said multiplicity of plurality of at least three microphones is conveyed to said signal processor by a wireless link using any of a Bluetooth radio frequency link; a WiFi radio frequency link; a modulated Infra Red infrared link; an analog frequency-modulated link; a digital wireless link; a modulated visible light link; an inductively-coupled link and an electrostatically-coupled link.
8. The system and apparatus of claim 6 configured for a lecture hall environment in which said first multiplicity of speakers may three or more persons comprises a first group of speakers on stage and a second group speakers in the an audience, and said second multiplicity of plurality of at least three microphones comprises any combination of wireless microphones, lapel microphones, wireless headsets, fixed microphones, and roaming microphones.
9. The system and apparatus of claim 6 configured for use on the a flight deck of an aircraft, in which said second multiplicity of plurality of at least three microphones comprises the headsets provided for at least two crew members.
10. A system and apparatus for improving the a speech quality of conference calls using a telephone network, comprising:
a first conference phone installed at a first location and configured to serve a first group containing at least one intermittent speaker;
at least one second conference phone installed at a second location and configured to serve a second group containing at least one second intermittent speaker, the first phone and at least one second conference phones phone being in mutual communication via a telephone network;
at least two three microphones at at least one of said first or at least one second location configured to produce corresponding audio output signals containing respective samplings of a wanted speech signal and unwanted background noise;
a signal processor configured to receive said audio output signals from said at least two three microphones and to dynamically jointly process the at least two audio output signals to derive therefrom, solely from the audio output signals and without reference to noise profiles or filters constructed in advance, a derived output signal in which the a ratio of the wanted speech signal to unwanted background noise is greater than the a corresponding ratio for the audio output signal from any one alone of said at least two three microphones, said derived audio output signal from the signal processor being transmitted via said telephone network from the a location of the at least two three microphones to all other locations participating in the conference mutual communication via the telephone network;
wherein the joint processing of the audio output signal from the principle microphone and one or more other said audio output signals containing respective samplings of a wanted speech signal and unwanted background noise includes, on a frame-block basis:
estimating a signal correlation matrix without reliance on stored statistics;
estimating an inverse noise spatial correlation matrix;
for each audio signal,
distinguishing between noise with speech present and noise without speech present,
updating the signal correlation matrix only if speech is present, and
calculating a frequency response from the updated signal correlation matrix and the inverse noise spatial correlation matrix if speech is present, or from a non-updated signal correlation matrix and the inverse noise spatial correlation matrix if speech is not present;
dynamically jointly processing the frequency responses for each audio signal to derive an output signal in the a frequency domain solely from the audio signals and without reference to noise profiles or filters constructed in advance; and
converting the derived output signal to the a time domain.
11. The system and apparatus of claim 10 in which said at least two three microphones comprises any of one or more microphones associated with said first conference phone or said at least one second conference phone and connected thereto; any headset or lapel microphones worn by any person; any microphone contained by or connected to a laptop computer by wire or wireless means and any other fixed or hand-held microphones.
12. The system and apparatus of claim 10 in which said signal processor is located within said first conference phone or said at least one second conference phone, and the first conference phone or said at least one second conference phone is configured to receive the audio signals from said at least two three microphones using any of a wired connection; a wireless connection, or a connection to a server that forwards audio signals received at the server from any microphone.
13. The system and apparatus of claim 10 in which said signal processor is implemented in software on a server, the server being configured to receive audio signals from said at least two three microphones and to derive said derived output signal.
14. A method for improving the a signal to noise ratio of an audio signal received from a microphone associated with a principal active speaker, comprising the steps of:
providing a plurality of microphones;
associating at least one microphone of the plurality of microphones with each of a number of potential speakers;
determining the microphone of the plurality of microphones that is normally associated with the principal active speaker from among the number of potential speakers;
activating or maintaining in an active state at least one other microphone of the plurality of microphones that is normally associated with a speaker from among the number of potential speakers other than the principal active speaker; and
jointly processing, in a frequency domain and without performing beamforming, in a digital signal processor, the audio signals received from the microphone of the plurality of microphones that is normally associated with the principal active speaker and together with audio signals received from said at least one other microphone of the plurality of microphones in order to derive a processed signal in which the a ratio of the a wanted speech signal from the principal active speaker to background noise is greater than from any one microphone of the plurality of microphones alone.
15. The method of claim 14 in which the step of determining the microphone of the plurality of microphones that is associated with the principal active speaker is based on the a state of a press-to-talk switch associated with the microphone.
16. The method of claim 14 in which the step of determining the microphone of the plurality of microphones that is associated with the principal active speaker is based on an indication from a Voice Activity Detector associated with the microphone.
17. The method of claim 14 wherein jointly processing the audio signals received from the microphone of the plurality of microphones that is associated with the principal active speaker and said at least one other microphone of the plurality of microphones comprises:
decomposing all the such audio signals into a set of narrowband constituent components using a windowed Fast Fourier Transform;
processing overlapping blocks of signals, wherein the an overlap of a windowing function adds to unity, and applying frequency domain filtering on a frame-block basis;
estimating a signal correlation matrix and a an inverse noise spatial correlation matrix, based on a recursive linear squares algorithm modified for processing in a frequency domain, for each frame;
using voice activity detection on each audio signal to distinguish between noise with speech present and noise without speech present;
for each audio signal in each frame, updating the signal correlation matrix only if speech is present, and updating the inverse noise spatial correlation matrix only if speech is not detected present;
calculating Green's function for each frame from the updated signal correlation matrix if speech is present, or from a non-updated signal correlation matrix if speech is not present;
calculating a frequency response for each audio signal from the calculated Green's function and the updated signal inverse noise correlation matrix if speech is not present, or from a non-updated inverse noise correlation matrix if speech is present;
calculating an output signal in the frequency domain from the Green's function and frequency responses; and
converting the output signal to the a time domain using inverse Fast Fourier Transform.
18. The method of claim 17, wherein the noise spatial correlation matrix is calculated using a recursive linear squares algorithm modified for processing in the frequency domain.
19. The method of claim 17, further comprising calculating a power spectral density of the output signal if speech is detected, prior to the converting using the inverse Fast Fourier Transform.
20. A Press-To-Talk (PTT) communication system comprising:
at least two communication terminals, each terminal including a pressel switch used by an operator of the terminal to indicate active speech; and
a signal processor operative to
continuously receive the state of the pressel switch from each terminal;
continuously receive an audio signal from each terminal, regardless of the state of the pressel switch;
determine, from the states of all pressel switches, a currently active speaker;
jointly process audio signals from the currently active speaker's terminal and at least one other terminal to derive an output audio signal in which the ratio of speech by the currently active speaker to background noise is greater than such ratio derived from any one terminal alone; and
output the derived output audio signal to at least one terminal.
21. The system and apparatus of claim 1 wherein dynamically jointly process processing said first principal audio signal and second the at least one said incidental audio signalssignal, without reference to noise profiles or filters constructed in advance, further comprises processing the audio signals under the a constraint that the a spectrum of the wanted speech is substantially unchanged.
22. The system and apparatus of claim 10 wherein the joint processing of the audio output signal from the principle microphone and one or more other said audio output signals containing respective samplings of a wanted speech signal and unwanted background noise comprises joint processing under the a constraint that the a spectrum of the wanted speech signal is substantially unchanged.
23. The method of claim 14 wherein jointly processing the audio signals received from the microphone associated with the principal active speaker and said at least one other microphone comprises jointly processing the audio signals under the a constraint that the a spectrum of the wanted speech signal from the principal active speaker is substantially unchanged.
24. The system of claim 1 in which said principal microphone is part of a Bluetooth wireless device and said at least one incidental microphone is part of a Bluetooth-equipped communication device in which wireless communication with said Bluetooth device.
25. A system for dynamically improving a ratio of a wanted audio signal to unwanted noise that is not known or characterized a priori, comprising:
a principal microphone configured to produce a principal audio signal containing a sampling of the wanted audio signal and unwanted noise that has not previously been measured or characterized;
at least one incidental microphone located remotely from said principal microphone by several audio wavelengths at a mid-band audio frequency and configured to produce at least one corresponding incidental audio signal containing a sampling of said unwanted noise that has not previously been measured or characterized; and
a signal processor configured to dynamically jointly process said principal audio signal and said at least one incidental audio signal, without reference to profiles or filters constructed in advance with respect to said unwanted noise,
by receiving and processing said principal audio signal to determine a first set of individual spectral components at a set of predetermined frequencies;
receiving and processing said at least one incidental audio signal to determine one or more additional sets of individual spectral components at said set or predetermined frequencies; and
dynamically combining corresponding spectral components from said first set and one or more of said additional sets to obtain a combined set of spectral components in which unwanted noise components are reduced compared to wanted audio signal components; and
generating an output audio waveform solely from the combined set of spectral components, without filtering or suppressing unwanted noise by reference to a predetermined noise profile, in which the ratio of the wanted audio signal to the unwanted noise is greater than a corresponding ratio for either said principal microphone or any of said at least one incidental microphone alone.
26. The system of claim 1 in which said principal microphone is part of a first wireless communications device and said at least one incidental microphone is a part of a second wireless communications device in wireless communication with said first wireless communications device.
27. The system of claim 1 wherein dynamically combining corresponding spectral components from said first set and the at least one said additional set comprises combining the corresponding spectral components by minimizing said background noise components under a nonlinear equality constraint leaving characteristics of the wanted speech components nominally undisturbed or having an a priori desired amplitude and phase distortion.
28. The system of claim 1 wherein positions of said principal microphone and said at least one incidental microphone are arbitrary relative to each other, other than said at least one incidental microphone being located remotely from said principal microphone by several acoustic wavelengths at a mid-band audio frequency.
29. The system of claim 1 wherein a position of at least one of said at least one incidental microphones is changing relative to said principal microphone.
30. The system of claim 6 wherein positions of the plurality of at least three microphones are arbitrary relative to each other.
31. The system of claim 6 wherein a position of at least one of the plurality of at least three microphones is changing relative to another of the plurality of at least three microphones.
32. The method of claim 14 wherein jointly processing audio signals received from the microphone of the plurality of microphones that is normally associated with the principal active speaker together with audio signals received from said at least one other microphone of the plurality of microphones comprises jointly processing the audio signals subject to a nonlinear equality constraint that represents a predetermined degree of degradation of the speech.
33. The method of claim 14 wherein jointly processing audio signals received from the microphone associated with the principal active speaker and said at least one other microphone comprises jointly processing the audio signals to minimize the background noise under a nonlinear equality constraint leaving the characteristics of the wanted speech signal nominally undisturbed or having a predetermined amplitude and phase distortion representing a degree of degradation of the wanted speech signal.
34. The system of claim 25 in which said principal microphone is part of a first wireless communications device and said at least one incidental microphone is a part of a second wireless communications device in wireless communication with said first wireless communications device.
35. The system of claim 25, further comprising additional microphones producing additional audio signals containing different samplings of said wanted audio signal and unwanted background noise, and wherein the signal processor is configured to receive all of the principal, at least one incidental, and said additional audio signals, and to dynamically jointly process also said additional audio signals, wherein, in the output audio waveform generated, the ratio of said wanted audio signal to unwanted background noise is greater than the corresponding ratio for any of the audio signals alone.
36. The system of claim 25, further comprising additional microphones producing additional audio signals containing different samplings of said wanted speech signal and unwanted background noise, and wherein the signal processor is configured to receive all of the principal, at least one incidental, and additional audio signals and to dynamically jointly process the principal and at least one incidental audio signals jointly with a selected one of the additional audio signals, wherein the ratio of said wanted audio signal to unwanted background noise in the output audio waveform generated is greater than the corresponding ratio for any of the audio signals alone.
37. The system of claim 25 in which said joint processing comprises time-domain to spectral domain conversion for separating said principal and at least one said incidental audio signal into said spectral components, spectral combining for performing weighted combining of corresponding said spectral components to produce said combined set of spectral components, and spectral domain to time domain conversion to convert said combined set of spectral components to generate said output audio waveform.
38. The system of claim 25 wherein dynamically combining corresponding spectral components from said first set and one or more of said additional sets comprises combining the corresponding spectral components by minimizing said background noise under a nonlinear equality constraint leaving characteristics of the wanted audio signal nominally undisturbed or having a predetermined amplitude and phase distortion representing a degree of degradation of the wanted audio signal.
39. The system of claim 25 wherein positions of said principal microphone and said at least one incidental microphone are arbitrary relative to each other, other than said at least one incidental microphone being located remotely from said principal microphone by several acoustic wavelengths at a mid-band audio frequency.
40. The system of claim 25 wherein a position of at least one of said at least one incidental microphones is changing relative to said principal microphone.
41. A system for dynamically enhancing speech communications between two people in the presence of random acoustic background noise that is not known or characterized a priori, comprising:
two microphones arranged such that one microphone is associated with each said person, wherein each microphone ouptuts an audio signal containing speech if the associated person is speaking, or an audio signal containing acoustic background noise alone that has not previously been measured or characterized; and
a signal processor configured to dynamically process jointly an audio signal containing speech along with one audio signal containing acoustic background noise in order to derive a derived an output signal, without reference to noise profiles or filters constructed in advance, in which a ratio of speech to acoustic background noise is greater than a corresponding ratio for the audio signal containing speech alone, or for the one audio signal containing acoustic background noise alone;
wherein the joint processing of the audio output signal containing speech and the one audio signal containing acoustic background noise includes, on a frame-block basis:
estimating a noise correlation matrix without reliance on stored statistics;
for each audio signal,
distinguishing whether speech is present,
updating the noise spatial correlation matrix only if speech is present, and
calculating a frequency response from the updated noise spatial correlation matrix if speech is present, or from a non-updated spatial correlation matrix if speech is not present;
dynamically jointly processing the frequency responses for the audio signal containing speech and the one audio signal containing acoustic background noise to derive an output signal in a frequency domain without reference to noise profiles or filters constructed in advance; and
converting the derived output signal to a time domain.
US16/402,088 2012-06-18 2019-05-02 Wired and wireless microphone arrays Active 2034-01-23 USRE50627E1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US16/402,088 USRE50627E1 (en) 2012-06-18 2019-05-02 Wired and wireless microphone arrays

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US201261690019P 2012-06-18 2012-06-18
US13/908,178 US9641933B2 (en) 2012-06-18 2013-06-03 Wired and wireless microphone arrays
US16/402,088 USRE50627E1 (en) 2012-06-18 2019-05-02 Wired and wireless microphone arrays

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
US13/908,178 Reissue US9641933B2 (en) 2012-06-18 2013-06-03 Wired and wireless microphone arrays

Publications (1)

Publication Number Publication Date
USRE50627E1 true USRE50627E1 (en) 2025-10-07

Family

ID=51985127

Family Applications (2)

Application Number Title Priority Date Filing Date
US13/908,178 Ceased US9641933B2 (en) 2012-06-18 2013-06-03 Wired and wireless microphone arrays
US16/402,088 Active 2034-01-23 USRE50627E1 (en) 2012-06-18 2019-05-02 Wired and wireless microphone arrays

Family Applications Before (1)

Application Number Title Priority Date Filing Date
US13/908,178 Ceased US9641933B2 (en) 2012-06-18 2013-06-03 Wired and wireless microphone arrays

Country Status (1)

Country Link
US (2) US9641933B2 (en)

Families Citing this family (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9257132B2 (en) * 2013-07-16 2016-02-09 Texas Instruments Incorporated Dominant speech extraction in the presence of diffused and directional noise sources
US9613611B2 (en) * 2014-02-24 2017-04-04 Fatih Mehmet Ozluturk Method and apparatus for noise cancellation in a wireless mobile device using an external headset
US9510094B2 (en) * 2014-04-09 2016-11-29 Apple Inc. Noise estimation in a mobile device using an external acoustic microphone signal
US9672841B2 (en) * 2015-06-30 2017-06-06 Zte Corporation Voice activity detection method and method used for voice activity detection and apparatus thereof
TWI783917B (en) * 2015-11-18 2022-11-21 美商艾孚諾亞公司 Speakerphone system or speakerphone accessory with on-cable microphone
KR102502601B1 (en) * 2015-11-27 2023-02-23 삼성전자주식회사 Electronic device and controlling voice signal method
CN107748657B (en) * 2017-10-19 2021-12-21 广东小天才科技有限公司 Microphone-based interaction method and microphone
KR102088216B1 (en) * 2018-10-31 2020-03-12 김정근 Method and device for reducing crosstalk in automatic speech translation system
GB2597009B (en) * 2019-05-22 2023-01-25 Solos Tech Limited Microphone configurations for eyewear devices, systems, apparatuses, and methods
US10735887B1 (en) * 2019-09-19 2020-08-04 Wave Sciences, LLC Spatial audio array processing system and method
US10993088B1 (en) 2020-06-11 2021-04-27 H.M. Electronics, Inc. Systems and methods for using role-based voice communication channels in quick-service restaurants
US11452073B2 (en) 2020-08-13 2022-09-20 H.M. Electronics, Inc. Systems and methods for automatically assigning voice communication channels to employees in quick service restaurants
US11356561B2 (en) 2020-09-22 2022-06-07 H.M. Electronics, Inc. Systems and methods for providing headset voice control to employees in quick-service restaurants
KR102848942B1 (en) 2021-02-01 2025-08-21 삼성전자주식회사 Method for processing audio data and electronic device supporting the same
CN119905100B (en) * 2025-04-02 2025-06-27 大象声科(深圳)科技有限公司 Cross-equipment multi-microphone collaborative processing voice enhancement method, device, terminal and medium

Citations (65)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4008376A (en) * 1975-10-17 1977-02-15 Bell Telephone Laboratories, Incorporated Loudspeaking teleconferencing circuit
US4149032A (en) * 1978-05-04 1979-04-10 Industrial Research Products, Inc. Priority mixer control
US4449238A (en) * 1982-03-25 1984-05-15 Bell Telephone Laboratories, Incorporated Voice-actuated switching system
US4658425A (en) * 1985-04-19 1987-04-14 Shure Brothers, Inc. Microphone actuation control system suitable for teleconference systems
US5404397A (en) * 1992-04-16 1995-04-04 U.S. Phillips Corporation Conference system with automatic speaker detection and speaker unit
JPH08116353A (en) * 1994-10-14 1996-05-07 Ricoh Co Ltd Teleconference terminal device
US5561737A (en) * 1994-05-09 1996-10-01 Lucent Technologies Inc. Voice actuated switching system
US5715319A (en) * 1996-05-30 1998-02-03 Picturetel Corporation Method and apparatus for steerable and endfire superdirective microphone arrays with reduced analog-to-digital converter and computational requirements
US20020126856A1 (en) * 2001-01-10 2002-09-12 Leonid Krasny Noise reduction apparatus and method
US20020141601A1 (en) * 2001-02-21 2002-10-03 Finn Brian M. DVE system with normalized selection
US20030027600A1 (en) * 2001-05-09 2003-02-06 Leonid Krasny Microphone antenna array using voice activity detection
US20030206640A1 (en) 2002-05-02 2003-11-06 Malvar Henrique S. Microphone array signal enhancement
US20040013252A1 (en) * 2002-07-18 2004-01-22 General Instrument Corporation Method and apparatus for improving listener differentiation of talkers during a conference call
US6690793B1 (en) * 1999-05-17 2004-02-10 Mitel Corporation Click free mute switch circuit for telephones
US20040058674A1 (en) 2002-09-19 2004-03-25 Nortel Networks Limited Multi-homing and multi-hosting of wireless audio subsystems
US20040131201A1 (en) * 2003-01-08 2004-07-08 Hundal Sukhdeep S. Multiple wireless microphone speakerphone system and method
US20040192362A1 (en) * 2002-03-27 2004-09-30 Michael Vicari Method and apparatus for providing a wireless aircraft interphone system
US20040213419A1 (en) * 2003-04-25 2004-10-28 Microsoft Corporation Noise reduction systems and methods for voice applications
US20050060142A1 (en) * 2003-09-12 2005-03-17 Erik Visser Separation of target acoustic signals in a multi-transducer arrangement
US20050141541A1 (en) * 2003-12-29 2005-06-30 Renaud Cuny Method and system for controlling a real-time communications service
US20050207567A1 (en) * 2000-09-12 2005-09-22 Forgent Networks, Inc. Communications system and method utilizing centralized signal processing
US20050254640A1 (en) * 2004-05-11 2005-11-17 Kazuhiro Ohki Sound pickup apparatus and echo cancellation processing method
US20050286698A1 (en) * 2004-06-02 2005-12-29 Bathurst Tracy A Multi-pod conference systems
US20060013416A1 (en) * 2004-06-30 2006-01-19 Polycom, Inc. Stereo microphone processing for teleconferencing
US20060045063A1 (en) * 2004-08-26 2006-03-02 Stanford Thomas H Communication system and method
US20060084504A1 (en) * 2004-04-30 2006-04-20 Chan Andy K Wireless communication systems
US20060135085A1 (en) * 2004-12-22 2006-06-22 Broadcom Corporation Wireless telephone with uni-directional and omni-directional microphones
US20070021958A1 (en) * 2005-07-22 2007-01-25 Erik Visser Robust separation of speech signals in a noisy environment
US20070046540A1 (en) * 2005-08-26 2007-03-01 Step Communications Corporation, A Nevada Corporation Beam former using phase difference enhancement
US20070082615A1 (en) * 2005-10-12 2007-04-12 Siukai Mak Method and system for audio signal processing for bluetooth wireless headsets using a hardware accelerator
US20070149246A1 (en) * 2004-01-09 2007-06-28 Revolabs, Inc. Wireless multi-user audio system
US20070237341A1 (en) * 2006-04-05 2007-10-11 Creative Technology Ltd Frequency domain noise attenuation utilizing two transducers
US20070274540A1 (en) * 2006-05-11 2007-11-29 Global Ip Solutions Inc Audio mixing
US20080152167A1 (en) * 2006-12-22 2008-06-26 Step Communications Corporation Near-field vector signal enhancement
US20080159507A1 (en) * 2006-12-27 2008-07-03 Nokia Corporation Distributed teleconference multichannel architecture, system, method, and computer program product
US7496387B2 (en) * 2003-09-25 2009-02-24 Vocollect, Inc. Wireless headset for use in speech recognition environment
US20090220065A1 (en) * 2008-03-03 2009-09-03 Sudhir Raman Ahuja Method and apparatus for active speaker selection using microphone arrays and speaker recognition
US20090238377A1 (en) * 2008-03-18 2009-09-24 Qualcomm Incorporated Speech enhancement using multiple microphones on multiple devices
US20090299742A1 (en) * 2008-05-29 2009-12-03 Qualcomm Incorporated Systems, methods, apparatus, and computer program products for spectral contrast enhancement
US20090318202A1 (en) * 2003-01-15 2009-12-24 Gn Netcom A/S Hearing device
US20090323925A1 (en) * 2008-06-26 2009-12-31 Embarq Holdings Company, Llc System and Method for Telephone Based Noise Cancellation
US7706821B2 (en) * 2006-06-20 2010-04-27 Alon Konchitsky Noise reduction system and method suitable for hands free communication devices
US20100151787A1 (en) * 2008-12-17 2010-06-17 Motorola, Inc. Acoustic suppression using ancillary rf link
US20100228545A1 (en) * 2007-08-07 2010-09-09 Hironori Ito Voice mixing device, noise suppression method and program therefor
US20100266139A1 (en) * 2007-12-10 2010-10-21 Shinichi Yuzuriha Sound collecting device, sound collecting method, sound collecting program, and integrated circuit
US20110019836A1 (en) * 2008-03-27 2011-01-27 Yamaha Corporation Sound processing apparatus
US20110070926A1 (en) * 2009-09-22 2011-03-24 Parrot Optimized method of filtering non-steady noise picked up by a multi-microphone audio device, in particular a "hands-free" telephone device for a motor vehicle
US20110091029A1 (en) * 2009-10-20 2011-04-21 Broadcom Corporation Distributed multi-party conferencing system
US20110096942A1 (en) * 2009-10-23 2011-04-28 Broadcom Corporation Noise suppression system and method
US7983428B2 (en) * 2007-05-09 2011-07-19 Motorola Mobility, Inc. Noise reduction on wireless headset input via dual channel calibration within mobile phone
US20120051548A1 (en) * 2010-02-18 2012-03-01 Qualcomm Incorporated Microphone array subset selection for robust noise reduction
US20120093344A1 (en) * 2009-04-09 2012-04-19 Ntnu Technology Transfer As Optimal modal beamformer for sensor arrays
US20120183154A1 (en) * 2011-01-19 2012-07-19 Broadcom Corporation Use of sensors for noise suppression in a mobile communication device
US20120184337A1 (en) * 2010-07-15 2012-07-19 Burnett Gregory C Wireless conference call telephone
US20120258772A1 (en) * 2011-04-05 2012-10-11 Research In Motion Limited Mobile wireless communications device with proximity based transmitted power control and related methods
US20120275621A1 (en) * 2009-12-22 2012-11-01 Mh Acoustics,Llc Surface-Mounted Microphone Arrays on Flexible Printed Circuit Boards
US20120322511A1 (en) * 2011-06-20 2012-12-20 Parrot De-noising method for multi-microphone audio equipment, in particular for a "hands-free" telephony system
US20130010657A1 (en) * 2008-09-25 2013-01-10 Sonetics Corporation Vehicle crew communications system
US8428661B2 (en) * 2007-10-30 2013-04-23 Broadcom Corporation Speech intelligibility in telephones with multiple microphones
US20130325458A1 (en) * 2010-11-29 2013-12-05 Markus Buck Dynamic microphone signal mixer
US8606249B1 (en) * 2011-03-07 2013-12-10 Audience, Inc. Methods and systems for enhancing audio quality during teleconferencing
US8774875B1 (en) * 2010-10-20 2014-07-08 Sprint Communications Company L.P. Spatial separation-enabled noise reduction
US20140193009A1 (en) * 2010-12-06 2014-07-10 The Board Of Regents Of The University Of Texas System Method and system for enhancing the intelligibility of sounds relative to background noise
US8848901B2 (en) * 2006-04-11 2014-09-30 Avaya, Inc. Speech canceler-enhancer system for use in call-center applications
US8924206B2 (en) * 2011-11-04 2014-12-30 Htc Corporation Electrical apparatus and voice signals receiving method thereof

Patent Citations (66)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4008376A (en) * 1975-10-17 1977-02-15 Bell Telephone Laboratories, Incorporated Loudspeaking teleconferencing circuit
US4149032A (en) * 1978-05-04 1979-04-10 Industrial Research Products, Inc. Priority mixer control
US4449238A (en) * 1982-03-25 1984-05-15 Bell Telephone Laboratories, Incorporated Voice-actuated switching system
US4658425A (en) * 1985-04-19 1987-04-14 Shure Brothers, Inc. Microphone actuation control system suitable for teleconference systems
US5404397A (en) * 1992-04-16 1995-04-04 U.S. Phillips Corporation Conference system with automatic speaker detection and speaker unit
US5561737A (en) * 1994-05-09 1996-10-01 Lucent Technologies Inc. Voice actuated switching system
JPH08116353A (en) * 1994-10-14 1996-05-07 Ricoh Co Ltd Teleconference terminal device
US5715319A (en) * 1996-05-30 1998-02-03 Picturetel Corporation Method and apparatus for steerable and endfire superdirective microphone arrays with reduced analog-to-digital converter and computational requirements
US6690793B1 (en) * 1999-05-17 2004-02-10 Mitel Corporation Click free mute switch circuit for telephones
US20050207567A1 (en) * 2000-09-12 2005-09-22 Forgent Networks, Inc. Communications system and method utilizing centralized signal processing
US20020126856A1 (en) * 2001-01-10 2002-09-12 Leonid Krasny Noise reduction apparatus and method
US20020141601A1 (en) * 2001-02-21 2002-10-03 Finn Brian M. DVE system with normalized selection
US20030027600A1 (en) * 2001-05-09 2003-02-06 Leonid Krasny Microphone antenna array using voice activity detection
US20040192362A1 (en) * 2002-03-27 2004-09-30 Michael Vicari Method and apparatus for providing a wireless aircraft interphone system
US20030206640A1 (en) 2002-05-02 2003-11-06 Malvar Henrique S. Microphone array signal enhancement
US20040013252A1 (en) * 2002-07-18 2004-01-22 General Instrument Corporation Method and apparatus for improving listener differentiation of talkers during a conference call
US20040058674A1 (en) 2002-09-19 2004-03-25 Nortel Networks Limited Multi-homing and multi-hosting of wireless audio subsystems
US20040131201A1 (en) * 2003-01-08 2004-07-08 Hundal Sukhdeep S. Multiple wireless microphone speakerphone system and method
US20090318202A1 (en) * 2003-01-15 2009-12-24 Gn Netcom A/S Hearing device
US20040213419A1 (en) * 2003-04-25 2004-10-28 Microsoft Corporation Noise reduction systems and methods for voice applications
US20050060142A1 (en) * 2003-09-12 2005-03-17 Erik Visser Separation of target acoustic signals in a multi-transducer arrangement
US7496387B2 (en) * 2003-09-25 2009-02-24 Vocollect, Inc. Wireless headset for use in speech recognition environment
US20050141541A1 (en) * 2003-12-29 2005-06-30 Renaud Cuny Method and system for controlling a real-time communications service
US20070149246A1 (en) * 2004-01-09 2007-06-28 Revolabs, Inc. Wireless multi-user audio system
US20060084504A1 (en) * 2004-04-30 2006-04-20 Chan Andy K Wireless communication systems
US20050254640A1 (en) * 2004-05-11 2005-11-17 Kazuhiro Ohki Sound pickup apparatus and echo cancellation processing method
US20050286698A1 (en) * 2004-06-02 2005-12-29 Bathurst Tracy A Multi-pod conference systems
US20060013416A1 (en) * 2004-06-30 2006-01-19 Polycom, Inc. Stereo microphone processing for teleconferencing
US20080201138A1 (en) * 2004-07-22 2008-08-21 Softmax, Inc. Headset for Separation of Speech Signals in a Noisy Environment
US20060045063A1 (en) * 2004-08-26 2006-03-02 Stanford Thomas H Communication system and method
US20060135085A1 (en) * 2004-12-22 2006-06-22 Broadcom Corporation Wireless telephone with uni-directional and omni-directional microphones
US20070021958A1 (en) * 2005-07-22 2007-01-25 Erik Visser Robust separation of speech signals in a noisy environment
US20070046540A1 (en) * 2005-08-26 2007-03-01 Step Communications Corporation, A Nevada Corporation Beam former using phase difference enhancement
US20070082615A1 (en) * 2005-10-12 2007-04-12 Siukai Mak Method and system for audio signal processing for bluetooth wireless headsets using a hardware accelerator
US20070237341A1 (en) * 2006-04-05 2007-10-11 Creative Technology Ltd Frequency domain noise attenuation utilizing two transducers
US8848901B2 (en) * 2006-04-11 2014-09-30 Avaya, Inc. Speech canceler-enhancer system for use in call-center applications
US20070274540A1 (en) * 2006-05-11 2007-11-29 Global Ip Solutions Inc Audio mixing
US7706821B2 (en) * 2006-06-20 2010-04-27 Alon Konchitsky Noise reduction system and method suitable for hands free communication devices
US20080152167A1 (en) * 2006-12-22 2008-06-26 Step Communications Corporation Near-field vector signal enhancement
US20080159507A1 (en) * 2006-12-27 2008-07-03 Nokia Corporation Distributed teleconference multichannel architecture, system, method, and computer program product
US7983428B2 (en) * 2007-05-09 2011-07-19 Motorola Mobility, Inc. Noise reduction on wireless headset input via dual channel calibration within mobile phone
US20100228545A1 (en) * 2007-08-07 2010-09-09 Hironori Ito Voice mixing device, noise suppression method and program therefor
US8428661B2 (en) * 2007-10-30 2013-04-23 Broadcom Corporation Speech intelligibility in telephones with multiple microphones
US20100266139A1 (en) * 2007-12-10 2010-10-21 Shinichi Yuzuriha Sound collecting device, sound collecting method, sound collecting program, and integrated circuit
US20090220065A1 (en) * 2008-03-03 2009-09-03 Sudhir Raman Ahuja Method and apparatus for active speaker selection using microphone arrays and speaker recognition
US20090238377A1 (en) * 2008-03-18 2009-09-24 Qualcomm Incorporated Speech enhancement using multiple microphones on multiple devices
US20110019836A1 (en) * 2008-03-27 2011-01-27 Yamaha Corporation Sound processing apparatus
US20090299742A1 (en) * 2008-05-29 2009-12-03 Qualcomm Incorporated Systems, methods, apparatus, and computer program products for spectral contrast enhancement
US20090323925A1 (en) * 2008-06-26 2009-12-31 Embarq Holdings Company, Llc System and Method for Telephone Based Noise Cancellation
US20130010657A1 (en) * 2008-09-25 2013-01-10 Sonetics Corporation Vehicle crew communications system
US20100151787A1 (en) * 2008-12-17 2010-06-17 Motorola, Inc. Acoustic suppression using ancillary rf link
US20120093344A1 (en) * 2009-04-09 2012-04-19 Ntnu Technology Transfer As Optimal modal beamformer for sensor arrays
US20110070926A1 (en) * 2009-09-22 2011-03-24 Parrot Optimized method of filtering non-steady noise picked up by a multi-microphone audio device, in particular a "hands-free" telephone device for a motor vehicle
US20110091029A1 (en) * 2009-10-20 2011-04-21 Broadcom Corporation Distributed multi-party conferencing system
US20110096942A1 (en) * 2009-10-23 2011-04-28 Broadcom Corporation Noise suppression system and method
US20120275621A1 (en) * 2009-12-22 2012-11-01 Mh Acoustics,Llc Surface-Mounted Microphone Arrays on Flexible Printed Circuit Boards
US20120051548A1 (en) * 2010-02-18 2012-03-01 Qualcomm Incorporated Microphone array subset selection for robust noise reduction
US20120184337A1 (en) * 2010-07-15 2012-07-19 Burnett Gregory C Wireless conference call telephone
US8774875B1 (en) * 2010-10-20 2014-07-08 Sprint Communications Company L.P. Spatial separation-enabled noise reduction
US20130325458A1 (en) * 2010-11-29 2013-12-05 Markus Buck Dynamic microphone signal mixer
US20140193009A1 (en) * 2010-12-06 2014-07-10 The Board Of Regents Of The University Of Texas System Method and system for enhancing the intelligibility of sounds relative to background noise
US20120183154A1 (en) * 2011-01-19 2012-07-19 Broadcom Corporation Use of sensors for noise suppression in a mobile communication device
US8606249B1 (en) * 2011-03-07 2013-12-10 Audience, Inc. Methods and systems for enhancing audio quality during teleconferencing
US20120258772A1 (en) * 2011-04-05 2012-10-11 Research In Motion Limited Mobile wireless communications device with proximity based transmitted power control and related methods
US20120322511A1 (en) * 2011-06-20 2012-12-20 Parrot De-noising method for multi-microphone audio equipment, in particular for a "hands-free" telephony system
US8924206B2 (en) * 2011-11-04 2014-12-30 Htc Corporation Electrical apparatus and voice signals receiving method thereof

Non-Patent Citations (6)

* Cited by examiner, † Cited by third party
Title
"Audio frequency." Academic Press Dictionary of Science and Technology, edited by Christopher G. Morris, Elsevier Science & Technology, 4th edition, 1992. Credo Reference, https://search.credoreference.com/content/entry/apdst/audio_frequency/0?institutionId=743. Accessed Aug. 23, 2022 (Year: 2022). *
"Echo canceller." Hargrave's Communications Dictionary, Wiley, Frank Hargrave, Wiley, 1st edition, 2001. Credo Reference, https://search.credoreference.com/content/entry/hargravecomms/echo_canceller/0?institutionId=743. Accessed Aug. 22, 2022 (Year: 2002). *
"Unwanted." Merriam-Webster.com Dictionary, Merriam-Webster, https://www.merriam-webster.com/dictionary/unwanted. Accessed Oct. 7, 2020. (Year: 2020). *
"Wanted." Merriam-Webster.com Dictionary, Merriam-Webster, https://www.merriam-webster.com/dictionary/unwanted. Accessed Oct. 7, 2020. (Year: 2020). *
Hidri Adel, et al., "Beamforming Techniques for Multichannel audio Signal Separation," International Journal of Digital Content Technology & Its Applications, Nov. 2012, vol. 6, Issue 20, pp. 659-668.
Noise. (1999). In S . . . Amos, & R. S. Amos, Newnes Dictionary of Electronics, Newnes (4th ed.). (Year: 2020). *

Also Published As

Publication number Publication date
US9641933B2 (en) 2017-05-02
US20140355775A1 (en) 2014-12-04

Similar Documents

Publication Publication Date Title
USRE50627E1 (en) Wired and wireless microphone arrays
US9756422B2 (en) Noise estimation in a mobile device using an external acoustic microphone signal
US10269369B2 (en) System and method of noise reduction for a mobile device
US10090001B2 (en) System and method for performing speech enhancement using a neural network-based combined symbol
US8428661B2 (en) Speech intelligibility in telephones with multiple microphones
CN112492434B (en) Hearing devices including noise reduction systems
CN1809105B (en) Dual-microphone speech enhancement method and system applicable to mini-type mobile communication devices
JP4378170B2 (en) Acoustic device, system and method based on cardioid beam with desired zero point
US8989815B2 (en) Far field noise suppression for telephony devices
EP1953735A1 (en) Voice control system and method for voice control
US20060147063A1 (en) Echo cancellation in telephones with multiple microphones
US20150256956A1 (en) Multi-microphone method for estimation of target and noise spectral variances for speech degraded by reverberation and optionally additive noise
US8798290B1 (en) Systems and methods for adaptive signal equalization
US20110181452A1 (en) Usage of Speaker Microphone for Sound Enhancement
US20140037100A1 (en) Multi-microphone noise reduction using enhanced reference noise signal
CA2574793A1 (en) Headset for separation of speech signals in a noisy environment
WO2003036614A2 (en) System and apparatus for speech communication and speech recognition
WO2006017993A1 (en) A background noise eliminate device and method for speech communication terminal
US20140335917A1 (en) Dual beamform audio echo reduction
US8923530B2 (en) Speakerphone feedback attenuation
US9589572B2 (en) Stepsize determination of adaptive filter for cancelling voice portion by combining open-loop and closed-loop approaches
US10297245B1 (en) Wind noise reduction with beamforming
US9729967B2 (en) Feedback canceling system and method
US20210297792A1 (en) Hearing devices and related methods
WO2006066618A1 (en) Local area network, communication unit and method for cancelling noise therein

Legal Events

Date Code Title Description
FEPP Fee payment procedure

Free format text: ENTITY STATUS SET TO UNDISCOUNTED (ORIGINAL EVENT CODE: BIG.); ENTITY STATUS OF PATENT OWNER: SMALL ENTITY

FEPP Fee payment procedure

Free format text: ENTITY STATUS SET TO MICRO (ORIGINAL EVENT CODE: MICR); ENTITY STATUS OF PATENT OWNER: SMALL ENTITY

FEPP Fee payment procedure

Free format text: ENTITY STATUS SET TO MICRO (ORIGINAL EVENT CODE: MICR); ENTITY STATUS OF PATENT OWNER: SMALL ENTITY

FEPP Fee payment procedure

Free format text: ENTITY STATUS SET TO SMALL (ORIGINAL EVENT CODE: SMAL); ENTITY STATUS OF PATENT OWNER: SMALL ENTITY

FEPP Fee payment procedure

Free format text: ENTITY STATUS SET TO SMALL (ORIGINAL EVENT CODE: SMAL); ENTITY STATUS OF PATENT OWNER: SMALL ENTITY