WO2009043066A1 - Method and device for low-latency auditory model-based single-channel speech enhancement - Google Patents
Method and device for low-latency auditory model-based single-channel speech enhancement Download PDFInfo
- Publication number
- WO2009043066A1 WO2009043066A1 PCT/AT2007/000466 AT2007000466W WO2009043066A1 WO 2009043066 A1 WO2009043066 A1 WO 2009043066A1 AT 2007000466 W AT2007000466 W AT 2007000466W WO 2009043066 A1 WO2009043066 A1 WO 2009043066A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- noise
- signal
- filter
- speech
- band
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
- G10L21/0232—Processing in the frequency domain
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0264—Noise filtering characterised by the type of parameter measurement, e.g. correlation techniques, zero crossing techniques or predictive techniques
Definitions
- the present invention relates to a method for enhancing wide-band speech audio signals in the presence of background noise and, more particularly to a noise suppression system, a noise suppression method and a noise suppression program. More specifically, the present invention relates to low-latency single-channel noise reduction using sub-band processing based on masking properties of the human auditory system.
- noise reduction methods i.e. methods aiming at processing a noisy signal with the purpose of eliminating or attenuating the level of noise and improving the signal-to-noise-ratio (SNR) without affecting the speech and its characteristics.
- SNR signal-to-noise-ratio
- noise reduction is also referred to as noise suppression or speech enhancement.
- ambient noise usually arises from fans of computers, printers, or facsimile machines, which can be considered as (long-term) stationary.
- Conversational noise emerging from (telephone) talks of colleagues sharing the office, as often referred to as babble noise, contains harmonic components and is therefore much harder to attenuate by a noise reduction unit.
- spectral subtraction attempts to estimate the short time spectral amplitude (STSA) of clean speech from that of the noisy, speech, i.e. the desired speech contaminated by noise, by subtracting an estimate noise signal.
- STSA short time spectral amplitude
- the estimated speech magnitude is combined with the phase of the noisy speech, based on the assumption that the human ear is insensitive against phase distortions (see C. L.
- a method to reduce musical tones which is often applied is to subtract an overestimate of the noise spectrum to reduce the fluctuations in the DFT coefficients and prevent the spectral components from going below a spectral floor (see M. Berouti et al., "Enhancement of speech corrupted by acoustic noise," in Proc. IEEE Int. Conf. on Acoust., Speech and Sig. Proc. (ICASSP'79), vol. 4, pp. 208-211, Washington D.C., Apr. 1979).
- This approach successfully reduces musical tones during low SNR conditions and noise only periods.
- the main disadvantage is the distortion of the speech signal during voice activity . In practice a tradeoff between speech quality level and residual noise floor level has to be found.
- MMSE-STSA The minimum mean-square error short-time spectral amplitude estimator (MMSE-STSA, see Y. Ephraim and D. Malah, "Speech enhancement using a minimum mean-square error short-time amplitude estimator," TERF. Trans. Acoust Speech and Sig. Proc, vol. 32, no. 6, pp.1109-1121,
- a priori SNR represents the information on the unknown spectrum magnitude gathered from previous frames and is evaluated in the decision-directed approach (DDA). As the smoothing performed by the DDA may have irregularities, low-level musical noise may occur.
- DDA decision-directed approach
- Ephraim and Van Trees propose another important method for noise reduction based on signal subspace decomposition (see Y. Ephraim and H. L. Van Trees, "A signal subspace approach for speech enhancement", in IEEE Trans. Speech and Audio Proc, vol. 3, pp.251-266, July 1995).
- the noisy signal is decomposed into a signal-plus-noise subspace and a noise subspace, where these two subspaces are orthogonal.
- the resulting linear estimator is a general Wiener filter with adjustable noise level, to control the trade-off between signal distortion and residual noise, as they cannot be minimized simultanously.
- Skoglund and Kleijn point out the importance of the temporal masking property in connection with the excitation of voiced speech (see J. Skoglund and W. B. Kleijn, "On Time-Frequency Masking in Voiced Speech", in EEEE Trans. Speech and Audio Proc., vol. 8, no. 4, pp. 361-369, July 2000). It is shown that noise between the excitation impulses is more perceptive than noise close to the impulses, and this is especially so for the low pitch speech for which the excitation impulses locates temporal sparsely. Temporal masking is not employed by conventioanl noise reduction methods using frequency domain MMSE estimators. Patent WO 2006 114100 discloses a signal subspace approach taking the temporal masking properties into account.
- the aim of the present invention consists in providing a single-channel auditory-model based noise suppression method with low-latency processing of wide-band speech signals in the presence of backgound noise. More specifically, the present invention is based on the method of spectral subtraction using a modified decision directed approach comprising oversubtraction and an adjustable noise-level to avoid perceptible musical tones. Further, the present invention uses sub-band processing plus pre- and post-filtering to give consideration to temporal and simultaneous masking inherent to human auditory perception, in particular to minimize perceptible signal distortions during speech periods.
- GTF Gammatone filter bank
- a pre-processor which emulates the transfer behaviour of the human outer- and middle ear, is applied to the time-discrete noisy input signal (i.e. the desired speech contaminated by noise and interference).
- each sub-band the level of the noisy signal is detected and smoothed.
- These narrow-band level detectors applied to each of the plurality of sub-bands utilize the phase of simple low-order filter sections to provide lowest signal processing delay.
- the noise level is estimated in each sub-band utilizing a heuristic approach based on recursive Minimum-Statistics.
- the instantaneous signal-to-noise-ratio (SNR) in each sub-band is estimated from the envelope of the noisy signal and the noise level estimate.
- the a priori SNR is estimated from the instantaneous SNR by applying the Ephraim-and- Malah Spectral Subtraction Rule (EMSR).
- ESR Ephraim-and- Malah Spectral Subtraction Rule
- DDA decision directed approach
- Temporal masking based on human auditory perception is taken into account by appropriate filtering of the sub-band signals.
- These non-linear auditory post-masking filters apply recursive averaging to falling slopes of the signal level detected in each sub-band, with the following effects: (a) over-estimating variances of impulsive noise, (b) noise suppression algorithms do not effect signal below the temporal masking threshold, and (c) no additional signal delay is introduced to transient signals, important in speech perception.
- a non-linear gain function for each sub-band is derived from the a priori SNR estimates, comprising over-subtraction of the noise signal estimates.
- the noisy signal in each sub-band is multiplied by the respective gain in order to suppress the noise signal components.
- An optimized nearly perfect reconstruction filter-bank employing a decision criterion for signed summation re-synthesizes the enhanced full-band speech singal.
- a post-processing filter is applied to the enhanced full-band signal to compensate the effect of the pre-processing filter.
- Single channel subtractive-type speech enhancement systems are efficient in reducing background noise; however, they introduce a perceptually annoying residual noise.
- properties of the auditory system are introduced in the enhancement process. This phenomenon is modeled by the calculation of a noise-masking threshold in frequency domain, below which all components are inaudible (see N. Virag, "Single Channel Speech Enhancement Based on Masking Properties of the Human Auditory System", IEEE Trans, on Speech and Audio Proc., vol. 7, no. 2, pp. 126-137, March 1999).
- filter bank implementations are especially attractive as they can be adapted to the spectral and temporal resolution of the human ear.
- the authors propose a noise suppression method based on spectral subtraction combined with Gammatone filter (GTF) banks divided into critcal bands.
- GTF Gammatone filter
- the concept of critical bands which describes the resolution of the human auditory systems, leads to a nonlinearly warped frequency scale, called the Bark Scale (see J. O. Smith HI and J. S. Abel, "Bark and ERB Bilinear Transforms," IEEE Trans, on Speech and Audio Pro ⁇ , vol. 7, no. 6, pp. 697-708, Nov. 1999).
- the use of Gammatone filter banks outperforms the DTFT based reaches in terms of computational complexity and overall system latency.
- the GTF approach allows implementing a low-latency analysis-synthesis scheme with low computational complexity and nearly perfect reconstruction.
- the proposed synthesis filter creates the broadband output signal by a simple summation of the sub-band signals, introducing a criterion that indicates the necessity of sign alteration before summation.
- This approach outperforms channel vocoder based approaches as proposed e.g. by McAulay and Malpass (see R. J. McAulay and M. L. Malpass, "Speech Enhancement Using a Soft-Decision Noise Suppression Filter", IEEE Trans, on Acoust., Speech and Sig.
- the method for low-latency auditory-model based single channel noise suppression and reduction works as an independent module and is intended for installation into a digital signal processing chain, wherein the software-specified algorithm is implemented on a commercially available digital signal processor (DSP), preferably a special DSP for audio applications.
- DSP digital signal processor
- FIG 1 is a schematic illustration of the single-channel sub-band speech enhancement unit of the present invention.
- FIG 2 is a schematic illustration of the non-linear calculation of the gain factor for noise suppression applied to each sub-band.
- FIG 3 and 4 show the roof-shaped MMSE-SP attenuation surface dependent on the a posteriori (7 f c) and the a priori (£*) SNR.
- the x-axis corresponds to 7* and not (7* - 1) as in the literature.
- the dash-dotted line in Fig. 3 marks the transition between the partitions * /**"• and G 10 , the dashed line shows the power spectral subtraction contour.
- the contours of the DDA estimation are plotted in Fig. 4 upon the MMSE-SP attenuation surface. Dashed lines in Fig. 4 show the average of the dynamic relationships between 7* and ⁇ k , solid lines show static relationships.
- FIG 5 and 6 are illustrations of the combined (modified) DDA and MMSE-SP estimation behaviour. Dashed lines in Fig. S show the average of the dynamic relationships between 7* and ⁇ k , solid lines show static relationships. Two fictitious hysteretis loops of Fig. 6 matching the observations from informal experiments.
- FIG 7 shows a block diagram of the overall-system.
- FIG 8 shows the over-all system comprising auditory frequency analysis and resynthesis as front- and back end, and using special low-latency and low-effort speech enhancement in between.
- a combination of an elaborate noise suppression law with a human auditory model enables high quality performance.
- FIG 9 shows an outer- and middle ear filter composed of three second order sections (SOS).
- FIG 11 shows a familiar way of level-detection. As the signal power is used, the squared amplitude is detected.
- FIG 12 shows the Low-Latency FIR level detector
- FIG 13 shows a non-linear recursive auditory post-masking filter, responding to falling slopes.
- FIG 14 shows a recursive noise level estimator using three time-constant and a counter threshold.
- the Ephraim and Malah amplitude estimator and the Ephraim and Malah decision directed a priori SNR estimate (Y. Ephraim and D. Malah, "Speech Enhancement Using a Minimum Mean-Square Error Short-Time Spectral Amplitude Estimator", IEEE Transactions on Acoustics, Speech, and Signal Processing, nr. 6, vol. ASSP-32, pp. 1109-1121, Dec. 1984 and Y. Ephraim and D. Malah, "Speech Enhancement Using a Minimum Mean-Square Error Log-Spectral Amplitude Estimator", IEEE Transactions on Acoustics, Speech, and Signal Processing, nr.2, vol. ASSP-33, pp. 443-445, Apr.
- a signal model is considered in which a noisy signal y[n] consists of speech x[n] and additive noise d[n], at time-index n.
- the signals x[n] and d[n] are assumed to be statistically independent Gaussian random variables. Due to certain properties of the Fourier transform, the same statistical model can be assumed for corresponding complex short-term spectral amplitudes 2C t f m ] an ⁇ £*[m] in each frequency bin k, at analysis time m (Underlined variables denote complex quantities here. Therefore, in our notation, 2C j t[ m ] represents a complex variable.
- ⁇ f c[m] shall represent the magnitude 12CjJm]
- the unknown clean speech variance ⁇ fc is implicitly determined in the a priori SNR estimation part of the algorithm, whereas the noise variance ⁇ k has to be determined in advance, e.g. using Minimum Statistics (P. J. Wolfe and S. J. Godsill, "Simple Alternatives to the Ephraim and Malah Suppression Rule for Speech Enhancement", Proc. 11th IEEE Signal Processing Workshop, pp. 496-499, 6-8. Aug 2001), MCRA (I. Cohen and B. Berdugo, "Speech Enhancement for non-stationary noise environments", Signal Processing, no. 11, pp. 2403-2418, Elsevier, Nov. 2001), or Harmonic Tunneling (D. Ealey, H. Kelleher, D. Pearce, "Harmonic Tunneling: Tracki ⁇ g Non-Stationaiy Noises During Speech", Proc. Eurospeech, 2001)
- section II an overview of the combined estimation is given, and its hysteretic shape is presented. Furthermore in section HI it is shown how a slight modification can reduce unwanted estimation behaviour and enable a smoother estimation hysteresis.
- the EMSR reconstructs the magnitude of the clean speech signal X k [m] from the noisy observation Vt[m].
- the time index m may be dropped for simplicity of notation.
- the noisy phase is an optimal estimate of the clean phase.
- the reconstruction operator is a real- valued spectral weight G[m]:
- the DDA combines two basic SNR estimators to a new estimator of die a priori SNR ⁇ k .
- the second estimator describes the reconstructed SNR, which is calculated after noise reduction using
- the a posteriori SNR 7* shows relative variations in time that are smaller than those of (7*- 1) (Relative variations, e.g. 10 1og(7 f c[m])— 10-log(7 f c[m— 1]), are more significant than linear variations regarding human auditory perception.).
- G provides a consistently high attenuation under low SNR conditions. Therefore, the reconstructed SNR rec will take more consistent values than SNRi 11St in the low SNR case.
- the DDA for estimation of the a priori SNR combines both SNHi 11St aad SNR rec :
- the specific estimation properties can be observed by inserting the suppression gain into the DDA.
- Awkward estimation behavior e.g. the "constant- ⁇ -effect”
- the discontinuities in the hysteresis loop (Fig. 4) give rise to considerations concerning a modification of the DDA and a reconsideration of time-constant and minimum a priori SNR quantities.
- the parameter p can directly control the suppression hysteresis width and musical noise suppression. Our modification enables a separate control of averaging time-constant and musical noise suppression.
- the over-all system is shown in a block diagram in Fig. 7. It can be implemented as analog or digital effect processor or as a part of a software algorithm. Inside the over-all system there are several subsystems Fig. 8:
- LD low-latency level detection
- PM auditory post-masking filter
- An outer and middle ear filter consists of three second order sections (SOS) representing the following physiological parts of the human ear (E. Zwicker, H. Fasti, “Psychoacoustics, facts and models”, Springer, Berlin Heidelberg, 1999; E. Terhardt, “Akustician Kochunikation”, Springer, Berlin Heidelberg, 1998):
- the latter two filters are optional, whereas the high-pass component is mandatory and reduces the influence of low-frequency noise on the noise suppressor.
- a filter structure providing an appropriate magnitude transfer function could look like Fig. 9. All three filter sections have to be second order sections to provide appropriate slopes.
- the outer filter skirts can be modelled as second order low- and high-pass shelving filters, whereas the resonance can be modelled as parametric peak-filter (P. Dutilleux, U. Zolzer, "DAFX”, Wiley&Sons, 2002).
- Frequency grouping is an imporant effect in human loudness perception.
- the perceived loudnesss consists of particular loudnesses associated to individual frequency ranges.
- An auditory frequency scale can be used to model this frequency grouping effects, the units of which can be seen as frequency resolution of human auditory loudness perception (E. Zwicker, H. Fasti, "Psychoacoustics, facts and models", Springer, Berlin Heidelberg, 1999).
- a reasonable frequency scale using a low number of frequency groups can be given by the formula of Traunmuller (E. Terhardt, "Akustician Kochunikation", Springer, Berlin Heidelberg, 19
- the bandwidths B k can be derived from B k — 35 ⁇ 1 ⁇ / f c + du/2 ⁇ — 93 "1 I ⁇ — du/2 ⁇ .
- Other Bark-scames e.g. E. Zwicker, H. Fasti, 'Tsychoacoustics, facts and models", Springer, Berlin Heidelberg, 1999
- Auditory Gammatone filters (R. F. Lyon, "The All-Pole Gammatone Filter and Auditory Models", Proc. Forum Acusticum, Antwerpen 1996) can be efficiently implemented in the time- domain, allowing the separation of a broadband audio signal into auditory band signals.
- the magnitude response of the Gammatone Filter corresponds to the simultaneous masking properties of the human ear. Plotting the magnitude of this filter along an auditory frequency scale the filter shape remains the same, whatever center frequency the filter is designed to have.
- the arbitrary form representing the family of Gammatone-filters of the order m is shown below, wherein it is the filterbank channel index.
- a corresponding z-transform wherein *GF denotes an arbitrary Gammatone filter (e.g. GF, APGF, OZGF, TZGF), is:
- An auditory Gammatone Filterbank represents of a set of overlapping Gammatone filters that devide the auditory firequeny scale in equally spaced frequency bands.
- the term g *GF shall be adjusted so that unity gain at the center frequency ⁇ k can be provided.
- the system H n ⁇ m,k (z) has to be adapted suitably as shown in the following sub-sections.
- the odinary Gammatone filter (GF; R. F. Lyon, "The All-Pole Gammatone Filter and Auditory Models", Proc. Forum Acusticum, Antwerpen 1996) has to be derived from the continuous-time impulse response using the Laplace- and impulse-invariance transform (A. V. Oppenheim, R. W. Schafer, J. R. Buck, “Discrete-Time Signal Processing", Prentice Hall, 1999): which determines the unknown polynomial H num,k (z) in the above equation (21). Due to its shape and computational cost its use is not recommended.
- the One-Zero Gammatone (OZGF) can be efficiently composed of a common "One-Zero" for all channels k before splitting up into k All-Pole Gammatone filters.
- Fig. 11 provides an example, which also takes the form-factor F into account
- Suitable time-constants match the auditory pre-masking time-constant, which is approximately ⁇ avg ⁇ 2[ ⁇ ns] (G. Stoll, J. G. Beerends, R. Bitto, K. Brandenburg, C. Colonies, B. Feiten, M. Keyhl, C. Schmidmer, T. Sporer, T. Thiede, W. C. Treurniet, "PEAQ - der ein ITU-Standard Kunststoff sammlungiven Messung der Cincinnati Congress proceedingsen Audioqualitat", RTM - Rundfunktechnische Mitteilungen, die suzeitschrift fur H ⁇ rfunk und Femsehtechnik, 43. Gonzgang, ISSN 0035- 9890 (81-120), Fi ⁇ na Mensing GmbH + Co. KG, Abander Verlag, Sept 1999).
- a consistent 90° phase shift can be brought to a broad band signal. Summing up the squares of the original and the shifted signal, squared amplitudes (i.e. signal power) remain while sinusoidal components cancel. But a causal implementation of the Hilbert transform doesn't exist. Unlike an ideal Hubert transformator, we only need 90° phase shift in the considered frequency range, i.e. in the corresponding auditory frequency group.
- Each of the above mentioned methods can provide a 90° phase shift to a virtually arbitrary frequency ⁇ k and is therefore suitable.
- Fig. 12 provides an example for the FIR level detection method. Appropriate parameters can be found using the phase-equations for the corresponding systems, e.g. A. V. Oppenheim, R. W. Schafer, J. R. Buck, "Discrete-Time Signal Processing", Prentice Hall, 1999.
- the averaging parameter ⁇ k in the channel k has to correspond to human auditory post- masking time-constants at corresponding frequencies ⁇ k . Therefore, we use following equation to derive the averaging parameter ⁇ :
- a parameter G can be used to scale the post-masking time-constants if useful.
- This method essentially applies three time-constants of averaging to the signal level. Falling slopes are sligthtly averaged, whereas during rising input slope, the output is held constant (i.e. infinitely large time-constant) during the period of N ⁇ , sampling intervals. When N ⁇ sampling intervals are exceeded, the rising signal slope is averaged by a third time constant.
- the time- constants can be similarly converted to recursive averaging parameters as in equation (25) and (26).
- An appropriate counter threshold N w can be calculated using a continuous time interval T w
- this time interval can be chosen e.g. T w ⁇ 1.5s.
- the falling slope time-constant can be a scaled version of the post-masking time-constants r*, or e.g. constant 200 [msj.
- the rising slope time-constant defining ⁇ can be approximately 700 [ms], which corresponds to a velocity of appoximately 6[dB]/[s]. Unlike other time-constants, this one is proposed to be equal for all channels k.
- the noise variance is given by the noise estimation algorithm; m and n are time indices, ⁇ s is the system sample rate and L a down-sampling factor.
- ⁇ k [m] is the a posteriori SNR
- ⁇ k [m] is the a priori SNR
- G w,k [m] is the spectral weight of a Wiener filter
- a is an averaging parameter, ced by an averaging time-constant ⁇ snr,k , which is either approximately 2[ms] (F. Zotter, M. Noisternig, R.
- up-sampling needs either a processing-delay or a group-delay due to the interpolation operation involved. Such a delay is approximately L samples long, using the up-sampling factor L.
- Frequency domain solutions using equivalent auditory models require delays in the range of 10 miliseconds, the implementation of our system with 20 frequency bands and the third order TZGF has a mean latency of 3.5 up to 4 miliseconds.
- ESR Ephraim and Malah suppression rule
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Quality & Reliability (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
- Circuit For Audible Band Transducer (AREA)
Abstract
Description
Claims
Priority Applications (4)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| GB1004090.5A GB2465910B (en) | 2007-10-02 | 2007-10-02 | Method and device for low-latency auditory model-based single-channel speech enhancement |
| DE112007003674T DE112007003674T5 (en) | 2007-10-02 | 2007-10-02 | Method and apparatus for single-channel speech enhancement based on a latency-reduced auditory model |
| AT0956707A AT509570B1 (en) | 2007-10-02 | 2007-10-02 | METHOD AND APPARATUS FOR ONE-CHANNEL LANGUAGE IMPROVEMENT BASED ON A LATEN-TERM REDUCED HEARING MODEL |
| PCT/AT2007/000466 WO2009043066A1 (en) | 2007-10-02 | 2007-10-02 | Method and device for low-latency auditory model-based single-channel speech enhancement |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| PCT/AT2007/000466 WO2009043066A1 (en) | 2007-10-02 | 2007-10-02 | Method and device for low-latency auditory model-based single-channel speech enhancement |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2009043066A1 true WO2009043066A1 (en) | 2009-04-09 |
Family
ID=39447761
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/AT2007/000466 Ceased WO2009043066A1 (en) | 2007-10-02 | 2007-10-02 | Method and device for low-latency auditory model-based single-channel speech enhancement |
Country Status (4)
| Country | Link |
|---|---|
| AT (1) | AT509570B1 (en) |
| DE (1) | DE112007003674T5 (en) |
| GB (1) | GB2465910B (en) |
| WO (1) | WO2009043066A1 (en) |
Cited By (9)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN102157156A (en) * | 2011-03-21 | 2011-08-17 | 清华大学 | Single-channel voice enhancement method and system |
| EP2495724A1 (en) * | 2011-02-17 | 2012-09-05 | Siemens Medical Instruments Pte. Ltd. | Device and method for estimating an interference noise |
| EP2747081A1 (en) * | 2012-12-18 | 2014-06-25 | Oticon A/s | An audio processing device comprising artifact reduction |
| US9173025B2 (en) | 2012-02-08 | 2015-10-27 | Dolby Laboratories Licensing Corporation | Combined suppression of noise, echo, and out-of-location signals |
| US10141003B2 (en) | 2014-06-09 | 2018-11-27 | Dolby Laboratories Licensing Corporation | Noise level estimation |
| CN112151060A (en) * | 2020-09-25 | 2020-12-29 | 展讯通信(天津)有限公司 | Single-channel voice enhancement method and device, storage medium and terminal |
| US10939161B2 (en) | 2019-01-31 | 2021-03-02 | Vircion LLC | System and method for low-latency communication over unreliable networks |
| WO2021128670A1 (en) * | 2019-12-26 | 2021-07-01 | 紫光展锐(重庆)科技有限公司 | Noise reduction method, device, electronic apparatus and readable storage medium |
| US12160709B2 (en) | 2022-08-23 | 2024-12-03 | Sonova Ag | Systems and methods for selecting a sound processing delay scheme for a hearing device |
Families Citing this family (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN110580910B (en) * | 2018-06-08 | 2024-04-26 | 北京搜狗科技发展有限公司 | Audio processing method, device, equipment and readable storage medium |
Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2002011125A1 (en) * | 2000-07-31 | 2002-02-07 | Herterkom Gmbh | Attenuation of background noise and echoes in audio signal |
| EP1600947A2 (en) * | 2004-05-26 | 2005-11-30 | Honda Research Institute Europe GmbH | Subtractive cancellation of harmonic noise |
| WO2006114100A1 (en) * | 2005-04-26 | 2006-11-02 | Aalborg Universitet | Estimation of signal from noisy observations |
| EP1729287A1 (en) * | 1999-01-07 | 2006-12-06 | Tellabs Operations, Inc. | Method and apparatus for adaptively suppressing noise |
Family Cites Families (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US6052771A (en) | 1998-01-20 | 2000-04-18 | International Business Machines Corporation | Microprocessor with pipeline synchronization |
| DE69932626T2 (en) | 1998-11-13 | 2007-10-25 | Bitwave Pte Ltd. | SIGNAL PROCESSING DEVICE AND METHOD |
| US6377637B1 (en) * | 2000-07-12 | 2002-04-23 | Andrea Electronics Corporation | Sub-band exponential smoothing noise canceling system |
-
2007
- 2007-10-02 GB GB1004090.5A patent/GB2465910B/en not_active Expired - Fee Related
- 2007-10-02 WO PCT/AT2007/000466 patent/WO2009043066A1/en not_active Ceased
- 2007-10-02 AT AT0956707A patent/AT509570B1/en active
- 2007-10-02 DE DE112007003674T patent/DE112007003674T5/en not_active Ceased
Patent Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| EP1729287A1 (en) * | 1999-01-07 | 2006-12-06 | Tellabs Operations, Inc. | Method and apparatus for adaptively suppressing noise |
| WO2002011125A1 (en) * | 2000-07-31 | 2002-02-07 | Herterkom Gmbh | Attenuation of background noise and echoes in audio signal |
| EP1600947A2 (en) * | 2004-05-26 | 2005-11-30 | Honda Research Institute Europe GmbH | Subtractive cancellation of harmonic noise |
| WO2006114100A1 (en) * | 2005-04-26 | 2006-11-02 | Aalborg Universitet | Estimation of signal from noisy observations |
Non-Patent Citations (5)
| Title |
|---|
| AMIR HUSSAIN ET AL: "Nonlinear Adaptive Speech Enhancement Inspired by Early Auditory Processing", NONLINEAR SPEECH MODELING AND APPLICATIONS LECTURE NOTES IN COMPUTER SCIENCE;LECTURE NOTES IN ARTIFICIAL INTELLIG ENCE;LNCS, SPRINGER-VERLAG, BE, vol. 3445, 1 January 2005 (2005-01-01), pages 291 - 316, XP019012533, ISBN: 978-3-540-27441-4 * |
| JAN SKOGLUND ET AL: "On Time-Frequency Masking in Voiced Speech", IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, IEEE SERVICE CENTER, NEW YORK, NY, US, vol. 8, no. 4, 1 July 2000 (2000-07-01), XP011054031, ISSN: 1063-6676 * |
| JOHNSON ET AL: "Speech signal enhancement through adaptive wavelet thresholding", SPEECH COMMUNICATION, ELSEVIER SCIENCE PUBLISHERS, AMSTERDAM, NL, vol. 49, no. 2, 15 February 2007 (2007-02-15), pages 123 - 133, XP005890520, ISSN: 0167-6393 * |
| KALLIRIS M G ET AL: "Broad-Band Acoustic Noise Reduction Using a Novel Frequency Depended Parametric Wiener Filter. Implementations using Filterbank, STFT and Wavelet Analysis/Synthesis Techniques.", AUDIO ENGINEERING SOCIETY (AES) CONVENTION, 12 May 2001 (2001-05-12) - 15 May 2001 (2001-05-15), Amsterdam, The Netherlands, pages 1 - 9, XP002499667 * |
| LIN L ET AL: "Speech denoising based on an auditory filterbank", SIGNAL PROCESSING, 2002 6TH INTERNATIONAL CONFERENCE ON AUG. 26-30, 2002, PISCATAWAY, NJ, USA,IEEE, vol. 1, 26 August 2002 (2002-08-26), pages 552 - 555, XP010628047, ISBN: 978-0-7803-7488-1 * |
Cited By (12)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| EP2495724A1 (en) * | 2011-02-17 | 2012-09-05 | Siemens Medical Instruments Pte. Ltd. | Device and method for estimating an interference noise |
| US8634581B2 (en) | 2011-02-17 | 2014-01-21 | Siemens Medical Instruments Pte. Ltd. | Method and device for estimating interference noise, hearing device and hearing aid |
| CN102157156A (en) * | 2011-03-21 | 2011-08-17 | 清华大学 | Single-channel voice enhancement method and system |
| US9173025B2 (en) | 2012-02-08 | 2015-10-27 | Dolby Laboratories Licensing Corporation | Combined suppression of noise, echo, and out-of-location signals |
| EP2747081A1 (en) * | 2012-12-18 | 2014-06-25 | Oticon A/s | An audio processing device comprising artifact reduction |
| US10141003B2 (en) | 2014-06-09 | 2018-11-27 | Dolby Laboratories Licensing Corporation | Noise level estimation |
| US10939161B2 (en) | 2019-01-31 | 2021-03-02 | Vircion LLC | System and method for low-latency communication over unreliable networks |
| WO2021128670A1 (en) * | 2019-12-26 | 2021-07-01 | 紫光展锐(重庆)科技有限公司 | Noise reduction method, device, electronic apparatus and readable storage medium |
| US12260873B2 (en) | 2019-12-26 | 2025-03-25 | Unisoc (Chongqing) Technologies Co., Ltd. | Method and apparatus of noise reduction, electronic device and readable storage medium |
| CN112151060A (en) * | 2020-09-25 | 2020-12-29 | 展讯通信(天津)有限公司 | Single-channel voice enhancement method and device, storage medium and terminal |
| CN112151060B (en) * | 2020-09-25 | 2022-11-25 | 展讯通信(天津)有限公司 | Single-channel voice enhancement method and device, storage medium and terminal |
| US12160709B2 (en) | 2022-08-23 | 2024-12-03 | Sonova Ag | Systems and methods for selecting a sound processing delay scheme for a hearing device |
Also Published As
| Publication number | Publication date |
|---|---|
| DE112007003674T5 (en) | 2010-08-12 |
| GB2465910B (en) | 2012-02-15 |
| GB201004090D0 (en) | 2010-04-28 |
| AT509570A5 (en) | 2011-09-15 |
| AT509570B1 (en) | 2011-12-15 |
| GB2465910A (en) | 2010-06-09 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US7313518B2 (en) | Noise reduction method and device using two pass filtering | |
| Martin | Speech enhancement based on minimum mean-square error estimation and supergaussian priors | |
| US8560320B2 (en) | Speech enhancement employing a perceptual model | |
| WO2009043066A1 (en) | Method and device for low-latency auditory model-based single-channel speech enhancement | |
| Soon et al. | Speech enhancement using 2-D Fourier transform | |
| JP5068653B2 (en) | Method for processing a noisy speech signal and apparatus for performing the method | |
| Abramson et al. | Simultaneous detection and estimation approach for speech enhancement | |
| Wu et al. | Subband Kalman filtering for speech enhancement | |
| Chen et al. | Fundamentals of noise reduction | |
| Mosayyebpour et al. | Single-microphone early and late reverberation suppression in noisy speech | |
| Soon et al. | Wavelet for speech denoising | |
| EP1995722B1 (en) | Method for processing an acoustic input signal to provide an output signal with reduced noise | |
| Diethorn | Subband noise reduction methods for speech enhancement | |
| Taşmaz et al. | Speech enhancement based on undecimated wavelet packet-perceptual filterbanks and MMSE–STSA estimation in various noise environments | |
| Sunnydayal et al. | A survey on statistical based single channel speech enhancement techniques | |
| Saleem et al. | Machine learning approach for improving the intelligibility of noisy speech | |
| Li et al. | A block-based linear MMSE noise reduction with a high temporal resolution modeling of the speech excitation | |
| WO2006114100A1 (en) | Estimation of signal from noisy observations | |
| Diethorn | Subband noise reduction methods for speech enhancement | |
| Esch et al. | Model-based speech enhancement exploiting temporal and spectral dependencies | |
| Mwema et al. | A spectral subtraction method for noise reduction in speech signals | |
| Yong et al. | Real time noise suppression in social settings comprising a mixture of non-stationary anc transient noise | |
| Dionelis | On single-channel speech enhancement and on non-linear modulation-domain Kalman filtering | |
| Tilp | Single-channel noise reduction with pitch-adaptive post-filtering | |
| Roy | Single channel speech enhancement using Kalman filter |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 07815133 Country of ref document: EP Kind code of ref document: A1 |
|
| ENP | Entry into the national phase |
Ref document number: 1004090 Country of ref document: GB Kind code of ref document: A Free format text: PCT FILING DATE = 20071002 |
|
| WWE | Wipo information: entry into national phase |
Ref document number: 1004090.5 Country of ref document: GB |
|
| WWE | Wipo information: entry into national phase |
Ref document number: 1120070036745 Country of ref document: DE |
|
| ENP | Entry into the national phase |
Ref document number: 95672007 Country of ref document: AT Kind code of ref document: A |
|
| RET | De translation (de og part 6b) |
Ref document number: 112007003674 Country of ref document: DE Date of ref document: 20100812 Kind code of ref document: P |
|
| 122 | Ep: pct application non-entry in european phase |
Ref document number: 07815133 Country of ref document: EP Kind code of ref document: A1 |