[go: up one dir, main page]

WO1998006090A1 - Codage parole/audio a l'aide d'une transformee non lineaire a amplitude spectrale - Google Patents

Codage parole/audio a l'aide d'une transformee non lineaire a amplitude spectrale Download PDF

Info

Publication number
WO1998006090A1
WO1998006090A1 PCT/CA1997/000543 CA9700543W WO9806090A1 WO 1998006090 A1 WO1998006090 A1 WO 1998006090A1 CA 9700543 W CA9700543 W CA 9700543W WO 9806090 A1 WO9806090 A1 WO 9806090A1
Authority
WO
WIPO (PCT)
Prior art keywords
speech
audio signal
encoding
linear transform
signal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/CA1997/000543
Other languages
English (en)
Inventor
Roch Lefebvre
Claude Laflamme
Jean-Pierre Adoul
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Universite de Sherbrooke
Original Assignee
Universite de Sherbrooke
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Universite de Sherbrooke filed Critical Universite de Sherbrooke
Priority to AU36901/97A priority Critical patent/AU3690197A/en
Publication of WO1998006090A1 publication Critical patent/WO1998006090A1/fr
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/02Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
    • G10L19/0204Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders using subband decomposition
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0264Noise filtering characterised by the type of parameter measurement, e.g. correlation techniques, zero crossing techniques or predictive techniques
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/27Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique

Definitions

  • the present invention relates to the field of digital encoding of speech and/or audio signals, where the main objective is to maintain the highest possible sound quality at a given bit rate.
  • noise shaping it is widely recognized that, to reduce the bit rate of speech and audio encoders without compromising quality, proper care should be given to the spectral shape of the coding noise.
  • a general rule is that the short-term spectrum of the coding noise must follow the short-term spectrum of the speech and/or audio signal. This is known as noise shaping.
  • a time-varying perceptual filter controls the noise level as a function of frequency.
  • This perceptual filter is derived from an autoregressive filter which models the formants, or spectral envelope, of the speech spectrum. Hence, the noise spectrum approximately follows the speech formants. Further perceptual improvements can be achieved by using a post-filter, which emphasizes the formant and harmonic structure of the synthesized speech signal.
  • the noise spectrum is controlled by dynamic bit allocation in the frequency domain.
  • a sophisticated hearing model is used to determine a masking threshold. Bit allocation is conducted so as to maintain the distortion below the masking threshold at any frequency (provided the encoder operates at a sufficient bit rate).
  • the resulting coding noise will be correlated to the signal spectrum, with corresponding peaks and valleys.
  • the perceptual filter and bit allocation discussed above have to be adaptive; more specifically, they are time-varying functions. This adaptation implies either that side information has to be transmitted to the decoder, or that the noise shaping function is an inherent part of the encoder.
  • An object of the present invention is therefore to overcome the above-discussed drawbacks of the prior art.
  • Another object of the present invention is to provide a speech/audio encoding method and device capable of enhancing the perceptual quality of speech and/or audio signals.
  • a further object of the present invention is to provide a encoding/decoding method and device conducting a warping of the spectral amplitude of a speech and/or audio signal prior to encoding, and an unwarping of the spectral amplitude of the speech and/or audio signal after decoding, in view of enhancing the perceptual quality of the encoded and subsequently synthesized speech and/or audio signal.
  • a method of encoding a speech and/or audio signal in view of enhancing perceptual quality comprising the steps of non-linearly transforming the speech and/or audio signal, and encoding the non- linearly transformed speech and/or audio signal to produce an encoded speech and/or audio signal.
  • the speech and/or audio signal is a time-domain speech and/or audio signal
  • the step of producing a spectrum representation comprises: breaking down the time-domain speech and/or audio signal into a succession of overlapping finite-duration speech and/or audio signal segments; and applying a first linear transform to each speech and/or audio signal segment to obtain short-term spectral components;
  • the step of non-linearly transforming the spectrum representation of the speech and/or audio signal comprises the step of: applying a non-linear transform to the short-term spectral components in order to produce warped spectral components; - the encoding step is performed in the time domain, and the method further comprises, prior to the encoding step, the steps of: applying a second linear transform to the warped spectral components to obtain a time-domain signal interval; multiplying the time-domain signal interval by a time window to produce a windowed-signal interval; and adding the successive overlapping windowed-signal intervals corresponding to the successive overlapping finite- duration speech and/or audio signal segments to obtain a pre- processed signal applied to the encoding step for encoding the non-linearly transformed spectrum representation; wherein the second linear transform is the inverse of the first linear transform.
  • a k ' is the amplitude of the k th warped spectral component S k '; and - meticulousXA k ) is a non-linear function of A k . wherein the non-linear function is given by
  • a k log b (A k ).
  • a method of decoding an encoded speech and/or audio signal in view of enhancing the perceptual quality of a synthesized speech and/or audio signal, in which the speech and/or audio signal has been encoded by producing a spectrum representation of the speech and/or audio signal, non-linearly transforming the spectrum representation of the speech and/or audio signal, and encoding the non-linearly transformed spectrum representation to produce the encoded speech and/or audio signal.
  • the decoding method comprises the steps of decoding the encoded speech and/or audio signal to recover a non-linearly transformed spectrum representation of the speech and/or audio signal, non-linearly transforming the recovered non-linearly transformed spectrum representation to recover a spectrum representation of the speech and/or audio signal, and transforming the recovered spectrum representation into the synthesized speech and/or audio signal.
  • a method for decoding an encoded speech and/or audio signal in view of enhancing the perceptual quality of a synthesized speech and/or audio signal, in which the encoded speech and/or audio signal has been produced by breaking down a time-domain speech and/or audio signal into a succession of overlapping finite-duration speech and/or audio signal segments, applying a first linear transform to each of the speech and/or audio signal segments to obtain short-term spectral components, applying a first non-linear transform to the short-term spectral components in order to produce warped spectral components, and encoding the warped spectral components to produce an encoded speech and/or audio signal.
  • the decoding method comprises the steps of decoding the encoded speech and/or audio signal to recover warped signal components, applying a second non-linear transform to the recovered warped signal components to produce unwarped short-term spectral components, and applying a second linear transform to the unwarped signal components to produce the synthesized speech and/or audio signal.
  • the present invention further relates to a method of decoding an encoded speech and/or audio signal in view of enhancing the perceptual quality of a synthesized speech and/or audio signal, in which the encoded speech and/or audio signal has been produced by breaking down a time- domain speech and/or audio signal into a succession of first overlapping finite-duration speech and/or audio signal segments, applying a first linear transform to each of the first speech and/or audio signal segments to obtain first short-term spectral components, applying a first non-linear transform to the first short-term spectral components in order to produce warped spectral components, applying a second linear transform to the warped spectral components to obtain a first time-domain signal interval, multiplying the first time-domain signal interval by a first time window to produce a first windowed-signal interval, adding the successive overlapping first windowed-signal intervals corresponding to the successive first overlapping finite-duration speech and/or audio signal segments to obtain a time-domain pre-processed speech and
  • the decoding method comprises the steps of decoding the encoded speech and/or audio signal to recover a decoded time-domain pre-processed speech and/or audio signal, breaking down the decoded time-domain pre- processed speech and/or audio signal into a succession of second overlapping finite-duration speech and/or audio signal segments, applying a third linear transform to each of the second speech and/or audio signal segments to obtain second short-term spectral components, applying a second non-linear transform to the second short-term spectral components in order to produce unwarped spectral components, applying a fourth linear transform to the unwarped spectral components to obtain a second time-domain signal interval, multiplying the second time-domain signal interval by a second time window to produce a second windowed- signal interval, adding the successive overlapping second windowed- signal intervals corresponding to the successive second overlapping finite-duration speech and/or audio signal segments to obtain the synthesized speech and/or audio signal.
  • the present invention concerns a device for carrying out into practice the above defined encoding and decoding methods.
  • adaptive noise shaping may be obtained with a constant function. More precisely, a constant non-linear transformation is applied to the speech/audio short-term spectrum prior to the quantizing and/or encoding per se. Since noise shaping is performed by means of a constant function, this function can be viewed as a completely separate entity from the encoder, making it possible to enhance the noise shaping capability of an existing encoder without modifying the encoder itself, or having to transmit additional side information.
  • Figure 1a which is labelled as "prior art", is a simplified block diagram of a speech/audio encoding device using pre-processing
  • FIG 1b which is labelled as "prior art” is a simplified block diagram of a speech/audio decoding device using post-processing, this speech/audio decoding device corresponding to the speech/audio encoding device of Figure 1a;
  • Figure 2a is a block diagram of a speech/audio encoding device using spectral-amplitude warping as pre-processing, in which the input signal is a time- domain speech and/or audio signal;
  • Figure 2b is a block diagram of a speech/audio decoding device using spectral-amplitude unwarping as post-processing, this speech/audio decoding device corresponding to the speech/audio encoding device of Figure 2a;
  • Figure 3a is a block diagram of a more general configuration of the speech/audio encoding device of Figure 2a, still using spectral-amplitude warping as pre-processing;
  • Figure 3b is a block diagram of a more general configuration of the speech/audio decoding device of Figure 2b, still using spectral-amplitude unwarping as post-processing;
  • Figure 4a is a graph showing an example of short-term amplitude spectrum of a voiced speech segment, using a fast Fourier transform (FFT);
  • FFT fast Fourier transform
  • Figure 4b is a graph showing the short-term amplitude spectrum of the coding noise, corresponding to the speech segment of Figure 4a, when this speech segment is encoded with wideband speech coding standard G.722 [P. Mermelstein, "G.722, a new CCITT coding standard for digital transmission of wideband audio signals", IEEE Communications Magazine, Vol. 26, No. 1 , 1988] at 48 kbits/second; and
  • Figure 4c is a graph showing the resulting short-term amplitude spectrum of the coding noise, corresponding to the speech segment of
  • Noise shaping can be implemented through the two following, fundamentally different approaches:
  • a weighting measure is included in the encoder itself, which weighting measure determines the level of accuracy reached in quantizing each spectral component;
  • the present invention is concerned with the pre-processing approach (approach No. (2)).
  • Figure 1a is a simplified block diagram of a speech/audio encoding device 100 with a pre-processing module 102.
  • the speech/audio encoding device 100 comprises an optional module 101 for conditioning the input speech and/or audio signal 106.
  • Input signal conditioning module 101 conditions the input speech and/or audio signal 106 to account for operations such as soft saturation to prevent clipping, high-pass filtering to remove DC component, gain control, etc.
  • signal conditioning is viewed as an entity separate from signal pre-processing.
  • the speech/audio encoding device 100 also comprises the preprocessing module 102 per se.
  • the main purpose of the pre-processing module is to improve the perceptual quality of the speech/audio encoding device 100. More specifically, module 102 modifies the speech and/or audio signal 106, conditioned by a signal conditioning module 101 or not, to emphasize the perceptually relevant features of this signal prior to encoding. This enables proper encoding of these perceptually relevant features.
  • the characteristics of the pre-processing module 102 will be fully explained in the following description.
  • the pre-processed speech and/or audio signal from module 102 is then supplied to the encoder 103.
  • the encoder 103 produces a bitstream 108 to be transmitted over a communication channel.
  • Figure 1 b is a simplified block diagram of a speech/audio decoding device 107 comprising a post-processing module 105.
  • the bitstream 108 from the encoder 103 is received by a decoder 104 of the speech/audio decoding device 107.
  • Decoder 104 produces a pre-processed synthesis speech and/or audio signal 110 in response to the received bitstream 108.
  • the post-processing module 105 conducts a post-processing operation which is typically the inverse of the pre-processing operation conducted by module 102 of Figure 1a. Hence, if the output of the decoder 104 was exactly the same as the input of the encoder 103, i.e. if there were no coding noise and no channel noise, then the synthesized speech and/or audio signal 109 would be exactly the same as the input speech and/or audio signal 106 (conditioned by module 101 or not).
  • the pre-processing 102 and post-processing 105 modules have been mostly implemented by means of linear filters.
  • a drawback of this linear-filter implementation is that adaptation of the preprocessing and post-processing requires adaptation of the linear filters themselves. This implementation also requires the transmission of additional side information to the decoder 104 in order to adapt postprocessing accordingly.
  • Figure 2a is a block diagram of the speech/audio encoding device 100, in which the pre-processing module 102 is broken down into four distinct modules 202-205.
  • the input speech and/or audio signal 106 is a time-domain signal, conditioned by the module 101 or not, and consisting of a block of samples supplied at recurrent time intervals called frames.
  • This signal structure is well known to those of ordinary skill in the art and, accordingly, will not be further described in the present disclosure.
  • the speech/audio encoding device 100 comprises a module 202 for multiplying the block of samples of the input speech and/or audio signal 106, conditioned by module 101 or not, by a time window to produce a windowed signal 201. In this manner, the speech and/or audio signal 106 is broken down into a succession of overlapping finite-duration speech and/or audio signal segments.
  • the window can be a rectangular window, a Hanning or Hamming window [A.V. Oppenheim, A.S. illsky, “Signals and Systems", Prentice-Hall Signal Processing Series, 1983], or a window having a more complex form.
  • module 203 applies a linear transform, for example a Fast Fourier Transform FFT, to each speech and/or audio signal segment to obtain short-term spectral components [A.V. Oppenheim, A.S. Willsky, "Signals and Systems", Prentice-Hall Signal Processing Series, 1983]. It is within the scope of the present invention to use any other similar linear transforms including, but not limited to, Sine, Cosine and MLT transforms, and that can be represented by a set of spectral components each having an amplitude, a phase (or a sign) and a distinct index on a frequency scale.
  • S k is the k th short-term spectral component, and is the amplitude of the spectral component s k .
  • module 204 The function of module 204 is to apply a non-linear transformation (non-linear warping) to the spectral components S k in order to produce so-called "warped" spectral components S k .
  • This operation can be summarized as follows: A, )
  • r " ⁇ / (A k ) is a non-linear function of A k and A k ' is the amplitude of the warped spectral component S k ⁇
  • Examples of f n ⁇ A k ) include, but are not limited to:
  • ⁇ (k) is a constant, possibly the same for all indexes k;
  • Module 205 then applies the inverse of the linear transform of module 203, i.e. the inverse Fast Fourier Transform (FFT), to the warped spectral components S k '. This yields a time-domain signal 206. To minimize the effects of discontinuities (frame effects), successive frames of time-domain signal 206 are added using overlap-add.
  • FFT Fast Fourier Transform
  • module 205 applies a second linear transform to the warped spectral components to obtain a time-domain signal interval, multiplies the time-domain signal interval by a time window to produce a windowed- signal interval, and adds the successive overlapping windowed-signal intervals corresponding to the above mentioned successive overlapping finite-duration speech and/or audio signal segments to obtain the time- domain pre-processed signal 206 applied to the encoder 103.
  • a second linear transform to the warped spectral components to obtain a time-domain signal interval, multiplies the time-domain signal interval by a time window to produce a windowed- signal interval, and adds the successive overlapping windowed-signal intervals corresponding to the above mentioned successive overlapping finite-duration speech and/or audio signal segments to obtain the time- domain pre-processed signal 206 applied to the encoder 103.
  • other methods such as overlap-discard could be used to reconstruct a continuous time-domain signal from successive frames.
  • Figure 2b is a block diagram of the speech/audio decoding device
  • the decoder 208 receives the bitstream 108 from the encoder 103 and, in response to this bitstream, produces a time-domain pre-processed synthesis speech and/or audio signal 110. Since the input to the encoder 103 is a pre-processed signal, the output of the decoder 104 requires post-processing to recover a synthesized speech and/or audio signal 109 suitable for listening.
  • modules 209, 210 and 212 of Figure 2b are identical to modules 202, 203 and 205 of Figure 2a, respectively. However, it is within the scope of the present invention to use modules 209, 210 and 212 different from modules 202, 203 and 205.
  • a non-linear transform is applied to the short-term spectral components produced by module 210 to perform an operation referred to an non-linear spectral-amplitude "unwarping", since the non- linear transform of module 210 is the inverse of the non-linear transform of module 204 ( Figure 1a). Indeed, the term "unwarping” emphasizes the fact that this operation is essentially the inverse of the above described spectral-amplitude warping. More specifically, if
  • Figure 4a shows the amplitude E (dB) as a function of frequency (kHz) of the short-term Fourier spectrum for a voiced segment of female speech, using a Fast Fourier Transform (FFT).
  • Figure 4b shows the amplitude E(dB) of the short-term Fourier spectrum as a function of frequency (kHz) of the coding noise, corresponding to the voiced segment spectrum amplitude of Figure 4a, when this speech segment is encoded using ITU wideband speech coding standard G.722 [3] at 48 kbits/second. (ITU is the successor to CCITT).
  • the coding noise is the difference signal between the original speech and/or audio signal 106, conditioned or not by module 101 , and the synthesized speech and/or audio signal 109 at the output of the speech/audio decoding device 107.
  • Figures 4a and 4b show that, between 2 kHz and 5 kHz, the noise spectrum exceeds the original speech spectrum, which results in audible distortion.
  • Figure 4c shows the resulting coding noise when pre-processing (spectral-amplitude warping) and post-processing (spectral-amplitude unwarping) are used respectively prior to encoding and after decoding, with wideband speech coding standard G.722 at 48 kbits/seconds.
  • the pre-processing is described in module 102 of Figure 2a
  • the postprocessing is described in module 105 of Figure 2b.
  • the non-linear warping operation of Module 204 is, in this particular case, as follows:
  • the noise spectrum is very correlated to the original speech spectrum in Figure 4a.
  • both spectra present corresponding peaks and valleys.
  • the noise spectrum of Figure 4c is much more attenuated in the low energy portions of the original speech spectrum.
  • An obvious example is in the 2-4 kHz region, where the noise level is more than 10 dB below the noise level of the G.722 encoder without pre-processing and post-processing.
  • the present invention can be generalized to the cases where the encoder 103 does not necessarily operate in the time-domain, for example to transform encoders operating directly in the frequency domain.
  • FIG 3a a generalized version of the speech/audio encoding device 100 of Figure 2a is presented.
  • the pre-processing operation conducted by module 102 is decomposed into three functions.
  • Modules 302, 303 and 304 are identical to module 202, 203 and 204 of Figure 2a, respectively.
  • the input of the encoder 103, in Figure 3a is then the "warped" spectral components, instead of the time-domain signal as described with reference to Figure 2a.
  • the encoder 103 has the choice either to operate in the frequency domain directly, as in the case of transform/sub-band encoders, or to apply the inverse linear transform and overlap-add functions of module 205 or Figure 2a to obtain a time- domain signal prior to encoding.
  • Figure 3b shows a modified version of the speech/audio decoding device 107 with post-processing of Figure 2b.
  • the output of the decoder 104 is then assumed to be a frequency-domain signal, i.e. a series of quantized (synthesized) spectral components, as in the case of transform/sub-band decoders.
  • Modules 308 and 309 are similar to modules 211 and 212 of Figure 2b. If the encoder 103 operates in the time domain, it is assumed that the decoding device 107 of Figure 3b includes internally modules 209 and 210 of Figure 2b to provide the spectral components required at the input of module 308 of Figure 3b.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

Dans un procédé et un dispositif de codage d'un signal de parole et/ou audio, afin de rehausser la qualité perceptive, on divise le signal temporel de parole et/ou audio en une succession de segments de signal de parole et/ou audio, de durée finie et se chevauchant, on applique une première transformée linéaire à chacun de ces segments afin d'obtenir des composantes spectrales de courte durée, auxquelles on applique ensuite une transformée non linéaire, afin de produire des composantes spectrales à distorsion, puis on applique une seconde transformée linéaire à ces composantes spectrales afin d'obtenir un intervalle de signal temporel que l'on multiplie par une fenêtre temporelle afin d'obtenir un intervalle de signal à fenêtre, et on additionne les intervalles successifs à fenêtre et se chevauchant, lesquels correspondent aux segments successifs se chevauchant et de durée finie du signal de parole et/ou audio, afin d'obtenir un signal préalablement traité que l'on code pour obtenir le signal de parole et/ou audio codé. On décrit également un procédé et un dispositif de décodage de signaux de parole/audio, correspondants.
PCT/CA1997/000543 1996-08-02 1997-07-30 Codage parole/audio a l'aide d'une transformee non lineaire a amplitude spectrale Ceased WO1998006090A1 (fr)

Priority Applications (1)

Application Number Priority Date Filing Date Title
AU36901/97A AU3690197A (en) 1996-08-02 1997-07-30 Speech/audio coding with non-linear spectral-amplitude transformation

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US2298696P 1996-08-02 1996-08-02
US60/022,986 1996-08-02

Publications (1)

Publication Number Publication Date
WO1998006090A1 true WO1998006090A1 (fr) 1998-02-12

Family

ID=21812476

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CA1997/000543 Ceased WO1998006090A1 (fr) 1996-08-02 1997-07-30 Codage parole/audio a l'aide d'une transformee non lineaire a amplitude spectrale

Country Status (2)

Country Link
AU (1) AU3690197A (fr)
WO (1) WO1998006090A1 (fr)

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2002093558A1 (fr) * 2001-05-15 2002-11-21 Wavecom Dispositif et procede de traitement d'un signal audio.
FR2826492A1 (fr) * 2001-06-22 2002-12-27 Thales Sa Procede et systeme de pre et de post-traitement d'un signal audio pour la transmission sur un canal fortement perturbe
WO2003009639A1 (fr) * 2001-07-19 2003-01-30 Vast Audio Pty Ltd Enregistrement d'une scene auditive tridimensionnelle et reproduction de cette scene pour un auditeur individuel
EP1739658A1 (fr) * 2005-06-28 2007-01-03 Harman Becker Automotive Systems-Wavemakers, Inc. Extension fréquentielle de signaux harmoniques
EP1724758A3 (fr) * 1999-02-09 2007-08-01 AT&T Corp. Réduction de délai pour une combinaison de préprocesseur de parole et codeur de parole
US7546237B2 (en) 2005-12-23 2009-06-09 Qnx Software Systems (Wavemakers), Inc. Bandwidth extension of narrowband speech
US7720677B2 (en) 2005-11-03 2010-05-18 Coding Technologies Ab Time warped modified transform coding of audio signals
US7765100B2 (en) * 2005-02-05 2010-07-27 Samsung Electronics Co., Ltd. Method and apparatus for recovering line spectrum pair parameter and speech decoding apparatus using same
US7813931B2 (en) 2005-04-20 2010-10-12 QNX Software Systems, Co. System for improving speech quality and intelligibility with bandwidth compression/expansion
US7912729B2 (en) 2007-02-23 2011-03-22 Qnx Software Systems Co. High-frequency bandwidth extension in the time domain
US8086451B2 (en) 2005-04-20 2011-12-27 Qnx Software Systems Co. System for improving speech intelligibility through high frequency compression
US8249861B2 (en) 2005-04-20 2012-08-21 Qnx Software Systems Limited High frequency compression integration
WO2023056920A1 (fr) * 2021-10-05 2023-04-13 Huawei Technologies Co., Ltd. Réseau neuronal perceptron multicouche pour traitement de la parole

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1989009985A1 (fr) * 1988-04-08 1989-10-19 Massachusetts Institute Of Technology Synthese d'ondes sinusoidales selon un procede efficace de calcul pour le traitement de formes d'ondes acoustiques
US5394508A (en) * 1992-01-17 1995-02-28 Massachusetts Institute Of Technology Method and apparatus for encoding decoding and compression of audio-type data

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1989009985A1 (fr) * 1988-04-08 1989-10-19 Massachusetts Institute Of Technology Synthese d'ondes sinusoidales selon un procede efficace de calcul pour le traitement de formes d'ondes acoustiques
US5394508A (en) * 1992-01-17 1995-02-28 Massachusetts Institute Of Technology Method and apparatus for encoding decoding and compression of audio-type data

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
LEFEBVRE R ET AL: "Spectral amplitude warping (SAW) for noise spectrum shaping in audio coding", 1997 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (CAT. NO.97CB36052), 1997 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, MUNICH, GERMANY, 21-24 APRIL 1997, ISBN 0-8186-7919-0, 1997, LOS ALAMITOS, CA, USA, IEEE COMPUT. SOC. PRESS, USA, pages 335 - 338 vol.1, XP002044323 *
PATISAUL C R ET AL: "Time-frequency resolution experiment in speech analysis and synthesis", JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, DEC. 1975, USA, vol. 58, no. 6, ISSN 0001-4966, pages 1296 - 1307, XP002044324 *
TOKUDA K ET AL: "Recursion formula for calculation of mel generalized cepstrum coefficients", TRANSACTIONS OF THE INSTITUTE OF ELECTRONICS, INFORMATION AND COMMUNICATION ENGINEERS A, JAN. 1988, JAPAN, vol. J71A, no. 1, ISSN 0373-6091, pages 128 - 131, XP002044325 *

Cited By (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1724758A3 (fr) * 1999-02-09 2007-08-01 AT&T Corp. Réduction de délai pour une combinaison de préprocesseur de parole et codeur de parole
WO2002093558A1 (fr) * 2001-05-15 2002-11-21 Wavecom Dispositif et procede de traitement d'un signal audio.
FR2824978A1 (fr) * 2001-05-15 2002-11-22 Wavecom Sa Dispositif et procede de traitement d'un signal audio
FR2826492A1 (fr) * 2001-06-22 2002-12-27 Thales Sa Procede et systeme de pre et de post-traitement d'un signal audio pour la transmission sur un canal fortement perturbe
EP1271473A1 (fr) * 2001-06-22 2003-01-02 Thales Procédé et système de pré et de post-traitement d'un signal audio pour la transmission sur un canal fortement pertubé
US7561702B2 (en) 2001-06-22 2009-07-14 Thales Method and system for the pre-processing and post processing of an audio signal for transmission on a highly disturbed channel
WO2003009639A1 (fr) * 2001-07-19 2003-01-30 Vast Audio Pty Ltd Enregistrement d'une scene auditive tridimensionnelle et reproduction de cette scene pour un auditeur individuel
US7489788B2 (en) 2001-07-19 2009-02-10 Personal Audio Pty Ltd Recording a three dimensional auditory scene and reproducing it for the individual listener
US8214203B2 (en) 2005-02-05 2012-07-03 Samsung Electronics Co., Ltd. Method and apparatus for recovering line spectrum pair parameter and speech decoding apparatus using same
US7765100B2 (en) * 2005-02-05 2010-07-27 Samsung Electronics Co., Ltd. Method and apparatus for recovering line spectrum pair parameter and speech decoding apparatus using same
US7813931B2 (en) 2005-04-20 2010-10-12 QNX Software Systems, Co. System for improving speech quality and intelligibility with bandwidth compression/expansion
US8249861B2 (en) 2005-04-20 2012-08-21 Qnx Software Systems Limited High frequency compression integration
US8086451B2 (en) 2005-04-20 2011-12-27 Qnx Software Systems Co. System for improving speech intelligibility through high frequency compression
US8219389B2 (en) 2005-04-20 2012-07-10 Qnx Software Systems Limited System for improving speech intelligibility through high frequency compression
US8311840B2 (en) 2005-06-28 2012-11-13 Qnx Software Systems Limited Frequency extension of harmonic signals
EP1739658A1 (fr) * 2005-06-28 2007-01-03 Harman Becker Automotive Systems-Wavemakers, Inc. Extension fréquentielle de signaux harmoniques
US7720677B2 (en) 2005-11-03 2010-05-18 Coding Technologies Ab Time warped modified transform coding of audio signals
US8412518B2 (en) 2005-11-03 2013-04-02 Dolby International Ab Time warped modified transform coding of audio signals
US8838441B2 (en) 2005-11-03 2014-09-16 Dolby International Ab Time warped modified transform coding of audio signals
US7546237B2 (en) 2005-12-23 2009-06-09 Qnx Software Systems (Wavemakers), Inc. Bandwidth extension of narrowband speech
US8200499B2 (en) 2007-02-23 2012-06-12 Qnx Software Systems Limited High-frequency bandwidth extension in the time domain
US7912729B2 (en) 2007-02-23 2011-03-22 Qnx Software Systems Co. High-frequency bandwidth extension in the time domain
WO2023056920A1 (fr) * 2021-10-05 2023-04-13 Huawei Technologies Co., Ltd. Réseau neuronal perceptron multicouche pour traitement de la parole

Also Published As

Publication number Publication date
AU3690197A (en) 1998-02-25

Similar Documents

Publication Publication Date Title
US7529660B2 (en) Method and device for frequency-selective pitch enhancement of synthesized speech
Tribolet et al. Frequency domain coding of speech
CN101425294B (zh) 声音编解码与发送接收设备及编码方法、通信终端和基站
AU2009267529B2 (en) Apparatus and method for calculating bandwidth extension data using a spectral tilt controlling framing
EP3602549B1 (fr) Appareil et procédé de post-traitement d'un signal audio à l'aide d'une détection de position transitoire
KR101213840B1 (ko) 복호화 장치 및 복호화 방법, 및 복호화 장치를 구비하는 통신 단말 장치 및 기지국 장치
DE69529393T2 (de) Verfahren zur gewichteten Geräuschfilterung
US20080126081A1 (en) Method And Device For The Artificial Extension Of The Bandwidth Of Speech Signals
EP1328923B1 (fr) Codage ameliore de maniere perceptible de signaux sonores
US20070219785A1 (en) Speech post-processing using MDCT coefficients
US11562756B2 (en) Apparatus and method for post-processing an audio signal using prediction based shaping
US20110125507A1 (en) Method and System for Frequency Domain Postfiltering of Encoded Audio Data in a Decoder
Valin et al. Bandwidth extension of narrowband speech for low bit-rate wideband coding
AU2001284606A1 (en) Perceptually improved encoding of acoustic signals
WO1998006090A1 (fr) Codage parole/audio a l'aide d'une transformee non lineaire a amplitude spectrale
US6629068B1 (en) Calculating a postfilter frequency response for filtering digitally processed speech
EP1395982B1 (fr) Systeme de codage de la parole adpcm dote de filtres de trainage de phase et de detrainage de phase
Füg et al. Temporal noise shaping on MDCT subband signals for transform audio coding
Luo et al. High quality wavelet-packet based audio coder with adaptive quantization
VIJAYASRI et al. IMPLEMENTATION OF A NOVEL TRANSFORMATION TECHNIQUE TO IMPROVE SPEECH COMPRESSION RATIO
Raksapatcharawong A Hi-Fi Audio Coding Technique for Wireless Communication based on Wavelet Packet Transformation
Bhaskar Adaptive predictive coding with transform domain quantization using block size adaptation and high-resolution spectral modeling

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AL AM AT AU AZ BA BB BG BR BY CA CH CN CU CZ DE DK EE ES FI GB GE GH HU IL IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MD MG MK MN MW MX NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT UA UG US UZ VN YU ZW AM AZ BY KG KZ MD RU TJ TM

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): GH KE LS MW SD SZ UG ZW AT BE CH DE DK ES FI FR GB GR IE IT LU MC NL

DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
121 Ep: the epo has been informed by wipo that ep was designated in this application
NENP Non-entry into the national phase

Ref country code: CA

REG Reference to national code

Ref country code: DE

Ref legal event code: 8642

NENP Non-entry into the national phase

Ref country code: JP

Ref document number: 1998507412

Format of ref document f/p: F

122 Ep: pct application non-entry in european phase