WO2024194493A1 - Joint stereo coding in complex-valued filter bank domain - Google Patents
Joint stereo coding in complex-valued filter bank domain Download PDFInfo
- Publication number
- WO2024194493A1 WO2024194493A1 PCT/EP2024/057870 EP2024057870W WO2024194493A1 WO 2024194493 A1 WO2024194493 A1 WO 2024194493A1 EP 2024057870 W EP2024057870 W EP 2024057870W WO 2024194493 A1 WO2024194493 A1 WO 2024194493A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- channel
- signal
- stereo
- mid
- coding mode
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/008—Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/04—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
- G10L19/16—Vocoder architecture
- G10L19/167—Audio streaming, i.e. formatting and decoding of an encoded audio signal representation into a data stream for transmission or storage purposes
Definitions
- M/S mid/side
- a mid (M) signal is formed as a combination of the L and R channel signals
- a side (S) signal may be formed as a difference between the L and R channel signals.
- the M and S signals are coded instead of the L and R signals.
- M/S stereo coding can be implemented in a time and frequency-variant manner.
- a stereo encoder can apply L/R for encoding some frequency bands of the stereo signal
- M/S coding can be used for encoding other frequency bands of the stereo signal (frequency variant).
- some encoders can switch over time between L/R and M/S coding (time-variant) methods.
- embodiments described herein provide systems, methods and/or devices with extensions to mid-side stereo coding applied in a complex-valued filter domain. For example, embodiments provide an adjusted phase alignment between a left audio channel and a right audio channel prior to the mid-side conversion in combination with real-valued prediction of the side signal from the mid signal in the encoder. Additionally, embodiments described herein provide a novel method of phase alignment in the complex domain and transmitting of the complex audio data. [0008] The inter-channel phase difference which may be transmitted in a bitstream as metadata is used to reconstruct the original inter-channel phase relation at the decoder.
- phase alignment operation can be applied without loss of information or risk of degradation, which is in general not the case when only real-valued data is encoded.
- different processing blocks e.g., phase alignment, mid-side conversion, and side signal prediction blocks
- phase alignment, mid-side conversion, and side signal prediction blocks are enabled or disabled based on a level-dependent psychoacoustic model and, in some instances, based on parameter side rate cost.
- Embodiments described herein improve spatial noise shaping (by preventing spatial unmasking) and improve energy compaction compared to known mid-side coding, particularly for signals with systematic inter-channel level differences and inter-channel phase shifts or time delays. Such characteristics are common for binaural signals which have been generated by filtering audio objects or channels with head-related transfer functions.
- embodiments described herein include an encoder receiving a left- channel (e.g., a left input signal) and a right-channel (e.g., a right input signal) that are binaural channels.
- Complex filter bank analysis is performed on the left-channel and the right-channel to convert the left-channel and the right-channel to a complex-valued filter bank domain.
- Converting the left-channel and the right-channel to the complex-valued filter bank domain prepares the left-channel and the right-channel for rendering by a head-tracking device, such as being processed by a head related transfer function (HRTF).
- the complex-valued filter bank domain signals may be, for example, one or more frequency bands.
- stereo analysis is performed on the complex-valued filter bank domain left-channel and the complex-valued filter bank domain right-channel.
- the stereo analysis may identify energies of the complex-valued filter bank domain left-channel and energies of the complex-valued filter bank domain right-channel. Additionally, the stereo analysis may identify energies of a potential Mid signal and a potential Side signal.
- the Mid signal represents the sum of the left-channel and the right-channel.
- the Side signal represents the difference between the left-channel and the right-channel.
- the energy of a potential residual signal is determined.
- the stereo analysis also generates stereo metadata based on the left- channel and the right-channel. For example, a covariance of the complex-valued filter bank domain left-channel and the complex-valued filter bank domain right-channel may be calculated. An inter-channel phase difference of the left-channel and the right-channel is determined based on the covariance.
- a real-valued prediction coefficient (e.g., a side prediction coefficient) is calculated based on the energies of the complex-valued filter bank domain left- channel and the complex-valued filter bank domain right-channel, as well as the energies of the potential Mid signal and the potential Side signal.
- a single stereo coding mode of a plurality of stereo coding modes is selected for signaling and encoding the left-channel and the right-channel. For example, for each possible stereo coding mode, a bit cost of signaling the left-channel and the right-channel in that stereo coding mode is determined.
- the bit cost is based on the energies of the complex-valued filter bank domain left-channel, the complex-valued filter bank domain right-channel, the possible Mid signal, the possible Side signal, and the possible residual signal. In some instances ratios of the energies of signals involved in each stereo coding mode are determined.
- the stereo coding mode is selected based on the bit cost. In some instances, the stereo coding mode with the lowest bit cost is selected. [0014] In some aspects, the left-channel and the right-channel are processed according to the selected stereo coding mode.
- the stereo coding mode when the stereo coding mode is a separate coding mode, a stereo processor performs signaling of the left-channel and the right-channel (or the complex-valued filter bank domain left-channel and the complex-valued filter bank domain right-channel) directly without converting the signals to the Mid signal and the Side signal.
- the selected stereo mode is a basic Mid/Side stereo mode. In this instance, the left-channel and the right-channel are converted to the Mid signal and the Side signal. The stereo processor performs signaling of the Mid signal and the Side signal.
- the selected stereo mode is a Mid/Side stereo mode with adjusted phase alignment.
- the left-channel and the right-channel are phase-aligned based on the inter-channel phase difference. After aligning, the left-channel and the right-channel are converted to the Mid signal and the Side signal.
- the stereo processor performs signaling of the Mid signal and the Side signal, and the phase difference is encoded alongside the Mid signal and the side signal.
- the selected stereo mode is a Mid/Side stereo mode with side prediction.
- the left-channel and the right-channel are converted to the Mid signal and the Side signal.
- a residual signal is generated based on the Side signal and a prediction coefficient.
- the stereo processor performs signaling of the Mid signal and the residual signal, and the prediction coefficient is encoded alongside the Mid signal and the residual signal.
- the selected stereo mode is a Mid/Side stereo mode with both adjusted phase alignment and side prediction.
- the left-channel and the right- channel are phase-aligned based on the inter-channel phase difference.
- the left- channel and the right-channel are converted to the Mid signal and the Side signal.
- a residual signal is generated based on the Side signal and a prediction coefficient.
- the stereo processor performs signaling of the Mid signal and the residual signal, and the prediction coefficient and phase difference are encoded alongside the Mid signal and the residual signal.
- One example provides a method for encoding a stereo audio signal in a bitstream.
- the method includes passing a left-channel and a right-channel of the stereo audio signal to a complex-valued filter bank analysis block to responsively generate one or more frequency bands and calculating, for each of the one or more frequency bands, an energy of the left-channel, an energy of the right-channel, and a covariance of the left-channel and the right-channel.
- the method includes selecting, based on the calculated energy of the left-channel, the calculated energy of the right-channel, and the calculated covariance of the left-channel and the right- channel, a stereo coding mode in which to encode the left-channel and the right-channel.
- the method includes computing a phase difference between the left-channel and the right-channel, adjusting phase alignment between the left-channel and the right-channel based on the computed phase difference to generate an aligned left-channel and an aligned right-channel, transforming the aligned left- channel and the aligned right-channel to a Mid signal and a Side signal, generating a residual signal based on side prediction data and the Side signal; encoding the Mid signal, the residual signal, the phase difference, and the side prediction data in the bitstream, and outputting the bitstream for the selected stereo coding mode.
- Another example provides an apparatus for encoding a stereo audio signal in a bitstream, the apparatus including an electronic processor.
- the electronic processor is configured to pass a left-channel and a right-channel of the stereo audio signal to a complex-valued filter bank analysis block to responsively generate one or more frequency bands, calculate, for each of the one or more frequency bands, an energy of the left-channel, an energy of the right-channel, and a covariance of the left-channel and the right-channel and select, based on the calculated energy of the left-channel, the calculated energy of the right-channel, and the calculated covariance of the left-channel and the right-channel, a stereo coding mode in which to encode the left-channel and the right-channel.
- the electronic processor is configured to compute a phase difference between the left-channel and the right-channel, adjust phase alignment between the left-channel and the right-channel based on the computed phase difference to generate an aligned left-channel and an aligned right- channel, transform the aligned left-channel and the aligned right-channel to a Mid signal and a Side signal, generate a residual signal based on side prediction data and the Side signal, encode the Mid signal, the residual signal, the phase difference, and the side prediction data in the bitstream, and output the bitstream for the selected stereo coding mode.
- Another example provides a method for decoding a stereo audio signal.
- the method includes receiving an encoded bitstream, decoding, from the bitstream, a replicated Mid signal, a replicated residual signal, and replicated stereo metadata, wherein the replicated stereo metadata includes a phase difference and side prediction data, converting the replicated Mid signal and the replicated residual signal to a replicated left channel and a replicated right channel using the stereo metadata, and passing the replicated left channel and the replicated right channel to a filter bank analysis block to recreate an original left channel and an original right channel.
- Another example provides an apparatus for decoding a stereo audio signal, the apparatus including an electronic processor.
- the electronic processor is configured to receive an encoded bitstream, decode, from the bitstream, a replicated Mid signal, a replicated residual signal, and replicated stereo metadata, wherein the replicated stereo metadata includes a phase difference and side prediction data, convert the replicated Mid signal and the replicated residual signal to a replicated left channel and a replicated right channel using the replicated stereo metadata, and pass the replicated left channel and the replicated right channel to a filter bank analysis block to recreate an original left channel and an original right channel.
- FIG.1 illustrates a block diagram of an example audio coding system in which various aspects of the present invention may be incorporated.
- FIG.2 illustrates a block diagram of an example encoder.
- FIG.3 illustrates a block diagram of an example stereo processing process.
- FIG.4A illustrates an example joint stereo process conversion.
- FIG.4B illustrates a table that provides each of the joint stereo coding types and their associated bitstream syntax elements.
- FIG.5A illustrates a block diagram of an example mode-based stereo processing unit, such as the mode-based stereo processing unit of FIG.3.
- FIG.5B illustrates a table that provides an example bitstream syntax.
- FIG.6A illustrates a graph illustrating a stereo metadata rate for an extended Mid/Side coding mode per audio frame, in accordance with various aspects of the present disclosure.
- FIGS.6B-6D illustrate examples of pseudocode.
- FIGS.7A-7B illustrate a block diagram of various example methods for encoding stereo signals, which may be performed by the encoder of FIG.2, in accordance with various aspects of the present disclosure.
- FIG.8A illustrates a block diagram of an example decoder.
- FIG.8B illustrates an example of pseudocode.
- FIG.9 illustrates a block diagram of various example methods for decoding stereo signals, which may be performed by the decoder of FIG.8A, in accordance with various aspects of the present disclosure.
- FIG.10 illustrates a graph of a PEAQ evaluation for twelve audio items.
- FIG.11A illustrates a schematic block diagram of an example device architecture that may be used to implement various aspects of the present disclosure.
- FIG.11B illustrates a schematic block diagram of an example CPU implemented in the device architecture of FIG.11A that may be used to implement various aspects of the present disclosure.
- FIG.1 illustrates a block diagram of an audio coding system in which various aspects of the present invention may be incorporated.
- the example audio coding system 100 includes an encoder 110 and a decoder 120.
- the input of the encoder 110 corresponds to a first signal path 105, while an output of the encoder 100 corresponds to a second signal path 115.
- the input of the decoder 120 corresponds to the second signal path 115, while an output of the decoder corresponds to a third signal path 125.
- the encoder 110 is configured to receive, from the first signal path 105, one or more streams of audio information representing one or more channels of audio signals.
- the encoder 110 is further configured to process the received streams of audio information and generate an encoded signal, which may be output to the second signal path 115.
- the encoded signal may be stored (e.g., captured, buffered and/or recorded), or transmitted (e.g., via a wired or wireless communication medium).
- the decoder 120 is configured to receive the encoded signal from the second signal path 115.
- the decoder 120 is further configured to process the received encoded signal and generate a decoded signal, which may be output to the third signal path 125.
- the decoded signal that is generated by the decoder 120 corresponds to a replica of the audio information previously received by the encoder 110 from the first signal path 105.
- the decoded signal may be stored (e.g., captured and/or recorded), transmitted (e.g., via a wireless or wired electronic communication medium), or output to a listening device (e.g., an audio processing device such as a receiver, speaker, soundbar, etc.).
- the audio coding system 100 may be an audio system capable of implementing an audio codec standard, such as the Immersive Voice and Audio Services (IVAS) standard.
- the encoded signal at the second signal path 115 may correspond to an IVAS bitstream.
- the terms “replica” and “replica signal” are not intended to mean that the streams of audio information are “identical”.
- the term “replica” may indicate that the streams of audio information are approximately the same as the original audio information.
- the decoder 120 can in principle recover a lossless version that is approximately the same as the original audio information from the streams.
- the content of the recovered replica signal is generally not identical to the content of the original stream but it may be perceptually indistinguishable from the original content.
- the terms “replica” and “replica signal” are intended to cover both lossless and lossy encoding techniques as used herein.
- FIG.2 illustrates an example of the encoder 110.
- the encoder 110 includes a complex filter bank analysis block 205, a stereo processing block 210, an encoding block 215, and a bitstream writing block 220.
- the complex filter bank analysis block 205 is configured to receive left and right audio channels (for example, from one or more microphones).
- the left and right audio channels may be binaural channels that have a small delay between the channels and a correlation between the channels.
- the left and right audio channels may be processed by a head related transfer function (HRTF). In some instances, the left and right audio channels are Ambisonics signals.
- the complex filter bank analysis 205 is further configured to process the left and right audio channels to generate complex-value filter bank domain left signals and complex-valued filter bank domain right signals.
- the complex filter bank analysis block 205 is configured to output the complex-valued filter bank domain left signals and the complex-valued filter bank domain right signals to the stereo processing block 210.
- the complex-valued filter bank domain signals may include additional metadata for processing by a head tracking device.
- the complex-valued filter bank domain signals are one or more frequency bands associated with the left-channel and the right-channel.
- the complex- filter bank analysis may be a perfect reconstruction such as for a Modified Discrete Fourier Transform (MDFT), a Modified Discrete Cosine/Sine Transform (MDCT), or a near perfect reconstruction such as a complex modulated filter bank.
- MDFT Modified Discrete Fourier Transform
- MDCT Modified Discrete Cosine/Sine Transform
- a near perfect reconstruction such as a complex modulated filter bank.
- the stereo processing block 210 is configured to receive the complex-valued filter bank domain left signals and the complex-valued filter bank domain right signals from the complex filter bank analysis 205.
- the stereo processing block 210 is configured to perform stereo processing analysis by assembling filter bank frequency bins that are associated to a frame to frequency bands according to an auditory frequency scale, such as an Equivalent Rectangular Bandwidth (ERB) scale or a Bark scale. For every frequency band, the energy of the left and right channel and the covariance from both channels are calculated. Additionally, the energies of the Mid and Side signals are computed per frequency band. A correlation coefficient is computed per band from the covariance.
- ERP Equivalent Rectangular Bandwidth
- a phase difference between the left and right channel may be computed and used to adjust the phase alignment.
- the covariance may be updated based on the quantized phase difference.
- the stereo processing block 210 may compute a real-valued prediction coefficient, which can be used to remove redundancy in the Side signal with respect to the Mid signal.
- the adjusted phase alignment may lead to an increase of the Mid signal energy and a decrease of the Side signal energy. Additionally, the adjusted phase alignment may ensure that the Side signal energy is always lower than the Mid signal energy, which may lead to situations where the left and right signal parts cancel out in the Mid signal for certain frequencies.
- the side signal residual energy may be computed based on the quantized prediction coefficient.
- the computed signal energies, as well as computed stereo processing metadata, are output by the stereo processing block 210 to encoding block 215.
- the encoding block 215 is configured to receive the computed signal energies and the computed stereo processing metadata from the stereo processing block 210.
- the encoding block 215 is further configured to encode the computed signal energies and the stereo processing metadata as an encoded signal.
- the encoded computed signal energies and the encoded stereo processing metadata are output by the encoding block 215 to the bitstream writing block 220.
- the bitstream writing block 220 is configured to receive the encoded computed signal energies and the encoded stereo processing metadata.
- the bitstream writing block 220 is configured to convert the encoded computed signal energies and the encoded stereo processing metadata to a bitstream.
- the bitstream may be output by the bitstream writing block 220.
- Stereo processing is based on a determined stereo coding mode.
- Some example stereo coding modes, or methods include at least: a Left/Right mode, a Mid/Side mode, and an extended Mid/Side mode.
- the selected stereo coding mode for a certain frequency band may be determined based on the estimated or computed inter-channel energy difference of the signal pairs for each possible mode. For example, the coding mode decision may be based on an estimated number of bits that would be required to encode each signal pair for the corresponding coding mode.
- the estimation of the required number of bits may be based on a level-dependent psychoacoustic model that assigns more bits to louder signal parts than to quieter signal parts. Accordingly, a signal pair with the highest energy difference may require the least number of bits at a certain quality level.
- the bits needed to encode the stereo metadata may also be considered in determining the most efficient coding method.
- the same level dependent psycho-acoustical model may also be used to select the encoding method for the audio data. For the Mid/Side modes, a bit savings estimate relative to the Left/Right coding may be computed and added up to determine the total bit savings on a per frame basis.
- FIG.3 illustrates an example stereo processing process or method, such as that performed by stereo processing block 210.
- the example stereo processing block 210 includes a stereo analysis unit 305, a psychoacoustic model 310, a bit cost estimator unit 315, a mode decision unit 320, and a mode-based stereo processing unit 325.
- the stereo analysis unit 305 is configured to receive the complex-valued filter bank domain left and right signals from the complex filter bank analysis block 205.
- the stereo analysis unit 305 determines, by computation or estimation, the energies of the complex-valued filter bank domain left and right signals, the energies of the Mid signal and the Side signal, and residual energies.
- the stereo analysis unit 305 is further configured to provide the energies to the bit cost estimator unit 315.
- the stereo analysis unit 305 also determines, by computation or estimation, stereo metadata based on the complex-valued filter bank domain left and right signals.
- the stereo metadata may include, for example, one or more of a phase difference, a prediction coefficient, and/or a covariance of the complex-valued filter bank domain left and right signals, the Mid signal, and the Side signal.
- the stereo metadata is provided by the stereo analysis unit 305 to the mode-based stereo processing unit 325 and the bit cost estimator unit 315.
- the bit cost estimator unit 315 estimates the required number of bits needed to encode the signals based on the determined energies provided by the stereo analysis unit 305 and based on the psychoacoustic model 310. For example, the bit cost estimator unit 315 estimates the required number of bits needed to encode the signals for each candidate (or possible) stereo coding mode.
- the bit cost estimator unit 315 is configured to transmit the estimated required number of bits to the mode decision unit 320.
- the mode decision unit 320 is configured to receive the estimated required number of bits, and responsively determines the stereo coding mode (for example, selects one of the candidate stereo coding modes) based on the estimated required number of bits.
- the mode decision unit 320 is further configured to provide the selected (or determined) stereo coding mode to the mode-based stereo processing unit 325.
- the mode-based stereo processing unit 325 is configured to receive the selected stereo coding mode from the mode decision unit 320.
- the mode-based stereo processing unit 325 signals the processed left and right signals using, for example, two bits per frame (e.g., signal 0 and signal 1).
- the processed left and right signals may correspond to the left and right signals, Mid and Side signals, or Mid and residual signals, as described below in more detail.
- FIG.4A illustrates an example joint stereo process 400 performed by the stereo analysis unit 305.
- the stereo analysis unit 305 analyzes the complex- valued filter bank domain left and right signals received from the complex filter bank analysis block 205.
- the stereo analysis unit 305 receives the complex-valued filter bank domain left signal L and the complex-valued filter bank domain right signal R.
- the phase-rotated right signal R’ is provided according to Equation 3: [Equation 3] where ⁇ is the phase rotation angle.
- Stereo processing parameters (that are provided as the stereo metadata) are also generated by the stereo analysis unit 305.
- the stereo processing parameters are based on energy and covariance measures for each perceptual frequency band k in an audio frame f (which is omitted in the below computations for clarity). Therefore, energy compaction estimates for multiple stereo coding types referred to by the mode decision unit 320 are computed.
- the energy for the left input signal L is provided by Equation 7: ⁇ ⁇ [ Equation 7] where * denotes complex conjugate, N is the number of filter bank time slots per frame, ⁇ ⁇ is the ⁇ ⁇ first filter bank time slot index in the audio frame f, ⁇ and ⁇ ⁇ correspond to the filter bank frequency bins associated with the frequency band k.
- the total number of frequency bands (K) is approximately 23, and the total number of filter bank frequency bins is greater than K (for example, 60, 64, 256, or more filter bank frequency bins).
- the energy for the right input signal R is provided by Equation 8: ⁇ ⁇ [Equation 8]
- the covariance with respect to the left input signal L and the right input signal R is provided by Equation 9: ⁇ [Equation 9]
- the inter-channel phase difference, which may be used to maximize the Mid signal energy, is provided by Equation 10: ⁇ ) [Equation 10] where the threshold Thr may be greater than or equal to 0.5, but less than 1.0.
- Table 1 shown in FIG.4B, provides each of the joint stereo coding types and their associated bitstream syntax elements.
- the type “0” corresponds to separate coding of the left and right signals, and is not a joint stereo coding type.
- the joint coding types are associated with a corresponding stereo mode MSMode.
- the related bitstream elements are provided to the mode-based stereo processing unit 325.
- ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ is a band-dependent factor.
- An example of the psychoacoustic model 310 can be found in U.S. Patent Publication No.2022/0415334, “A Psychoacoustic Model for Audio Processing,” incorporated herein by reference in its entirety.
- the constant ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ is an estimate of the number of bits needed to quantize a value to achieve a certain signal-to-noise ratio (SNR).
- SNR signal-to-noise ratio
- the SNR is dependent on the utilized codec, and may be, for example, approximately 0.3.
- the total reduction of required bits for a certain joint coding type is computed by summing the associated reduction in all bands, as provided by Equation 21: , [Equation 21] considering the signaling cost ⁇ ⁇ ⁇ ⁇ ⁇ according to the bitstream syntax, described below with respect to Table 2.
- the joint coding type may be selected such as to exceed bit reduction of another joint coding type by a certain percentage.
- FIG.5A illustrates a block diagram of an example mode-based stereo processing unit, such as mode-based stereo processing unit 325 of FIG.3. The illustrated example may be used when a Mid/Side mode is selected.
- the example mode-based stereo processing unit 325 includes a phase-alignment block 505, a Mid/Side conversion block 510, and a side prediction block 515.
- the phase-alignment block 505 is configured to receive the left input signal L, the right input signal R, and the quantized phase difference.
- the phase-alignment block 505 performs phase alignment adjustment between the left input signal L and the right input signal R using the quantized phase difference ⁇ ⁇ , ⁇ .
- the adjustment for phase alignment may be applied to one of the L or R signals, while in other examples the adjustment for phase alignment may be applied to both the L and R signals.
- the phase-aligned left input signal L and the phase- aligned right input signal R are output by the phase-alignment block 505 to the side conversion block 510.
- the Mid/Side conversion block 510 is configured to receive the left input signal L and the right input signal R from the phase-alignment block 505, which may be phase-aligned as discussed above.
- the Mid/Side conversion block 510 converts the left input signal L and the right input signal R to Mid signal M and side signal S (and, in some instances, residual signal S’), as previously described with respect to FIG.4A.
- the Mid signal M and the side signal S are output by the side conversion block 510 to the side prediction block 515.
- the side prediction block 515 receives the Mid signal M and the side signal S from the side conversion block 510 and receives the quantized prediction coefficient ⁇ ⁇ , ⁇ from the mode decision unit 320.
- the side prediction block 515 applies side prediction to the Mid signal M and the side signal S.
- the side prediction block 515 is disabled or bypassed (e.g., ⁇ ⁇ , ⁇ is set to 0).
- the stereo modes may be signaled, for example, by 2 bits per frame from the mode-based stereo processing unit 325.
- Mid/Side coding versus Left/Right coding may be signaled by one bit per frequency band.
- Mid/Side coding active for all frequency bands may be indicated by a single bit per frame.
- the presence of phase difference data may be signaled by one bit per frame and the side prediction data may be signaled by one bit per frame. In this manner, the metadata size may be reduced for different types of stereo signals.
- the side prediction data may be Huffman entropy coded per differences across relevant frequency bands. If the phase difference data is present, then the phase difference data may be entropy coded. Table 2, shown in FIG.5B, provides an example bitstream syntax.
- the inter-channel phase difference data is linearly quantized.
- the scale factor used for quantization is selected such that the values ⁇ and - ⁇ are precisely represented as shown in Equation 12.
- the second order differences of the phase symbols ⁇ ⁇ , ⁇ e.g., the quantized inter-channel phase differences per band
- the relevant frequency bands may be computed and wrapped into a [- ⁇ , + ⁇ ] range such that no jumps larger than ⁇ are required.
- the second order differences may be computed by first computing the difference between the phase symbols in bands 1 to N and the phase symbols in bands 0 to N-1, where N is the number of bands where M/S processing with phase adjustment is active (for example, see variable “numMSBands” in Table 2 of FIG.5B and in Pseudocode 1 of FIG.6B).
- the second order differences are computed as the differences between the new values in bands 2 to N and the new values in bands 1 to N, as shown in Pseudocode 2 of FIG.6C.
- the quantized number in the first frequency band may be unmodified
- the quantized number in the second frequency band may represent a difference
- the quantized numbers in the rest of the frequency bands may correspond to differences from the first difference.
- the wrapped phase symbols may be entropy coded, where the symbol zero may be encoded using the least number of bits.
- the signal model has inter-channel time delay or a constant phase shift.
- the second order frequency differentials are substantially equal to zero and thus need only a small number of bits to be encoded. In other instances, the second order frequency differentials are substantially non- zero. Additionally, in some instances, only the first order difference is used for encoding.
- FIG.6A illustrates the stereo metadata rate for the extended Mid/Side coding mode per audio frame for an item with five samples (at 48 kHz) inter-channel time delay over a plurality of frames when encoding with either the first order differential or the second order differential.
- the x-axis provides the plurality of frames and the y-axis provides the bitrate (in kB/s) of encoding for each frame.
- the quantized phase angles when encoding the quantized phase angles for jointly coded bands, as indicated by MSFlags ⁇ ⁇ , the quantized phase angles may be first arranged adjacently.
- Pseudocode 1 as shown in FIG.6B.
- first and second order differences are computed for the arranged quantized phase angles.
- the data is wrapped into the quantized 2 ⁇ range before Huffman encoding.
- Pseudocode 2 as shown in FIG.6C provides one example of computing the first and second order differences, and presented using a C-like format.
- Pseudocode 3 as shown in FIG. 6D, provides an example of wrapping of the phase symbols (variable phaseQ) into 2 ⁇ range.
- phase symbols either represent absolute phase (first frequency band), the first order difference (second frequency band), or the second order difference across frequency bands (other frequency bands).
- FIGS.7A-7B illustrate a block diagram of various example methods 700 for encoding stereo signals, which may be performed by the encoder 110 of FIG.2.
- the methods 700 may be performed by a processor, which may be configured to perform methods 700 via machine- executable instructions.
- the methods 700 may be broken into various blocks or partitions, such as blocks 705, 710, 715, 720, 725, 730, 735, 740, 745, 750, 755, 760, 765, and 770.
- the various process blocks illustrated in FIG.7 provide examples of various methods disclosed herein, and it is understood that some blocks may be removed, added, combined, or modified without departing from the spirit of the present disclosure.
- processing of the various blocks may commence at block 705.
- “Passing Left-Channel and Right-Channel of Stereo Audio Signal To Filter Bank Analysis” an example method 700 may include passing a left-channel and a right-channel of a stereo audio signal to a filter bank analysis. The filter bank analysis responsively generates one or more frequency bands. Processing may continue from block 705 to block 710.
- an example method 700 may include calculating the energy of the left-channel, calculating the energy of the right-channel, and calculating the covariance of the left-channel and the right channel, as previously described with respect to Equations 7-16. Processing may continue from block 710 to block 715.
- “Determining Bit Cost of a Plurality of Stereo Coding Modes” an example method 700 may include determining a bit cost of a plurality of stereo coding modes, as previously described with respect to Equations 17-22 and Table 1. Processing may continue from block 715 to block 720.
- the method 700 includes selecting a stereo coding mode, as previously described with respect to Table 1.
- a separated coding mode is selected at block 720, processing continues from block 720 to block 725.
- An example method 700 may include, at block 725, “Encoding Left-Channel and Right-Channel”, encoding the left-channel and the right-channel separately.
- the stereo processing block 210 passes the left input signal and the right input signal to the encoding block 215.
- the Mid/Side coding mode is selected at block 720, processing continues from block 720 to block 730.
- An example method 700 may include, at block 730, “Converting To Mid Signal and Side Signal”, converting (for example, transforming) the left-channel and the right-channel to a Mid signal and a Side signal (for example, with the side conversion block 510). Processing may continue from block 730 to block 735. At block 735, “Encoding Mid Signal and Side Signal,” an example method 700 may include encoding the Mid signal and the Side signal. [0092] When the extended Mid/Side coding mode is selected at block 720, processing continues from block 720 to block 740.
- An example method 700 may include, at block 740, “Adjusting Phase Alignment Based On Phase Difference,” adjusting phase alignment to the left-channel and/or the right-channel based on a calculated phase difference (for example, with the phase- alignment block 505). Processing may continue from block 740 to block 745. [0093] At block 745, “Converting to Mid Signal and Side Signal,” the method 700 includes converting the phase-aligned left-channel and right-channel to a Mid signal and a Side signal (for example, with the side conversion block 510). Processing may continue from block 745 to block 750.
- an example method 700 may include generating a residual signal using a side prediction coefficient (for example, with the side prediction block 515). Processing may continue from block 750 to block 755.
- “Encoding Mid Signal and Residual Signal” an example method 700 may include encoding the Mid signal and the residual signal. In some instances, the phase difference and the side prediction coefficient are encoded alongside the Mid signal and the residual signal. Additionally, in some instances, only phase alignment adjustment or use of the side prediction coefficient is performed.
- FIG.8A illustrates a block diagram of an example decoder 120. The example decoder 120 reverses the encoding performed by the encoder 110.
- the example decoder 120 includes a bitstream reading block 805, a decoding block 810, an inverse stereo processing block 815, and a filter bank synthesis block 820.
- the bitstream reading block 805 receives the bitstream from the encoder 110.
- the bitstream reading block 805 processes the bitstream and provides the processed bitstream to the decoding block 810.
- the decoding block 810 is configured to receive the processed bitstream from the bitstream reading block 805.
- the decoding block 810 processes (e.g., decodes) the processed bitstream to substantially replicate the Mid signal, the Side signal, and the stereo metadata.
- the replicated Mid signal, the replicated Side signal, and the replicated stereo metadata are provided by the decoding block 810 to the inverse stereo processing block 815.
- the inverse stereo processing block 815 is configured to receive the replicated Mid signal, the replicated Side signal, and the replicated stereo metadata from the decoding block 810.
- the inverse stereo processing block 815 is configured to process the replicated Mid signal, the replicated Side signal, and the replicated stereo metadata to generate a replicated complex- valued filter bank domain left signal and a replicated complex-valued filter bank domain right signal.
- the inverse stereo processing block 815 first reconstructs the Side signal using side prediction information included in the stereo metadata. Then, the Mid/Side transform is inversed. Finally, the original phase relation for the left signal and the right signal is reinstated based on the transmitted phase data.
- phase adjustment is applied only to one channel. In other instances, phase adjustment (or alignment) is applied to both the left channel and the right channel.
- Equation 23 provides an example of decoding the jointly coded signals in matrix notation: 1 ⁇ ⁇ + ⁇ 1 ⁇ ⁇ [Equation 23] [0099]
- Decoding of the second order frequency differential coding of the transmitted phase signals is provided by Pseudocode 4 shown in FIG.8B.
- the inverse stereo processing block 815 provides the replicated complex-valued filter bank domain left signal and the replicated complex-valued filter bank domain right signal to the filter bank synthesis block 820.
- the filter bank synthesis block 820 is configured to receive the replicated complex-valued filter bank domain left signal and the replicated complex-valued filter bank domain right signal.
- the filter bank synthesis block 820 converts the replicated complex-valued filter bank domain left signal and the replicated complex-valued filter bank domain right signal to a replicated left signal and a replicated right signal.
- the filter bank synthesis block 820 outputs the replicated original left signal and the replicated original right signal.
- FIG.9 illustrates a block diagram of various example methods 900 for decoding stereo signals, which may be performed by the decoder 120 of FIG.8A.
- the example methods 900 may be performed by a processor, which may be configured to perform methods 900 via machine-executable instructions.
- the methods 900 may be broken into various blocks or partitions, such as blocks 905, 910, 915, and 920.
- the various process blocks illustrated in FIG. 9 provide examples of various methods disclosed herein, and it is understood that some blocks may be removed, added, combined, or modified without departing from the spirit of the present disclosure.
- stereo signals may have been previously encoded using an extended Side/Mid mode.
- the processing of the various blocks which may be described as processes, methods, steps, blocks, operations, or functions, may commence at block 905.
- an example method 900 may include receiving an encoded bitstream.
- the decoder 120 receives a bitstream from the encoder 110.
- an example method 900 may include decoding, from the bitstream, the replicated Mid signal, the replicated residual signal, and the replicated stereo metadata. Processing may continue from block 910 to block 915.
- an example method 900 may include converting the replicated Mid signal and the replicated residual signal to a replicated left channel and a replicated right channel using the replicated stereo metadata. For example, the Side signal is replicated using side prediction information included in the replicated stereo metadata.
- the replicated Mid signal and the replicated Side signal are converted to a replicated left channel and a replicated right channel.
- phase adjustment is performed using a phase difference included in the stereo metadata. Processing may continue from block 915 to block 920.
- an example method 900 may include passing the replicated left channel and the replicated right channel to a filter bank analysis to replicate the original left channel signal and the original right channel signal.
- FIG.10 illustrates a graph of a Perceptual Evaluation of Audio Quality (PEAQ) for twelve audio items (provided along the x-axis).
- PEAQ Perceptual Evaluation of Audio Quality
- An audio item is an audio event captured by a microphone for encoding by the encoder 110.
- An Objective Difference Grade (ODF) value is provided (along the y-axis) for each audio item.
- the graph illustrates the benefits of joint stereo encoding. For example, audio quality was improved at least in audio item 5 (which had a small inter channel time delay) as indicated by the by ODG value for encoding item 5 with the extended Mid/Side mode being closer to zero than encoding item 5 with the Mid/Side mode.
- Audio item 9, which is a panned speech item shows a similar improvement in audio quality, as indicated by the ODG for encoding item 9 with the extended Mid/Side mode being closer to zero than encoding item 9 with the Mid/Side mode.
- FIG.11A illustrates a schematic block diagram of an example device architecture 1100 (e.g., an apparatus 1100) that may be used to implement various aspects of the present disclosure.
- Architecture 1100 includes but is not limited to servers and client devices, systems, and methods as described in reference to FIGS.1-10.
- the architecture 1100 includes central processing unit (CPU) 1101 which is capable of performing various processes in accordance with a program stored in, for example, read only memory (ROM) 1102 or a program loaded from, for example, storage unit 1108 to random access memory (RAM) 1103.
- the CPU 1101 may be, for example, an electronic processor 1101.
- I/O interface 1105 input unit 1106, that may include a keyboard, a mouse, or the like; output unit 1107 that may include a display such as a liquid crystal display (LCD) and one or more speakers; storage unit 1108 including a hard disk, or another suitable storage device; and communication unit 1109 including a network interface card such as a network card (e.g., wired or wireless).
- input unit 1106, that may include a keyboard, a mouse, or the like
- output unit 1107 that may include a display such as a liquid crystal display (LCD) and one or more speakers
- storage unit 1108 including a hard disk, or another suitable storage device
- communication unit 1109 including a network interface card such as a network card (e.g., wired or wireless).
- input unit 1106 includes one or more microphones in different positions (depending on the host device) enabling capture of audio signals in various formats (e.g., mono, stereo, spatial, immersive, and other suitable formats).
- output unit 1107 include systems with various number of speakers. Output unit 1107 (depending on the capabilities of the hose device) can render audio signals in various formats (e.g., mono, stereo, immersive, binaural, and other suitable formats).
- communication unit 1109 is configured to communicate with other devices (e.g., via a network). Drive 1110 is also connected to I/O interface 1105, as required.
- Removable medium 1111 such as a magnetic disk, an optical disk, a magneto-optical disk, a flash drive or another suitable removable medium is mounted on drive 1110, so that a computer program read therefrom is installed into storage unit 1108, as required.
- Removable medium 1111 such as a magnetic disk, an optical disk, a magneto-optical disk, a flash drive or another suitable removable medium is mounted on drive 1110, so that a computer program read therefrom is installed into storage unit 1108, as required.
- apparatus 1100 is described as including the above-described components, in real applications, it is possible to add, remove, and/or replace some of these components and all these modifications or alteration all fall within the scope of the present disclosure. [0110]
- the processes described above may be implemented as computer software programs or on a computer-readable storage medium.
- FIG.11B illustrates a schematic block diagram of an example CPU 1101 implemented in the device architecture 1100 of FIG.11A that may be used to implement various aspects of the present disclosure.
- the CPU 1101 includes an electronic processor 1120 and a memory 1121.
- the electronic processor 1120 is electrically and/or communicatively connected to the memory 1121 for bidirectional communication.
- the memory 1121 stores encoding software 1122 and/or decoding software 1123.
- memory 1121 may be located internal to the electronic processor 1120, such as for an internal cache memory or some other internally located ROM, RAM, or flash memory.
- memory 1121 may be located external to the electronic processor 1120, such as in a ROM 1102, a RAM 1103, flash memory or a removable medium 1111, or another non-transitory computer readable medium that is contemplated for device architecture 1100.
- the electronic processor 1120 may implement the encoding software 1122 stored in the memory 1121 to perform, among other things, any of the methods 700 of FIGS.7A-7B.
- the electronic processor 1120 may implement the decoding software 1123 stored in the memory 1121 to perform, among other things, any of the methods 900 of FIG.9.
- various example embodiments of the present disclosure may be implemented in hardware or special purpose circuits (e.g., control circuitry), software, logic or any combination thereof.
- control circuitry e.g., CPU 1101 in combination with other components of FIG.11A
- the control circuitry may be performing the actions described in this disclosure.
- Some aspects may be implemented in hardware, while other aspects may be implemented in firmware or software which may be executed by a controller, microprocessor or other computing device (e.g., control circuitry).
- a machine-readable medium may be any tangible medium that may contain or store a program for use by or in connection with an instruction execution system, apparatus, or device.
- the machine-readable medium may be a machine- readable signal medium or a machine-readable storage medium.
- a machine-readable medium may be non-transitory and may include but not limited to an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing.
- machine-readable storage medium More specific examples of the machine-readable storage medium would include an electrical connection having one or more wires, a portable computer diskette, a hard disk, a random-access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or Flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
- Computer program code for carrying out methods of the present disclosure may be written in any combination of one or more programming languages.
- These computer program codes may be provided to a processor of a general-purpose computer, special purpose computer, or other programmable data processing apparatus that has control circuitry, such that the program codes, when executed by the processor of the computer or other programmable data processing apparatus, cause the functions/operations specified in the flowcharts and/or block diagrams to be implemented.
- the program code may execute entirely on a computer, partly on the computer, as a stand-alone software package, partly on the computer and partly on a remote computer or entirely on the remote computer or server or distributed over one or more remote computers and/or servers.
- EEEs Various aspects and implementations of the present disclosure may also be appreciated from the following enumerated example embodiments (EEEs), which are not claims, and which may represent systems, methods, and devices, all arranged in accordance with aspects of the present disclosure. [0117] EEE1.
- a method for encoding a stereo audio signal in a bitstream comprising: passing a left-channel and a right-channel of the stereo audio signal to a complex- valued filter bank analysis block to generate one or more frequency bands; calculating, for each of the one or more frequency bands, an energy of the left-channel, an energy of the right- channel, and a covariance of the left-channel and the right-channel; selecting, based on the calculated energy of the left-channel, the calculated energy of the right-channel, and the calculated covariance of the left-channel and the right-channel, a stereo coding mode in which to encode the left-channel and the right-channel; and when the stereo coding mode is an extended Mid/Side coding mode: computing a phase difference between the left-channel and the right- channel; adjusting phase alignment between the left-channel and the right-channel based on the computed phase difference to generate an aligned left-channel and an aligned right-channel; transforming the aligned left-channel and the aligned right-channel to
- EEE2 The method according to EEE1, wherein the method includes: when the stereo coding mode is a separated coding mode: encoding the left-channel and the right-channel in the bitstream.
- EEE3 The method according to any one of EEE1 to EEE2, wherein the method includes: when the stereo coding mode is a Mid/Side coding mode: transforming the aligned left-channel and the aligned right-channel to the Mid signal and the Side signal; and encoding the Mid signal and the Side signal in the bitstream.
- EEE4 The method according to EEE1, wherein the method includes: when the stereo coding mode is a separated coding mode: encoding the left-channel and the right-channel in the bitstream.
- EEE5 The method according to any one of EEE1 to EEE3, wherein the Mid signal in the extended Mid/Side coding mode represents a sum of the left-channel and the right-channel, and wherein the Side signal in the extended Mid/Side coding mode represents a difference between the left-channel and the right-channel.
- computing the phase difference includes: comparing the calculated covariance of the left-channel and the right- channel to an energy threshold, and setting the computed phase difference to zero when the calculated covariance of the left-channel and the right-channel is less than or equal to the energy threshold.
- selecting the stereo coding mode includes: determining a bit cost associated with each of a plurality of stereo coding modes based on the calculated energy of the left-channel, the calculated energy of the right- channel, the calculated covariance of the left-channel and the right-channel, and a cost of transmitting the stereo audio signal; and selecting the stereo coding mode based on the bit cost.
- determining the bit cost associated with each of the plurality of stereo coding modes includes: determining an energy ratio of signals included in each of the plurality of stereo coding modes; and comparing the energy ratio to a threshold, wherein the bit cost indicates a bit reduction between each of the plurality of stereo coding modes compared to coding the left-channel and the right-channel, and wherein the threshold is based on the calculated energy of the left-channel and the calculated energy of the right-channel.
- EEE8 further comprising signaling the presence of the phase difference or the presence of the side prediction data using one bit per frame each.
- EEE10 The method according to any one of EEE1 to EEE9, wherein the phase difference and the side prediction data are quantized. [0127] EEE11.
- An apparatus for encoding a stereo audio signal in a bitstream comprising: an electronic processor configured to: pass a left-channel and a right-channel of the stereo audio signal to a complex-valued filter bank analysis block to generate one or more frequency bands; calculate, for each of the one or more frequency bands, an energy of the left- channel, an energy of the right-channel, and a covariance of the left-channel and the right- channel; select, based on the calculated energy of the left-channel, the calculated energy of the right-channel, and the calculated covariance of the left-channel and the right-channel, a stereo coding mode in which to encode the left-channel and the right-channel; and when the stereo coding mode is an extended Mid/Side coding mode: compute a phase difference between the left-channel and the right-channel; adjust phase alignment between the left-channel and the right- channel based on the computed phase difference to generate an aligned left-channel and an aligned right-channel; transform the aligned left-channel and the aligned
- EEE12 The apparatus according to EEE11, wherein the electronic processor is configured to: when the stereo coding mode is a separated coding mode: encode the left-channel and the right-channel in the bitstream.
- EEE13 The apparatus according to any one of EEE11 to EEE12, wherein the electronic processor is configured to: when the stereo coding mode is a Mid/Side coding mode: transform the aligned left-channel and the aligned right-channel to the Mid signal and the Side signal; and encode the Mid signal and the Side signal in the bitstream.
- EEE14 The apparatus according to EEE11, wherein the electronic processor is configured to: when the stereo coding mode is a Mid/Side coding mode: transform the aligned left-channel and the aligned right-channel to the Mid signal and the Side signal; and encode the Mid signal and the Side signal in the bitstream.
- adjusting phase alignment between the left-channel and the right-channel includes adjusting a phase of the right-channel to align the left-channel and the right-channel.
- EEE15 The apparatus according to any one of EEE11 to EEE14, wherein, to compute the phase difference, the electronic processor is configured to: compare the calculated covariance of the left-channel and the right-channel to an energy threshold, and set the computed phase difference to zero when the calculated covariance of the left-channel and the right-channel is less than or equal to the energy threshold.
- the electronic processor is configured to: determine a bit cost associated with each of a plurality of stereo coding modes based on the calculated energy of the left- channel, the calculated energy of the right-channel, the calculated covariance of the left-channel and the right-channel, and a cost of transmitting the stereo audio signal; and select the stereo coding mode based on the bit cost.
- the electronic processor is configured to: determine an energy ratio of signals included in each of the plurality of stereo coding modes; and compare the energy ratio to a threshold, wherein the bit cost indicates a bit reduction between each of the plurality of stereo coding modes compared to coding the left-channel and the right- channel, and wherein the threshold is based on the calculated energy of the left-channel and the calculated energy of the right-channel.
- EEE18 The apparatus according to any one of EEE11 to EEE17, wherein the electronic processor is configured to signal the stereo coding mode that is selected using two bits per frame.
- EEE18 wherein the electronic processor is configured to signal the presence of the phase difference or the side prediction data using one bit per frame each.
- EEE20 The apparatus according to any one of EEE11 to EEE19, wherein the phase difference and the side prediction data are quantized.
- EEE21 The apparatus according to any one of EEE11 to EEE20, wherein the phase difference is a linearly quantized inter-channel phase difference, and wherein to encode the phase difference, the electronic processor is configured to: compute, per frequency band, second order differences of the linearly quantized inter-channel phase difference, wrap the second order differences into a 2 ⁇ range, and encode the wrapped second order differences.
- EEE22 The apparatus according to EEE20.
- EEE23 The apparatus according to any one of EEE11 to EEE21, wherein the bitstream corresponds to an IVAS bitstream.
- EEE24 The apparatus according to any one of EEE11 to EEE21, wherein the bitstream corresponds to an IVAS bitstream.
- a method for decoding a stereo audio signal comprising: receiving an encoded bitstream; decoding, from the bitstream, a replicated Mid signal, a replicated residual signal, and replicated stereo metadata, wherein the replicated stereo metadata includes a phase difference and side prediction data; converting the replicated Mid signal and the replicated residual signal to a replicated left channel and a replicated right channel using the replicated stereo metadata; and passing the replicated left channel and the replicated right channel to a filter bank analysis block to recreate an original left channel and an original right channel.
- EEE25 The method according to EEE24, wherein converting the replicated Mid signal and the replicated residual signal includes generating a Side signal from the replicated residual signal based on the side prediction data.
- EEE26 The method according to any one of EEE24 to EEE25, further comprising aligning the replicated left channel and the replicated right channel using the phase difference.
- EEE27 An apparatus for decoding a stereo audio signal, the apparatus comprising: an electronic processor configured to: receive an encoded bitstream; decode, from the bitstream, a replicated Mid signal, a replicated residual signal, and replicated stereo metadata, wherein the replicated stereo metadata includes a phase difference and side prediction data; convert the replicated Mid signal and the replicated residual signal to a replicated left channel and a replicated right channel using the replicated stereo metadata; and pass the replicated left channel and the replicated right channel to a filter bank analysis block to recreate an original left channel and an original right channel.
- EEE28 The apparatus according to EEE27, wherein, to convert the replicated Mid signal and the replicated residual signal, the electronic processor is configured to generate a Side signal from the replicated residual signal based on the side prediction data.
- EEE29 The apparatus according to any one of EEE27 to EEE28, wherein the electronic processor is configured to align the replicated left channel and the replicated right channel using the phase difference.
- a method for encoding a stereo audio signal comprising: determining an advanced stereo coding mode for encoding the stereo audio signal, wherein the advanced stereo coding mode is one selected from the group consisting of a left/right coding mode, a mid/side coding mode, and an extended mid/side coding mode, signaling the advanced stereo coding mode using two bits per frame, signaling, in response to the advanced stereo coding mode being the extended mid/side coding mode, phase difference data using one bit per frame, and signaling, in response to the advanced stereo coding mode being the extended mid/side coding mode, prediction data using one bit per frame.
- the advanced stereo coding mode is one selected from the group consisting of a left/right coding mode, a mid/side coding mode, and an extended mid/side coding mode
- signaling the advanced stereo coding mode using two bits per frame
- signaling in response to the advanced stereo coding mode being the extended mid/side coding mode, phase difference data using one bit per frame
- signaling in response to the advanced stereo coding
- a non-transitory computer-readable storage medium recording a program of instructions that is executable by a device to perform the method according to any one of EEE1 to EEE10, EEE22, EEE24 to EEE26, or EEE30.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Mathematical Physics (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
Abstract
Description
Claims
Priority Applications (3)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| IL322986A IL322986A (en) | 2023-03-23 | 2024-03-22 | Joint stereo coding in complex-valued filter bank domain |
| CN202480020649.1A CN120917510A (en) | 2023-03-23 | 2024-03-22 | Joint stereo coding in complex valued filter bank domain |
| KR1020257035043A KR20250164274A (en) | 2023-03-23 | 2024-03-22 | Joint stereo coding in the complex-valued filter bank domain |
Applications Claiming Priority (4)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US202363491840P | 2023-03-23 | 2023-03-23 | |
| US63/491,840 | 2023-03-23 | ||
| US202463559764P | 2024-02-29 | 2024-02-29 | |
| US63/559,764 | 2024-02-29 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2024194493A1 true WO2024194493A1 (en) | 2024-09-26 |
Family
ID=90544884
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/EP2024/057870 Pending WO2024194493A1 (en) | 2023-03-23 | 2024-03-22 | Joint stereo coding in complex-valued filter bank domain |
Country Status (4)
| Country | Link |
|---|---|
| KR (1) | KR20250164274A (en) |
| CN (1) | CN120917510A (en) |
| IL (1) | IL322986A (en) |
| WO (1) | WO2024194493A1 (en) |
Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20200168232A1 (en) * | 2017-06-01 | 2020-05-28 | Panasonic Intellectual Property Corporation Of America | Encoder and encoding method |
| US20220293111A1 (en) * | 2016-11-08 | 2022-09-15 | Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. | Apparatus and method for downmixing or upmixing a multichannel signal using phase compensation |
| US20220415334A1 (en) | 2019-12-05 | 2022-12-29 | Dolby Laboratories Licensing Corporation | A psychoacoustic model for audio processing |
-
2024
- 2024-03-22 CN CN202480020649.1A patent/CN120917510A/en active Pending
- 2024-03-22 KR KR1020257035043A patent/KR20250164274A/en active Pending
- 2024-03-22 WO PCT/EP2024/057870 patent/WO2024194493A1/en active Pending
- 2024-03-22 IL IL322986A patent/IL322986A/en unknown
Patent Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20220293111A1 (en) * | 2016-11-08 | 2022-09-15 | Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. | Apparatus and method for downmixing or upmixing a multichannel signal using phase compensation |
| US20200168232A1 (en) * | 2017-06-01 | 2020-05-28 | Panasonic Intellectual Property Corporation Of America | Encoder and encoding method |
| US20220415334A1 (en) | 2019-12-05 | 2022-12-29 | Dolby Laboratories Licensing Corporation | A psychoacoustic model for audio processing |
Non-Patent Citations (2)
| Title |
|---|
| LINDBLOM J ET AL: "Flexible sum-difference stereo coding based on time-aligned signal components", APPLICATIONS OF SIGNAL PROCESSING TO AUDIO AND ACOUSTICS, 2005. IEEE W ORKSHOP ON NEW PALTZ, NY, USA OCTOBER 16-19, 2005, PISCATAWAY, NJ, USA,IEEE, 16 October 2005 (2005-10-16), pages 255 - 258, XP010854377, ISBN: 978-0-7803-9154-3, DOI: 10.1109/ASPAA.2005.1540218 * |
| NEUENDORF MAX ET AL: "MPEG Unified Speech and Audio Coding - The ISO/MPEG Standard for High-Efficiency Audio Coding of All Content Types", AES CONVENTION 132; APRIL 2012, AES, 60 EAST 42ND STREET, ROOM 2520 NEW YORK 10165-2520, USA, 26 April 2012 (2012-04-26), XP040574618 * |
Also Published As
| Publication number | Publication date |
|---|---|
| KR20250164274A (en) | 2025-11-24 |
| CN120917510A (en) | 2025-11-07 |
| IL322986A (en) | 2025-10-01 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| KR100936498B1 (en) | Stereo compatible multichannel audio coding | |
| EP1851997B1 (en) | Near-transparent or transparent multi-channel encoder/decoder scheme | |
| US8046214B2 (en) | Low complexity decoder for complex transform coding of multi-channel sound | |
| AU2010249173B2 (en) | Complex-transform channel coding with extended-band frequency coding | |
| KR101449434B1 (en) | Method and apparatus for encoding/decoding multi-channel audio using plurality of variable length code tables | |
| EP2169666B1 (en) | A method and an apparatus for processing a signal | |
| US20130144630A1 (en) | Multi-channel audio encoding and decoding | |
| EP1914723B1 (en) | Audio signal encoder and audio signal decoder | |
| RU2628898C1 (en) | Irregular quantization of parameters for improved connection | |
| EP2261897A1 (en) | Quantization and inverse quantization for audio | |
| EP2169665A1 (en) | A method and an apparatus for processing a signal | |
| US8483411B2 (en) | Method and an apparatus for processing a signal | |
| EP1938313A1 (en) | Method and apparatus for encoding/decoding multi-channel audio signal | |
| EP2169664A2 (en) | A method and an apparatus for processing a signal | |
| EP2690622B1 (en) | Audio decoding device and audio decoding method | |
| CN113614827B (en) | Method and apparatus for low cost error recovery in predictive coding | |
| WO2024194493A1 (en) | Joint stereo coding in complex-valued filter bank domain | |
| RU2803142C1 (en) | Audio upmixing device with possibility of operating in a mode with or without prediction | |
| AU2024241035A1 (en) | Frame segmentation and grouping for audio encoding | |
| HK1107495B (en) | Near-transparent or transparent multi-channel encoder/decoder scheme |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 24714924 Country of ref document: EP Kind code of ref document: A1 |
|
| WWE | Wipo information: entry into national phase |
Ref document number: 322986 Country of ref document: IL |
|
| WWE | Wipo information: entry into national phase |
Ref document number: 202480020649.1 Country of ref document: CN |
|
| WWE | Wipo information: entry into national phase |
Ref document number: 11202505795Y Country of ref document: SG |
|
| WWP | Wipo information: published in national office |
Ref document number: 11202505795Y Country of ref document: SG |
|
| WWE | Wipo information: entry into national phase |
Ref document number: KR1020257035043 Country of ref document: KR Ref document number: 1020257035043 Country of ref document: KR |
|
| WWE | Wipo information: entry into national phase |
Ref document number: 2024714924 Country of ref document: EP |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| WWP | Wipo information: published in national office |
Ref document number: 202480020649.1 Country of ref document: CN |
|
| ENP | Entry into the national phase |
Ref document number: 2024714924 Country of ref document: EP Effective date: 20251023 |
|
| ENP | Entry into the national phase |
Ref document number: 2024714924 Country of ref document: EP Effective date: 20251023 |
|
| ENP | Entry into the national phase |
Ref document number: 2024714924 Country of ref document: EP Effective date: 20251023 |
|
| ENP | Entry into the national phase |
Ref document number: 2024714924 Country of ref document: EP Effective date: 20251023 |
|
| ENP | Entry into the national phase |
Ref document number: 2024714924 Country of ref document: EP Effective date: 20251023 |