HK1093596B

HK1093596B - Device and method for determining a quantiser step size

Info

Publication number: HK1093596B
Application number: HK07100370.3A
Authority: HK
Inventors: 伯恩哈德‧格里尔; 迈克尔‧舒格; 博多‧泰希曼; 尼古劳斯‧雷特尔巴赫
Original assignee: 弗劳恩霍夫应用研究促进协会
Priority date: 2004-03-01
Filing date: 2005-02-17
Publication date: 2008-05-16

Description

The present invention relates to audio encoders and in particular to audio encoders which are transformation based, i.e. which at the beginning of the encoder pipeline convert a temporal representation into a spectral representation.

A well-known transformation-based audio encoder is shown in Figure 3 The encoder shown in Figure 3 is represented in the international standard ISO/IEC 14496-3: 2001 (E), Subpart 4, page 4, and is also known in the art as an AAC encoder.

The following is the known encoder. An audio signal to be encoded is fed to an input 1000. This is first fed to a scaling level 1002, where a so-called AAC amplification control is performed to determine the level of the audio signal. Side information from the scaling is fed to a bitstream formater 1004, as shown by the arrow between block 1002 and block 1004. The scaled audio signal is then fed to an MDCT filter bank 1006.

Generally speaking, block 1008 is there to window transient signals with shorter windows and to window more stationary signals with longer windows. This is to allow for a higher time resolution (at the expense of frequency resolution) for transient signals due to the shorter window, while for more stationary signals a higher frequency resolution (at the expense of time resolution) is achieved through longer windows, with tending to prefer longer windows as they promise a larger coding gain. At the output of the filter 1006 are time-consuming successive blocks of spectral channels, each of which has a number of MDCT coefficients, specific signal coefficients or specific bandwidth, and each filter sub-band sub-band can be set to a specific sub-band, and the filter sub-band is also limited by a specific signal sub-band.

The following is an example of a case where the filter bank produces successive blocks of MDCT spectral coefficients in time, which generally represent successive short-term spectra of the audio signal to be encoded at input 1000. A block of MDCT spectral values is then fed into a TNS processing block 1010 in which temporary noise shaping takes place. The TNS technique is used to quantize the temporal form of the noise within each window of the transformation. This is achieved by performing a filtering process on parts of the spectra of each filter channel. The encoding is performed on the basis of a specific window, the TNS tool is used to shape the temporal form of the noise within a subsequent window, i.e. a data channel.

First, a frequency range is selected for the TNS tool. An appropriate choice is to cover a frequency range of 1,5 kHz up to the highest possible scale factor band with a filter. It should be noted that this frequency range depends on the sampling rate as specified in the AAC standard (ISO/IEC 14496-3: 2001 (E)).

The LPC calculation is performed with the spectral MDCT coefficients in the selected target frequency range. For increased stability, coefficients below 2.5 kHz are excluded from this process. Common LPC procedures, as known from speech processing, can be used for LPC calculation, such as the well-known Levinson-Durbin algorithm.

The result of the LPC calculation is the expected predictive gain PG, and the reflection coefficients or Parcor coefficients.

If the prediction gain does not exceed a certain threshold, the TNS tool is not used, in which case a control information is written into the bit stream so that a decoder knows that no TNS processing has been performed.

However, if the predictive gain exceeds a threshold, TNS processing is applied.

Err1:Expecting ',' delimiter: line 1 column 274 (char 273)

Err1:Expecting ',' delimiter: line 1 column 185 (char 184)

The calculated LPC coefficients are then used as coder noise filtering coefficients, i.e. as prediction filter coefficients. This FIR filter is passed over the specified target frequency range. For decoding, an autoregressive filter is used, while for coding, a so-called moving average filter is used. Finally, the page information for the TNS tool is fed to the bitstream formater, as shown by the arrow, the block between the TNS processing 1010 and the bitstream formater 1004 is shown in Figure 3.

The center/side encoder 1012 is then activated if the audio signal to be encoded is a multichannel signal, i.e. a stereo signal with a left channel and a right channel. So far, i.e. in the processing direction before block 1012 in Fig. 3, the left and right stereo channels have been processed separately, i.e. scaled, transformed by the filter bank, not subjected to TNS processing or etc.

In the middle/side encoder, it is first checked whether a middle/side encoding makes sense, i.e. whether it yields a coding gain at all. A middle/side encoding will then yield a coding gain if the left and right channels are more similar, since then the middle channel, i.e. the sum from the left and right channels is almost equal to the left or right channel, except for scaling by the factor 1/2, while the side channel has only very small values, since it is equal to the difference between the left and right channel.

The quantiser 1014 is fed a permitted interference per scale factor band by a psychoacoustic model 1020. The quantiser works iteratively, i.e. an external iteration loop is first called, which then calls an internal iteration loop. Generally speaking, a block of values at the E step of the quantiser 1014 is first quantized from quantiser sweat start values. In particular, the inner loop quantizes the MDCT coefficients, consuming a certain number of bits. The outer loop calculates the use of the amplifier and modified energy of the external loop of the scalar factor to re-enter a certain step. This is calculated on a given scale level, i.e. a frequency of 10 for each frequency of the scale I.

Then, when a situation is reached where the quantization disturbance introduced by the quantization is below the permissible disturbance determined by the psychoacoustic model, and when at the same time bit requirements are met, namely that a maximum bitrate is not exceeded, the iteration, i.e. the analysis-by-synthesis process, is terminated and the scale factors are encoded as performed in block 1014 and fed in coded form to the bitstream formater 1004 as characterized by the arrow encoding drawn between block 1014 and block 1004. The quantized values are then fed to the entropy-formater 1016 which uses a number of different Huygens-Codes for different scale factors. The Huygens-Codes are then fed in coded form to the bitstream formater 1004 as it is commonly known in the entropy-formater 1016 which uses a number of different entropy-code tables to encode the different bitstream-codes.

As already shown, in this iterative quantization, if the interference introduced by a quantizer step is greater than the threshold, a finer quantizer step is used in the hope that this will reduce the quantization noise because finer quantization is done.

This concept is disadvantageous in that, of course, the finer quantizer step range increases the amount of data to be transmitted and thus decreases the compression gain.

Err1:Expecting ',' delimiter: line 1 column 91 (char 90)

The present invention is intended to provide a concept for determining a quantizer step scale that introduces low quantization interference and provides a good compression gain.

This task is solved by a device for determining a quantizer step according to claim 1, a method for determining a quantizer step according to claim 9, or a computer program according to claim 10.

Err1:Expecting ',' delimiter: line 1 column 737 (char 736)

The present invention is particularly useful when very good estimates of the first quantizer step are already available for the first quantizer step, starting from the threshold comparison. Therefore, in a preferred embodiment of the present invention, it is preferable to determine the first quantizer step by direct calculation on the basis of the average noise energy and not on the basis of a worst case scenario. This can either significantly reduce or completely obsolete the iteration loops according to the state of the art.

Err1:Expecting ',' delimiter: line 1 column 248 (char 247)

In an alternative embodiment of the present invention, an analysis-by-synthesis approach can be used to estimate the first quantizer step, as in the present state of the art, and this can be done until a break-down criterion is reached. Then the processing according to the invention can be used to finally verify whether a larger quantizer step can produce the same or even better quantum interference results. If it is then determined that a larger quantizer step is as good or even better in terms of the introduced interference, then quantization is undertaken.

According to the invention, any quantizer step length can be used to perform a first threshold comparison, regardless of whether this first quantizer step length has already been determined by analysis/synthesis schemes or even by a direct calculation of the quantizer step lengths.

In a preferred embodiment of the present invention, this concept is used to quantize an audio signal present in the frequency range, but this concept can also be used to quantize a time-scale signal containing audio and/or video information.

It should also be noted that the threshold to which comparisons are made is a psychoacoustic or psychooptic permissible disturbance or another threshold that is desirable to be exceeded. Thus, this threshold may actually be a permissible disturbance provided by a psychoacoustic model. However, this threshold may also be a pre-determined introduced disturbance for the original quantizer step scale or any other threshold.

It should be noted that the quantized values do not necessarily have to be Huffman-coded, but that they can alternatively be encoded with another entropy coding, such as an arithmetic coding. Alternatively, the quantized values can also be binary coded, since this coding also leads to the fact that for the transmission of smaller values or values equal to zero are needed fewer bits than for the transmission of larger values or generally values not equal to zero.

Preferably, the iterative approach can be omitted entirely or at least to a large extent when the quantizer step is determined from a direct noise energy estimate to determine the initial values, i.e. the 1 quantizer step rate. The calculation of the quantizer step rate from an exact noise energy estimate is considerably faster than a calculation in an analysis-by-synthesis loop, as the values for the calculation are directly available.

However, since the quantizer characteristic used is a nonlinear characteristic, the nonlinear characteristic must be taken into account in the noise energy estimation. The simple noise energy estimation for a linear quantizer can no longer be used because it is too imprecise. According to the invention, a quantizer with the following quantization characteristic is used:

y_{i} = round ((\frac{x_{i}}{q})^{α} + s)

Err1:Expecting ',' delimiter: line 1 column 339 (char 338)

The following relation is used to calculate the quantizer step range according to the invention:

\sum_{i} | {Δ x}_{i} |^{2} \approx \frac{q^{2 α}}{12 α^{2}} \cdot \sum_{i} x_{l}^{2 (1 - α)}

If α is equal to 3⁄4 , we get the following equation:

\sum_{i} | {Δ x}_{i} |^{2} \approx \frac{q^{3 / 2}}{6.75} \cdot \sum_{i} | x_{i} |^{1 / 2}

In these equations, the left term is the THR interference allowed in a frequency band, which is supplied by a psychoacoustics module for a scale factor band with the frequency lines i = i1 to i = i2. The above equation allows a near-exact estimation of the interference introduced by a quantizer step length q for a nonlinear quantizer with the above quantizer characteristic with the exponent α not equal to 1, whereby the function from the quantizer equation performs the actual quantizer equation, i.e. rounding to the next integer.

It should be noted that instead of the function nint, any round function can be used, e.g. rounding to the next even or odd integer, rounding to the next decimal, etc. Generally speaking, the rounding function is responsible for mapping a value from a stock of values with a certain number of allowable values to a stock of values with a smaller second set of values.

In a preferred embodiment of the present invention, the quantized spectral values have already been subjected to TNS processing and, in the case of e.g. stereo signals, to centre/side coding, provided that the channels were such that the centre/side coder was activated.

The relationship between the quantizer step width and the scale factor is given by the following equation.

q = 2^{(1 / 4) * scf}

Other The scale factor for each scale factor band can be directly specified and fed into a corresponding audio encoder.

\Leftrightarrow scf = 8.8585 \cdot (\log_{10} (6.75 \cdot THR) - \log_{10} (FFAC)); \sum_{i} | x_{i} |^{1 / 2} = FFAC

In a preferred embodiment of the present invention, a post-processing citation based on an analysis-by-synthesis principle may be used to vary the directly calculated quantizer step range for each scale factor band slightly further to achieve the actual optimum.

However, the already very precise calculation of the starting values allows for a very short iteration compared to the state of the art, although it has been shown that in the vast majority of cases the down-turn iteration can be omitted altogether.

The preferred approach, based on the calculation of the step range using the mean noise energy, thus provides a good and realistic estimate, since it does not work with a worst-case scenario as is the case at present, but uses an expected value of the quantization error as a basis and thus allows for a more efficient coding of the data with a substantially lower bit count at subjectively equivalent quality. This is achieved by the fact that the iteration can be completely avoided or that the number of iteration steps can be significantly reduced, a significantly faster coder. This is particularly noteworthy because the iteration steps known in the past were significantly reduced in the overall coding time.

The following are examples of preferred embodiments of the present invention, which are described in detail in the accompanying drawings: Fig. 1 a block diagram of a device for detecting a quantized audio signal;Fig. 2 a flow diagram to represent the post-processing according to a preferred embodiment of the present invention;Fig. 3 a block diagram of a known encoder according to the AAC standard;Fig. 4 a diagram of the reduction of quantization interference by a larger quantizer step range; andFig. 5 a block diagram of the device for detecting a quantizer step range for signalling a signal.

The following is a schematic representation of a device for determining a quantizer step range for quantizing a signal containing audio or video information and supplied via a signal input 500. The signal is fed to a device 502 to provide a first quantizer step range (QSW) and to provide a threshold of interference, hereinafter referred to as an introductory interference. It should be noted that the interference threshold can be any threshold. However, preferably it is a psychoacoustic or psychooptically observable interference, such that a threshold is selected such that the signal introduced into the interference is perceived by the human observer as still audible or undisturbed.

The threshold (THR) and the first quantizer step are fed to a device 504 to determine the actual first interference introduced by the first quantizer step. The determination of the actual interference introduced is preferably done by quantizing with the first quantizer step, by requantizing using the first quantizer step, and by calculating the distance between the original signal and the re-requested signal. Preferably, when spectral values are processed, the squares are formed from corresponding spectral values of the original signal and the re-requested signal to determine the differences between the squares. Alternative methods of distance determination can be used.

The device 504 returns a value for a first interference actually introduced by the first quantizer step. This is fed into the comparison with the THR threshold of a device 506. The device 506 performs a comparison between the THR threshold and the first interference actually introduced. If the first interference actually introduced is greater than the threshold, the device 506 will activate a device 508 to select a second quantizer step, with the device 508 trained to select the second quantizer step larger, i.e. larger than the first quantizer step. The second quantizer step selected by the device 508 is fed into the second interference actually introduced by the device 510 B.To this end, the device 510 obtains the original signal and the second quantizer step and performs a second quantization with the second quantizer step, a re-quantization with the second quantizer step and a distance calculation between the re-quantized signal and the original signal to derive a measure of the second interference actually introduced by the device 512 for comparison. The device 512 for comparison compares the second interference actually introduced with the first interference actually introduced or with the first threshold THR. If the second interference actually introduced is smaller than the first interference actually introduced or even smaller than the threshold THR, the second quantizer step is used to quantize the signal.

It should be noted that the concept shown in Figure 5 is only schematic. Of course, separate comparison devices need not be provided for the purposes of the comparisons in blocks 506 and 512, but a single comparison device may be provided for and controlled accordingly.

Err1:Expecting ',' delimiter: line 1 column 819 (char 818)

In a preferred embodiment of the present invention, the THR threshold is the psychoacoustically determined maximum inducible disturbance, in which case the signal is an audio signal. The THR threshold is supplied by a psychoacoustic model that works conventionally and provides for each scale factor band an estimated maximum inducible quantization disturbance in that scale factor band. The maximum inducible disturbance is based on the masking threshold in that it is identical to the masking threshold or is derived from the masking threshold in that, for example, coding is performed with a confidence interval that the inducible disturbance is smaller than the apparent masking factor band, or that a coding is performed in a manner that is more likely to be over the allowed masking rate and that the masking rate is higher than the allowed masking rate.

In relation to Figure 1, a preferred way of implementing the device 502 to deliver the first quantizer step is shown below, in which case the functionalities of the device 50 of Figure 2 and the device 502 of Figure 5 are the same. Preferably, the device 502 is trained to have the functionalities of device 10 and device 12 of Figure 1. Furthermore, in this example, the quantizer 514 in Figure 5 is trained the same as the quantizer 14 in Figure 1.

The following is a complete procedure, with reference to Fig. 2, which also attempts coarser quantizer step lengths if the introduced interference is greater than the threshold.

Err1:Expecting ',' delimiter: line 1 column 361 (char 360)

Finally, the effect on which the present invention is based is shown below with reference to Fig. 4, namely that despite the enlargement of the quantizer step, a lower quantization noise and thus an increase in compression gain can be obtained.

Figure 1 shows a device for determining a quantized audio signal given as a spectral representation in the form of spectral values. In particular, it should be noted that, with reference to Figure 3, if no TNS processing and no mid-sided coding have been performed, the spectral values are directly the output values of the filter bank. However, if only TNS processing but no mid-sided coding is performed, the spectral values fed into the quantiser 1015 are residual spectral values as obtained from the TNS prediction filtering.

If TNS processing including mid-sided coding is used, the spectral values fed into the device of the invention shall be mid-channel spectral values or side-channel spectral values.

The device of the invention first comprises a device for delivering a permissible disturbance, designated by 10 in Fig. 1. The psychoacoustic model 1020 shown in Fig. 3 can serve as a device for delivering a permissible disturbance, typically trained to deliver a permissible disturbance or threshold, also known as THR, for each scale factor band, i.e. a group of several spectral values that are spectrally adjacent to each other. The permissible disturbance is based on the psychoacoustic masking threshold and indicates how much energy can be introduced into an original audio signal without the interference energy being perceived by the human ear. Otherwise, the permissible disturbance (through quantum) is introduced from the actual masking part of the audio signal.

The device 10 is mapped to calculate the allowed interference THR for a frequency band, preferably a scale factor band, and feed it to a downlinked device 12. The device 12 is used to calculate a quantizer step width information for the frequency band for which the allowed interference THR has been specified. The device 12 is trained to quantise the quantizer step width information q of a downlinked device 14. The quantizer device 14 works according to the quantization rule outlined in block 14, where the quantizer step width information is shown in Figure 1 to divide a spectrum by the result q, then use the exponential and add an exponential α to the result s, if necessary.

Err1:Expecting ',' delimiter: line 1 column 286 (char 285)

The output of the device 14 is then the quantized spectral value in the frequency band. As can be seen from the equation shown in block 14, the device 14 is naturally fed, in addition to the quantizer step range q, the quantized spectral value in the frequency band under consideration.

It should be noted that the device 12 does not necessarily have to calculate the quantizer step range q directly, but that as alternative quantizer step range information the scale factor as used in known transformation-based audio encoders can be calculated. The scale factor is linked to the actual quantizer step range via the relation shown to the right of block 12 in Figure 1. So if the calculating device is trained to calculate the scale factors scf as quantizer step range information, this scale factor is fed to the device 14 for quantizing, which then uses the 21/4 value for quantizing instead of the q value in block 14 for the quantization calculation.

The following is a derivation of the form given in block 12.

As demonstrated, the exponential law quantizer, as shown in block 14, obeys the following relation:

y_{i} = round ((\frac{x_{i}}{q})^{α} + s)

The inverse operation is shown as follows:

x_{i} ʹ = {y_{i}}^{1 / α} \cdot q

This equation thus represents the operation necessary for requantization, where yi is a quantized spectral value and x'i is a requantized spectral value. q is again the quantizer step range, which is related to the scale factor via the relation shown in Fig. 1 to the right of block 12.

As expected, the result is consistent with this equation for the case α is equal to 1.

If the above equation is summed over a vector of spectral values, the total noise power in a band determined by the index i is given as follows:

\sum_{i} | {Δ x}_{i} |^{2} \approx \frac{q^{2 α}}{12 α^{2}} \cdot \sum_{i} x_{i}^{2 (1 - α)}

In summary, the expected value of the quantization noise of a vector is determined by the quantizer step range q and a so-called form factor, which describes the distribution of the amount of the components of the vector.

The form factor, which is the rightmost term in the above equation, depends on the actual input values and only needs to be calculated once, even if the above equation is calculated for different desired noise levels THR.

As we have already done, this equation simplifies to α equals 3⁄4 as follows:

\sum_{i} | {Δ x}_{i} |^{2} \approx \frac{q^{3 / 2}}{6.75} \cdot \sum_{i} | x_{i} |^{1 / 2}

The left side of this equation is therefore an estimate of the quantization noise energy, which corresponds to the threshold in the boundary case.

So the approach is this:

\sum_{i} | {Δ x}_{i} |^{2} = THR

The sum over the roots of the frequency lines in the right part of the equation corresponds to a measure of the uniformity of the frequency lines and is known as a form factor preferably already in the encoder:

\sum_{i} | x_{i} |^{1 / 2} = FFAC

So it turns out:

THR \approx \frac{q^{3 / 2}}{6.75} \cdot FFAC

Other The quantity of the quantiser step is defined in AAC as:

q = 2^{(1 / 4) * scf}

Other scf is the scale factor. If the scale factor is to be determined, the equation can be calculated from the relationship between step width and scale factor as follows:

THR \approx \frac{q^{(3 / 8) scf}}{6.75} \cdot FFAC

\Leftrightarrow 2^{(3 / 8) scf} = \frac{6.75 \cdot THR}{FFAC}

\Leftrightarrow scf = \frac{8}{3} \log_{2} (\frac{6.75 \cdot THR}{FFAC})

\Leftrightarrow scf = \frac{8}{3 \log_{10} 2} (\log_{10} (6.75 \cdot THR) - \log_{10} (FFAC))

\Leftrightarrow scf = 8.8585 (\log_{10} (6.75 \cdot THR) - \log_{10} (FFAC))

The above equation thus provides a close relationship between the scale factor scf for a scale factor band having a certain form factor and for which a certain noise threshold THR, typically derived from the psychoacoustic model, is given.

As already shown, the calculation of the step width using the mean noise energy gives a better estimate, since no worst case scenario is assumed, but the expected value of the quantization error is used as a basis.

The concept of the invention is therefore suitable for determining the quantizer step range or equivalent of the scale factor for a scale factor band without any iterations.

However, if the computation time requirements are not quite as strict, a post-processing can still be performed, as shown below on the basis of Fig. 2. In a first step in Fig. 2 the first quantizer step is estimated (step 50). The estimation of the first quantizer step (QSW) is done using the procedure shown in Fig. 1. Then in a step 52 a quantization with the first quantization step is performed preferably according to the quantizer as shown in block 14 in Fig. 1.

It should be noted that the quantizer step range q (or scf) calculated by the relation shown in block 12 is an approximation. If the relation given in block 12 of Fig. 1 is indeed accurate, it should be stated in block 54 that the interference introduced corresponds exactly to the threshold. However, due to the approximation nature of the relation in block 12 of Fig. 1, the interference introduced may be greater or smaller than the THR threshold.

It should also be noted that the deviation from the threshold will not be very large, although it will still be present. If in step 54 it is found that the interference introduced using the first quantizer step is smaller than the threshold, so the question is answered in step 54 with no, the right branch is taken in Fig. 3. If the interference introduced is smaller than the threshold, this means that the estimate in block 12 in Fig. 1 was too pessimistic, so that in a step 56 a quantizer step size is set as the second quantizer step size.

The degree of roughness of the second quantizer step compared to the first quantizer step is optional, but it is preferable to take relatively small increments, as the estimate in block 50 will already be relatively accurate.

In step 58 the second coarser (larger) quantizer step is used to quantize the spectral values, then to re-quantify and calculate the second disturbance corresponding to the second quantizer step.

In one step (60) it is then checked whether the second disturbance corresponding to the second quantizer step is still smaller than the original threshold. If this is the case, the second quantizer step is stored (62) and a new iteration is started to set an even larger quantizer step again in one step (56). Then with the even larger quantizer step step, step 58, step 60 and, if necessary, step 62 are performed again to start another iteration. If then during an iteration in step 60 it is determined that the second disturbance is not smaller than the threshold, i.e. larger than the threshold, such a break is achieved, and when the last quantizer step is reached, it is broken and stored with the critical quantum step (64).

Since the first estimated quantizer step size was already a relatively good value, the number of iterations will be reduced compared to poorly estimated starting values, resulting in significant computation time savings in coding, as the iterations used to calculate the quantizer step size take up the largest proportion of the coder's computation time.

The following illustration of the left-hand branch in Fig. 2 shows a procedure of the invention to be used when the introduced disturbance is actually greater than the threshold.

Err1:Expecting ',' delimiter: line 1 column 581 (char 580)

The following is an explanation of why, especially when the input interference is greater than the threshold, improvement can still be achieved when working with an even larger quantizer step range. It has always been assumed that a finer quantizer step range leads to a lower input quantization energy and that a larger quantizer step range leads to a higher input quantization interference. This may be true on average, but is not always found to be accurate, especially in weakly occupied scale factor bands and especially when the quantizer has a nonlinear stepline.

Err1:Expecting ',' delimiter: line 1 column 1262 (char 1261)

It can therefore be seen from Fig. 4 that a coarser quantization can lead to a lower quantization error than a finer quantization.

Err1:Expecting ',' delimiter: line 1 column 444 (char 443)

Err1:Expecting ',' delimiter: line 1 column 97 (char 96)

Thus, as shown in Fig. 2 in the left-hand branch, based on estimates (step 50 in Fig. 2) an even larger quantizer step is attempted to benefit from the effect shown in Fig. 4 even if the introduced disturbance is greater than the threshold.

The concept of quantizer step-rate or scale factor processing is thus used to improve the result of the scale factor estimator.

Based on the quantizer step rates determined in the scale factor estimator (50 in Fig. 2), new quantizer step rates of the largest possible size for which the error energy is less than the threshold value are determined in the analysis-by-synthesis stage.

First, the spectrum is quantized with the calculated quantizer step lengths, and the energy of the error signal, preferably the sum of the squares of the difference between original and quantized spectral values, is determined. Alternatively, a corresponding time signal can also be used for error determination, although the use of spectral values is preferred.

The quantizer step range and the error signal are stored as the best result so far.

The scale factor within a given range is varied by the value originally calculated, in particular by using more coarse quantizer step widths (70).

For each new scale factor, the spectrum is again quantized and the error signal energy is calculated. If the error signal is smaller than the smallest one calculated so far, the current quantizer step range is interpolated together with the energy of the corresponding error signal as the best result so far.

The invention takes into account not only smaller but also larger scaling factors, in order to benefit from the concept described in Figure 4, especially when the quantizer is a non-linear quantizer.

If, on the other hand, the calculated disturbance is below the threshold, so that the estimate at step 50 was too pessimistic, the scale factor is varied within a given range by the value originally calculated.

For each new scale factor, the spectrum is re-quantized and the energy of the error signal is calculated.

If the error signal is smaller than the smallest one calculated so far, the current quantizer step range is interpolated together with the energy of the corresponding error signal as the best result so far.

However, only gross scaling factors are taken into account in order to reduce the number of bits needed to encode the audio spectrum.

Depending on the circumstances, the method of the invention may be implemented in hardware or software, and may be implemented on a digital storage medium, in particular a floppy disk or CD with electronically readable control signals, which can interact with a programmable computer system to perform the method.

In general, the invention thus also consists of a computer program product with a program code stored on a machine-readable medium for performing the procedure of the invention, if the computer program product runs on a computer. In other words, the invention can thus be realized as a computer program with a program code for performing the procedure, if the computer program runs on a computer.

Claims

Apparatus for determining a quantizer step size for quantizing a signal comprising audio or video information, the apparatus comprising:
means (502) for providing a first quantizer step size and an interference threshold;

means (504) for determining a first interference introduced by the first quantizer step size;

means (506) for comparing the interference introduced by the first quantizer step size with the interference threshold;

means (508) for selecting a second quantizer step size which is larger than the first quantizer step size if the first interference introduced is greater than the interference threshold;

means (510) for determining a second interference introduced by the second quantizer step size;

means (512) for comparing the second interference introduced with the interference threshold or the first interference introduced; and

means (514) for quantizing the signal with the second quantizer step size if the second interference introduced is smaller than the first interference introduced or is smaller than the interference threshold.
Apparatus as claimed in claim 1, wherein the signal is an audio signal and comprises spectral values of a spectral representation of the audio signal, and wherein the means (502) for providing is configured as a psycho-acoustic model which calculates a permitted interference for a frequency band on the basis of a psycho-acoustic masking threshold.
Apparatus as claimed in claims 1 or 2, wherein the means (504) for determining the first interference introduced, or the means (510) for calculating the second interference introduced is configured to quantize using a quantizer step size, to re-quantize using the quantizer step size, and to calculate a distance between the re-quantized signal and the signal so as to obtain the interference introduced.
Apparatus as claimed in any of the previous claims, wherein the means (502) for providing the first quantizer step size is configured to calculate the quantizer step size in accordance with the following equation: $\sum_{i} | {Δ x}_{i} |^{2} \approx \frac{q^{2 α}}{12 α^{2}} \cdot \sum_{i} x_{i}^{2 (1 - α)}$ wherein the means (514) for quantizing is configured to quantize in accordance with the following equation: $y_{i} = round ((\frac{x_{i}}{q})^{α} + s)$ wherein x_i is a spectral value to be quantized, wherein q represents the quantizer step size information, wherein s is a figure differing from or equaling zero, wherein α is an exponent different from "1", wherein round is a rounding function which maps a value from a first, larger range of values to a value within a second, smaller range of values, wherein $\sum_{i} | {Δ x}_{i} |^{2}$ (THR) is the permitted interference, and wherein _i is a run index for spectral values in the frequency band.
Apparatus as claimed in any of the previous claims, wherein the means (508) for selecting is further configured to select a larger quantizer step size when the interference introduced is smaller than the permitted interference.
Apparatus as claimed in any of the previous claims, wherein the means (502) for providing is configured to provide the first quantizer step size as a result of an analysis/synthesis determination.
Apparatus as claimed in any of the previous claims, wherein the means (508) for selecting is configured to alter a quantizer step size for one frequency band independently of a quantizer step size for another frequency band.
Apparatus as claimed in any of the previous claims, wherein the means (502) for providing is configured to determine the first quantizer step size as a result of a preceding iteration step with a coarsening of the quantizer step size, and wherein the interference threshold is an interference introduced in the preceding iteration step for determining the first quantizer step size.
Method for determining a quantizer step size for quantizing a signal comprising audio or video information, the method comprising:
providing (502) a first quantizer step size and an interference threshold;

determining (504) a first interference introduced by the first quantizer step size;

comparing (506) the interference introduced by the first quantizer step size with the interference threshold;

selecting (508) a second quantizer step size which is larger than the first quantizer step size if the first interference introduced is greater the interference threshold;

determining (510) a second interference introduced by the second quantizer step size;

comparing (512) the second interference introduced with the interference threshold or the first interference introduced;

quantizing (514) the signal with the second quantizer step size if the second interference introduced is smaller than the first interference introduced or is smaller than the interference threshold.
A computer program having a program code adapted for performing the method as claimed in claim 9, when the computer program runs on a computer.