HK1111259B

HK1111259B - Device and method for producing a data flow and for producing a multi-channel representation

Info

Publication number: HK1111259B
Application number: HK08106159.6A
Authority: HK
Inventors: 沃尔夫冈‧菲泽尔; 马蒂亚斯‧诺伊辛格; 哈拉尔德‧波普; 斯特凡‧盖尔斯贝格
Original assignee: 弗劳恩霍夫应用研究促进协会
Priority date: 2005-03-30
Filing date: 2006-03-15
Publication date: 2009-11-20

Description

The present invention relates to audio signal processing and in particular multi-channel processing techniques based on the creation of a multichannel reconstruction of an original multichannel signal from at least one basic channel or downmix channel and multichannel additional information.

Err1:Expecting ',' delimiter: line 1 column 448 (char 447)

The following describes various techniques for reducing the amount of data required to transmit a multichannel audio signal.

Such techniques are called joint stereo techniques. For this purpose, reference is made to Fig. 3 showing a joint stereo device 60. This device may be a device implementing, for example, the intensity stereo ((IS)) technique or the Binaural Cue Coding (BCC) technique. Such a device usually receives at least two channels CH1, CH2, .... CHn as input signal and gives off a single carrier channel as well as parametric multi-channel information.

Normally the carrier channel will include subband sampling values, spectral coefficients, time-scale sampling values, etc. which provide a relatively fine representation of the underlying signal, while the parametric data will not include such sampling values or spectral coefficients, but control parameters to control a particular reconstruction algorithm, such as weights by multiplying, by time shifting, by frequency shifting, etc. The parametric multichannel information will therefore include a relatively rough representation of the signal or associated channel. In terms of parameters, the amount of data used by a carrier channel is about 60 to 70 kbit/s, while the amount of data required by critical parameters is about 1.5 to 2.5 kbit/s. For example, it is known that for a digital data channel, the data used in the data-compressed data is not compressed in the range of BCC-P or BCC-P, but is expressed in the number of bits per second.

Err1:Expecting ',' delimiter: line 1 column 116 (char 115)

Err1:Expecting ',' delimiter: line 1 column 566 (char 565)

Err1:Expecting ',' delimiter: line 1 column 96 (char 95)

On the decoder side, the decoder typically receives a monosignal and the BCC bitstream. The monosignal is transformed into the frequency range and input into a spatial synthesis block, which also receives decoded ICLD and ICTD values. In the spatial synthesis block, the BCC parameters (ICLD and ICTD) are used to perform a weighting operation of the monosignal to synthesize the multichannel signals, which, after a frequency/time conversion, constitute a reconstruction of the original multichannel audio signal.

In the case of BCC, the joint stereo module 60 is effective in outputting the channel-side information in such a way that the parametric channel data are quantized and coded ICLD or ICTD parameters, using one of the original channels as the reference channel for coding the channel-side information.

The carrier signal is normally formed by the sum of the participating originating channels.

Of course, the above techniques only provide a mono representation for a decoder that can only process the carrier channel but is not able to process the parametric data to generate one or more approximations of more than one input channel.

Err1:Expecting ',' delimiter: line 1 column 223 (char 222)

A typical BCC scheme for multichannel audio coding is shown below in more detail, with reference to Figures 4 to 6.

Fig. 5 shows such a BCC scheme for encoding/transmitting multichannel audio signals. The multichannel audio input signal at an input 110 of a BCC coder 112 is downmixed in a so-called downmix block 114. In this example, the original multichannel signal at the input 110 is a 5-channel surround signal with a front left channel, a front right channel, a left surround channel, a right surround channel and a middle channel. In the preferred embodiment of the present invention, the downmix block 114 generates a summation signal by a simple addition of these five channels into a monosignal.

Other downmixing schemes are known in the art, so that using a multi-channel input signal, a downmix channel with a single channel is obtained.

This single channel is output on a total signal line 115. A side information received from the BCC analysis block 116 is output on a side information line 117.

The BCC analytical block is used to calculate interchannel level differences (ICLD) and interchannel time differences (ICTD) as shown above. Recently, the BCC analytical block 116 is also capable of calculating interchannel correlation values (ICC values). The sum signal and side information are transmitted in a quantized and coded format to a BCC decoder 120. The BCC decoder breaks the transmitted sum signal into a number of subbands and performs scaling, delaying and other processing steps to provide the interconnections of the multichannel A block 121 to be processed. This is done so that the ICLC, ICCD and ICCD 120 are reconstructed into a multichannel BCC 120 and a multichannel BCC 112 to be used for this purpose.

The following is a diagram of the internal structure of the BCC synthesis block 122 with reference to Fig. 6. The total signal on line 115 is fed into a time/frequency converter unit or filter bank FB 125. At the output of block 125 there is a number N of subband signals or, in an extreme case, a block of spectral coefficients when the audio filter bank 125 performs a 1:1 transformation, i.e. a transformation that generates N spectral coefficients from N time-scale samples.

The BCC synthesis block 122 also includes a delay stage 126, a level modification stage 127, a correlation processing stage 128 and an inverse filter bank stage IFB 129. At the end of stage 129, the reconstructed multichannel audio signal, e.g. five channels in the case of a 5-channel surround system, can be output to a set of 124 speakers as shown in Figure 5 or Figure 4.

The input signal sn is converted to the frequency range or filter bank range by means of element 125. The signal output from element 125 is copied in such a way that multiple versions of the same signal are obtained as represented by the copy node 130. The number of versions of the original signal is equal to the number of outgoing channels in the output signal. Then each version of the original signal at node 130 is subjected to a certain delay d1, d2, ..., di, ... dN. The delay parameters are calculated by the page 123 inter-processing block in Fig. 5 and derived from the channel time differences as calculated by the BCC analyzer block of Fig. 5 116.

The same applies to the multiplication parameters a1, a2, ..., ai, ..., aN, which are also calculated by the page information processing block 123 based on the interchannel level differences as calculated by the BCC analysis block 116.

The ICC parameters calculated by the BCC analysis block 116 are used to control the functionality of block 128 so that certain correlations are obtained between the delayed and manipulated signals at the output of block 128.

It should be noted that in a frame-based processing of the audio signal, the BCC analysis is also framed, i.e. time-variable, and that a frequency BCC analysis is obtained, as shown by the filter bank distribution in Fig. 6. This means that the BCC parameters are obtained for each spectral band. This also means that in the case where the audio filter bank 125 breaks the original signal into, say, 32 bandpass signals, the BCC analysis block receives a set of BCC input parameters for each of the 32 bands. Of course, the BCC synthesis block 122 of Fig. 5, which is shown in detail in Fig. 6, is also modified by a reconstruction based on the 32 BAMs mentioned in the example.

The following is a scenario used to determine individual BCC parameters, using Figure 4. Normally, ICLD, ICTD and ICC parameters can be defined between channel pairs, but it is preferred to determine ICLD and ICTD parameters between one reference channel and any other channel, as shown in Figure 4A.

In general, one can determine ICC parameters in the encoder between all possible channel pairs, as shown in Fig. 4B. However, it has been proposed to calculate only ICC parameters between the strongest two channels at one time, as shown in Fig. 4C, where an example is shown where at one time an ICC parameter is calculated between channels 1 and 2, and at another time an ICC parameter is calculated between channels 1 and 5.

For example, for the calculation of the multiplication parameters a1, aN based on the transmitted ICLD parameters, reference is made to AES Convention Paper 5574. The ICLD parameters represent an energy distribution of an original multichannel signal. Without loss of generality, it is preferable, as shown in Figure 4A, to take four ICLD parameters representing the energy difference between the respective channels and the front left channel. In the page 122 information processing block, the multiplication parameters a1, ..., aN or from the ICLD parameters are derived in such a way that the total reconstructed energy is proportional to the sum of all the transmitted energy (the total energy is proportional to the sum of the output of the signal).

Generally, such multichannel coding schemes, particularly parametric, generate at least one base channel and the side information, as shown in Figure 5. Typically, block-based schemes are used, where, as shown in Figure 5, the original multichannel signal at input 110 is subjected to block processing by a block level 111, such that a block of, for example, 1152 sampling values forms the downmix signal or summation signal or at least one base channel for this sampling, while at the same time, BCC analysis generates the corresponding multichannel parameters for this block. After the summation of the multichannel, the typical signal is encoded with a block-based data source, such as a CAC-1152 or a parallelic downcoding data source, to obtain additional data, such as a parallelic MP3 or AAC-C encoding, and the data is encoded by a parallelic downcoding.

Then, at the output of the entire encoder, which thus includes the BCC 112 encoder and a downstream base channel encoder, a common data stream is written in which a block of at least one base channel follows an earlier block of at least one base channel and in which the encoded multichannel additional information is also input, for example by a bitstream multiplexer.

This input is made in such a way that the data stream of base channel data and multi-channel additional information always includes a block of base channel data and, in association with this block, includes a block of multi-channel additional data, which then, for example, form a common transmission frame.

The decoder includes a data stream demultiplexer at the input side to split a frame of the data stream into a block of base channel data and a block of associated multichannel additional information. The block of base data is then decoded, for example, by an MP3 decoder or an AAC decoder. This block of decoded base data is then fed to the BCC decoder 120 together with the block of decoded multichannel additional information, if any.

This means that, because of the joint transmission of basic channel data and additional information, the temporal assignment of the additional information to the basic channel data is automatically determined and easily restored by a decoder working in a frame-like manner. The decoder therefore, because of the joint transmission of the two data types in a single data stream, automatically finds to some extent the additional information belonging to a block of basic channel data, so that a high-quality multichannel reconstruction is possible. There will therefore be no problem that the multichannel additional information has a time shift to the basic channel data. However, if such a shift were available, it would lead to a significant loss of quality of the multichannel reconstruction, since a block of multichannel additional data is then processed together with the earlier or later block of basic channel data, even if this does not belong to the later block of additional data.

Err1:Expecting ',' delimiter: line 1 column 916 (char 915)

Furthermore, it is preferred to use encoders/decoders with non-constant output rate to achieve particularly good bit efficiency. Here it is not predictable how long it takes to decode a block of basic channel data. Furthermore, this processing also depends on the actual hardware components used for decoding, such as those that must be present in a PC or digital receiver.

On the other hand, the separation of the common data stream described above into two separate data streams has particular advantages: a conventional receiver, i.e. a pure mono or stereo receiver, is able to receive and play back the audio base data at any time, regardless of the content and version of the multichannel supplementary information.

In contrast, a newer generation receiver can evaluate this additional multichannel data and combine it with the audio base data in such a way that the full extension, here the multichannel tone, can be provided to the user.

A particularly interesting application scenario for the separate transmission of audio base data and extension data is in digital broadcasting, where the multichannel additional information can be used to extend the previously broadcast stereo audio signal to a multichannel format, such as 5.1 with little additional transmission effort. Here the program provider on the broadcaster side generates the multichannel additional information from multichannel sound sources, such as those found on DVD audio/video.

A major advantage of the separation is the compatibility with existing digital broadcasting systems: a conventional receiver, which cannot process this additional information, will be able to receive and play back the two-channel signal without any qualitative limitations, while a newer receiver can process, decode and reconstruct the original 5.1 multichannel signal.

In order to allow the simultaneous transmission of the multichannel additional information as a complement to the stereo signal used so far, as already shown, the multichannel additional information can be combined with the coded down-mix audio signal for a digital broadcasting system, so that there is a single data stream that is then scalable if necessary and can also be read by an existing receiver, but which ignores the additional data regarding the multichannel additional information.

The receiver therefore sees only one (valid) audio data stream and, if it is a receiver of the newer design, can also extract, decode and output as 5.1 multi-channel tone the multichannel audio data stream information back to the corresponding audio data block in sync via a correspondingly pre-set data distributor.

However, the disadvantage of this approach is that the existing infrastructure or data routes are extended so that they can carry the data signals from downmix and amplification combined, instead of the stereo audio signals as before.

However, it is highly problematic for market penetration when existing broadcasting infrastructures have to be changed, i.e. when the problem is not only on the part of the decoders but also on the part of the broadcasters and the standardised transmission protocols.

Err1:Expecting ',' delimiter: line 1 column 752 (char 751)

On the other hand, the separate transmission of basic channel data and multichannel additional information is particularly interesting because existing stereo infrastructures do not need to be changed, so the disadvantages of non-standard compliance described in the first option do not occur here. A broadcasting system only needs to transmit an additional channel, but does not change the infrastructure for the already existing stereo channel. The additional effort is therefore driven to some extent solely on the part of the receivers, but in such a way that backward compatibility exists, i.e. a user who has a new receiver gets a better sound quality than a user who has an old receiver.

As already shown, the magnitude of the time shift can no longer be determined from the received audio signal and the additional information. This no longer ensures a timely reconstruction and assignment of the multichannel signal in the receiver. Another example of such a delay problem is when an already running two-channel transmission system is to be extended to multichannel transmission, for example in a digital radio receiver. Here, it is often the case that decoding of the digital mix is repeated by means of a two-channel audio decoder already present in the receiver, whose instantaneous transmission time is not known and therefore does not correspond to an analogue transmission.

Err1:Expecting ',' delimiter: line 1 column 168 (char 167)

WO 2005/011281 A1 reveals a method and device for generating and collecting fingerprints for synchronizing audio and video signals. In particular, a first fingerprint and a second fingerprint are generated that can be used to synchronize at least two signals. For this purpose, a segment of a first signal, such as an audio signal, and a segment of a second signal, such as a video signal, are used at each synchronization time. The generated fingerprint pairs are stored in a database and transmitted to a synchronization device. During the synchronization point, fingerprints of the audio signal and fingerprint identification of the video signal are generated and used to record the two synchronization signals. If a match is found in the data bank, the synchronization point is also used to record the two fingerprints.

The present invention is intended to provide a concept for generating a data stream or multichannel representation that allows synchronization of basic channel data and multichannel supplementary information.

This task is solved by a device for generating a data stream according to claim 1, a device for generating a multichannel representation according to claim 17, a process for generating a data stream according to claim 26, a process for generating a multichannel representation according to claim 27, a computer program product according to claim 28 or a data stream representation according to claim 29.

Err1:Expecting ',' delimiter: line 1 column 281 (char 280)

According to the invention, on the transmitter side, the association of multichannel supplementary information with base channel data is signaled by the determination of fingerprint information from the base channel data, which marks the multichannel supplementary information that belongs to exactly this base channel data to some extent. This marking or signalling of the relationship between the multichannel supplementary information and the fingerprint information is achieved in block-like data processing by considering a block of multichannel supplementary information that belongs to a block of base channels, and assigning a block-fingerprint to exactly that block of base channel data to which the block of multichannel supplementary information belongs.

In other words, a fingerprint of the block of basic channel data with which the multichannel add-on information must be reconstructed is assigned to the multichannel add-on information. In a block-based transmission, the block fingerprint of the block of basic channel data can be entered into the block structure of the multichannel add-on data stream in such a way that each block of multichannel add-on information contains the block fingerprint of the associated base data. The block fingerprint can be written immediately after a block of multichannel add-on information used previously, or it can be written before the existing blocks, or it can be written at any place known within that block structure, so that in normal multichannel add-on configuration, the block fingerprint can be written in parallel to the block of multichannel add-on information used previously, and the data can therefore be written in any place known within that block.

Alternatively, the data stream could be written in such a way that, for example, all block fingerprints, provided with additional information, such as a block counter, are at the beginning of the data stream generated according to the invention, so that a first section of the data stream contains only block fingerprints and a second part of the data stream contains the block-written multichannel add-ons belonging to the block fingerprint information. This alternative has the disadvantage of requiring reference information, but the affiliation of the block fingerprints to the block-written multichannel add-on information may also be implied by the order in which the data stream is written, so that no additional information is necessary.

In this case, in multichannel reconstruction for synchronization purposes, a large number of block fingerprints could simply be read in first to obtain the reference fingerprint information. Gradually, the test fingerprints are added until a minimum number of test fingerprints used for a correlation is available. During this time, the set of reference fingerprints could already be subjected to difference coding, for example, if the correlation in multichannel reconstruction is done using differences while the data stream does not contain difference block fingerprints but absolute block fingerprints.

In general, the data stream is processed on the receiver side with the base channel data, i.e. first decoded and then fed to a multichannel reconstructor. Preferably, this multichannel reconstructor is trained to simply switch off when it does not receive any additional information to output the two base channels as a stereo signal. Parallel to this, the extraction of the reference fingerprint information and the calculation of the test fingerprint information from the decoded base data takes place instead, and then a correlation calculation is performed to obtain the shift of the base channel data to the multichannel data.

If this is the case, the shift is assumed to be correct and, after receiving synchronized multichannel additional information, the stereo output is switched to the multichannel output.

This procedure is preferred when a user is not to notice the time required for synchronization, so that basic channel data is processed at the moment of receipt, so that of course in the period in which synchronization takes place, i.e. the shift calculation takes place, only stereo data can be output, since no synchronized multichannel supplementary information has yet been found.

Err1:Expecting ',' delimiter: line 1 column 107 (char 106)

In preferred embodiments of the present invention, the time for synchronization is usually about 5 seconds, since an optimal shift calculation requires about 200 reference fingerprints as reference fingerprint information. If this delay of about 5 seconds does not matter, as is the case for example with unidirectional transmissions, a 5.1 playback may be started at the same time, but only after the time needed for shift calculation. For interactive applications, such as dialogue or the like, this delay will be disruptive, so that at some point when the synchronization is complete, the stereo will switch off the multi-channel playback.

According to the invention, the time mapping problem between basic channel data and multichannel additional data is solved by both transmitter and receiver side measures.

On the transmitter side, time-varying and suitable fingerprint information is calculated from the corresponding mono or stereo downmix audio signal. Preferably, this fingerprint information is typed regularly as a synchronization aid into the transmitted multichannel add-on data stream. This is done preferably as a data field in the middle of the block-organized e.g. spatial audio coding page information, or so that the fingerprint signal is sent as the first or last information of the data block, such that it can be easily added or removed.

On the receiving side, time-varying and suitable fingerprint information is calculated from the corresponding stereo audio signal, i.e. the basic channel data, with the invention preferring a number of two basic channels. Furthermore, the fingerprints are extracted from the multichannel additional information. The time lag between the multichannel additional information and the received audio signal is then determined by correlation methods, such as a calculation of a cross-correlation between the test fingerprint information and the reference fingerprint information. Alternatively, tri-test error preparation procedures can be performed, in which the best test results from the different reference-block fingerprint information are calculated using different questions based on the different block-interrogation techniques used to determine the block-interrogation response of the reference-block fingerprint information.

Finally, the audio signal of the base channels is synchronized with the multichannel additional information for subsequent multichannel reconstruction by a down-switched delay compensation stage. Depending on the implementation, only an initial delay can be compensated. However, the delay calculation is preferably performed in parallel with the playback, so that in the event of a time drift of the base channel data and the multichannel additional information despite a compensated initial delay, the delay can be reconstructed as needed and according to the result of the correlation calculation.

The present invention is advantageous in that no changes need to be made to the base channel data or to the processing path for the base channel data. The base channel data stream fed into a receiver is in no way different from a normal base channel data stream. Changes are only made on the sides of the multichannel data stream. This is modified to include the fingerprint information. However, since no standardized procedures exist for the multichannel data stream at present, the change in the multichannel additional data stream does not lead to an unwanted departure from an already standardized, established and established solution, as would be the case if the base channel data stream were modified.

The scenario of the invention provides a particular flexibility in the distribution of multichannel additional information. In particular, where the multichannel additional information is parameter information which is very compact in terms of the required data rate or storage capacity, a digital receiver can be supplied with such data even completely separate from the stereo signal. For example, a user could purchase multichannel additional information from a separate supplier for existing stereo recordings already on his solid-state player or CDs and store them on his playback device. This storage is not problematic as the storage requirements are not particularly large, especially for multichannel audio recordings.The solution according to the invention thus allows, completely independent of the mode of the stereo signal, i.e. whether it comes from a digital broadcasting receiver, whether it comes from a CD, whether it comes from a DVD or whether it has been received, for example, via the Internet, to synchronize multichannel additional data that may come from a completely different source with the stereo signal, whereby the stereo signal then functions as the basis for multichannel reconstruction.

The following are examples of preferred embodiments of the present invention, which are described in detail in the accompanying drawings: Fig. 1a block diagram of a device of the invention for generating a data stream;Fig. 2a block diagram of a device of the invention for generating a multichannel representation;Fig. 3a well-known joint stereo encoder for generating channel data and parametric multichannel information;Fig. 4a diagram of a scheme for determining ICLD, D and ICC parameters for a BCC coding/decoding;Fig. 5a block diagram of a BCC encoder/decoding chain;Fig. 6a block diagram of a multichannel BCC synthesis block from Fig. 5;Fig. 7a diagram of a multichannel original signal from Fig. 7Fig. 7Fig. 8Fig. 7Fig. 7Fig. 7Fig. 7Fig. 7Fig. 7Fig. 7Fig. 7Fig. 8Fig. 7Fig. 7Fig. 7Fig. 7Fig. 7Fig. 7Fig. 8Fig. 7Fig. 7Fig. 7Fig. 7Fig. 7Fig. 8Fig. 7Fig. 7Fig. 7Fig. 7Fig. 7Fig. 8Fig. 7Fig. 7Fig. 7Fig. 7Fig. 8Fig. 7Fig. 7Fig. 7Fig. 7Fig. 7Fig. 8Fig. 7Fig. 7Fig. 7Fig. 7Fig. 7Fig. 8Fig. 7Fig. 7Fig. 7Fig. 7Fig. 7Fig. 7Fig. 8Fig. 7Fig. 7Fig. 7Fig. 7Fig. 7Fig. 7Fig. 7Fig. 8Fig. 7Fig. 7Fig. 7Fig. 7Fig. 7Fig. 7Fig. 7Fig. 7Fig. 8Fig. 7Fig. 7Fig. 7Fig. 7Fig. 7Fig. 7Fig. 7Fig. 7Fig. 7Fig. 7Fig. 8Fig. 7Fig. 7Fig. 7Fig. 7Fig. 7

Fig. 1 shows a device for generating a data stream for a multichannel reconstruction of an original multichannel signal, where the multichannel signal has at least two channels, according to a preferred embodiment of the present invention. The device includes a fingerprint generator 2 to which at least one base channel derived from the original multichannel signal can be fed over an input line 3. The number of base channels is greater than or equal to 1 and less than a number of channels of the original multichannel signal. If the original multichannel signal is merely a two-channel stereo signal, then only one of the two base channels available is the original two-channel stereo signal, but the original multichannel signal is a three-channel stereo signal.This embodiment is preferred because it allows audio to be played back as normal stereo without additional multichannel data. In a preferred embodiment of the present invention, the original multichannel signal is a five-channel surround signal and a low frequency enhancement (LFE) channel, also known as a subwoofer. The five channels are a left-surround channel Ls, a left-channel L, a middle channel C, a right-channel R and a rear-right or right-surround channel Rs.the multiple basic channels are also referred to as downmix channels.

Err1:Expecting ',' delimiter: line 1 column 513 (char 512)

The invention prefers block-based processing, where the fingerprint information is composed of a sequence of block fingerprints, whereby a block fingerprint is a measure of the energy of one or more base channels in the block. Alternatively, however, a specific sample of the block or a combination of block fingerprints may be used as a block fingerprint, as a sufficiently large number of block fingerprints as fingerprint information generates at least a rough representation of the temporal characteristics of at least one base channel.

The Fingerprint Generator 2 provides the fingerprint information fed to a data stream generator 4 at the output. The data stream generator 4 is trained to generate a data stream from the fingerprint information and the typically time-variable multichannel supplementary information, whereby the multichannel supplementary information together with at least one base channel allows the multichannel reconstruction of the original multichannel signal. The data stream generator is trained to generate the data stream at an output 5 such that the data stream is associated with a relationship between the multichannel supplementary information and the fingerprint information. The multichannel supplementary information is generated from the data stream, which is at least somewhat disaggregated by the base channel information, which is then sent to the data stream.

Fig. 2 shows a device according to the invention for generating a multichannel representation of an original multichannel signal from at least one base channel and one data stream, which contains fingerprint information representing a time sequence of at least one base channel and has multichannel additional information which together with at least one base channel allows the multichannel reconstruction of the original multichannel signal, from which a correlation between the multichannel multichannel signal and the fingerprint information can be derived. The at least one base channel is fed through an input to a receiver or decoder-like fingerprint generator 11 at least 10 times. The fingerprint generator 11 at least 2 times will be fed through a single algorithmic output from each base printer, but the data must not be transmitted in the same way as the original printer 12 times.

For example, the fingerprint generator 2 may generate a block fingerprint in absolute coding, while the fingerprint generator 11 on the decoder side performs a difference fingerprint determination such that the test block fingerprint associated with a block is the difference between two absolute fingerprints. In this case, if absolute block fingerprints are transmitted through the data stream containing the fingerprint information, a fingerprint extractor 14 will extract the fingerprint information from the data stream and at the same time generate differences so that the reference fingerprint information is transmitted over a data 15F 13 synchronous input that is comparable to the test fingerprint information.

Generally speaking, it is preferred that the algorithms for calculating the test fingerprint information on the decoder side and the algorithms for calculating the fingerprint information on the encoder side, which may also be referred to as reference fingerprint information in Fig. 2, are at least similar enough that the synchronizer 13 can synchronize the multichannel incremental data in the data stream received via an E16 using these two information to assign the data over at least one base channel.

Err1:Expecting ',' delimiter: line 1 column 758 (char 757)

The data on lines 18 and 20 thus constitute the synchronized multichannel display, whereby the data stream on line 20 corresponds to the data stream at input 16 except for any additional multichannel data encoding, except that the fingerprint information is removed from the data stream, which may be done in or before the synchronizer 13 depending on implementation. Alternatively, fingerprint removal can also be done already in the fingerprint extractor 14, so that if no line 19 is present, a 19' line from the fingerprint extractor 9 goes directly into the synchronizer 13.

The synchronizer is thus trained to synchronize the multichannel additional information and at least one base channel using the test fingerprint information and the reference fingerprint information, and using the data stream-derived context of the multichannel information with the fingerprint information contained in the data stream. The time relationship between the multichannel additional information and the fingerprint information is, as further explained below, preferably determined simply by whether the fingerprint information is in front of a set of multichannel additional information, behind a set of multichannel additional information or within a set of multichannel additional information.

Alternatively, however, a data stream format could be used in which all the fingerprint information is written to a separate part at the beginning of the data stream, followed by the entire data stream. Here, therefore, block fingerprints and blocks of multichannel supplementary information would not alternate. Alternative types and ways of arranging fingerprint information into multichannel decal-channel information are used. According to the findings, only one source of data stream information needs to be known in order to synchronize the data stream with the data stream.

Err1:Expecting ',' delimiter: line 1 column 828 (char 827)

Err1:Expecting ',' delimiter: line 1 column 96 (char 95)accordingly, the data stream with multichannel information again comprises blocks B1 to B8, each block in Fig. 7c corresponding to the corresponding block of the original multichannel signal in Fig. 7a and one or more of the original multichannel signal in Fig. 7b. For example, to reconstruct block B1 of the original multichannel signal1, the basic data must be in block B1 of the MK1 multichannel, designated BK1 and multiformats of the B1 multichannel in Fig. 7b.This combination is performed in the example shown in Fig. 6 by the BCC synthesis block, which has a block formation stage at its input again to obtain block processing of the base channel data.

P3 thus denotes, as shown in Fig. 7c, the multichannel information which, together with the block of values BK3 of the base channels, allows a reconstruction of the block of values MK3 of the original multichannel signal.

According to the invention, each block Bi of the data stream of Fig. 7c is now fingerprinted with a block fingerprint. For block B3, this means that preferably, block F3 of multichannel information is written after block P3. This fingerprint is now derived exactly from block B3 of the block of values BK3. Alternatively, block F3 could also be subject to differential coding, so that block F3 is equal to the difference between block BK3 of the base channel and block F2 of the block of base channels.

In the scenario described at the beginning, the data stream with one or more base channels in Fig. 7b is transferred separately from the data stream with the multichannel information and fingerprint information from Fig. 7c to a multichannel reconstructor. If nothing further is done, it may happen that at the multichannel reconstructor, for example at the BCC synthesis block 122 of Fig. 5, the block BK5 is waiting to be processed. Furthermore, it may be that due to some temporal sharpness, the multichannel information from block B7 instead of block B5 is waiting to be processed. Without further action, therefore, a reconstruction of the base data stream from block B5 with the multichannel information from block P7 would be delayed.

Depending on the embodiment and design/accuracy of the fingerprint information, the shift determination according to the invention is not limited to the calculation of a shift as an integer multiple of a block, but can, with sufficiently accurate correlation calculation and using a sufficiently large number of block fingerprints (which of course depends on the time taken to calculate the correlation), also achieve a shift accuracy equal to a fraction of a block and can reach up to a sampling value.

Fig. 7d shows a preferred embodiment of a block Bi, for example for block B3 of the data stream in Fig. 7c. The block is introduced with a sync word, which may be one byte long, for example. This is followed by a length information, as it is preferred to scale, quantify and entropy-code the multichannel information P3, as is known in the art, after calculation, so that the length of the multichannel information, which may be parameter information, but may also be a waveform signal, e.g. of the side channel, is not known from the previous channel and therefore must be signalized in the data stream.Err1:Expecting ',' delimiter: line 1 column 520 (char 519)wherein this block of multi-channel information P4 is again followed by the block fingerprint based on the BK4 base channel data for the BK4 base channel data block.

As shown in Fig. 7d, an absolute energy measure or a difference energy measure can be introduced as an energy measure, and then the difference between the energy measure for the base channel data BK3 and the energy measure for the base channel data BK2 would be added to block B3 of the data stream as a block fingerprint.

Fig. 8 shows a more detailed representation of the synchronizer, the fingerprint generator 11 and the fingerprint extractor 9 from Fig. 2 in cooperation with the multichannel reconstructor 21. The base channel data is fed into and buffered in a base channel data buffer 25. Accordingly, the additional information or data stream containing the additional information and the fingerprint information is fed into an additional information buffer 26. Both buffers are generally constructed in the form of a FIFO buffer, but the buffer has 26 additional capacities, so that the fingerprint data are extracted from the reference fingerprint buffer 9 and can be extracted from the data stream, but can also be carried out on any other input or output element, but without the input or output buffer 27 The data stream can be extracted from the buffer 27 or any other input or output element, but can be carried out in any other time frame, but without the input or output buffer.If absolute fingerprints are used on both the reference and test side, the fingerprint information calculated by the fingerprint generator 11 and the fingerprint information obtained by the fingerprint extractor 9 can be fed directly into a correlator 29 within the synchronizer 13 of Fig. 2. The correlator then calculates the shift rate and delivers it to the same one via a shift rate 30 at the time 28. The synchronizer 13 is further trained, so that a valid shift rate 28 is generated and the time 31 is passed to a freed shift rate 31 so that the 32 is able to move 31in such a way that the stream of multichannel additional data is fed from buffer 26 via the time-shifter 28 and switch 32 into the multichannel reconstructor 21.

Err1:Expecting ',' delimiter: line 1 column 523 (char 522)

However, in applications where initial time delays are not of great importance, the output of the multichannel reconstructor 21 can be held back until a valid shift is available, at which point the very first block (BK1 of Fig. 7b) with the now correctly delayed multichannel additional data P1 (Fig. 7c) can be fed to the multichannel reconstructor 21, so that output will only be started when multichannel data is available.

The following is a description of the functionality of correlator 29 from Figure 8 with reference to Figure 9. At the output of the test fingerprint calculator 11 a sequence of test fingerprint information is provided as shown in the top sub-image of Figure 9. Thus, for each block of the base channels, where this block is designated by 1, 2, 3, 4, i, there is a block fingerprint. Depending on the correlation algorithm, only the sequence of discrete values is needed for correlation. However, other correlation algorithms can also obtain as input a curve interpolated between the discrete values, as shown in Figure 9.For example, if the data stream contains differential-coded fingerprint information and the correlator is to work on the basis of absolute fingerprints, a differential decoder 35 is activated in Fig. 8. However, it is preferable that the data stream contains absolute fingerprints as an energy measure, since this information about the total energy per block can also be used to advantage by the multichannel reconstructor 21 for level correction purposes. Furthermore, it is preferable to perform the correlation on the basis of difference fingerprints. In this case, block 9 will perform a differential processing of the differentiator, and block 11 will also be correlated by a pre-processing of the differential,The Commission has already done so.

The correlator 29 will now contain the curves or sequences of discrete values shown in the two upper sub-shapes of Fig. 9 and give a correlation result shown in the lower sub-shapes of Fig. 9. This results in a correlation result whose shift component gives exactly the shift between the two fingerprint information curves. Since the shift is also positive, the multichannel additional information must be shifted in positive time, i.e. delayed. It should be noted that of course the base channel data could also be shifted in negative time, or that both the multichannel additional information could be shifted in a part in the positive direction, and the base channel data could be shifted in a part in the negative direction, thus providing a longer multichannel synchronization of the two multichannel additional data.

Err1:Expecting ',' delimiter: line 1 column 786 (char 785)

Depending on the implementation, less than 200 blocks or more than 200 blocks may be used. According to the invention, a number between 100 and 300 blocks and preferably 200 blocks has been shown to yield results that provide a reasonable compromise between computation time, correlation computation effort and translation accuracy.

When block 36 is finished, a block 37 is created, where the correlator 29 correlates the 200 calculated test block fingerprints with the 200 calculated reference block fingerprints. The resulting delay is then stored. A number of the next blocks, e.g. 200 blocks of the base data, are then calculated in block 38 according to block 36. Accordingly, another 200 blocks are extracted from the data stream with the multichannel supplementary information. Then a correlation is performed in block 39 and the resulting delay is stored.If the deviation is below a predetermined threshold, block 41 passes the shift from shift line 30 to the timer 28 of Fig. 8 and switch 32 is closed, so that the multichannel output is switched on from that point on. A predetermined value for the deviation threshold is, for example, a value of one or two blocks. This is based on the fact that if a shift from one calculation to the next calculation does not change by more than one or two blocks, no error has been made in the correlation calculation.

A sliding window with a window length of a number of blocks, such as 200, can be used to some extent. For example, a calculation with 200 blocks is performed and a result is obtained. Then a block is moved further and a block is removed from the number of blocks used for the correlation calculation and the new block is used instead. The result is then stored in a histogram, as is the last result obtained. This procedure is performed for a number of correlation calculations, such as 100 or 200, so that the histogram fills up and fills up.

The shift calculation, which takes place in parallel to the output, will be carried out in a block 42 and, as appropriate, when a deviation of the data flow with the multichannel information and the data flow with the base channel data has been detected, an adaptive or dynamic shift tracking will be achieved by feeding an updated shift value via line 30 to the time slider 28 of Figure 8.

A preferred embodiment of the encoder-side Fingerprint Generator 2 as shown in Figure 1 and the decoder-side Fingerprint Generator 11 as shown in Figure 2 is shown below, with reference to Figure 11.

In general, the multichannel audio signal is divided into blocks of fixed size for the extraction of the multichannel add-on data. For the extraction of the multichannel add-on data, a fingerprint is now calculated per block at the same time, which is suitable for characterizing the temporal structure of the signal as clearly as possible. An example of this is to use the energy content of the current downmix audio signal of the audio block, for example in logarithmic form, i.e. in a decibel-related representation. In this case, the fingerprint is a measure of the temporal envelope curve of the audio signal.

First, as shown in Fig. 11, an energy calculation of the downmix audio signal in the current block is performed for a stereo signal, if necessary. For example, 1152 audio samples from both the left and right downmix channels are squared and summed. Sleft (i) represents a time sample at time i of the left base channel, while Sright (i) represents a time sample at time i of the right base channel. For a monophonic downmix signal, the summation is omitted.

In step 2, a minimum energy limit is applied for subsequent logarithmic representation. For a decibel-related energy evaluation, it is preferred to use a minimum energy offset to provide a logarithmic calculation in the case of zero energy. This energy scale in dB exceeds a range of 0 to 90 (dB) at a 16-bit audio signal resolution.

As shown in Fig. 3, it is preferable to use the slope (partial) of the signal envelope instead of the absolute energy curve value to accurately determine the time difference between multichannel additional information and the received audio signal. Therefore, only the slope of the energy envelope is used for correlation measurement. Technically, this signal spread is calculated by differentiating the energy value with that of the previous block. This step is performed, for example, in the encoder.

Furthermore, it is preferable to scale the energy (curve of the signal) for optimal control. In order to maximize the numerical range and improve resolution at low energy values when subsequently quantifying this fingerprint, it is useful to introduce additional scaling (= amplification). This can be done either as a fixed and static weighting quantity or via a dynamic amplification control adapted to the curve signal.

In addition, as shown in Fig. 11, a fingerprint quantization is performed to prepare the fingerprint for input into the multichannel add-on information, and the fingerprint is quantized to 8 bits. This reduced fingerprint resolution has proven to be a good compromise in practice in terms of bit requirements and reliability of delay detection.

As shown in Fig. 6, an optimal entropy coding of the fingerprint can still be performed. By evaluating statistical properties of the fingerprint, the bit requirement of the quantized fingerprint can be reduced further. A suitable entropy method is, for example, Huffman coding or arithmetic coding. Statistically different frequencies of fingerprint values can be expressed by different coding angles, thus reducing the bit requirement of the fingerprint representation on average.

The calculation of the multichannel additional data is performed per audio block using the multichannel audio data, and the calculated multichannel additional information is then extended by the newly added synchronization information by appropriate embedding into the bitstream.

The solution of the invention now enables the receiver to detect a time lag between downmix signal and additional data and to achieve a time correct adjustment, i.e. delay compensation between stereo audio signals and multichannel additional information on the order of +/- 1⁄2 audio block.

Depending on the circumstances, the method of production or decoding may be implemented in hardware or software. The implementation may be on a digital storage medium, in particular a disk or CD with electronically readable control signals, which can interact with a programmable computer system in such a way that the procedure is executed. In general, the invention thus also consists of a computer program product with a program code stored on a machine-readable medium for the execution of the procedure, if the computer program product runs on a computer. In other words, the invention may thus be realized as a computer program with a program code for the execution of the procedure, if the computer program runs on a computer.

Claims

Device for generating a data stream for a multichannel reconstruction of an original multi-channel audio signal, wherein the multi-channel audio signal has at least two channels, comprising:
a fingerprint generator (2) for generating fingerprint information from at least one base channel derived from the original multi-channel audio signal, wherein a number of base channels is equal to or larger than 1 and less than a number of channels of the original multi-channel audio signal, wherein the fingerprint information gives a progress in time of the at least one base channel; and

a data stream generator (4) for generating a data stream from the fingerprint information and of time-variable multi-channel additional information which, together with the at least one base channel, allow the multi-channel reconstruction of the original multichannel audio signal, wherein the data stream generator (4) is designed to generate the data stream so that a time connection between the multi-channel additional information and the fingerprint information may be derived from the data stream.
Device of claim 1, wherein the fingerprint generator (2) is designed to process the at least one base channel blockwise to obtain the fingerprint information, wherein the multi-channel additional information is calculated blockwise so that they are to be used together with blocks of the at least one base channel for the multi-channel reconstruction, and wherein the data stream generator (4) is designed to write the multi-channel additional information and the fingerprint information blockwise into the data stream.
Device of claim 2, wherein the fingerprint generator (2) is designed to generate, as fingerprint information for a block of the at least one base channel, a block fingerprint giving a progress in time of the base channel in the block, wherein a block of the multi-channel additional information is to be used together with the block of the base channel for the multi-channel reconstruction, and wherein the data stream generator (4) is designed to write the data stream blockwise so that the block of multi-channel additional information and the block of fingerprint information have a predetermined relationship to each other.
Device of claim 2, wherein the fingerprint generator (2) is designed to calculate a sequence of block fingerprints as fingerprint information for blocks of the at least one base channel that are subsequent in time, wherein the multi-channel additional information is given blockwise for blocks of the at least one base channel that are subsequent in time, and wherein the data stream generator is designed to write the sequence of block fingerprints in a predetermined relationship to the sequence of blocks of the multichannel additional information.
Device of claim 4, wherein the fingerprint generator (2) is designed to calculate a difference between two fingerprint values of two blocks of the at least one base channel as block fingerprint.
Device of one of the preceding claims, wherein the fingerprint generator (2) is designed to perform a quantization and entropy coding of fingerprint values to obtain the fingerprint information.
Device of claim 6, wherein the fingerprint generator (2) is designed to scale fingerprint values with scaling information and to further write the scaling information into the data stream in association with the fingerprint information.
Device of one of the preceding claims, wherein the fingerprint generator (2) is designed to calculate the fingerprint information blockwise, and wherein the data stream generator (4) is designed to write the data stream blockwise so that a block of the data stream comprises a block of multi-channel additional information and a block of fingerprint information associated with the block of multi-channel additional information and a block of the at least one base channel.
Device of one of the preceding claims, wherein there are at least two base channels, and wherein the fingerprint generator (2) is designed to add the at least two base channels sample-wise or spectral value-wise or to square them prior to the addition.
Device of one of the preceding claims, wherein the fingerprint generator (2) is designed to use data on an energy envelope of the at least one base channel as fingerprint information.
Device of claim 10, wherein the fingerprint generator (2) is designed to use data on an energy envelope of the at least one base channel as fingerprint information, and wherein the fingerprint generator (2) is further designed to use a minimum limitation of the energy and to provide a logarithmic representation of a minimum-limited energy.
Device of claim 11, wherein the at least one base channel may be transmitted in coded form to a multichannel reconstructor, wherein the coded form has been generated using a lossy encoder, and wherein there is further a base channel decoder to provide a decoded form of the at least one base channel as input signal for the fingerprint generator (2).
Device of one of the preceding claims, wherein the multi-channel additional data are multi-channel parameter data each associated blockwise with corresponding blocks of the at least one base channel.
Device of claim 13, further comprising:
a multi-channel analyzer (112) for the blockwise generation of both a sequence of blocks of the at least one base channel and a sequence of blocks of the multi-channel additional information,
wherein the fingerprint generator (2) is designed to calculate a block fingerprint value from each block of values of the at least one base channel.
Device of claim 14, wherein the data stream generator (4) is designed to write the data stream into a separate data channel existing in addition to a standard data channel, via which the at least one base channel may be transmitted to a multi-channel reconstruction means.
Device of claim 15, wherein the standard data channel is a standardized channel for a digital stereo radio signal or a standardized channel for transmission via the internet.
Device for generating a multi-channel representation (18, 20) of an original multi-channel audio signal from at least one base channel and a data stream comprising fingerprint information giving a progress in time of the at least one base channel and multichannel additional information which, together with the at least one base channel, allow the multi-channel reconstruction of the original multi-channel audio signal, wherein a connection between the multi-channel additional information and the fingerprint information may be derived from the data stream, comprising:
a fingerprint generator (11) for generating test fingerprint information from the at least one base channel;

a fingerprint extractor (9) for extracting the fingerprint information from the data stream to obtain reference fingerprint information; and

a synchronizer (13) for synchronizing the multichannel additional information and the at least one base channel in time using the test fingerprint information, the reference fingerprint information and a connection of the multi-channel information and the fingerprint information contained in the data stream, which is derived from the data stream, to obtain a synchronized multi-channel representation.
Device of claim 17, further comprising:
a multi-channel reconstructor (21) for reconstructing the multi-channel representation using the synchronized multi-channel representation to obtain a reconstruction of the original multi-channel audio signal.
Device of claim 17 or 18, wherein the data stream comprises a sequence of blocks of multi-channel additional data in time connection with a sequence of reference fingerprint values as reference fingerprint information, wherein the extractor (9) is designed to determine an associated fingerprint value to a block of multichannel additional data based on the time connection; wherein the fingerprint generator (11) is designed to determine a sequence of test fingerprint values as test fingerprint information for a sequence of blocks of the at least one base channel; wherein the synchronizer (13) is designed to calculate an offset between the blocks of multi-channel additional data and the blocks of the at least one base channel based on an offset (30) between the sequence of test fingerprint values and the sequence of reference fingerprint values, and to compensate the offset by delaying (28) the sequence of blocks of the multi-channel additional information using the calculated offset.
Device of one of claims 17 to 19, wherein the fingerprint generator (11) is designed to perform a quantization of fingerprint values to obtain the test fingerprint information.
Device of one of claims 17 to 20, wherein the fingerprint generator (11) is designed to scale fingerprint values with scaling information from the data stream.
Device of one of claims 17 to 21, wherein there are at least two base channels, and wherein the fingerprint generator (11) is designed to add the at least two base channels sample-wise or spectral value-wise or to square them prior to the addition.
Device of one of claims 17 to 22, wherein the fingerprint generator (11) is designed to use data on an energy envelope of the at least one base channel as fingerprint information.
Device of one of claims 17 to 23, wherein the fingerprint generator (11) is designed to use data on an energy envelope of the at least one base channel as fingerprint information, and wherein the fingerprint generator (11) is further designed to use a minimum limitation of the energy and to provide a logarithmic representation of a minimum-limited energy.
Device of one of claims 17 to 24, wherein the data stream is organized blockwise, and a block of multichannel additional information and a block fingerprint are contained in a block of the data stream, wherein the fingerprint generator (11) is designed to calculate a difference between two block fingerprints of the at least one base channel as test fingerprint information, and wherein the fingerprint extractor (9) is further designed to calculate a difference of two block fingerprints in the data stream and to provide it as reference fingerprint information to the synchronizer (13).
Device of one of claims 17 to 25, wherein the synchronizer (13) is designed to calculate an offset between the multi-channel additional data and the at least one base channel in parallel to an audio output and to compensate the offset adaptively.
Device of claim 18, further designed to reproduce the at least one base channel when there are no synchronized multi-channel additional data yet, and to switch (32) from a mono or stereo reproduction of the at least one base channel to a multi-channel reproduction when there are synchronized multi-channel additional data.
Device of one of claims 17 to 27, designed to obtain the data stream and the at least one base channel via bit streams separate from each other, which are received via two logic channels or physical channels different from each other, or are obtained via the same transmission channel which, however, is active at different times.
Method for generating a data stream for a multichannel reconstruction of an original multi-channel audio signal, wherein the multi-channel audio signal has at least two channels, comprising:
generating (2) fingerprint information from at least one base channel derived from the original multichannel audio signal, wherein a number of base channels is equal to or larger than 1 and less than a number of channels of the original multi-channel audio signal, wherein the fingerprint information gives a progress in time of the at least one base channel; and

generating (4) a data stream from the fingerprint information and of time-variable multi-channel additional information which, together with the at least one base channel, allow the multi-channel reconstruction of the original multi-channel audio signal, wherein the data stream is generated so that a time connection between the multi-channel additional information and the fingerprint information may be derived from the data stream.
Method for generating a multi-channel representation (18, 20) of an original multi-channel audio signal from at least one base channel and a data stream comprising fingerprint information giving a progress in time of the at least one base channel and multichannel additional information which, together with the at least one base channel, allow the multi-channel reconstruction of the original multi-channel audio signal, wherein a connection between the multi-channel additional information and the fingerprint information may be derived from the data stream, comprising:
generating (11) test fingerprint information from the at least one base channel;

extracting (9) the fingerprint information from the data stream to obtain reference fingerprint information; and

synchronizing (13) the multi-channel additional information and the at least one base channel using the test fingerprint information, the reference fingerprint information and a connection of the multichannel information and the fingerprint information contained in the data stream, which is derived from the data stream, to obtain a synchronized multichannel representation.
Computer program product having a program code for performing the method of claim 29 or claim 30, when the program code runs on a computer.
Data stream comprising fingerprint information giving a progress in time of at least one base channel derived from an original multi-channel audio signal, wherein a number of base channels is equal to or larger than 1 and less than a number of channels of the original multi-channel audio signal, and multichannel additional information which, together with the at least one base channel, allow the multi-channel reconstruction of the original multi-channel audio signal, wherein a connection between the multi-channel additional information and the fingerprint information may be derived from the data stream.
Data stream of claim 32, comprising control signals to generate a synchronized multi-channel representation of the original multi-channel audio signal, when the data stream is fed into the device of claim 17.