WO1995004348A1

WO1995004348A1 - Multimedia playblack system

Info

Publication number: WO1995004348A1
Application number: PCT/US1994/008779
Authority: WO
Inventors: Howard L. Resnikoff; John P. Stautner; John Corydon Huffman
Original assignee: Aware Inc
Current assignee: Aware Inc
Priority date: 1993-08-02
Filing date: 1994-08-02
Publication date: 1995-02-09
Anticipated expiration: 1996-02-02
Also published as: AU7519594A

Abstract

A method for operating a digital computer to generate a decompressed audio/visual signal from compressed material stored on a disk drive. The incoming compressed stream is decoded by a decoder (602) and de-quantizer (604) to generate set of transform coefficients (615) which are used to reconstruct the time domain signal. The computational requirements of the filters, which coefficients and frequency cut-offs are stored in memory (615), used to synthesize the decompressed signal are varied in response to the available computational resources of the computer, thereby allowing a single compressed audio/visual program to be played back in real time on a variety of platforms by trading off audio/visual quality against available computational resources.

Description

MULTIMEDIA PLAYBACK SYSTEM

Field of the Invention

The present invention relates to computer systems, and more particularly, to a method for providing multimedia program material that may be used on a variety of computer platforms.

Background of the Invention

Computer based audio and video systems have become increasingly useful in training programs and other multimedia applications. Personal computer based systems using compressed audio and video data promise to provide inexpensive playback solutions and allow distribution of program material on digital disks or over a computer network.

While data compression systems have gone a long way toward solving the problem of storing the audio and video material, the capabilities of personal computers are still limiting. The computational requirements for decompressing an audio or video signal in real time at high resolution are beyond the capability of many personal computers.

One solution to this problem would be to use lower quality playback on computer platforms that lack the computational resources to decode compressed material at high fidelity quality levels. Unfortunately, this solution requires that the material be coded at various quality levels. Hence, each program would need to be stored in a plurality of formats. Different types of users would then be sent the format suited to their application. The cost and complexity of maintaining such multi-format libraries makes this solution unattractive.

In principle, all users could be sent one disk having copies of the material at all the different resolution levels. However, the storage requirements of the multiple formats partially defeats the basic goal of reducing the amount of storage needed to store the material. Furthermore, the above discussion assumes that the computational resources of a particular playback platform are fixed. This assumption is not always true in practice. The computational resources of a computing system are often shared among a plurality of applications that are running in a time-shared environment. Similarly, communication links between the playback platform and shared storage facilities also may be shared. As the playback resources change, the format of the material must change adaptively in systems utilizing a multi-format compression approach. This problem has not been adequately solved in prior art systems.

Broadly, it the object of the present invention to provide an improved audio/video compression system.

It is yet another object of the present invention to provide an audio/video compression system that allows the compressed material to be played back on a variety of playback platforms with different computational capabilities without maintaining multiple copies of the compressed material.

It is a still further object of the present invention to provide an audio/video compression system in which the bandwidth needed to transmit the audio/video material may be varied in response to changes in the available bandwidth.

These and other objects of the present invention will become apparent to those skilled in the art from the following detailed description of the invention and the accompanying drawings.

Summary of the Invention

The present invention comprises a method for operating a computer to generate a decompressed audio/visual signal for playback on the computer or equipment connected to the computer. Compressed audio/visual program material is stored on a disk drive connected to the computer. The material is assumed to have been compressed using a perfect, or near perfect, reconstruction filter bank based on a known set of filter coefficients. Information specifying a plurality of sets of filter coefficients for use in decompressing the audio/visual program material is also stored on the disk. Each set of filter coefficients provides a different degree of fidelity in the decompressed audio/visual program and requires a different computational capacity to decompress the audio/visual material in real time. The computer program determines the computational capacity of the computer, selects one of the sets of filter coefficients based on the measured computational capacity; and generates a decompressed audio/visual signal from the compressed audio/visual program material using the selected set of filter coefficients. In one embodiment of the invention, a plurality of sets of filter cut-offs are also stored on the disk. Each the set of filter cut-offs provides a different degree of fidelity in the decompressed audio/visual program and requires a different computational capacity to decompress the audio/visual material in real time when used by the decompression code. The invention selects one of the sets of filter cut-offs based on the measured computational capacity; and utilizes the selected set of filter cut-offs to reduce the computational capacity needed to generate the decompressed audio/visual signal.

Brief Description of the Drawings

Figure 1 is a block diagram of a typical compression/decompression system.

Figure 2 is a block diagram of a decompression system utilizing the variable computational load techniques of the present invention.

Detailed Description of the Invention

The manner in which the present invention operates may be more easily understood with reference to a typical compression/decompression system as shown in Figure 1 at 10. The input to the system may be a segment of an audio soundtrack, a frame or set of frames in a video sequence, or a segment of a speech soundtrack. The input signal will be assumed to be digitized upon receipt by transform generator 12. Transform generator 12 typically fits the input signal to a linear approximation utilizing a predetermined set of basis functions. The output of transform generator 12 is typically the coefficients of the basis functions in the best fit to the model represented by the basis functions in question. For example, in the case of audio or speech soundtracks (See U.S. Patent 4,885,790 to McAulay, et al.), transform generator 12 typically computes the finite Fourier transformation of the segment of a soundtrack or some other measure of the signal intensity in each of a plurality of frequency bands. In the case in which the signal is a single image, transform generator 12 may compute the coefficients of the image in a sub-band decomposition of the image (See U.S. Patent 5,014,134 to Lawton, et al). If the input signal is a set of images representing a segment of a video sequence, the output of transform generator 12 may be the coefficients of a sub-band decomposition of a "three-dimensional" image constructed from the set of frames (See U.S. Patent 5,121,191 to Cassereau, etal).

The transform coefficients are then replaced with approximations by quantizer 14. Typically, the transform coefficients are grouped by the accuracy with which they must be represented to recover a satisfactory approximation of the original signal. In any given group, each coefficient is replaced by an integer value which identifies one of a predetermined number of states for the coefficient. To simplify the discussion, it will be assumed that each coefficient within a given group, will be allocated K bits. Denote the coefficients by {pj}. Let Pmin and Pmax be the minimum and maximum values, respectively, of the set of parameters {p_j}. In the simplest case, 2^K equally spaced levels, denoted by L_j, are defined between P,,^ and Pmax- Each coefficient, pj, is then replaced by an integer, k, where Lj-≤pj<L_j-₊ι. These integers are used in place

In general, the manner in which the coefficients are grouped and the value of K will depend on the type of data being compressed. For example, in the case of audio soundtracks, psycho-acoustic effects determine the errors that can be introduced into transform coefficients without significantly altering quality, while in image compression systems, the sub-band coding divides the image into sub-images of varying spatial frequency content. Most of the information is concentrated in low- frequency component images. Hence, only the high-frequency component image data may be coded with lower precision.

The final step in the compression process is to code the approximations generated by quantizer 14 using coder 16. The quantized coefficients are further coded by coder 16 which makes use of the redundancy in the quantized coefficients to further reduce the number of bits needed to represent the coded coefficients. Coder 18 does not introduce further errors into the coefficients. Coder 16 replaces each coefficient by a code that depends on the number of times the coefficient value appears in the set of coefficients. Values that occur more frequently are replaced by codes having fewer bits than values that occur only rarely; hence, a net savings is achieved. Coding algorithms are well known to those skilled in the signal compression arts, and hence, will not be discussed in more detail here.

The compressed signal is typically stored on CD-ROM or some other media. The compressed signal may also be communicated over a communication channel having a bandwidth which is insufficient to carry the original signal.

An approximation to the original signal is generated by reversing the compression process. A decoder 20 reverses the coding operations introduced by coder 16. Dequantizer 22 then reconstructs the approximations to the original transform coefficients. These approximations are then input to inverse transform generator 24 which applies the inverse of the transformation applied by transform generator 12 to generate a playback signal. If the original transform coefficients had not been replaced by approximations, the playback signal would exactly match the original input signal, provided all of the intermediate computations were carried out with sufficient numerical precision.

In general, a computational bottleneck is encountered in the decompression of the signal which must be performed in real time. The number of frames that must be decompressed per second in a video sequence is determined by the frame rate of the original sequence. Reducing the rate would lead to a slow-motion effect. Similarly, the playback of an audio track at a reduced rate is not acceptable. In principle, the audio/video track can be pre-decompressed in its entirety and then played back at normal speed. However, this requires sufficient fast storage to accommodate the entire work. This amount of storage is often not available. In addition, any communication links between the fast storage and the central processing unit must be sufficiently fast to accommodate the real time output data rates. This requirement can be particularly acute in video presentations since a 1000x1000 pixel motion picture can require a data channel with a capacity of 30 Mbytes per second. Hence, decompression prior to display is preferable in many cases.

In contrast, the compression process can often be performed on a computing platform with much greater capacity than that of the playback platforms. The compression process need only be done once; while, the playback process will be performed many times. Hence, a large investment in compression may be amortized over a large number of playbacks. In addition, the compression process may not need to be done in real time. The input signal can be stored on some intermediate storage device and compressed at a fraction of its real time rate.

In many cases, the playback platform will not have sufficient computational capacity to decompress an audio/video sequence at high resolution in real time. The present invention provides a solution to this problem by providing a playback system which generates a lower resolution playback signal in real time. The loss of resolution is determined by the computational capacity of the playback platform.

In one embodiment of the present invention, a computer program is distributed with the compressed audio/video material. The program initially measures the computational capacity of the playback platform when the program is executed by the platform. The capacity is measured by determining the time needed to complete a test program. Once, the computational capacity of the playback platform is determined, the decompression algorithms are adjusted such that these algorithms can complete their computations in real time. The adjustments are made in the inverse transform algorithms.

The original transformation that generated the transform coefficients may be viewed as a set of filter banks in which each filter bank generates one of the transform coefficients. Each filter bank consists of a finite impulse response (FIR) filter which generates the correlation between a set of filter coefficients and the input signal samples. Denote the i* transform coefficient by Fj. In general, for each M input signal values received, M transform coefficients are generated according to the relationship

for i= 0 to M-l. Here X_j- is the k* input signal sample. In many applications, W>M. In general, W is an integer multiple of M. If W is greater than M then the filter bank stores the last W input samples received. Each time input samples are received, the filter bank carries out the calculation shown in Eq. (1). For audio applications, W is usually much greater than M. In video compression systems, W is usually equal to M. However, the following observations apply in either case.

The weights ^JA_j- may viewed as defining M band pass filters. The center frequency and shape of each filter is determined by ^A^. In the case of image compression systems, the filters are spatial frequency filters that operate in two dimensions; however, such filters may be written in a form consistent with Eq. (1). In the case of video compression systems in which a three-dimensional image is constructed from a sequence of frames, the filters operate in two spatial dimensions and one time dimension.

The inverse transformation process also involves generating weighted sums. In general, the inverse transformation may be written in the form

W-l .

X_i = Σ 'A'₁ *F_k (2)

where the coefficients 'A' _k are related to the filter coefficients 'A^ described above.

It will be apparent from Eq. (2) that there are two classes of approximations that may be used to trade computational complexity for fidelity of the reconstructed signal. The first method would be to use a reconstruction filter bank with less ideal filters. As the number of samples increases, the discrepancy between the sub-band analysis filter performance and that of an ideal band pass filter decreases. For example, if an optimized audio filter utilizing 128 samples has a side lobe suppression in excess of 48 dB, while a filter utilizing 512 samples has a side lobe suppression in excess of 96 dB. Hence, if the inverse transform is replaced by one having a smaller W, synthesis quality can be traded for a reduction in computational workload.

Consider two different perfect, or near perfect, reconstruction filter systems. The first system utilizes the filter coefficients that provide the highest quality reconstruction of the original signal when used in the transform and inverse transform generators. Denote the coefficients used to generate the transform coefficients by ^(1), and the coefficients used to generate the playback signal by 'A'j-Q), the value of W for this first filter system being denoted by W(l). The second system utilizes filter coefficients which correspond to less ideal filters. Denote the coefficients used to generate the transform coefficients by ⁱAj-(2), and the coefficients used to generate the playback signal by ^'^(2 , the value of W for this first filter system being denoted by W(2). Here, W(2)<W(1). The ^(2) are chosen such that the equivalent band pass filters have center frequencies at the same locations as those provided by the first filter system and response characteristics which are similar in shape to those provided by the first filter system. Since W(2)<W(1), the second set of filters will have characteristics that do not quite match those of the first set.

Now consider the case in which the first set of filter coefficients is used to generate the compressed data and the second set of filter coefficients is used to reconstruct the playback signal. The second set of coefficients may be viewed as having W(l) coefficients in which the coefficients W(l)-W(2) of the coefficients have been replaced by zeros. Since the second set is based on a transform filter set that has a similar shape to the original filter set, the resulting playback signal will be an approximation to the playback signal that would have been obtained using an inverse transform based on the first set of filter coefficients; however, the computational workload will have been reduced by a factor equal to W(2) W(l).

In many audio compression systems, the filter coefficients used to generate each bandpass filter are obtained by modulating a prototype filter represented by a set of coefficients. The prototype filter coefficients, hj, viewed as a function of i have a more or less sine-shaped appearance with tails extending from a maximum. The tails provide the corrections which result in the high sidelobe rejection. If the tails are truncated, the filter bands would have substantially the same bandwidths and center frequencies as those obtained from the non-truncated coefficients. However, the rejection of signal energy outside a specific filter's band would be less than that provided by the non-truncated filter. As a result, a compression and decompression system based on the truncated filter would show significantly more aliasing than the non-truncated filter. Since all of the filters in a given filter bank are generated from the single prototype filter, a family of filter banks can be generated by merely changing the point at which the filter is truncated. Hence, the various filter coefficients can be generated without having to store a large number of different sets of filters.

The second method of trading quality for computational workload operates by setting one or more of the transform coefficients to zero before computing the inverse transform. In most cases of interest, the effect of setting one or more transform coefficients to zero will be apparent. For example, if the high spatial frequency transform coefficients are set to zero in an image compression system, the resulting image will be somewhat blurred compared to that obtained by utilizing all of the transform coefficients in generating the output signal. Similarly, setting the high- frequency coefficients to zero in an audio compression system has the effect of reducing the quality of the regenerated audio track but still providing an audio track that will be recognized by a human listener. In this case, the reconstructed signal would be equivalent to listening to the original soundtrack on a poor quality receiver.

If it is known in advance that certain transform coefficients are zero, then the multiplications and additions involving these transform coefficients may be omitted in generating the playback signal. In particular, Eq. (2) may be rewritten to separate the computations involving the high-frequency coefficients from those involving the lower frequency transform coefficients. The portion of the computation involving the lower frequency transform coefficients is always performed. However, the portion involving the high-frequency transform coefficients is only performed if the computational capacity of the playback platform is sufficiently great. It will be apparent that more gradual degradation patterns may be obtained by having several levels of transform coefficients that may be set to zero if needed.

In the preferred embodiment of the present invention several sets of filter coefficients are stored on a CD-ROM disk together with the compressed audio/visual information. One or more decompression programs are also stored on the disk. Different decompression programs are used for different types of processors. When a user loads the decompression program used by his or her processor, the program measures capacity of the playback platform by measuring the time needed to execute a standard test program. The computer then loads the filter coefficients and information specifies the maximum frequencies to be used in the decompression of each type of file on the disk. When the user selects a file, the computer automatically decompresses the file using the loaded filter coefficients and frequency cut off.

The above-described embodiments of the present invention assume a fixed capacity for the playback platform. However, the actual capacity of a playback platform may vaiy with time. For example, in multi-tasking systems, the computational capacity of the platform is shared between a number of applications. As the number changes, the capacity available to any given application also changes. For this reason, it is useful to provide a decompression system whose quality may be continually adjusted in response to variations in the available computational capacity.

A block diagram of a decompression system providing the ability to respond to a variable computational load is shown in Figure 2 at 600. The system is preferably implemented on a computer. The incoming compressed stream is decoded by decoder 602 and de-quantizer 604 to generate sets of transform coefficients which are used to reconstruct the time domain signal values. The output of inverse transform generator 606 is loaded into a FIFO buffer 608 which feeds a set of D/A converters 610 at a constant rate determined by clock 609. The outputs of the D/A converters are used to drive speakers 612 and display device 613. Buffer 608 generates a signal that indicates the number of time domain samples stored therein. This signal is used by controller 614 to adjust the parameters that control the computational complexity of the synthesis operations in synthesizer 606. The various sets of filter coefficients and frequency cut-offs are stored in memory 615. When this number falls below a predetermined minimum value, the computational algorithm used by inverse transform generator 606 is adjusted to reduce the computational complexity, thereby increasing the number of time domain samples generated per unit time. For example, controller 614 can replace the filter coefficients currently being used by inverse transform generator 606 by a smaller set of coefficients that approximates the original set. Alternatively, controller 614 can force all of the high-frequency components from bands having frequencies above some predetermined frequency to be zero. In this case, controller 614 also instructs de-quantizer 604 not to unpack the high-frequency components that are not going to be used in the synthesis of the signal. This provides additional computational savings.

If the number of stored values exceeds a second predetermined value, controller 614 adjusts the computational algorithm to regain the output signal quality if inverse transform generator 606 is not currently running in a manner that provides the highest quality. In this case, controller 614 reverses the approximations introduced into inverse transform generator 606 discussed above.

While decompression system 600 has been discussed in terms of individual computational elements, in the preferred embodiment of the present invention, the functions of decoder 602, de-quantizer 604, inverse transform generator 606, buffer 608, memory 615, and controller 614 are implemented as a program on a general purpose digital computer. In this case, the functions provided by clock 609 may be provided by the computer's clock circuitry.

In the preferred embodiment of the present invention, the various levels of degradation are determined for all audio/visual programs stored on the CD-ROM. The filter coefficients and frequency cut-offs for each level of degradation are stored for use by controller 614. However, it will be apparent to those skilled in the art that the amount of additional information that must be stored with each audio/visual program to specify the various forms of degraded playback is small compared to the size of the compressed files specifying the program material. Hence, the order in which the filter coefficients and high-frequency components are to be changed may be specified for each program. This allows the manner in which the playback of the program material degrades to be optimized to the specific program material.

While the above described embodiments of the present invention have been described in terms of compressed audio/video information stored on disk, it will be apparent to those skilled in the art that the material may be received over a communication link. For example, the compressed material may be broadcast to a number of playback platforms having differing computational capacity. Each platform selects a quality level that is consistent with its playback capability. Hence, one program source may be used by all platforms.

Various modifications to the present invention will become apparent to those skilled in the art from the foregoing description and accompanying drawings. Accordingly, the present invention is to be limited solely by the scope of the following claims.

Claims

WHAT IS CLAIMED IS:

1. A method for operating a computer to generate a decompressed audio/visual signal for playback equipment connected to said computer, said method comprising the steps of: receiving compressed audio/visual program material; storing information specifying a plurality of sets of filter coefficients for use in decompressing said audio/visual program material, each set of filter coefficients providing a different degree of fidelity in the decompressed audio/visual program; determining the computational capacity of said computer; selecting one of said sets of filter coefficients based on said measured computational capacity; and generating a decompressed audio/visual signal from said compressed audio/visual program material using said selected set of filter coefficients.

2. The method of Claim 1 further comprising the steps of: storing a plurality of sets of filter cut-offs, each said set of filter cut-offs providing different degrees of fidelity in said decompressed audio/visual program; selecting one of said sets of filter cut-offs based on said measured computational capacity; and utilizing said selected set of filter cut-offs to reduce the computational capacity needed to generate said decompressed audio/visual signal.

3. An audio/visual recording for use in generating a decompressed audio/visual program for playback on a playback platform, said recording comprising a CD-ROM disk having stored thereon: compressed audio/visual program material; a plurality of sets of filter coefficients for use in the decompression of said audio/visual program material, each set of filter coefficients providing different degrees of fidelity in the decompressed audio/visual program; and a computer program adapted for running on said playback platform, said computer program comprising code for: measuring the computational capacity of said playback platform; selecting one of said sets of filter coefficients based on said measured computational capacity; and generating a decompressed audio/visual signal from said compressed audio/visual program material using said selected set of filter coefficients.

4. The recording of Claim 3 further comprising a plurality of sets of filter cut¬ offs, each said set of filter cut-offs providing different degrees of fidelity in said decompressed audio/visual program, wherein said computer program further comprises code for selecting one of said sets of filter cut-offs based on said measured computational capacity and utilizing said selected set of filter cut-offs to reduce the computational capacity needed to generate said decompressed audio/visual signal.