HK1117320B

HK1117320B - Method and device for indicating quantizer parameters in a video coding system

Info

Publication number: HK1117320B
Application number: HK08111375.4A
Authority: HK
Inventors: J‧莱恩马
Original assignee: Nokia Technologies Oy
Priority date: 2002-04-23
Filing date: 2008-10-15
Publication date: 2013-07-19

Description

Method and apparatus for indicating quantizer parameters in a video coding system

This application is a divisional application based on application No. 03809010.4, with the application date being 4/23/2003.

Technical Field

The invention relates to a method, encoder, decoder and apparatus for digital video encoding. More particularly, this disclosure relates to indication of Quantization Parameter (QP) values in video coding systems.

Background

Digital video sequences resemble conventional films recorded on film and comprise a sequence of still images, the illusion of motion usually being created by displaying said images successively at a rate of between 15 and 30 frames/second. Each frame of an uncompressed digital video sequence includes an array of image pixels. In one commonly used digital video format, for example, the Quarter Common Interchange Format (QCIF), a frame includes an array of 176 x 144 pixels (i.e., 25,344 pixels). Each pixel is then represented by a certain number of bits carrying information about the luminance and/or color content of the image area corresponding to the pixel. Typically, a color model called YUV is used to represent the luminance and chrominance of the image content. The luminance or Y component represents the intensity (luminance) of the image, while the color content of the image is represented by two chrominance or color difference components, labeled U and V.

A color model based on a luminance/chrominance representation of the image content offers certain advantages compared to a color model based on a representation comprising the original colors (i.e. red, green and blue, RGB). The human visual system is more sensitive to intensity variations than it is to color variations, and by using a lower spatial resolution for the chrominance components (U, V) than for the luminance component (Y), the YUV color model exploits this property. In this way, the amount of information required to encode color information in an image can be reduced with an acceptable reduction in image quality.

The lower spatial resolution of the chrominance components is usually obtained by sub-sampling. Typically, each frame of a video sequence is divided into blocks called "macroblocks", which include luminance (Y) information and associated chrominance (U, V) information, which are spatially sub-sampled. Fig. 1 illustrates one way in which macroblocks may be formed. As shown in fig. 1, a frame of a video sequence is represented using a YUV color model, each component having the same spatial resolution. A macroblock is formed as four blocks of luminance information by representing a 16 x 16 image pixel area in the original image, each luminance block comprising an 8 x 8 array of luminance (Y) values and two spatially corresponding chrominance components (U and V), subsampled by a factor of two in the horizontal and vertical directions to produce a corresponding array of 8 x 8 chrominance (U, V) values. According to certain video coding recommendations, such as the international telecommunication union (ITU-T) recommendation h.26l, the block sizes used within macroblocks may be of sizes other than 8 x 8, e.g. 4 x 8 or 4 x 4 (see t.wiegand, "Joint Model Number 1", doc.jvt-a003, Joint Video Team (JVT) of ISO/IEC MPEG and ITU-T VCEG, month 1 2002, sections 2.2 and 2.3). ITU-T recommendation h.26l also allows macroblocks to be organized together to form slices called "slices". More specifically, each slice is formed from a plurality of consecutive macroblocks in coding order and is coded in such a way that it can be decoded independently without having to refer to any other slice of the same frame. This scheme is advantageous because it tends to limit the propagation of artifacts in the decoded video that may be caused by transmission errors. Although there is no particular limitation on the manner in which slices can be constructed, a simple scheme is to group all macroblocks together as a slice in a row of a frame. This scheme and the division of a QCIF-format picture into 16 x 16 macroblocks is illustrated in fig. 2.

As can be seen from fig. 2, one QCIF image includes 11 × 9 macroblocks (in this case, 9 slices each grouped into 11 consecutive macroblocks). If luminance blocks and chrominance blocks are represented with 8-bit resolution (the number of which is in the range 0 to 255), the total number of bits required for each macroblock is (16 × 16 × 8) +2 × (8 × 8) — 3072 bits. The number of bits required to represent an image frame in QCIF format is therefore 99 × 3072-304,128 bits. This means that the amount of data (represented using a YUV color model at a rate of 30 frames/second) required to transmit/record/display an uncompressed video sequence in QCIF format is greater than 9Mbps (megabits/second). This is a very high data rate and is impractical for use in video recording, transmission and display applications due to the tremendous storage capacity, transmission channel capabilities and hardware performance required.

If the video data is to be transmitted in real time over a fixed line network such as an ISDN (integrated services digital network) or a conventional PSTN (public switched telephone network), the available data transmission bandwidth is typically around 64 kbits/s. In mobile video telephony, where transmission occurs at least partially over a radio communications link, the available bandwidth may be as low as 20 kbits/s. This means that a significant reduction in the amount of information used to represent the video data must be achieved in order to initiate the transmission of digital video sequences over a low bandwidth communication network. For this reason, video compression techniques have been developed that reduce the amount of transmitted information while maintaining an acceptable image quality.

Video compression methods are based on reducing those parts of a video sequence that are redundant and perceptually irrelevant. Redundancy in video sequences can be classified into spatial, temporal and spectral redundancy. "spatial redundancy" is a term used to describe the correlation (similarity) between adjacent pixels within a frame. The term "temporal redundancy" denotes the fact that an object appearing in one frame of a sequence may appear in subsequent frames, whereas "spectral redundancy" refers to the correlation between different color components of the same image.

There is often a significant amount of spatial redundancy between the pixels that make up each frame of a digital video sequence. In other words, any pixel value within a frame of the sequence is substantially the same as other pixel values in its immediate vicinity. Typically, video coding systems reduce spatial redundancy using a technique known as "block-based transform coding," in which a mathematical transform, such as a two-dimensional Discrete Cosine Transform (DCT), is applied to blocks of image pixels. This converts the image data from a representation comprising pixel values to a form comprising a set of coefficient values representing spatial frequency components. This alternative representation of the image data significantly reduces spatial redundancy and thus produces a more compressed representation of the image data.

Individual frames of a video sequence that are compressed using block-based transform coding, without reference to any other frame within the sequence, are referred to as intra-coded or I-frames.

In general, video coding systems not only reduce spatial redundancy within individual frames of a video sequence, but also use a technique called "motion compensated prediction" to reduce temporal redundancy in the sequence. The image content of some (often many) frames in a digital video sequence is "predicted" from one or more other frames in the sequence, called "reference" frames, using motion compensated prediction. A prediction of image content is obtained by tracking the motion of an object or image area between a frame to be encoded (compressed) and one or more reference frames using "motion vectors". As in the case of intra-coding, motion compensated prediction of video frames is typically performed macroblock by macroblock.

A frame of a video sequence compressed using motion compensated prediction is commonly referred to as an inter-coded or P-frame. Motion compensated prediction alone rarely provides a very accurate representation of the image content of a video frame and therefore it is often necessary to provide each inter-coded frame with a frame known as a "prediction error" (PE) frame. The prediction error frame represents the difference between the decoded version of the inter-coded frame and the image content of the frame to be encoded. More specifically, the prediction error frame includes a value representing the difference between a pixel value in the frame to be encoded and a corresponding reconstructed pixel value formed from a prediction form of the current frame. Thus, the prediction error frame characteristics are similar to those of a still image and block-based transform coding can be applied in order to reduce its spatial redundancy and thus the amount of data (number of bits) required to represent it.

To illustrate the operation of the video coding system in more detail, reference will now be made to fig. 3 and 4. Fig. 3 is a simplified diagram of a general-purpose video encoder that uses a combination of intra-frame and inter-frame encoding to generate a compressed (encoded) video bitstream. A corresponding decoder is illustrated in fig. 4 and will be described later in the text.

The video encoder 100 comprises an input 101 for receiving a digital video signal from a video camera or other video source (not shown). It still includes: a conversion unit 104 arranged to perform a block-based Discrete Cosine Transform (DCT), a quantizer 106, an inverse quantizer 108, an inverse conversion unit 110 arranged to perform an inverse block-based discrete cosine transform (DDCT), combiners 112 and 116, and a frame memory 120. The encoder further comprises a motion estimator 130, a motion field encoder 140 and a motion compensated predictor 150. Switches 102 and 114 are also operatively operated by control manager 160 to switch the encoder between intra and inter video coding modes. The encoder 100 also includes a video multiplex encoder 170 that forms a single bit stream from the various types of information generated by the encoder 100 for further transmission to a remote receiving terminal or, for example, for storage on a mass storage medium such as a computer hard drive (not shown).

The encoder 100 operates as follows. Each uncompressed video frame provided to input 101 from the video source is preferably processed macroblock by macroblock in raster scan order. When a new video sequence encoding begins, the first frame to be encoded is encoded as an intra-coded frame. The encoder is then programmed to encode each frame in an inter-coding format unless one of the following conditions is met: 1) determining that a current macroblock of a frame being encoded is too different from a reference frame used in its prediction in terms of pixel values to generate excessive prediction error information, in which case the current macroblock is encoded in an intra-coded format; 2) a predefined intra frame repetition period has expired; or 3) a feedback received from a receiving terminal indicating a request to provide a frame in an intra-coded format.

The operation of the encoder 100 in the intra-coding mode will now be described. In intra-coding mode, control manager 160 operates switch 102 to accept video input from input line 118. The video signal input is received macroblock by macroblock, and the blocks of luminance and chrominance values that make up each macroblock are passed to a DCT transform block 104. Here, a 2-dimensional discrete cosine transform is performed and a 2-dimensional DCT coefficient array is formed for each block.

The DCT coefficients for each block are passed to a quantizer 106 where they are quantized using a quantization parameter QP. The selection of the quantization parameter QP is controlled by the control manager 160 via control line 115.

In more detail, quantization of DCT coefficients is performed by dividing each coefficient value by a quantization parameter QP and rounding the result to the nearest integer. In this manner, the quantization process produces a set of quantized DCT coefficient values that have a lower numerical precision than the values originally generated by the DCT transform block 104. Thus, in general, each quantized DCT coefficient may be represented by a smaller number of data bits than the number of data bits required to represent the corresponding coefficient before quantization. Furthermore, some DCT coefficients are reduced to zero by the quantization process, thereby reducing the number of coefficients that must be encoded. Both of these effects result in a reduction in the amount of data (i.e., data bits) required to represent the DCT coefficients of an image block. Thus, the quantization provides a further mechanism by which the amount of data required to represent each video sequence image can be reduced. This also introduces an irreversible loss of information, which results in a corresponding reduction in image quality. Although this reduction in image quality may not always be desirable, the quantization of the DCT coefficient values provides the possibility to adjust the number of bits required to encode the video sequence in order to take into account, for example, the bandwidth available for encoding sequence transmission or the desired quality of the encoded video. More specifically, by increasing the QP value used to quantize the DCT coefficients, a lower quality but more compressed video sequence description results. Conversely, by reducing the QP value, a higher quality but less compressed encoded bitstream can be formed.

The quantized DCT coefficients for each block are passed from quantizer 106 to video multiplex coder 170, as indicated by line 125 in fig. 3. Video multiplex encoder 170 uses a zigzag scanning procedure to order the transform coefficients for each block, thereby converting the two-dimensional array of quantized transform coefficient value sets into a one-dimensional array. In general, video multiplex encoder 170 next represents each non-zero quantized coefficient in the one-dimensional array with a pair of values, referred to as a level (level) and a run (run), where a level is a quantized coefficient value and a run is the number of consecutive zero-valued coefficients preceding the coefficient. Run and level values are further compressed using entropy coding. For example, a method such as Variable Length Coding (VLC) may be used to generate a set of variable length codewords representing each (run, level) pair.

Once the (run, level) pairs have been entropy (e.g., variable length) encoded, the video multiplex encoder 170 further combines them with control information that has also been entropy encoded (e.g., using a variable length encoding method appropriate for such information) to form a single compressed encoded image information bitstream 135. It is a bitstream comprising variable length codewords representing (run, level) pairs and in particular control information associated with quantization parameters QP for quantizing DCT coefficients transmitted from the encoder.

A locally decoded version of the macroblock is also formed in the encoder 100. This process is accomplished by passing the quantized transform coefficients of each block output by quantizer 106 through inverse quantizer 108 and applying an inverse DCT transform in inverse transform block 110. The inverse quantization is performed by inverting the quantization operation performed in the quantizer 106. More specifically, the inverse quantizer 108 attempts to recover the original DCT coefficients for a given image block by multiplying each quantized DCT coefficient value by the quantization parameter QP. Due to the rounding operation performed as part of the quantization process in quantizer 106, it is often not possible to accurately recover the original DCT coefficient values. This results in a difference between the recovered DCT coefficient values and those DCT coefficient values originally generated by DCT transformation block 104 (so that irreversible information loss refers to the difference).

The operations performed by the inverse quantizer 108 and the inverse transform block 110 produce a reconstructed pixel array for each block of the macroblock. The resultant decoded image data is input to the merger 112. In intra-coding mode, switch 114 is set such that the input to combiner 112 via switch 114 is zero. In this way, the operation performed by the merger 112 is equivalent to passing decoded image data unchanged.

When a subsequent macroblock of the current frame is received and subjected to the aforementioned encoding and local decoding steps in blocks 104, 106, 108, 110, and 112, a decoded form of the intra-coded frame is established in the frame memory 120. When the last macroblock of the current frame has been intra-coded and subsequently decoded, the frame store 120 contains a fully decoded frame that can be used as a prediction reference frame in encoding a subsequently received image frame in an inter-coded format. Line 122 provides a flag indicating an intra or inter coding format.

The inter-coding mode operation of the encoder 100 will now be described. In the inter-coding mode, control manager 160 operates switch 102 to receive its input from line 117, which includes the output of combiner 116. The combiner 116 receives the video input signal from the input 101 macroblock by macroblock. When the combiner 116 receives blocks of luminance and chrominance values that make up a macroblock, it forms corresponding blocks of prediction error information. The prediction error information represents the difference between the block and its prediction generated in the motion compensated prediction block 150. More specifically, the prediction error information for each block of a macroblock comprises a two-dimensional array of values, each representing the difference between the pixel values in the block of luminance or chrominance information being encoded and the decoded pixel values obtained by forming the block motion compensated prediction according to the procedure described below.

The prediction error information for each block of the macroblock is passed to the DCT transformation block 104, and the DCT transformation block 104 performs a two-dimensional discrete cosine transform on each block of prediction error values to produce a two-dimensional array of DCT transformed coefficients for each block.

The transform coefficients for each prediction error block are passed to quantizer 106 where they are quantized using a quantization parameter QP in a manner similar to that described above with respect to the encoder intra-coding mode operation. Likewise, the selection of the quantization parameter QP is controlled by the control manager 160 via the control line 115. The accuracy of the prediction error coding can be adjusted according to the available frequency band and/or the required quality of the coded video. In a typical Discrete Cosine Transform (DCT) based system, this is done by varying the Quantizer Parameter (QP) used to quantize the DCT coefficients to a particular accuracy.

The quantized DCT coefficients representing the prediction error information for each block of the macroblock are passed from the quantizer 106 to the video multiplex coder 170 as indicated by line 125 in fig. 3. As in the intra coding mode, the video multiplex encoder 170 uses a zigzag scanning procedure to order the transform coefficients of each prediction error block and then represents each non-zero quantized coefficient as a (run, level) pair. It also compresses (run, level) pairs using entropy coding in a similar manner as described above with respect to intra-coded modes. The video multiplex encoder 170 also receives motion vector information (described below) from the motion field coding block 140 via line 126 and control information (e.g., including an indication of the quantized parameter QP) from the control manager 160. It entropy encodes the motion vector information and control information and forms a single bitstream 135 of encoded image information, which includes the entropy encoded motion vectors, prediction errors, and control information. An indication of the quantized parameter QP, qz, is provided to a multiplex encoder 170 via line 124.

The quantized DCT coefficients representing the prediction error information for each block of the macroblock are also passed from quantizer 106 to inverse quantizer 108. Here, they are inverse quantized in a manner similar to that described above with respect to the intra-coding mode operation of the encoder. In inter-coding mode, the quality of the coded video bitstream and the number of bits required to represent a video sequence can be adjusted by varying the quantization level applied to the DCT coefficients representing the prediction error information.

The resulting inverse block of quantized DCT coefficients is applied to an inverse DCT transformation block 110 where they are subjected to an inverse DCT transformation to produce a block of locally decoded prediction error values. The local decoded prediction error value block is then input to the combiner 112. In the inter-coding mode, the switch 114 is set so that the combiner 112 also receives the predicted pixel values for each block of the macroblock generated by the motion compensated prediction block 150. The merger 112 merges each local decoded block of prediction error values with a corresponding block of predicted pixel values to produce reconstructed image blocks and stores them in the frame memory 120.

When a subsequent macroblock of the video signal is received from the video source and subjected to the aforementioned encoding and decoding steps in blocks 104, 106, 108, 110, 112, a decoded form of the frame is established in the frame store 120. When the last macroblock of the frame has been processed, the frame store 120 contains a fully decoded frame that can be used as a prediction reference frame in encoding a subsequently received image frame in an inter-coded format.

The prediction format of one macroblock of the current frame will now be described. Any frame encoded in an inter-coded format requires a reference frame for motion compensated prediction. This necessarily means that when encoding a video sequence, the first frame to be encoded, whether it be the first frame in the sequence or other frames, must be encoded in intra-frame coding mode. This in turn means that when the video encoder 100 is switched by the control manager 1160 to the inter-code encoding mode, a complete reference frame formed by locally decoding the previously encoded frame is already available in the encoder's frame memory 120. In general, a reference frame is formed by locally decoding an intra-coded frame or an inter-coded frame.

The first step in forming a prediction for a macroblock of the current frame is performed by motion estimation block 1130. Motion estimation block 130 receives blocks of luminance and chrominance values that make up the current macroblock of the frame encoded via line 128. It then performs a block matching operation to identify an area in the reference frame that substantially corresponds to the current macroblock. To perform the block matching operation, the motion estimation block accesses reference frame data stored in the frame memory 120 via line 127. More specifically, the motion estimation block 130 performs block matching by calculating a difference (e.g., a sum of absolute differences) that represents a difference in pixel values between the macroblock under examination and a best matching pixel region from one of the reference frames stored in the frame memory 120. A difference value is generated for the candidate regions at all possible positions within the predefined search area of the reference frame and the motion estimation block 130 determines the calculated minimum difference value. The offset between the macroblock in the current frame and the candidate block of pixel values in the reference frame that yields the smallest difference value defines the motion vector for the macroblock.

Once motion estimation block 130 has generated a motion vector for the macroblock, it outputs the motion vector to motion field coding block 1140. Motion field coding block 140 is approximately equal to the motion vectors received from motion estimation block 130 using a motion model that includes a set of basis functions and motion coefficients. More specifically, motion field coding block 140 represents a motion vector as a set of motion coefficient values that when multiplied by a basis function form an approximation of the motion vector. Typically only one translational motion model of the two motion coefficients and the basis function is used, but more complex motion models may be used.

The motion coefficients are transmitted from the motion field coding block 1140 to the motion compensated prediction block 150. The motion compensated prediction block 150 also receives the best match candidate region of pixel values identified by the motion estimation block 130 from the frame memory 120. Using the approximate representation of the motion vectors generated by motion field coding block 140 and the best match candidate region pixel values from the reference frame, motion compensated prediction block 150 generates an array of predicted pixel values for each block of the macroblock. Each block of predicted pixel values is passed to a combiner 116 where the predicted pixel values are subtracted from the actual (input) pixel values in the corresponding block of the current macroblock to form a set of prediction error blocks for the macroblock.

The operation of the video decoder 200 as shown in fig. 2 will now be described. The decoder 200 includes: a video multiplex decoder 270 that receives an encoded video bitstream 135 from the encoder 100 and demultiplexes it into its constituent parts; an inverse quantizer 210; an inverse DCT transformer 220; a motion compensated prediction block 240; a frame memory 250; a combiner 230; a control manager 260 and an output 280.

The control manager 260 controls the operation of the decoder 200 in response to an intra-or inter-coded frame being decoded. An intra/inter trigger control signal that causes the decoder to switch between decoding modes is derived, for example, from picture type information associated with each compressed video frame received from the encoder. The intra/inter trigger control signals are extracted from the encoded video bitstream by video mux decoder 270 and passed to control manager 260 via control line 215.

The decoding of the intra-coded frame is performed macroblock by macroblock. Video multiplex decoder 270 separates the coding information for each block of the macroblock from possible control information related to the macroblock. The coding information for each block of an intra-coded macroblock includes variable length codewords that represent VLC coded levels and run values of non-zero quantized DCT coefficients for the block. The video multiplex decoder 270 decodes the variable-length codeword using one variable-length decoding method corresponding to the encoding method used in the encoder 100 and thus restores (run, level) pairs. It then reconstructs the array of quantized transform coefficient values for each block of the macroblock and passes them to inverse quantizer 210. Any control information associated with the macroblock is also decoded in the video multiplex decoder using an appropriate decoding method and passed to the control manager 260. In particular, information regarding the quantization level applied to the transform coefficients (i.e., the quantization parameter QP) is extracted from the encoded bitstream by the video multiplex decoder 270 and provided to the control manager 260 via the control line 217. The control manager then passes this information to the inverse quantizer 210 via control line 218. The inverse quantizer 210 inverse-quantizes the quantized DCT coefficients for each block of the macroblock based on control information related to the quantization parameter QP and provides the now inverse-quantized DCT coefficients to the inverse DCT converter 220. The inverse quantization operation performed by the inverse quantizer 210 is the same as the operation performed by the inverse quantizer 108 in the encoder.

Inverse DCT transformer 220 performs an inverse DCT transform on the inverse quantized DCT coefficients for each block of macroblocks to form a block of decoded image information that includes reconstructed pixel values. The reconstructed pixel values for each block of the macroblock are passed via the merger 230 to the decoder's video output 280 where they can be provided, for example, to a display device (not shown). Reconstructed pixel values for each block of the macroblock are also stored in the frame memory 250. Because motion compensated prediction is not used in the encoding/decoding of intra-coded macroblocks, control manager 260 thus controls merger 230 to pass each block of pixel values to video output 280 and frame store 250. When subsequent macroblocks of an intra-coded frame are decoded and stored, a decoded frame is progressively assembled in the frame memory 250 and thereby becomes available for use as a reference frame for motion compensated prediction in connection with the decoding of subsequently received inter-coded frames.

Inter-coded frames are also decoded macroblock by macroblock. The video multiplex decoder 270 receives the encoded video bitstream 135 and separates the encoded prediction error information for each block of an inter-coded macroblock from the encoded motion vector information and possibly control information relating to the macroblock. As explained above, the encoded prediction error information for each block of a macroblock comprises variable length codewords representing the level and run values of the non-zero quantized transform coefficients of said prediction error block. The video multiplex decoder 270 decodes the variable-length codeword using one variable-length decoding method corresponding to the encoding method used in the encoder 100 and thus restores (run, level) pairs. It then reconstructs the array of quantized transform coefficient values for each prediction error block and passes them to the inverse quantizer 210. Any control information associated with inter-coded macroblocks is also decoded in the video multiplex decoder 270 using an appropriate decoding method and passed to the control manager 260. Information (QP) relating to the quantization level applied to the transform coefficients of the prediction error block is extracted from the encoded bitstream and provided to the control manager 260 via control line 217. The control manager then passes this information to the inverse quantizer 210 via control line 218. The inverse quantizer 210 inversely quantizes the quantized DCT coefficients representing prediction error information for each block of the macroblock based on control information related to the quantization parameter QP and supplies the now inversely quantized DCT coefficients to the inverse DCT transformer 220. Also, the inverse quantization operation performed by the inverse quantizer 210 is the same as the operation performed by the inverse quantizer 108 in the encoder. Intra/inter markers are provided in line 215.

The inverse quantized DCT coefficients representing the prediction error information for each block are then inverse transformed in an inverse DCT transformer 220 to produce an array of reconstructed prediction error values for each block of the macroblock.

The encoded motion vector information associated with the macroblock is extracted from the encoded video bitstream 135 and decoded by the video mux decoder 270. The decoded motion vector information thus obtained is passed via the control line 225 to the motion compensated prediction block 240, which uses the same motion model as used in the encoder 100 to encode inter-coded macroblocks to reconstruct a motion vector for the macroblock. The reconstructed motion vector is approximately equal to the motion vector originally determined by the motion estimation block 130 of the encoder. The motion compensated prediction block 240 of the decoder uses the reconstructed motion vectors to identify a reconstructed pixel region in one of the prediction reference frames stored in the frame store 250. The pixel regions indicated by the reconstructed motion vectors are used to form a prediction of the macroblock. More specifically, the motion compensated prediction block 240 forms an array of pixel values for each block of a macroblock by copying corresponding pixel values from pixel regions identified in a reference frame. These blocks of pixel values derived from the reference frame are passed from the motion compensated prediction block 240 to the merger 230 where they are merged with the decoded prediction error information. In practice, the pixel values of each prediction block are added to the corresponding reconstructed prediction error values output by the inverse DCT transformer 220. In this way an array of reconstructed pixel values for each block of the macroblock is obtained. The reconstructed pixel values are passed to the video output 280 of the decoder and also stored in the frame memory 250.

When subsequent macroblocks of an intra-coded frame are decoded and stored, a decoded frame is progressively assembled in the frame memory 250 and thereby becomes available for use as a reference frame for motion compensated prediction of other inter-coded frames.

As described above, typical video encoding and decoding systems (often referred to as video codecs) are based on motion compensated prediction and prediction error coding. Motion compensated prediction is obtained by using motion information to analyze and encode motion between a video frame and a reconstructed image segment. Prediction error coding is used to code the difference between a motion compensated image segment and the corresponding segment of the original image. The accuracy of the prediction error coding can be adjusted according to the available bandwidth and the required quality of the coded video. In a typical Discrete Cosine Transform (DCT) -based system, this is done by changing the Quantization Parameter (QP) used to quantize the DCT coefficients to a particular precision.

It should be noted that in order to maintain synchronization with the encoder, the decoder must know the precise value of QP used in the encoded video sequence. Usually, the QP value is transmitted once per slice, resulting in an increase in the number of bits required to encode the picture. (as explained earlier, one slice contains a part of the image and is coded independently from the other slices to avoid possible transmission error growth inside the picture). For example, if the encoding of a single QP value takes 6 bits and 20 pictures are transmitted per second (each divided into 10 slices), the QP information consumes 1.2kbps alone.

Prior art solutions (e.g., h.26l Video coding recommendation set forth in document "Joint Model Number 1" (doc. JVT-a003, Joint Video Team (JVT) of ISO/IEC MPEG and ITU-T VCEG, 2002, 1 month) by t.wiegand) encode the picture/slice QP parameters independently in one fixed or variable length code. As described above, this results in an increase in the transmission bit rate. More specifically, according to the H.26L Union Model 1, the quantization parameter value QP used in quantizing the DCT coefficient values is typically indicated in the encoded bitstream at the beginning of each picture (see T.Wiegand "Joint Model Number 1" (doc. JVT-A003, Joint Video Team (JVT) of ISO/IEC MPEG and ITU-T VCEG, 1 month 2002, section 3.3.1)). If macroblocks within a frame are arranged into slices, the QP value is also indicated at the beginning of each slice of the frame (e.g., in an optional slice header portion of the encoded bitstream). In both cases, the QP value is indicated as such or encoded using an appropriate variable length coding scheme. As described above, it should be understood that: this scheme is expensive in terms of the number of bits required to represent the quantization parameter information, especially if the frame is divided into many segments and/or the available bandwidth available for transmission of the encoded video sequence is low. This is a particularly important problem in motion video applications where the encoded video bit stream is transmitted over a radio communication link. In this case, the bandwidth available for transmitting the encoded video bitstream may be as low as 20kbits/s and the QP information included in the bitstream may represent a significant proportion of the total available bandwidth.

Furthermore, according to H.26L, by inserting a quantizer variation parameter (Dquant) in the coded bitstream portion representing the macroblock, the QP value can be arbitrarily varied at macroblock level (see T.Wiegand "Joint Model Number 1", doc.JVT-A003, Joint Video Team (JVT) of ISO/IEC MPEGand ITU-T VCEG, 1 month 2002, section 3.4.7). This leads to a further increase in the amount of information put into the QP related information indication in the encoded bitstream.

From the foregoing, it should be apparent that there is a significant need for an improved mechanism for indicating information related to quantization parameter values in a video coding system.

Disclosure of Invention

The present invention improves on prior art solutions for indicating QP related information by introducing one sequence level. This allows the encoder application to decide the video sequence dependent reference QP to use in picture/slice QP coding. Now, according to the present invention, instead of encoding the absolute value of the picture/slice QP, it is sufficient to encode only the difference between the reference sequence QP and the actually used picture/slice QP. In this way, there is no need to transmit a full QP for every picture/slice, but rather a statistically smaller difference value is transmitted and used to reconstruct the picture/slice QP, thus resulting in a reduced transmission bit rate.

The bit rate savings are most evident with constant QP. In this case, it is sufficient that only one bit is sent per slice to indicate that the sequence QP is to be used when decoding the slice. For example, in the previously described example, the bit rate of the QP was reduced from 1.2kbps to 0.2kbps (instead of sending six bits per fragment, only one bit now needs to be sent).

According to a first aspect of the present invention, there is provided a method of encoding a digital video sequence, the method being applied in a video encoding application to generate an encoded video bitstream representing the digital video sequence. A digital video sequence comprises a number of frames and is divided into a number of blocks, wherein each frame of the sequence comprises an array of pixels and each block comprises a certain number of pixels. The method comprises the following steps: a frame of a digital video sequence is encoded by applying motion compensated prediction to a block of pixels, thereby generating a corresponding block of prediction error values. A transform coding technique is applied to the block of prediction error values to produce a set of transform coefficient values representing the block of prediction error values and a quantization step is applied to the set of transform coefficient values to produce a set of quantized transform coefficient values. According to the invention, the method further comprises: a default quantization level is defined that is used throughout the encoding of the digital video sequence to quantize the set of transform coefficient values.

Advantageously, the method according to the first aspect of the invention is also used for: the Quantization Parameter (QP) value used to quantize the transform coefficient value group representing the pixel value block generated for the frame encoded in the intra coding mode is indicated in a manner similar to that described above for indicating the quantization parameter value used to quantize the transform coefficient value group representing the prediction error value generated for the frame encoded in the inter coding mode.

Advantageously, the default quantization level used throughout the encoding of the digital video sequence is specific to the encoded video sequence. Alternatively, the default quantization level is specific to the video coding application.

Preferably, an indication of a default quantization level to be used throughout the encoding of the digital video sequence is provided. More preferably, a default quantizer scale indication is provided in the encoded bitstream representing the digital video sequence. Advantageously, the encoded bitstream, including the default quantizer scale indication used throughout the encoding of the digital video sequence, is transmitted to a video decoding device.

Advantageously, the default quantization step used to quantize the set of transform coefficient values throughout the digital video sequence encoding process may be updated during the digital video sequence encoding and a further indication is provided that the default quantization step has been updated.

Preferably, the indication that the default quantisation level has been updated is advantageously transmitted to a video decoding device in an encoded bitstream representing a digital video sequence.

Advantageously, the quantization step applied to said set of transformed coefficient values may be adjusted such that the actual quantization step applied to the set of transformed coefficient values is different from the default quantization step used throughout the encoding of the digital video sequence. Preferably, the actual quantization level applied is represented as a difference with respect to the default quantization level. Advantageously, an indication of the difference value with respect to the default quantization level is provided in an encoded bitstream representing the digital video sequence.

In an embodiment of the video encoding method according to the first aspect of the present invention, the quantization step applied to the set of transform coefficient values may be adjusted from one frame of the digital video sequence to another frame such that the actual quantization step applied to a particular frame of the digital video sequence is different from the default quantization step used throughout the encoding of the digital video sequence. Advantageously, in this embodiment, the actual level of quantization used in the particular frame is represented as a difference relative to the default level of quantization and an indication of the difference relative to the default level of quantization is provided in the encoded bitstream representing the digital video sequence.

In an alternative embodiment of the video encoding method according to the first aspect of the present invention, a frame of the digital video sequence is divided into a plurality of blocks, said plurality of blocks being grouped into one or more segments and the quantization levels applied to said sets of transform coefficient values being adjustable from one segment of a frame to another such that the actual quantization level applied to a set of transform coefficient values for a particular segment of a frame is different from the default quantization level used throughout the digital video sequence encoding process. Advantageously, in an alternative embodiment, the actual quantization level used in the particular segment is represented as a difference value with respect to the default quantization level and an indication of the difference value with respect to the default quantization level is provided in the coded bitstream representing the digital video sequence.

Advantageously, if the default quantization level is to be used for quantizing all sets of transform coefficient values throughout the digital video sequence, a default quantization level indication is provided together with a further indication that said default quantization level is to be used for quantizing all sets of transform coefficient values throughout the digital video sequence.

According to a second aspect of the present invention there is provided a method of decoding an encoded digital video sequence, the method being applied in a video decoding application to produce a decoded digital video sequence. A digital video sequence comprising a number of frames and being divided into a plurality of blocks, wherein each frame of said sequence comprises an array of pixels, each block comprising a certain number of pixels, is encoded by applying a motion compensated prediction to a block of pixels to generate a corresponding block of prediction error values and applying a transform coding technique to said block of prediction error values to generate a set of transform coefficient values representing said block of prediction error values and applying a quantization level to said set of transform coefficient values to generate a set of quantized transform coefficient values representing said block of prediction error values. According to the invention, the decoding method further comprises: a default inverse quantization step is defined that is used throughout the decoding of the digital video sequence to inverse quantize the set of quantized transform coefficient values.

Advantageously, the default inverse quantization level is the same as a default quantization level used for quantizing a set of transform coefficient values during encoding of the video sequence.

Advantageously, the default inverse quantization level defined to be used during decoding of the entire encoded digital video sequence is specific to the encoded video sequence being decoded. Alternatively, the default inverse quantization level is specific to the video decoding application.

Advantageously, the decoding method comprises reproducing an indication of the default inverse quantization level, preferably from a bitstream representing the encoded digital video sequence.

Advantageously, the default inverse quantization level may be updated during decoding of the digital video sequence. Preferably, the updating of the default inverse quantisation level is performed in response to an updated default quantisation level indication used during encoding of the video sequence reproduced from a bitstream representing the encoded digital video sequence. Alternatively, the updating of the default inverse quantization level is performed in response to an updated default quantization level indication transmitted from a video encoding device for use during encoding of the video sequence.

Advantageously, the inverse quantization step applied to the set of quantized transform coefficient values may be adjusted such that an actual inverse quantization step applied to the set of quantized transform coefficient values is different from a default inverse quantization step used during decoding of the entire encoded digital video sequence. In this case, the actual inverse quantization level is determined by adding a difference value to the default inverse quantization level, the difference value representing the difference between the default inverse quantization level and the actual inverse quantization level applied. Preferably, a difference indication is reproduced from a bit stream representing the encoded digital video sequence.

In an embodiment of the video decoding method according to the second aspect of the present invention, the inverse quantization step applied to the set of quantized transform coefficient values may be adjusted from one frame of the digital video sequence to another frame such that the actual inverse quantization step applied to a quantized transform coefficient set of a particular frame of the digital video sequence is different from the default inverse quantization step used during the decoding of the entire digital video sequence. Advantageously, in this embodiment, the actual inverse quantisation level used in a particular frame is determined by adding a frame-specific difference value to the default inverse quantisation level, the frame-specific difference value representing the difference between the default inverse quantisation level and the actual inverse quantisation level used in that particular frame. Preferably, an indication of the frame-specific difference values is reproduced from a bitstream representing the encoded digital video sequence.

In an alternative embodiment of the video decoding method according to the second aspect of the present invention, a frame of the digital video sequence is divided into a plurality of blocks, the plurality of blocks being grouped into one or more segments and the inverse quantization level applied to the set of quantized transform coefficient values being adjustable from one segment of a frame to another segment such that the actual inverse quantization level applied to a set of quantized transform coefficient values of a particular segment of a frame is different from the default inverse quantization level used throughout the decoding of the digital video sequence. Advantageously, in an alternative embodiment, the actual inverse quantization level used in a particular segment is determined by adding a segment-specific difference to the default inverse quantization level, said segment-specific difference representing the difference between the default inverse quantization level and the actual inverse quantization level used in the particular segment. Preferably, an indication of the segment-specific difference value is reproduced from a bitstream representing the encoded digital video sequence.

According to a third aspect of the present invention there is provided an encoder for encoding a digital video sequence to produce an encoded video bitstream representing the digital video sequence, wherein the digital video sequence comprises a number of frames, each frame of the sequence comprising an array of pixels and being divided into a plurality of blocks, each block comprising a certain number of pixels. The video encoder according to the third aspect of the invention is arranged to encode a frame of the digital video sequence by applying motion compensated prediction to blocks of pixels to generate corresponding blocks of prediction error values. This is further arranged to apply a transform coding technique to the block of prediction error values to produce a set of transform coefficient values representing the block of prediction error values and to apply a quantization step to the set of transform coefficient values to produce a set of quantized transform coefficient values. According to the invention, the video encoder is further arranged to define a default quantization step to be used for quantizing the set of transform coefficient values throughout the encoding of the digital video sequence.

Advantageously, the encoder according to the third aspect of the invention is further arranged to: the Quantization Parameter (QP) values arranged to quantize the set of transform coefficient values representing the block of pixel values generated for the frame encoded in the intra coding mode are indicated in a manner similar to that described above for indicating the quantization parameter values arranged to quantize the set of transform coefficient values representing the prediction error values generated for the frame encoded in the inter coding mode.

Advantageously, the default quantization level defined by the video encoder is specific to the encoded video sequence.

Advantageously, the video encoder is further arranged to provide an indication of said default quantization level in a bitstream representing the digital video sequence. More advantageously, it is arranged to transmit the encoded bit stream to a corresponding video decoder.

Advantageously, the video encoder is further arranged to update the default quantization level during encoding of the digital video sequence and to provide an indication that the default quantization level has been updated. Preferably, the encoder is further arranged to transmit an updated default quantisation level indication to a respective video decoder. Advantageously, the encoder includes the updated default quantizer scale indication in an encoded bitstream representing the digital video sequence.

Preferably, the video encoder is further arranged to adjust a quantization level applied to the set of transformed coefficient values and is thereby arranged to apply an actual quantization level different from the default quantization level to the set of transformed coefficient values. Preferably, the video encoder is further arranged to represent the actual quantisation level as a difference relative to the default quantisation level and to provide an indication of the difference relative to the default quantisation level in the encoded bitstream representing the digital video sequence.

In an embodiment the video encoder according to the third aspect of the invention is arranged to adjust the quantization level applied to the set of transform coefficient values from one frame of the digital video sequence to another frame. In this way it is arranged to apply an actual quantization level to the set of transform coefficients of a particular frame that is different from the default quantization level used during the encoding of the entire digital video sequence. Advantageously, the video encoder according to this embodiment is further arranged to represent the actual quantization level used in the particular frame as a difference value with respect to the default quantization level and to provide an indication of the difference value with respect to the default quantization level in the encoded bitstream representing the digital video sequence.

In an alternative embodiment, the video encoder according to the third aspect of the invention is further arranged to: grouping blocks into which a frame of the digital video sequence is divided into one or more segments and adjusting a quantization level applied to the set of transform coefficient values from one segment of the frame to another. In this way, the video encoder is arranged to apply an actual quantization level to the set of transform coefficients of a particular segment of a frame that is different from the default quantization level used during the encoding of the entire digital video sequence. Advantageously, the video encoder according to the alternative embodiment is further arranged to represent the actual quantization level used in the particular segment as a difference value with respect to the default quantization level and to provide an indication of the difference value with respect to the default quantization level in the encoded bitstream representing the digital video sequence.

In a particular embodiment, the video encoder is arranged to provide an indication of a default quantization level and an indication that a default level is used to quantize all sets of transform coefficient values throughout the digital video sequence.

Advantageously, the video encoder according to the third aspect of the invention is provided in a multimedia terminal. More preferably, it is implemented in a wireless telecommunication device.

According to a fourth aspect of the present invention, there is provided a decoder for decoding an encoded digital video sequence to produce a decoded video bitstream. A digital video sequence comprising a number of frames and being divided into a plurality of blocks, wherein each frame of said sequence comprises an array of pixels, each block comprising a certain number of pixels, a frame of the digital video sequence being encoded by applying a motion compensated prediction to a block of pixels to generate a corresponding block of prediction error values and applying a transform coding technique to said block of prediction error values to generate a set of transform coefficient values representing said block of prediction error values and applying a quantization level to said set of transform coefficient values to generate a set of quantized transform coefficient values representing said block of prediction error values. According to the invention, the video decoder is further arranged to define a default inverse quantization stage, which is used during decoding of the entire digital video sequence to inverse quantize the set of quantized transform coefficient values.

Preferably, the default inverse quantization step is the same as the default quantization step used to quantize the set of transform coefficient values during encoding of the video sequence.

Advantageously, the default inverse quantization level defined to be used during decoding of the entire encoded digital video sequence is specific to the encoded video sequence being decoded.

Advantageously, the video decoder is arranged to reproduce an indication of the default inverse quantization level from a bitstream representing the encoded digital video sequence.

Advantageously, the video decoder is arranged to: the default inverse quantization level is preferably updated during decoding of the digital video sequence by reproducing an indication of the updated default quantization level from a bitstream representing the encoded digital video sequence. Alternatively, the video decoder is arranged to receive an updated default quantisation level indication transmitted from the video encoding device.

Preferably, the video decoder is arranged to adjust an inverse quantization step applied to the set of quantized transform coefficient values and is arranged to apply an actual inverse quantization step different from the default inverse quantization step to the set of quantized transform coefficient values. Advantageously, the decoder is arranged to determine the actual inverse quantised by adding a difference to the default inverse quantisation level, the difference being indicative of the difference between the default inverse quantisation level and the actual inverse quantisation level applied. Advantageously, the video decoder is arranged to reproduce an indication of the difference value from a bitstream representing the encoded digital video sequence.

In an embodiment, a video decoder according to the fourth aspect of the present invention is arranged to adjust the inverse quantization level applied to the set of quantized transform coefficient values from one frame of the digital video sequence to another frame and is arranged to apply an actual inverse quantization level to the set of quantized transform coefficients of a particular frame of the digital video sequence that is different from a default inverse quantization level used during decoding of the entire encoded digital video sequence. Advantageously, the decoder is arranged to: the actual inverse quantization level used in a particular frame is determined by adding a frame-specific difference to the default inverse quantization level, the frame-specific difference representing the difference between the default inverse quantization level and the actual inverse quantization level used in the particular frame. Preferably, the video decoder is arranged to reproduce an indication of the frame-specific difference values from a bitstream representing the encoded digital video sequence.

In an alternative embodiment, the video decoder according to the fourth aspect of the invention is arranged to decode an encoded video sequence, wherein: a frame of the digital video sequence is divided into a plurality of blocks grouped into one or more segments, and the video decoder is further arranged to adjust an inverse quantization level applied to a set of quantized transform coefficient values from one segment of a frame to another segment and to apply an actual inverse quantization level different from a default inverse quantization level used during decoding of the entire encoded digital video sequence to a set of quantized transform coefficients of a particular segment of a frame. Preferably, the decoder is arranged to: the actual inverse quantization level used in a particular segment is determined by adding a segment-specific difference to the default inverse quantization level, the segment-specific difference representing the difference between the default inverse quantization level and the actual inverse quantization level used in the particular segment. Preferably, the video decoder is arranged to reproduce an indication of the segment-specific difference values from a bitstream representing the encoded digital video sequence.

According to a fifth aspect of the invention there is provided a multimedia terminal comprising an encoder according to the third aspect of the invention.

According to a sixth aspect of the present invention there is provided a multimedia terminal comprising a decoder according to the fourth aspect of the present invention.

Preferably, the multimedia terminal according to the fifth and/or sixth aspect of the invention is a mobile multimedia terminal arranged to communicate with a mobile telecommunications network using a radio connection.

Drawings

Fig. 1 illustrates a 16 x 16 macroblock format according to the prior art.

Fig. 2 illustrates a QCIF picture subdivided into 16 × 16 macroblocks and grouping of consecutive macroblocks into slices.

Fig. 3 is a schematic block diagram of a generic video encoder according to the prior art.

Fig. 4 is a schematic block diagram of a generic video decoder according to the prior art and corresponding to the encoder shown in fig. 3.

Fig. 5 is a schematic block diagram of a video encoder according to an embodiment of the present invention.

Fig. 6 is a schematic block diagram of a video decoder according to an embodiment of the present invention and corresponding to the encoder shown in fig. 5.

Fig. 7 illustrates a decoding process according to one possible embodiment of the invention. By adding the Sequence QP (SQP) to the slice-specific difference QP value (Δ QP)ⁿ) To obtain a Quantization Parameter (QP) for each slice.

Fig. 8 is a schematic block diagram of a multimedia communication terminal in which the method according to the invention can be performed.

Detailed Description

In a preferred embodiment of the present invention, a video sequence specific Quantization Parameter (QP) is transmitted and used as a reference when encoding and decoding the actual picture/slice quantization parameter. In this way, there is no need to transmit a full QP for every picture/slice, but rather a statistically smaller difference value is transmitted and used to reconstruct the picture/slice QP, thus resulting in a reduced transmission bit rate.

An embodiment of the present invention will now be described with reference to fig. 5 to 8.

Fig. 5 is a schematic block diagram of a video encoder 600 implemented in accordance with a preferred embodiment of the present invention. The structure of the video encoder as shown in fig. 5 is substantially the same as the structure of the prior art video encoder illustrated in fig. 3, except that appropriate modifications are made to those parts of the encoder that perform operations related to quantization of DCT transform coefficients and signaling of Quantization Parameter (QP) values used in the video encoding process. All those parts of the video encoder that perform functions and operate in the same manner as the aforementioned prior art video encoders are identified with identical reference numerals. Since the present invention relates in particular to the signaling of Quantization Parameter (QP) values at slice level or frame level, it will be assumed in the following description that the video encoder 600 according to a preferred embodiment of the present invention is particularly suitable for applying a video encoding method, wherein the method: frames of a video sequence to be encoded are divided into macroblocks and the macroblocks are further grouped into slices, the quantization parameter indication being provided at the start of each frame and at the start of each new slice within a frame. An example of such a Video coding method is the ITU-T H.26L Video coding recommendation cited previously, as described in "Joint Model Number 1" (doc. JVT-A003, Joint Video Team (JVT) of ISO/IEC MPEG and ITU-TVCEG, 1 month 2002) by T.Wiegand. Alternatively, the method may be applied in a video coding system, where the quantization parameter indication is only given at the beginning of a frame. Although the following detailed description is specifically written to illustrate the application of the method according to the invention in indication and signaling of slice-level quantization, it should be understood that: the method can be applied in a completely similar way to the representation of the frame (picture) level quantization parameter.

The operation of the video encoder 600 will now be considered in detail. When encoding a digital video sequence, encoder 600 operates in a manner similar to that previously described with respect to fig. 3 to generate intra-coded and inter-coded compressed video frames. As explained earlier in the text, in intra-coding mode, a Discrete Cosine Transform (DCT) is applied to each block of image data (pixel values) to produce a corresponding two-dimensional array of transform coefficient values. The DCT operation is performed in the transform block 104 and the resulting coefficients are then passed to a quantizer 106 where they are quantized. In the inter-coding mode, the DCT transform performed in block 104 is applied to the block of prediction error values. The transform coefficients produced as a result of this operation are also passed to quantizer 106, where they are also quantized.

In accordance with the present invention, when a new video sequence is to be encoded, encoder 600 determines the default or reference quantization level to be used throughout the encoding of the digital video sequence in order to quantize the DCT coefficient values generated in quantizer 106. Throughout the following description, this default or reference quantization level will be referred to as a "sequence level quantization parameter" or simply SQP. The selection of an SQP for a given video sequence is controlled by the control manager 660 and may be based, for example, on the properties of the sequence to be encoded and considerations of the available bandwidth of the encoded bitstream produced by the transmitting encoder.

In a preferred embodiment of the present invention, the encoder 600 determines the SQP as a default or reference quantization level to be used when operating in inter-coding mode (i.e., in the case when the DCT coefficients generated in the conversion block 104 represent prediction error values). It should be understood that: the method according to the invention can also be applied to the quantization of DCT coefficient values generated in conventional intra coding modes that do not utilize spatial prediction. However, given the different transform coefficient origins in intra and inter coding modes (the DCT coefficients generated in intra coding mode are derived from pixel values, while those generated in inter coding mode are generated by applying a DCT transform to the prediction error values), it is not necessarily possible to determine a single SQP value that is optimal for quantization of the DCT coefficients in both intra and inter coding modes. Thus, in embodiments where the method according to the invention is used in intra and inter coding modes, two SQP values are preferably used, one providing the most efficient representation of QP value information in intra coding modes and the other providing the most efficient representation of QP values used during inter coding. In all other aspects, the method according to the invention can be applied in a completely similar way in intra and inter coding modes. Of course, in an alternative embodiment, a single SQP value may be defined and used as a sequence level quantization parameter for both intra and inter coding modes. This is a practical approach, especially in modern video coding systems such as those described in "Joint Model Number" of T.Wiegand (Doc.JVT-A003, Joint Video Team (JVT) of ISO/IEC MPEG and ITU-T VCEG, 1.2002), where spatial prediction is applied to intra-coded macroblocks before they are coded with DCT intra-prediction errors.

It should also be noted that: since most frames in a typical video sequence are encoded as inter frames, the greatest savings in bit rate are achieved by applying the method according to the invention to QP value representation in inter coding mode. Thus, in a preferred embodiment of the present invention, a single SQP value is used, which represents a default or reference quantization value that is used in quantizing the DCT coefficients representing the prediction error value in inter-coding mode.

Having determined the SQP value to be used for the sequence, the control manager 660 provides an indication of the selected SQP value to the video multiplex encoder 670 via control line 122, and the video multiplex encoder 670 then inserts an indication of the SQP value into the bitstream 635 of encoded image information representing the video sequence. Preferably, this indication is provided in a sequence header portion of the encoded video bitstream 635.

The video encoder 600 then begins encoding the video sequence. As explained in connection with the description of the prior art video encoder 100 illustrated in fig. 3, the first frame of the sequence to be encoded is encoded in an intra-frame format. Because the SQP values defined according to the preferred embodiment of the present invention are specific to the quantization that produces DCT coefficient values in an inter-coding mode, the operation of encoder 600 in an intra-coding mode is completely similar to that of a prior art video encoder and will not be considered in further detail herein.

Once the first frame encoding is complete, the control manager 660 switches the video encoder into inter-coding mode. In the inter-coding mode, switch 102 is operative to receive its input from line 117, which includes the output of combiner 116. The combiner 116 receives the video input signal from the input 101 macroblock by macroblock and forms a block of prediction error information for each block of macroblocks. The prediction error information for each block is passed to the DCT transformer 104, and the DCT transformer 104 performs a two-dimensional discrete cosine transform on each block of prediction error values to produce a two-dimensional array of DCT transform coefficients for the block. The transform coefficients for each prediction error block are then passed to the quantizer 106 where they are quantized using a quantization parameter QP, as previously described. The remainder of the inter-coding process continues as previously described in relation to the prior art video encoder 100.

As each macroblock is received, the control manager 660 determines whether the macroblock currently being processed is the first macroblock of a slice. If so, the control manager determines that a quantization parameter value QP is to be used in quantizing the DCT coefficient values generated in the DCT transformer 104. It should be noted that: based on the allowed bit budget for that frame, bits already consumed in the previous slice or the same frame, and possibly bits consumed by slices in the previous frame, then a decision is possible with respect to QP. After this is done, the control manager 660 determines the difference (Δ QP) between the previously defined sequence level quantization parameter value SQP and the actual QP value used for the slice. It then passes an indication of this difference to video multiplex encoder 670 via control line 624, video multiplex encoder 670 also including an indication of the difference Δ QP in bitstream 635. Preferably, this indication is provided in a slice header portion of the encoded video bitstream 635, the encoded video bitstream 635 containing the slice-specific control information. This process is repeated until all segments of the current frame have been encoded in the inter-frame encoding format, at which point the video encoder begins encoding the next frame of the video sequence.

A video decoder 700 implemented according to a preferred embodiment of the present invention will now be described with reference to fig. 6. The structure of the video decoder illustrated in fig. 6 is substantially the same as that of the prior art video encoder shown in fig. 4, except that appropriate modifications are made to those parts of the decoder which perform operations related to the inverse quantization of the DCT-transformed coefficients. All those parts of the video decoder that perform functions and operate in the same way as the aforementioned prior art video decoders are identified with identical reference numerals.

It is assumed here that the video decoder of fig. 6 corresponds to the encoder described in connection with fig. 5 and is thus capable of receiving and decoding the bitstream 635 transmitted by the encoder 600. As previously mentioned, in the preferred embodiment of the present invention, the encoder 600 determines the sequence-level quantization parameter SQP used in the inter-coding mode. Accordingly, the decoder 700 is adapted to receive an indication of this SQP value and to use the sequence-level quantization parameter SQP in the determination of the dequantization parameter to be applied to a block of quantized transform coefficient values (representing a prediction error value) received in the encoded bitstream of an inter-coded frame. In an alternative embodiment of the invention, the same process may also be applied to quantized transform coefficient values extracted from the bitstream of an intra-coded frame. As explained above, in this alternative embodiment, an indication of two SQP values may be provided, one for intra-coded frames and one for inter-coded frames of the sequence. In yet another alternative embodiment, a single sequence level quantization parameter may be indicated for each frame encoded in intra and inter coding modes.

The operation of the video decoder according to the preferred embodiment of the present invention will now be described in detail. The decoder 700 receives the bit stream 635 and separates it into its component parts. This operation is performed by the video multiplex decoder 770.

When starting to decode a new sequence, the video mux decoder 770 first extracts information and parameters related to the sequence as a whole from the sequence header portion of the received bitstream 635. As explained above in relation to the description of the encoder 600, according to a preferred embodiment of the invention, the sequence header portion of the bitstream is modified to contain an indication of the sequence-level quantization parameter SQP used in the quantization of DCT coefficient values generated in the inter-coding mode. The video multiplex decoder extracts an indication of the SQP value from the bitstream and, if it is encoded, for example using variable length coding, applies appropriate decoding to recover the SQP value. It then passes the SQP value to the control manager 760 of the decoder, which stores it in the memory of the decoder.

The video decoder 700 then begins decoding the encoded frames of the video sequence, and as soon as the video decoder begins receiving information about the frames in the video bitstream 635, decoding of each frame begins. The video multiplex decoder 770 extracts an intra/inter trigger control signal from the picture type information associated with each compressed video frame received in the encoded bitstream 635. The control manager 760 controls the decoder operation in response to the intra/inter trigger control signal so that the decoder is switched into the correct decoding mode.

In the preferred embodiment of the present invention, the decoding of frames encoded in an intra-frame format is performed in a manner similar to that described above in relation to the operation of the prior art video decoder 200. On the other hand, the decoding of frames encoded in an inter-frame format continues as described below.

When it receives an indication extracted from the received bitstream by the video multiplex decoder 770 that the next frame to be decoded is an inter-coded frame, the control manager 760 switches the decoder 700 to inter mode. As explained in connection with the description of the encoder 600, according to a preferred embodiment of the invention, wherein: the macroblocks of each frame are grouped into slices, and the coded bitstream 635 includes some slice-specific control information, including an indication of the slice-specific QP value, which is expressed as a difference Δ QP relative to the sequence-level quantization parameter SQP. Advantageously, the control information specifically associated with each fragment is provided in the bitstream as a header portion specific to said fragment. Upon receiving such a bitstream portion, the video multiplex decoder extracts the slice-specific control information from the slice header portion of the bitstream and passes the Δ QP indication for the slice reproduced from the bitstream to the control manager 760 via control line 717.

Next, control manager 760 determines the inverse quantization level to be applied to the quantized DCT coefficients of the macroblocks in the slice. This is done by combining the delta QP of the slice with a sequence specific quantization parameter SQP that was previously received and stored in the decoder memory. As described earlier in the text, the inverse quantization operation performed in the decoder involves: each quantized DCT coefficient is multiplied by a value equal to the quantization level originally applied, i.e. by the QP value used in the corresponding encoder for quantizing the DCT coefficient. Therefore, according to the preferred embodiment of the present invention, the control manager 760 determines the inverse quantization level of the macroblocks of the slice by adding the received Δ QP value of the slice to the SQP. It then passes this value to inverse quantizer 210 via control line 218.

When the coding information for each macroblock of a slice is received in the bitstream 635, the video multiplex decoder 770 separates the coded prediction error information for each block of the macroblock from the coded motion vector information. It reconstructs the quantized DCT transform coefficients representing the prediction error values for each block and passes them to the inverse quantizer 210. Inverse quantizer 210 then inverse quantizes the quantized DCT coefficients according to the slice QP consisting of the delta QP and SQP values of control manager 760. It then provides the inverse quantized DCT coefficients to inverse DCT transformer 220. The remainder of the decoding process continues as previously described in connection with the prior art video decoder 200.

The steps of receiving a slice-specific delta QP value, merging the delta QP and SQP, and dequantizing the quantized DCT coefficients for each block of macroblocks within a slice are repeated for each slice of the frame until all slices of the current inter-coded frame have been decoded. At this point, the video encoder 700 begins decoding the next frame of the encoded video sequence.

Fig. 7 illustrates a method of reconstructing slice-specific QP values in accordance with a preferred embodiment of the present invention. As can be seen from this figure, the process comprises the following steps:

1. reproducing a sequence level quantization parameter (SQP);

2. reproducing a picture or slice level difference quantization parameter (Δ QP);

3. adding the difference quantization parameter to the sequence-level quantization parameter to obtain a quantization parameter for a picture or slice;

4. and constructing the received prediction error coding coefficient by using the picture or segment quantization parameter.

Fig. 8 provides a terminal device including video encoding and decoding devices that may be adapted to operate in accordance with the present invention. More specifically, this figure illustrates a multimedia terminal 80 implemented according to ITU-T recommendation H.324. The terminal can be considered a multimedia transceiver device. It comprises elements for capturing, encoding and multiplexing multimedia data streams for transmission via a communication network, and elements for receiving, demultiplexing, decoding and displaying received multimedia content. ITU-T recommendation h.324 defines the overall operation of a terminal and references other recommendations governing the operation of its various components. Such multimedia terminals may be used in real-time applications such as conversational video telephony or in non-real-time applications such as the reproduction and/or streaming of video clips from a multimedia content server on the internet.

In the context of the present invention, it should be understood that: the h.324 terminal shown in fig. 8 is only one of several alternative multimedia terminal implementations suitable for application of the method of the present invention. It should be noted that: several alternative solutions exist in relation to the location and implementation of the terminal equipment. As illustrated in fig. 8, the multimedia terminal may be located in a communication device connected to a fixed wired telephone network, such as an analog PSTN (public switched telephone network). In this case the multimedia terminal has a modem 91 compliant with ITU-T recommendations v.8, v.34 and optionally with v.8 bis. Alternatively, the multimedia terminal may be connected to an external modem. The modem initiates conversion of the multiplexed digital data and control signals generated by the multimedia terminal to an analog form suitable for transmission over the PSTN. It also enables the multimedia terminal to receive data and control signals from the PSTN in analog form and convert them into a data stream that can be demultiplexed and processed by the terminal in an appropriate manner.

An h.324 multimedia terminal can also be implemented in such a way that it can be connected directly to a digital fixed line network, such as an ISDN (integrated services digital network). In this case the modem 91 is replaced by an ISDN user network interface. In fig. 8, this ISDN user network interface is represented by spare block 92.

H.324 multimedia terminals may also be suitable for use in mobile communication applications. If used with a wireless communication link, the modem 91 can be replaced with any suitable wireless interface represented by spare block 93 in fig. 8. For example, an h.324/M multimedia terminal may comprise a radio transceiver allowing connection to the current second generation GSM mobile phone network or to the proposed third generation UMTS (universal mobile telephone system).

It should be noted that: in multimedia data designed for bi-directional communication, i.e., for the transmission and reception of video data, it is advantageous to provide a video encoder and a video decoder implemented according to the present invention. Such an encoder and decoder pair is often implemented as a single combined functional unit, referred to as a "codec".

A typical h.324 multimedia terminal will now be described in further detail with reference to fig. 8.

The multimedia terminal 80 includes various elements referred to as "terminal equipment". This includes video, audio and information communication devices generically represented by reference numerals 81, 82 and 83, respectively. The television apparatus 81 may include, for example: a camera for capturing video images, a display for displaying received video content and optionally a video processing device. The audio device 82 typically includes, for example, a microphone for capturing verbally spoken messages and a speaker for reproducing received audio content. The audio device may further comprise an additional audio processing unit. The information communication device 83 includes a data terminal, a keyboard, an electronic whiteboard, or a still image transceiver such as a facsimile unit.

The television apparatus 81 is coupled to a television codec 85. The television codec 85 includes a video encoder 600 and a corresponding video decoder 700 (see fig. 5 and 6) implemented in accordance with the present invention. The video codec 85 is responsible for encoding the captured video data in a suitable form for further transmission over the communication link and decoding the compressed video content received from the communication network. In the example illustrated in fig. 8, the video codec is implemented according to ITU-T recommendation h.26l, with appropriate modifications to the implementation of the method according to the invention in the encoder and decoder of the video codec.

The audio equipment of the terminal is coupled to an audio codec, indicated by reference numeral 86 in fig. 8. Like a video codec, an audio codec comprises one encoder/decoder pair. Which converts audio data captured by the terminal audio device into a form suitable for transmission over the communication link and converts encoded audio data received from the network back into a form suitable for reproduction, for example, on the terminal speakers. The output of the audio codec is passed to a delay block 87. This compensates for the delay introduced by the video encoding process and thereby ensures synchronization of the audio and video content.

The system control block 84 of the multimedia terminal controls the terminal-to-network signaling using an appropriate control protocol (signaling block 88) to establish a common mode of operation between the transmitting and receiving terminals. The signaling block 88 exchanges information about the encoding and decoding capabilities of the transmitting and receiving terminals and may be used to enable various encoding modes of the video encoder. The system control block 84 also controls the use of data encryption. Information about the type of encryption used in the data transmission is passed from the encryption block 89 to a multiplexer/demultiplexer (MUX/DMUX unit) 90.

During data transmission of the multimedia terminal, the MUX/DMUX unit 90 combines the encoded synchronized video and audio streams with the data input and possibly control data from the information communication device 83 to form a single bit stream. Information about the type of encryption of the data (if any) applied to the bitstream provided by the encryption block 89 is used to select an encryption mode. Accordingly, when a multiplexed and possibly encrypted multimedia bitstream is being received, MUX/DMUX unit 90 is responsible for decrypting the bitstream, dividing it into its constituent multimedia components and passing those components to the appropriate codec(s) and/or terminal equipment for decoding and reproduction.

If the multimedia terminal 80 is a mobile terminal, i.e. if it has a radio transceiver 93, it will be understood by those skilled in the art that it may also comprise additional elements. In one embodiment, it comprises: a user interface having a display and a keypad that enables user operation of the multimedia terminal 80; a central processing unit, such as a microprocessor, which controls the various blocks responsible for the different functions of the multimedia terminal; a random access memory RAM; a read only memory ROM and a digital camera. The operating instructions of the microprocessor are program code corresponding to the basic functions of the multimedia terminal 80, stored in a read-only memory ROM and capable of being executed when required by the microprocessor, for example under control of a user, according to which the microprocessor uses the radio transceiver 93 to form a connection with the mobile communication network, to enable the multimedia terminal 80 to transmit and receive information to and from the mobile communication network over a radio path.

The microprocessor monitors the state of the user interface and controls the digital camera. In response to a user command, the microprocessor commands the camera to record a digital image into the RAM. Once an image is captured, or during the capture process, the microprocessor segments the image into image segments (e.g., macroblocks) and performs motion compensated encoding of the segments using the encoder to generate a compressed image sequence, as explained in the above description. The user can instruct the multimedia terminal 80 to display the captured image on its display or to transmit the compressed image sequence to another multimedia terminal, a videophone or other telecommunication device connected to a fixed line network (PSTN) using the radio transceiver 93. In a preferred embodiment, the transmission of image data is started as soon as the first segment is encoded so that the receiver can start a corresponding decoding process with minimal delay.

Although described in the context of particular embodiments, it will be apparent to those skilled in the art that many modifications and variations to these teachings are possible. Thus, while the invention has been particularly shown and described with respect to one or more preferred embodiments thereof, it will be understood by those skilled in the art that certain modifications or changes may be made without departing from the scope and spirit of the invention as set forth above.

In particular, according to a second possible embodiment of the invention, a sequence QP is not transmitted, but instead an application specific constant is used as sequence QP.

In a third possible embodiment of the invention, the sequence QP may be updated depending on the changing characteristics of the video sequence if a reliable method of transmitting a new sequence QP is available. The updated SQP value may be included in the encoded bitstream representing the video sequence or can be transmitted directly from the encoder to the decoder in an associated control channel.

In a fourth possible embodiment of the invention, if the QP is constant for the entire video sequence, only the sequence QP value is transmitted together with an information indicating that it should be used as the QP for all pictures/slices.

Claims

1. A method for decoding an encoded video sequence, the encoded video data comprising at least a sequence level and a picture or slice level, the method comprising:

receiving a sequence level quantization parameter;

receiving an indication indicating whether a sequence-level quantization parameter is used when decoding a plurality of blocks belonging to a picture or slice level; and

an inverse quantization operation is performed on a set of quantized transform coefficients of a block of the plurality of blocks based on the sequence-level quantization parameter.

2. A method for decoding an encoded video sequence, the encoded video data comprising at least a sequence level and a picture or slice level, the method comprising:

receiving a sequence level quantization parameter;

receiving an indication of a picture or slice level of a difference between a sequence level quantization parameter and a second quantization parameter, the second quantization parameter being an actual quantization parameter used in the picture or slice level; and

an inverse quantization operation is performed on a set of quantized transform coefficients belonging to a plurality of blocks at a picture or slice level based on a second quantization parameter.

3. A decoder for decoding an encoded video sequence, the encoded video data comprising at least a sequence level and a picture or slice level, the decoder comprising:

demultiplexer of

Receiving a sequence level quantization parameter;

a quantizer of

4. A decoder for decoding an encoded video sequence, the encoded video data comprising at least a sequence level and a picture or slice level, the decoder comprising:

demultiplexer of

Receiving a sequence level quantization parameter;

receiving a picture or slice level indication of a difference between the sequence level quantization parameter and a second quantization parameter, the second quantization parameter being an actual quantization parameter used in the picture or slice level; and

a quantizer of