HK1119890B

HK1119890B - Method of encoding and decoding video signals

Info

Publication number: HK1119890B
Application number: HK08111615.4A
Authority: HK
Inventors: 尹度铉; 朴志皓; 朴胜煜; 全柄文; 金东奭
Original assignee: Lg电子株式会社
Priority date: 2005-07-21
Filing date: 2006-07-21
Publication date: 2011-08-26

Description

Method for encoding and decoding video signal

Technical Field

The present invention relates to a method of encoding and decoding a video signal.

Background

A Scalable Video Coding (SVC) scheme is a video signal coding scheme that encodes a video signal at the highest image quality and can present an image at a certain degree of image quality even if only a portion of a picture sequence (a sequence of frames continuously selected from the entire picture sequence) resulting from the encoding is decoded and used.

A picture sequence encoded using a scalable method enables an image to be presented with a certain degree of image quality, even if only a partial sequence of it is received and processed. In case of a low bit rate, the image quality may be significantly degraded. To overcome this problem, a separate auxiliary picture sequence may be provided for lower bit rates, e.g. a small screen and/or a picture sequence with a low frame rate.

The auxiliary picture sequence is called the base layer and the main picture sequence is called the enhancement layer or enhancement layer. The base layer and its enhancement layers are from the encoding of the same source video signal. There is redundancy in the video signals of the two layers. Thus, in the case where a base layer is provided, an inter-layer prediction method of predicting a video signal of an enhanced layer using motion information and/or texture information of the base layer corresponding to image data and encoding based on the prediction result may be employed in order to increase encoding efficiency.

A prediction method using texture information of a base layer includes an intra base prediction mode (intra base prediction mode) and a residual prediction mode (residual prediction mode).

The intra base prediction mode (simply referred to as an intra base mode) is a method of predicting and encoding a macroblock of an enhancement layer (a block which is located within a frame of a base layer temporally coincident with a frame including the macroblock and has an area covering the macroblock when enlarged by the ratio of screen sizes of the enhancement layer and the base layer) based on a block of the base layer which corresponds to the macroblock of the enhancement layer and has been encoded in the intra mode. In this case, the corresponding block of the base layer is decoded to have image data, and then enlarged and used by up-sampling at the ratio of screen sizes of the enhancement layer and the base layer.

The residual prediction mode is similar to the intra base mode except that the residual prediction mode uses a corresponding block having residual data in the base layer corresponding to an image difference value instead of a corresponding block having image data encoded in the base layer. Prediction data is created for a macroblock of the enhancement layer that has been encoded in the inter mode and has residual data based on a corresponding block of the base layer that has been encoded in the inter mode and has residual data. At this time, the corresponding block of the base layer having the residual data is enlarged and used by up-sampling as in the intra base mode.

Fig. 1 illustrates an embodiment of decoding an enhanced layer image block, which has been encoded in an inter mode and has residual data, using residual data of a base layer.

A residual prediction flag indicating that the image block of the enhanced layer has been encoded in the residual prediction mode is set to '1', and corresponding residual data of the base layer is added to the residual data of the enhanced layer.

In the case where the spatial resolutions of the base layer and the enhancement layer do not coincide with each other, residual data of the base layer is first up-sampled. Unlike the up-sampling in the intra base mode, the up-sampling for the residual data (hereinafter simply referred to as residual up-sampling) is performed in the following manner, in which the up-sampling is performed after decoding into image data.

1. In case the enhancement layer resolution is twice the base layer resolution (in case of 2 n-squared (dyadic)), bilinear interpolation is used.

2. In the case of non-2 n square, a 6-tap interpolation filter is used.

3. The upsampling is performed using only pixels within the same transform block. Up-sampling filtering beyond the boundary of the transform block is not allowed.

Fig. 2 illustrates an example of upsampling of a 4 x 4 residual block in the case of 2n square.

Simple bilinear interpolation is used for residual upsampling, but is not applied to the boundary of a transform block in order to avoid using pixels within another transform block. Therefore, as shown in fig. 2, pixels existing at the boundary of the transform block are up-sampled using only pixels of the corresponding block. Also, different operations are performed on pixels on the transform block boundary depending on the position of the pixel relative to the boundary.

Since transform operations may be performed for different block sizes, the boundary of a transform block must be determined taking into account the base layer transform block size (e.g., 4 × 4, 8 × 8, or … …).

The upsampling process is substantially the same except that a 6-tap interpolation filter is used in the case where the ratio of the base layer and enhancement layer resolutions is not 2 n-squared. The pixels within another transform block are not used for residual upsampling.

In addition, the same up-sampling is applied to a signal having luminance and chrominance components.

Fig. 3 illustrates an embodiment of decoding an image block of an enhanced layer encoded in an intra base mode using decoded image data of a base layer.

In up-sampling in the intra base mode, the boundary of the transform block is not considered, and a 6-tap interpolation filter is applied to both luminance and chrominance signals.

Disclosure of Invention

Accordingly, the present invention has been made keeping in mind the above problems occurring in the prior art, and an object of the present invention is to provide a method of simply and efficiently upsampling a base layer in inter-layer prediction.

To achieve the above object, the present invention provides a method of encoding a video signal, including creating a bitstream of a first layer by encoding the video signal; and creating a bitstream of the second layer by encoding the video signal based on the first layer; wherein, when residual data corresponding to an image difference within a first layer is up-sampled and used for encoding of a second layer, the residual data is up-sampled based on a block predicted by motion compensation.

Furthermore, the present invention provides a method of decoding an encoded video bitstream, comprising decoding a bitstream of a first layer; and decoding a bitstream of the second layer based on the first layer; wherein, when residual data corresponding to an image difference in the first layer is up-sampled and used for decoding of the second layer, the residual data is up-sampled based on a block predicted by motion compensation.

In one embodiment, when the ratio of the resolutions of the first and second layers is two, the residual data is up-sampled using a bilinear interpolation filter. In contrast, when the ratio of the resolutions of the first and second layers is not two, residual data is up-sampled using a 6-tap interpolation filter.

Further, the present invention provides a method of encoding a video signal, including creating a bitstream of a first layer by encoding the video signal; and creating a bitstream of the second layer by encoding the video signal based on the first layer; wherein, when the first layer is up-sampled and used for encoding of the second layer, different up-sampling methods are applied to the luminance data and the chrominance data, respectively.

Furthermore, the present invention provides a method of decoding an encoded video bitstream, comprising decoding a bitstream of a first layer; and decoding a bitstream of the second layer based on the first layer; wherein, when the first layer is up-sampled and used for decoding of the second layer, different up-sampling methods are applied to the luminance data and the chrominance data, respectively.

In one embodiment, luminance data is upsampled using a 6-tap interpolation filter, while chrominance data is upsampled using a bi-linear interpolation filter. In this case, weights may be applied to the up-sampling of the chrominance data, the weights being determined based on the relative positions and phase shifts between the chrominance data sampling points of the first and second layers.

In one embodiment, when residual data corresponding to an image difference within the first layer is up-sampled by two corresponding to a ratio of resolutions of the first and second layers to each other, samples to be inserted between four specific chrominance data samples are calculated using the same equation. In this case, each sample point to be inserted may be calculated as an average value of two corresponding pixels belonging to the four sample points and located in the diagonal direction.

Drawings

The above and other objects, features and advantages of the present invention will be more clearly understood from the following detailed description taken in conjunction with the accompanying drawings, in which:

fig. 1 is a schematic diagram illustrating an embodiment of decoding an enhancement layer that has been encoded in an inter mode and has residual data, using the residual data of a base layer;

fig. 2 is a diagram illustrating an example of upsampling a 4 × 4 residual block in the case of 2n square;

FIG. 3 is a diagram illustrating an embodiment of decoding an enhancement layer that has been encoded in an intra base mode using decoded image data of a base layer;

FIG. 4 is a schematic diagram illustrating various examples of macroblocks, transform blocks, and partitions;

fig. 5 is a diagram illustrating a process of upsampling luminance and chrominance signals of a base layer having residual data using different methods and decoding an enhanced layer having residual data and encoded in an inter mode using the upsampled result according to a second embodiment of the present invention;

fig. 6 is a diagram illustrating a process of upsampling luminance and chrominance signals of a base layer having decoded image data using different methods and decoding an enhanced layer, which has been encoded in an intra base mode, using the upsampled result, according to a second embodiment of the present invention; and

fig. 7 is a diagram illustrating the relative positions of respective pixels in a second embodiment of the present invention in which a chrominance signal having residual data is up-sampled by two times.

Modes for carrying out the invention

Reference should now be made to the drawings, in which the same reference numerals are used throughout the drawings to designate the same or similar components.

Fig. 4 illustrates various examples of macroblocks, transform blocks, and partitions.

The size of a macroblock is typically 16 x 16 pixels. A 4 x 4 block or an 8 x 8 block is transformed, such as DCT, and the one with higher coding efficiency is selected. In this case, the term partition, macroblock type, or mode refers to a block having one of various shapes into which subblocks including a reference frame (reference index) of a reference block and/or motion vectors (the reference index and the motion vectors are referred to as motion information) indicating displacements to the reference block are merged to be coincident with each other, or a partition of a macroblock into which blocks of which pieces of motion information are coincident with each other are combined when the macroblock is encoded using a motion compensation prediction method.

For example, in the AVC standard, the smallest unit defining motion information (e.g., mode or partition, reference index, and motion vector) is determined. The motion vector is defined based on a subblock having a minimum size of 4 x 4, and the reference index is defined based on a subblock having a minimum size of 8 x 8. Also, the motion vector and the reference index may be each defined based on a macroblock having a maximum size of 16 × 16. When 4 × 4 sub-blocks having the same motion vector are merged with each other, a motion vector may be defined for a 4 × 8, 8 × 4, 8 × 8, 8 × 16, 16 × 8, or 16 × 16 unit. Likewise, when 8 × 8 sub-macroblocks having the same reference index are merged with each other, the reference index may be defined for an 8 × 16, 16 × 8, or 16 × 16 unit.

In MB0, the size of the transform block is 4 × 4, and the partition is composed of one 8 × 8 block, two 8 × 4 blocks, and two 4 × 8 blocks. In MB1, the size of the transform block is 8 × 8, and the partition is composed of 16 × 8 modes (that is, two 16 × 8 blocks). In MB2, the size of a transform block is 8 × 8, and the partition is composed of 8 × 16 modes (that is, two 8 × 16 blocks). In the MB3, the size of a transform block is 4 × 4, and the partition is composed of a 16 × 16 mode (that is, one 16 × 16 block).

In residual up-sampling, pixels existing on the boundary of the block to be considered and pixels existing inside the block as shown in fig. 2 may lead out new pixels through different operations.

For pixels that do not exist on the boundary of the block to be considered, a bilinear interpolation filter or a 6-tap interpolation filter may be uniformly used. Depending on the position of the pixel relative to the boundary, different operations are performed on the pixels present on the block boundary. That is, by reducing the number of pixels that are subjected to individual operations, that is, the number of pixels existing on the boundary, and increasing the number of pixels that can be uniformly processed, residual up-sampling can be simplified.

Thus, in the first embodiment of the present invention, when performing residual upsampling, only the boundaries of the motion compensated prediction partition are considered, and the boundaries of the transform block are not considered.

That is, filtering of upsampling may be applied outside the boundaries of the transform block as long as the boundaries of the transform block are not the boundaries of the motion compensated prediction partition. In this case, the boundary of the base layer (not the boundary of the enhancement layer) is used as the boundary of the transform block and the boundary of the motion-compensated prediction partition.

In the MB0, in the case where the boundary of two upper 8 × 8 prediction blocks, the boundary of two lower left 8 × 4 prediction blocks, and the boundary of two lower right 4 × 8 prediction blocks are regarded as block boundaries, instead of the boundary of a 4 × 4 transform block being regarded as a block boundary, different operations are applied to pixels existing on the boundaries to perform residual up-sampling.

In MB1, the boundaries of two 16 × 8 prediction blocks, rather than the boundaries of an 8 × 8 transform block, are block boundaries that determine whether to apply an upsampling filter. Likewise, in MB2, the boundaries of two 8 × 16 prediction blocks, rather than the boundaries of an 8 × 8 transform block, are considered as block boundaries. Also, in the MB3, the boundary of a 16 × 16 macroblock, not the boundary of a 4 × 4 transform block, is regarded as the boundary of a block.

In general, a video signal is processed separately from each other in terms of components related to chrominance information Cb and Cr and components related to luminance information Y. The sampling ratio of the luminance signal to the chrominance signal is typically 4: 2: 0. The sampling points of the chrominance signal are located between the sampling points of the luminance signal. That is, for a video signal, the number of sampling points of the chrominance signal is smaller than the number of sampling points of the luminance signal. The reason for this is that the optic nerve of the human eye is more sensitive to luminance signals than chrominance signals.

Thus, in the second embodiment of the present invention, different up-sampling filters are applied to the luminance signal and the chrominance signal. The up-sampling filter applied to the chrominance signal is simpler than that applied to the luminance signal.

Fig. 5 illustrates a process of up-sampling luminance and chrominance signals of a base layer having residual data using different methods and decoding an enhanced layer having residual data, which has been encoded in an inter mode, using the up-sampling result according to a second embodiment of the present invention.

Fig. 6 illustrates a process of up-sampling luminance and chrominance signals of a base layer having decoded image data using different methods and decoding an enhanced layer, which has been encoded in an intra base mode, using the up-sampling result according to a second embodiment of the present invention.

As shown in fig. 5 and 6, different filtering methods are applied to the luminance and chrominance signals. The up-sampling method 1 in fig. 5 and 6 is a filtering method of up-sampling a luminance signal, and the up-sampling method 2 in fig. 5 and 6 is a filtering method of up-sampling a chrominance signal.

In the second embodiment of the present invention, for example, a 6-tap interpolation filter may be used as a filter for up-sampling a luminance signal, and for example, a bilinear interpolation filter may be used as a filter for up-sampling a chrominance signal.

Meanwhile, the residual data is composed of the difference between an image block desired to be encoded and a reference block having image data similar to the image block, so that the absolute value of the data is small and the variation in the value between adjacent pixels is small. Also, as described above, the chrominance signal stimulates the optic nerve of the human eye less than the luminance signal.

This means that a simpler method can be applied for upsampling of chrominance signals with residual data than for upsampling of luminance signals with residual data. Furthermore, it also means that a simpler method can be applied to the upsampling of the chrominance signal with residual data (residual prediction mode) than to the upsampling of the chrominance signal with decoded image data (intra base mode).

Thus, for example, when the ratio of the resolutions of the base layer and the enhancement layer is 2n, the residual up-sampling of the chrominance signal within the boundary (the boundary of the transform block or the boundary of the motion-compensated prediction partition) (residual up-sampling method 2 in fig. 5) is defined as h ═ v ═ D +1 > 1 or h ═ v ═ D ═ B + C +1 > 1, and the amount of calculation necessary for up-sampling can be reduced.

In this case, the relative positions of a, B, C, D, h, v and D are illustrated in fig. 7. Pixels to be inserted between pixels (sampling points) of a, B, C, and D chrominance signals are not calculated using different equations, but each pixel is simply calculated using the same equation as an average value of two corresponding pixels located in diagonal directions.

Conversely, in case the ratio of the resolutions of the base layer and the enhancement layer is not 2 n-squared, the residual upsampling of the chrominance signal within the boundary of the transform block or the boundary of the motion compensated prediction partition may be performed using a bilinear interpolation filter, in which case the weights are determined taking into account the relative positions and/or phase shifts between the chrominance samples (pixels) of the base layer and the enhancement layer.

Also, in the up-sampling of the chrominance signal of the intra base mode (up-sampling method 2 in fig. 6), the pixels of the base layer chrominance signal are up-sampled using a dual interpolation filter that is simpler than the 6-tap interpolation filter for the luminance signal. In this case, the weights may also be determined taking into account the relative positions and/or phase shifts between the base layer and enhancement layer chroma sampling points.

Meanwhile, in the case of encoding an enhancement layer in a residual prediction mode or an intra base mode, and in the case of decoding an enhancement layer encoded in a residual prediction mode or an intra base mode, upsampling of a base layer is performed.

Thus, the up-sampling method according to the present invention can be applied to an encoding and decoding apparatus that encodes and decodes a video signal using an inter-layer prediction method.

Also, a decoding apparatus to which the up-sampling method according to the present invention is applied may be installed in a mobile communication terminal or a recording medium playback apparatus.

Thus, when the base layer is upsampled in the inter-layer prediction, the number of pixels to be specially processed is reduced, and thus the upsampling efficiency is improved and the amount of calculation is reduced by applying a more simplified operation.

Although the preferred embodiments of the present invention have been disclosed for illustrative purposes, those skilled in the art will appreciate that various modifications, additions and substitutions are possible, without departing from the scope and spirit of the invention as disclosed in the accompanying claims.

Claims

1. A method of decoding a video signal, comprising:

obtaining a reference block of a current block in a first layer;

generating up-sampled residual data by using a bilinear interpolation filter and the residual data of the first layer;

acquiring residual data of a current block in a second layer based on the up-sampling residual data;

decoding the current block based on the residual data of the current block.

2. A method as claimed in claim 1, wherein the bilinear interpolation filter operates to take into account the relative positions of sample points.

3. The method of claim 1, wherein if the residual data of the first layer corresponds to chrominance data, the residual data of the first layer is upsampled using a bilinear interpolation filter, and a weight is applied to the bilinear interpolation filter, the weight being determined based on a relative position or a phase shift between chrominance data sampling points of the first layer and the second layer.