WO2013137588A1

WO2013137588A1 - Scalable video decoding/encoding method, and scalable video decoding/encoding device using same

Info

Publication number: WO2013137588A1
Application number: PCT/KR2013/001825
Authority: WO
Inventors: 전용준; 박준영; 전병문; 박승욱; 임재현; 박내리; 김철근
Original assignee: 엘지전자 주식회사
Priority date: 2012-03-12
Filing date: 2013-03-06
Publication date: 2013-09-19

Abstract

The present invention relates to a scalable video encoding method and apparatus. The scalable video encoding method according to the present invention includes the steps of: performing prediction for an input layer; and compressing motion information for each layer for each access unit. As a result, when motion information on a base layer is used in an enhancement layer, uncompressed motion information can be used. Therefore, the accuracy of the motion information is improved to enhance prediction performance.

Description

Scalable video decoding / encoding method and decoding / encoding device using same

The present invention relates to video compression techniques, and more particularly, to a method and apparatus for performing scalable video coding.

Recently, the demand for high resolution and high quality images is increasing in various applications. As an image has a high resolution and high quality, the amount of information on the image also increases. As information volume increases, devices with various performances and networks with various environments are emerging. With the emergence of devices of varying performance and networks of different environments, the same content is available in different qualities.

In detail, as the video quality of the terminal device can be supported and the network environment is diversified, in general, video of general quality may be used in one environment, but higher quality video may be used in another environment. .

As such, based on high-efficiency encoding / decoding methods for high-capacity video in order to provide various video services required by users in various environments, the quality of the image, for example, the image quality, the resolution, and the size of the image For example, it is necessary to provide scalability in a video frame rate. In addition, various image processing methods associated with such scalability should be discussed.

On the other hand, the higher the resolution and the higher quality of the image, the more the information about the image increases. Therefore, when image information is transmitted using a medium such as a conventional wired / wireless broadband line or when image information is stored using an existing storage medium, information transmission cost and storage cost increase. Therefore, a high efficiency image compression technique can be used to effectively transmit, store, and reproduce high resolution, high quality image information.

An object of the present invention is to provide a method for compressing motion information used for inter-layer prediction and an apparatus using the same.

Another object of the present invention is to provide a scalable video coding method and a coding apparatus using the same, in which motion information used for inter-layer prediction is derived in a base layer, thereby improving the accuracy of the prediction by increasing the accuracy of the motion information. It is done.

Another object of the present invention is to provide a scalable video coding method and a coding apparatus using the same, which enable use of uncompressed motion information when using motion information of a base layer in an enhancement layer.

Another object of the present invention is to provide a scalable video coding method capable of storing motion information of a layer by reflecting a resolution ratio between layers, and a coding apparatus using the same.

An embodiment of the present invention may include a prediction step of performing prediction on an input layer, and compressing motion information for each layer for each access unit.

In this case, the layer may include a base layer and at least one enhancement layer, and the prediction may include performing the prediction of the enhancement layer based on the motion information of the base layer. Compression of the motion information of the base layer and the motion information of the enhancement layer may be performed after the prediction of the enhancement layer.

Further, another embodiment of the present invention may include a prediction unit that performs prediction on an input layer, and a memory that compresses motion information about each layer for each access unit.

According to an embodiment of the present invention, the present invention may provide a method for compressing motion information used for interlayer prediction and an apparatus using the same.

Further, according to an embodiment of the present invention, when the base layer derives motion information used for inter-layer prediction, the accuracy of the motion information may be increased to improve the performance of the prediction.

In addition, according to an embodiment of the present invention, when the motion information of the base layer is used in the enhancement layer, uncompressed motion information can be used, thereby improving the accuracy of the motion information to improve prediction performance.

Meanwhile, according to another exemplary embodiment of the present invention, motion information of a layer may be stored by reflecting the resolution ratio between layers, thereby increasing data storage efficiency.

1 is a block diagram schematically illustrating a video encoding apparatus supporting scalability according to an embodiment of the present invention.

2 is a block diagram illustrating an example of interlayer prediction in an encoding apparatus that performs scalable coding according to the present invention.

3 is a block diagram schematically illustrating a video decoding apparatus supporting scalability according to an embodiment of the present invention.

4 is a block diagram illustrating an example of interlayer prediction in a decoding apparatus that performs scalable coding according to the present invention.

5 is a diagram schematically illustrating a layer structure of scalable coding to which the present invention is applied.

A layer is illustrated for explaining inter prediction and inter layer prediction.

6 is a diagram for describing a method of compressing and storing motion information used for inter prediction.

7 is a control flowchart illustrating a method of compressing motion information used for inter-layer prediction according to the present invention.

8 is a diagram for describing a method of storing motion information used for inter-layer prediction, according to another embodiment of the present invention.

As the present invention allows for various changes and numerous embodiments, particular embodiments will be illustrated in the drawings and described in detail in the written description. However, this is not intended to limit the invention to the specific embodiments. The terminology used herein is for the purpose of describing particular example embodiments only and is not intended to be limiting of the spirit of the present invention. Singular expressions include plural expressions unless the context clearly indicates otherwise. In this specification, terms such as "comprise" or "have" are intended to indicate that there is a feature, number, step, action, component, part, or combination thereof described on the specification, and one or more other features. It is to be understood that the present invention does not exclude the possibility of the presence or the addition of numbers, steps, operations, components, components, or a combination thereof.

On the other hand, each of the components in the drawings described in the present invention are shown independently for the convenience of description of the different characteristic functions in the video encoding apparatus / decoding apparatus, each component is a separate hardware or separate software It does not mean that it is implemented. For example, two or more of each configuration may be combined to form one configuration, or one configuration may be divided into a plurality of configurations. Embodiments in which each configuration is integrated and / or separated are also included in the scope of the present invention without departing from the spirit of the present invention.

Hereinafter, with reference to the accompanying drawings, it will be described in detail a preferred embodiment of the present invention. Hereinafter, the same reference numerals are used for the same components in the drawings, and redundant description of the same components is omitted.

In a video coding method supporting scalability (hereinafter, referred to as 'scalable coding'), input signals may be processed for each layer. Depending on the layer, the input signals (input images) may differ in at least one of resolution, frame rate, bit-depth, color format, and aspect ratio. Can be.

In the present specification, scalable coding includes scalable encoding and scalable decoding.

In scalable encoding / decoding, prediction between layers is performed by using differences between layers, that is, based on scalability, thereby reducing overlapping transmission / processing of information and increasing compression efficiency.

Referring to FIG. 1, the encoding apparatus 100 includes an encoder 105 for layer 1 and an encoder 155 for layer 0.

Layer 0 may be a base layer, a reference layer, or a lower layer, and layer 1 may be an enhancement layer, a current layer, or an upper layer.

The encoder 105 of the layer 1 includes an inter / intra predictor 110, a transform / quantizer 115, a filter 120, a decoded picture buffer 125, an entropy coding unit 130, and a unit parameter. The prediction unit 135 includes a motion predictor / rescaler 140, a texture predictor / rescaler 145, a parameter predictor 150, and a multiplexer 185.

The encoding unit 155 of the layer 0 includes an inter / intra prediction unit 160, a transform / quantization unit 165, a filtering unit 170, a DPB 175, and an entropy coding unit 180.

The inter /

intra predictors

110 and 160 may perform inter prediction and intra prediction on the input image. The inter /

intra predictor

110 or 160 may perform prediction in a predetermined processing unit. The performing unit of prediction may be a coding unit (CU), a prediction unit (PU), or a transform unit (TU).

For example, the inter /

intra prediction units

110 and 160 determine whether to apply inter prediction or intra prediction on a CU basis, determine a prediction mode on a PU basis, and perform prediction on a PU basis or a TU basis. It can also be done. Prediction performed includes generation of a prediction block and generation of a residual block (residual signal).

Through inter prediction, a prediction block may be generated by performing prediction based on information of at least one picture of a previous picture and / or a subsequent picture of the current picture. Through intra prediction, prediction blocks may be generated by performing prediction based on pixel information in a current picture.

As a mode or method of inter prediction, there are a skip mode, a merge mode, a motion vector predictor (MVP) mode method, and the like. In inter prediction, a reference picture may be selected with respect to the current PU that is a prediction target, and a reference block corresponding to the current PU may be selected within the reference picture. The inter / intra predictor 160 may generate a prediction block based on the reference block.

The prediction block may be generated in integer sample units or may be generated in integer or less pixel units. In this case, the motion vector may also be expressed in units of integer pixels or units of integer pixels or less.

In inter prediction, motion information, that is, information such as an index of a reference picture, a motion vector, and a residual signal, is entropy encoded and transmitted to a decoding apparatus. When the skip mode is applied, residuals may not be generated, transformed, quantized, or transmitted.

In intra prediction, the prediction mode may have 33 directional prediction modes and at least two non-directional modes. The non-directional mode may include a DC prediction mode and a planner mode (Planar mode). In intra prediction, a prediction block may be generated after applying a filter to a reference sample.

The PU may be a block of various sizes / types, for example, in the case of inter prediction, the PU may be a 2N × 2N block, a 2N × N block, an N × 2N block, an N × N block (N is an integer), or the like. In the case of intra prediction, the PU may be a 2N × 2N block or an N × N block (where N is an integer). In this case, the PU of the N × N block size may be set to apply only in a specific case. For example, the NxN block size PU may be used only for the minimum size CU or only for intra prediction. In addition to the above-described PUs, PUs such as N × mN blocks, mN × N blocks, 2N × mN blocks, or mN × 2N blocks (m <1) may be further defined and used.

The transform /

quantization units

115 and 165 perform transform on the residual block in transform block units to generate transform coefficients and quantize the transform coefficients.

The transform block is a block of samples and is a block to which the same transform is applied. The transform block can be a transform unit (TU) and can have a quad tree structure.

The transform /

quantization units

115 and 165 may generate a 2D array of transform coefficients by performing transform according to the prediction mode applied to the residual block and the size of the block. For example, if intra prediction is applied to a residual block and the block is a 4x4 residual array, the residual block is transformed using a discrete sine transform (DST), otherwise the residual block is transformed into a discrete cosine transform (DCT). Can be converted using.

The transform /

quantization unit

115 and 165 may quantize the transform coefficients to generate quantized transform coefficients.

The transform /

quantization units

115 and 165 may transfer the quantized transform coefficients to the

entropy coding units

130 and 180. In this case, the transform / quantization unit 165 may rearrange the two-dimensional array of quantized transform coefficients into one-dimensional arrays according to a predetermined scan order and transfer them to the

entropy coding units

130 and 180. Also, the transform /

quantizers

115 and 165 may transmit the reconstructed block generated based on the residual and the predictive block to the

filtering units

120 and 170 for inter prediction.

The

entropy coding units

130 and 180 may perform entropy encoding on the quantized transform coefficients. Entropy encoding may use, for example, an encoding method such as Exponential Golomb, Context-Adaptive Binary Arithmetic Coding (CABAC), or the like.

The

filtering units

120 and 170 may apply a deblocking filter, an adaptive loop filter (ALF), and a sample adaptive offset (SAO) to the reconstructed picture.

The deblocking filter may remove distortion generated at the boundary between blocks in the reconstructed picture. The adaptive loop filter (ALF) may perform filtering based on a value obtained by comparing the reconstructed image with the original image after the block is filtered through the deblocking filter. The SAO restores the offset difference from the original image on a pixel-by-pixel basis to the residual block to which the deblocking filter is applied, and is applied in the form of a band offset and an edge offset.

The

filtering units

120 and 170 may apply only the deblocking filter, only the deblocking filter and the ALF, or only the deblocking filter and the SAO without applying all of the deblocking filter, ALF, and SAO.

The

DPBs

125 and 175 may receive the reconstructed block or the reconstructed picture from the

filtering units

125 and 170 and store the received reconstruction block. The

DPBs

125 and 175 may provide the reconstructed block or picture to the inter /

intra predictor

110 or 160 that performs inter prediction.

Information output from the entropy coding unit 180 of the layer 0 and information output from the entropy coding unit 130 of the layer 1 may be multiplexed by the MUX 185 and output as a bitstream.

Meanwhile, the encoding unit 105 of the layer 1 may use the unit parameter predictor 135 and the motion predictor / rescaler 140 for inter-layer prediction for performing prediction on the image of the layer 1 using the layer 0 information. , Texture predictor / rescaler 145, parameter predictor 150, and the like.

The unit parameter predictor 135 derives unit (CU, PU and / or TU) information of the base layer to use as unit information of the enhancement layer, or unit information of the enhancement layer based on the unit information of the base layer. To be determined.

The motion predictor 140 performs interlayer motion prediction. Inter-layer motion prediction is also called inter-layer inter prediction. The motion predictor 140 may perform prediction on the current block of the current layer (enhanced layer) using the motion information of the reference layer (base layer).

If necessary, the motion predictor 140 may scale motion information of the reference layer.

The texture predictor 145 may perform texture prediction based on the information of the layer 0. Texture prediction is also called intra base layer (BL) prediction. Texture prediction may be applied when the reference layer is reconstructed by intra prediction as an I slice. In texture prediction, the texture of the reference block in the reference layer may be used as a prediction value for the current block of the enhancement layer. In this case, the texture of the reference block may be scaled by upsampling.

The parameter predictor 150 may derive the parameters used in the base layer to reuse them in the enhancement layer or predict the parameters for the enhancement layer based on the parameters used in the base layer.

Meanwhile, for the sake of convenience of description, the encoding unit 105 of the layer 1 includes the MUX 185, but the MUX is separate from the encoding unit 105 of the layer 1 and the encoding unit 155 of the layer 0. It may be a device or a module of.

Referring to FIG. 2, the predictor 210 of the layer 1 includes an inter / intra predictor 220 and an interlayer predictor 230.

The prediction unit 210 of the layer 1 may perform interlayer prediction necessary for the prediction of the layer 1 from the information of the layer 0.

For example, the interlayer prediction unit 230 may receive the layer 0 information from the inter / intra predictor 250 and / or the filter 260 of the layer 0 to perform interlayer prediction necessary for the prediction of the layer 1. have.

The inter / intra predictor 220 of the layer 1 may perform inter prediction or intra prediction using the information of the layer 1.

Also, the inter / intra predictor 220 of the layer 1 may perform prediction based on the information of the layer 0 using the information transmitted from the interlayer predictor 230.

In addition, the filtering unit 240 of the layer 1 may perform the filtering based on the information of the layer 1, or may perform the filtering based on the information of the layer 0. Information of the layer 0 may be transferred from the filtering unit 260 of the layer 0 to the filtering unit 240 of the layer 1, or may be transferred from the interlayer prediction unit 230 of the layer 1 to the filtering unit 240 of the layer 1. It may be.

The information transmitted from the layer 0 to the interlayer prediction unit 230 may be at least one of information about a unit parameter of the layer 0, motion information of the layer 0, texture information of the layer 0, and filter parameter information of the layer 0. have.

Accordingly, the interlayer prediction unit 230 may include a part of the unit parameter predictor 135, the motion predictor 140, the texture predictor 145, and the parameter predictor 150 that perform interlayer prediction in FIG. 1 or It can contain everything.

In addition, in layer 1, the inter / intra predictor 220 may correspond to the inter / intra predictor 110 of FIG. 1, and the filter 240 may correspond to the filter 120 of FIG. 1. In layer 0, the inter / intra predictor 250 may correspond to the inter / intra predictor 160 of FIG. 1, and the filter 260 may correspond to the filter 170 of FIG. 1.

Referring to FIG. 3, the decoding apparatus 300 includes a decoder 310 of layer 1 and a decoder 350 of layer 0.

The decoding unit 310 of the layer 1 includes an entropy decoding unit 315, a reordering unit 320, an inverse quantization unit 325, an inverse transform unit 330, a prediction unit 335, a filtering unit 340, and a memory. can do.

The decoding unit 350 of the layer 0 may include an entropy decoding unit 355, a reordering unit 360, an inverse quantization unit 365, an inverse transform unit 370, a filtering unit 380, and a memory 385. .

When the bitstream including the image information is transmitted from the encoding device, the DEMUX 305 may demultiplex the information for each layer and deliver the information to the decoding device for each layer.

The

entropy decoding units

315 and 355 may perform entropy decoding corresponding to the entropy coding scheme used in the encoding apparatus. For example, when CABAC is used in the encoding apparatus, the

entropy decoding units

315 and 355 may also perform entropy decoding using CABAC.

Information for generating a prediction block among information decoded by the

entropy decoding units

315 and 355 is provided to the

prediction units

335 and 375, and a residual value of which entropy decoding is performed by the

entropy decoding units

315 and 355. That is, the quantized transform coefficients may be input to the

reordering units

320 and 360.

The

reordering units

320 and 360 may rearrange the information of the bitstreams entropy decoded by the

entropy decoding units

315 and 355, that is, the quantized transform coefficients, based on the reordering method in the encoding apparatus.

For example, the

reordering units

320 and 360 may rearrange the quantized transform coefficients of the one-dimensional array back to the coefficients of the two-dimensional array. The

reordering units

320 and 360 may generate a two-dimensional array of coefficients (quantized transform coefficients) by performing scanning based on the prediction mode applied to the current block (transform block) and / or the size of the transform block.

The

inverse quantizers

325 and 365 may generate transform coefficients by performing inverse quantization based on the quantization parameter provided by the encoding apparatus and the coefficient values of the rearranged block.

The

inverse transformers

330 and 370 may perform inverse transform on the transform performed by the transform unit of the encoding apparatus. For example, the

inverse transform units

330 and 370 may perform inverse DCT and / or inverse DST on a discrete cosine transform (DCT) and a discrete sine transform (DST) performed by an encoding apparatus.

The DCT and / or DST in the encoding apparatus may be selectively performed according to a plurality of pieces of information, such as a prediction method, a size of a current block, and a prediction direction, and the

inverse transformers

330 and 370 of the decoding apparatus may perform transform information performed in the encoding apparatus. Inverse transformation may be performed based on.

The

inverse transformers

330 and 370 may inverse transform the transform coefficients or the block of the transform coefficients to generate the residual or the residual block.

The

prediction units

335 and 375 may perform prediction on the current block based on the prediction block generation related information transmitted from the

entropy decoding units

315 and 355 and previously decoded blocks and / or picture information provided by the

memories

345 and 385. A prediction block can be generated.

When the prediction mode for the current block is an intra prediction mode, the

prediction units

335 and 375 may perform intra prediction on the current block based on pixel information in the current picture.

When the prediction mode for the current block is the inter prediction mode, the

prediction units

335 and 375 may perform the prediction on the current block based on information included in at least one of a previous picture or a subsequent picture of the current picture. Inter prediction may be performed. Some or all of the motion information required for inter prediction may be derived from the information received from the encoding apparatus and correspondingly.

When the skip mode is applied as the mode of inter prediction, residual is not transmitted from the encoding apparatus, and the prediction block may be a reconstruction block.

Meanwhile, the prediction unit 335 of layer 1 may perform inter prediction or intra prediction using only information in layer 1, or may perform inter layer prediction using information of another layer (layer 0).

For example, the prediction unit 335 of the layer 1 may perform prediction on the current block by using one of the motion information of the layer 1, the texture information of the layer 1, the unit information of the layer 1, and the parameter information of the layer 1. In addition, the prediction unit 335 of the layer 1 may perform prediction on the current block by using a plurality of pieces of information of the motion information of the layer 1, the texture information of the layer 1, the unit information of the layer 1, and the parameter information of the layer 1. have.

The predictor 335 of the layer 1 may receive motion information of the layer 1 from the predictor 375 of the layer 0 to perform motion prediction. Inter-layer motion prediction is also called inter-layer inter prediction. By inter-layer motion prediction, prediction of a current block of a current layer (enhanced layer) may be performed using motion information of a reference layer (base layer). The prediction unit 335 may scale and use motion information of the reference layer when necessary.

The predictor 335 of the layer 1 may receive texture information of the layer 1 from the predictor 375 of the layer 0 to perform texture prediction. Texture prediction is also called intra base layer (BL) prediction. Texture prediction may be applied when the reference layer is reconstructed by intra prediction as an I slice. In texture prediction, the texture of the reference block in the reference layer may be used as a prediction value for the current block of the enhancement layer. In this case, the texture of the reference block may be scaled by upsampling.

The predictor 335 of the layer 1 may receive unit parameter information of the layer 1 from the predictor 375 of the layer 0 to perform unit parameter prediction. By unit parameter prediction, unit (CU, PU, and / or TU) information of the base layer may be used as unit information of the enhancement layer, or unit information of the enhancement layer may be determined based on unit information of the base layer.

The predictor 335 of the layer 1 may receive parameter information regarding the filtering of the layer 1 from the predictor 375 of the layer 0 to perform parameter prediction. By parameter prediction, the parameters used in the base layer can be derived and reused in the enhancement layer, or the parameters for the enhancement layer can be predicted based on the parameters used in the base layer.

The

adders

390 and 395 may generate reconstruction blocks using the prediction blocks generated by the

predictors

335 and 375 and the residual blocks generated by the

inverse transformers

330 and 370. In this case, the

adders

390 and 395 can be viewed as separate units (restore block generation unit) for generating the reconstruction block.

Blocks and / or pictures reconstructed by the

adders

390 and 395 may be provided to the

filtering units

340 and 380.

The

filtering units

340 and 380 may apply deblocking filtering, sample adaptive offset (SAO), and / or ALF to the reconstructed blocks and / or pictures.

The

filtering units

340 and 380 may not apply all of the deblocking filter, ALF, and SAO, and may apply only the deblocking filter, only the deblocking filter and the ALF, or may apply only the deblocking filter and the SAO.

Referring to the example of FIG. 3, the filtering unit 340 of the layer 1 performs filtering on the reconstructed picture by using parameter information transmitted from the prediction unit 335 of the layer 1 and / or the filtering unit 380 of the layer 1. It can also be done. For example, in layer 1, the filtering unit 340 may apply filtering to or between layers using a parameter predicted from the parameters of the filtering applied in layer 0.

The

memories

345 and 385 may store the reconstructed picture or block to use as a reference picture or reference block. The

memories

345 and 385 may output the stored reconstructed picture through a predetermined output unit (not shown) or a display (not shown).

In the example of FIG. 3, the reordering unit, the inverse quantization unit, and the inverse transform unit have been described. However, as in the encoding apparatus of FIG. It can also be configured.

On the contrary, in the example of FIG. 3, the prediction unit is described. However, as in the example of FIG. 1, the prediction unit of the layer 1 includes the interlayer prediction unit that performs prediction using information of another layer (layer 0) and the other layer (layer 0). It may also include an inter / intra predictor that performs prediction without using information.

4 is a block diagram illustrating an example of interlayer prediction in a decoding apparatus that performs scalable coding according to the present invention. Referring to FIG. 4, the predictor 410 of the layer 1 includes an inter / intra predictor 420 and an interlayer predictor 430.

The prediction unit 410 of the layer 1 may perform interlayer prediction necessary for the prediction of the layer 1 from the information of the layer 0.

For example, the interlayer prediction unit 430 may receive the layer 0 information from the inter / intra prediction unit 450 and / or the filtering unit 460 of the layer 0 to perform interlayer prediction necessary for the prediction of the layer 1. have.

The inter / intra predictor 420 of the layer 1 may perform inter prediction or intra prediction using the information of the layer 1.

Also, the inter / intra predictor 420 of the layer 1 may perform prediction based on the information of the layer 0 using the information transmitted from the interlayer predictor 430.

The filtering unit 440 of the layer 1 may perform the filtering based on the information of the layer 0 or may perform the filtering based on the information of the layer 0. Information of the layer 0 may be transferred from the filtering unit 460 of the layer 0 to the filtering unit 440 of the layer 1, or may be transferred from the interlayer prediction unit 430 of the layer 1 to the filtering unit 240 of the layer 1. It may be.

The information transmitted from the layer 0 to the interlayer prediction unit 430 may be at least one of information about a unit parameter of the layer 0, motion information of the layer 0, texture information of the layer 0, and filter parameter information of the layer 0. have.

In layer 1, the predictor 410 may correspond to the predictor 335 of FIG. 3, and the filter 440 may correspond to the filter 340 of FIG. 3. In layer 0, the predictor 450 may correspond to the predictor 375 of FIG. 3, and the filter 460 may correspond to the filter 380 of FIG. 3.

In addition, although not shown, the inter-layer prediction unit 430 may include a motion prediction unit, a texture prediction unit, and a unit parameter according to the type of inter-layer prediction (eg, motion prediction, texture prediction, unit parameter prediction, and parameter prediction). The prediction unit and the parameter prediction unit may be included.

In scalable video coding, interlayer prediction may be performed to predict information of a current layer by using information of another layer. As described in the example of FIGS. 1 to 4, motion prediction, texture prediction, unit prediction, parameter prediction, and the like may be considered as examples of inter-layer prediction.

The current picture 510 having a POC of n may perform inter prediction using information of another picture without referring to information of another layer. For example, if the current picture 510 is a P picture, inter prediction is performed using information of a previous picture having a POC smaller than n. If the current picture 510 is a B picture, a previous picture and a POC larger than n Inter prediction may be performed using the information. The prediction information used for inter prediction is information about a prediction mode (predMode) such as a skip mode, a merge mode, a motion vector predictor (MVP) mode, a reference picture index, and a motion vector.

In inter prediction, the decoding apparatus and the encoding apparatus may use motion information of neighboring blocks of the current block. The neighboring block includes a spatial block and a temporal block, and motion information of the neighboring block is a motion vector and a reference picture index in the merge mode, and a motion vector in the MVP mode. The motion information for the decoded picture is compressed and stored in the device memory or the DPS.

Meanwhile, in the case of inter layer prediction on the current picture 510, prediction is performed based on information of another layer 520 of the same POC. In the present specification, a method of predicting information of a current layer using information of another layer is referred to as inter-layer prediction for convenience of description.

Information of the current layer that is predicted using information of another layer (ie, predicted by inter-layer prediction) may include texture, motion information, unit information, predetermined parameters (eg, filtering parameters, etc.).

In addition, information of another layer used for prediction for the current layer (ie, used for inter-layer prediction) may include texture, motion information, unit information, and predetermined parameters (eg, filtering parameters).

As an example of inter-layer prediction, in inter-layer unit parameter prediction, unit (CU, PU, and / or TU) information of a base layer is derived and used as unit information of an enhancement layer, or based on unit information of a base layer. Unit information of the treatment layer may be determined.

In addition, the unit information may include information at each unit level. For example, in the case of CU information, information about a partition (CU, PU and / or TU) may include information on transform, information on prediction, and information on coding. In the case of PU information, information on a PU partition and information on prediction (eg, motion information, information on a prediction mode, etc.) may be included. The information about the TU may include information about a TU partition, information on transform (transform coefficient, transform method, etc.).

In addition, the unit information may include only the partition information of the processing unit (eg, CU, PU, TU, etc.).

Interlayer motion prediction, another example of interlayer prediction, is also called interlayer inter prediction. According to inter-layer inter prediction, prediction of a current block of layer 1 (current layer or enhancement layer) may be performed using motion information of layer 0 (reference layer or base layer).

In case of applying inter-layer inter prediction, motion information of a reference layer may be scaled.

As another example of inter-layer prediction, inter-layer texture prediction is also called intra base layer (BL) prediction. Inter layer texture prediction may be applied when a reference block in a reference layer is reconstructed by intra prediction.

In texture prediction, the texture of the reference block in the reference layer may be used as a prediction value for the current block of the enhancement layer. In this case, the texture of the reference block may be scaled by upsampling.

In another example of inter-layer prediction, inter-layer parameter prediction may derive a parameter used in the base layer to reuse it in the enhancement layer or predict a parameter for the enhancement layer based on the parameter used in the base layer.

As an example of interlayer prediction, interlayer texture prediction, interlayer motion prediction, interlayer unit information prediction, and interlayer parameter prediction have been described. However, the interlayer prediction applicable to the present invention is not limited thereto.

For example, the prediction unit may use inter-layer residual prediction, which predicts the residual of the current layer using residual information of another layer as inter-layer prediction, and performs prediction on the current block in the current layer based on the residual layer.

Meanwhile, motion information necessary to perform inter prediction is stored in a memory after a decoding process of a picture. In the present specification, the decoding process may mean a process of reconstructing an image by using the generated prediction block and the residual block. In addition, the decoding process may include filtering of the reconstructed image.

Therefore, when decoding of the current picture is completed, motion information for the current picture is stored to be used for prediction of another picture, and the motion information may be stored for each motion data storage unit.

When a minimum unit for setting motion information is expressed as a motion information block, one motion information storage unit may be configured of a plurality of motion information blocks.

The motion information for the plurality of motion information blocks may be stored as one representative value for each motion information storage unit. That is, the representative value set for each moved information storage unit is used as motion information of the motion information block belonging to the motion information storage unit.

The representative value may be an average value of the motion information, that is, an average value of the motion vector and a minimum value of the reference picture index. Alternatively, the representative value may be motion information of a specific motion information block. For example, the representative value may be motion information of a motion information block located at the upper left of the motion information storage unit.

This is to reduce the memory for storing the motion information, and the size of the block serving as the motion information storage unit may be preset or signaled from the encoder.

FIG. 6 exemplarily illustrates a method of compressing and storing motion information used for inter prediction. In FIG. 6, a case in which a unit composed of 16 motion information blocks B0 to B15 is used as one motion information storage unit is described as an example. For convenience of description, in FIG. 6, the motion vector of each motion information block BI (I = 0, ..., 15) is specified as (XI, YI).

The decoding apparatus may use a motion vector (X0, Y0) of the motion information block B0 at the upper left in the motion information storage unit shown in FIG. 6 as a representative value for the motion information storage unit 600. That is, motion vector values of (X0, Y0) are also assigned to the other 15 motion information blocks B1 to B15.

In other words, when the first motion information block B0 in the motion information storage unit 600 is not intra coded, the motion vectors X0 and YO of the motion information block B0 are assigned to the sixteen motion information blocks B0 to B15. As a representative motion vector for, it is stored in a memory (DPS). When the motion information block B0 is intra coded, the motion vector (0, 0) is stored in the motion vector buffer as a motion vector value representing the motion information storage unit 600.

6 illustrates storage of a motion vector as an example of motion information. As described above, motion information including a reference index and a motion vector may also be compressed and stored as a predetermined representative value for a plurality of blocks. In addition, the motion information may be compressed and stored as a predetermined representative value, as described above, information about whether the inter prediction or intra prediction is used or the prediction mode used for the inter prediction.

On the other hand, in the case of inter-layer prediction, since the enhancement layer is predicted and decoded based on the information of the base layer, the prediction and decoding of the base layer may be performed before the enhancement layer prediction. In this case, a method of storing motion information for each layer will be described.

7 is a control flowchart illustrating a method of compressing motion information used for inter-layer prediction according to the present invention. In the present embodiment, the video decoding apparatus 300 of FIG. 3 supporting scalability will be described as an example.

First, the prediction unit 375 performs prediction on the base rare picture (S701).

The prediction unit 375 may generate the prediction block for the current block based on the prediction block generation related information transmitted from the entropy decoding unit 355 and the previously decoded block and / or picture information provided in the memory 385. have. Thereafter, the enhancement layer is predicted based on the motion information of the base layer (S702).

In case of applying inter-layer prediction, prediction of the current block of the current layer may be performed using motion information of the reference layer, or prediction of the current block of the current layer may be performed using the texture of the base layer. have.

For convenience of description, interlayer prediction using motion information of a reference layer is called interlayer motion prediction, and interlayer prediction using texture of a reference layer is called interlayer texture prediction.

The motion information of the reference layer used for inter-layer motion prediction may be a motion vector and / or a reference picture index of a block in a reference layer corresponding to the current block. The texture of the reference layer used for inter layer texture prediction may be a reconstructed texture of the block in the reference layer corresponding to the current block.

According to the present embodiment, since the motion information of the base layer that can be used for prediction and decoding of the enhancement layer is uncompressed information, the accuracy of prediction can be improved. As a result, the image quality of the enhancement layer to be reconstructed is improved, so that an image close to the original image can be restored.

As described above, when prediction and decoding of all layers in the access unit are completed, motion information of the base layer and the enhancement layer is compressed (S703).

In this case, the access unit refers to a layer displayed at the same time. Through this, the performance of inter-layer prediction can be improved while reducing the stored data.

According to another embodiment of the present invention, even when the motion information of the base layer is not used when the enhancement layer is predicted, the motion information of the base layer and the enhancement layer may be compressed and stored for each access unit. That is, even when inter layer prediction is not performed, motion information about layers that may be displayed at a specific time point may be compressed at the time when prediction for all layers is completed.

The motion information may be compressed and stored as a representative value for a block of a predetermined size as described in FIG. 6.

The above description may be equally applied to a process of performing motion information of a base layer for inter prediction on an enhancement layer in the video encoding apparatus 100, and the motion information of the base layer may be predicted and / or decoded from the enhancement layer. After this is done it can be stored compressed.

As described above, the motion information may be compressed and stored in an information storage unit having a specific size. According to the present embodiment, the information storage unit in which the motion information of the enhancement layer is stored may be scaled according to the resolution ratio between the base layer and the enhancement layer, and may be extended as shown.

For example, when the resolution of the enhancement layer is a times the resolution of the base layer and the information storage unit 810 of the base layer is an N × N sample block, the information storage unit 820 of the enhancement layer may be set to an aNxaN sample block. have. The motion vector (X0, Y0) of the uppermost information block B0 in the aNxaN sample block may be used as a representative value for the information storage unit 820. According to the present exemplary embodiment, the motion information of the layer may be stored by reflecting the resolution ratio between layers, thereby increasing data storage efficiency.

The motion information of the base layer and the enhancement layer may be compressed and stored at once. In addition, only one of the two may be compressed for the motion information of the base layer and the motion information of the enhancement layer. When storing the motion information of the enhancement layer having a high resolution, the motion information for the base layer may be obtained by down sampling the stored motion information. On the contrary, when the motion information of the base layer having low resolution is stored, the motion information of the enhancement layer may be grasped by up-sampling the motion information of the base layer.

In the present specification, for convenience of description, an array of samples reconstructed at a specific time point (for example, a picture order count (POC) or an access unit (AU)) for each layer in a multi-layer structure in which scalable video coding is supported is referred to as a 'picture. '

In this regard, the entire sample array reconstructed or reconstructed at a specific time in the decoded and output layer (current layer) may be called a picture and may be distinguished from the reconstructed or reconstructed sample array of the referenced layer. The sample array reconstructed or reconstructed at a specific time point in the referenced layer may be referred to as a representation, a reference layer picture, a reference layer sample array, a reference layer texture, or the like. In this case, one decoded picture reconstructed in the current layer may be output for one AU.

In the exemplary system described above, the methods are described based on a flowchart as a series of steps or blocks, but the invention is not limited to the order of steps, and certain steps may occur in a different order or concurrently with other steps than those described above. Can be. In addition, since the above-described embodiments may include examples of various aspects, a combination of each embodiment should also be understood as an embodiment of the present invention. Accordingly, it is intended that the present invention cover all other replacements, modifications and variations that fall within the scope of the following claims.

Claims

Performing prediction on the base layer;

Performing prediction of an enhancement layer based on the motion information of the base layer;

And compressing the motion information of the base layer and the motion information of the enhancement layer for each access unit.
A first predictor for predicting the base layer;

A second predictor configured to predict an enhancement layer based on the motion information of the base layer;

And a memory for compressing the motion information of the base layer and the motion information of the enhancement layer for each access unit.
Performing prediction on the base layer;

Performing prediction of an enhancement layer based on the motion information of the base layer;

And compressing the motion information of the base layer and the motion information of the enhancement layer for each access unit.
A first predictor for predicting the base layer;

A second predictor configured to predict an enhancement layer based on the motion information of the base layer;

And a memory for compressing the motion information of the base layer and the motion information of the enhancement layer for each access unit.