WO2013000575A1 - Procédés et dispositifs pour un codage vidéo extensible - Google Patents
Procédés et dispositifs pour un codage vidéo extensible Download PDFInfo
- Publication number
- WO2013000575A1 WO2013000575A1 PCT/EP2012/002718 EP2012002718W WO2013000575A1 WO 2013000575 A1 WO2013000575 A1 WO 2013000575A1 EP 2012002718 W EP2012002718 W EP 2012002718W WO 2013000575 A1 WO2013000575 A1 WO 2013000575A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- dct
- video data
- coefficients
- block
- resolution
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 175
- 238000009826 distribution Methods 0.000 claims description 145
- 238000013139 quantization Methods 0.000 claims description 64
- 230000008569 process Effects 0.000 claims description 37
- 230000002123 temporal effect Effects 0.000 claims description 33
- 238000007906 compression Methods 0.000 claims description 27
- 230000006835 compression Effects 0.000 claims description 27
- 229920006395 saturated elastomer Polymers 0.000 claims description 11
- 230000009466 transformation Effects 0.000 claims description 5
- 239000010410 layer Substances 0.000 description 119
- 208000037170 Delayed Emergence from Anesthesia Diseases 0.000 description 48
- 230000006870 function Effects 0.000 description 31
- 230000015654 memory Effects 0.000 description 21
- 238000012545 processing Methods 0.000 description 20
- 230000001131 transforming effect Effects 0.000 description 18
- 238000004422 calculation algorithm Methods 0.000 description 15
- 230000008901 benefit Effects 0.000 description 13
- 238000013459 approach Methods 0.000 description 12
- 238000005070 sampling Methods 0.000 description 12
- 230000003247 decreasing effect Effects 0.000 description 11
- 230000009467 reduction Effects 0.000 description 8
- 230000006872 improvement Effects 0.000 description 7
- 238000004364 calculation method Methods 0.000 description 6
- 238000004891 communication Methods 0.000 description 6
- 230000007423 decrease Effects 0.000 description 6
- 238000009472 formulation Methods 0.000 description 6
- 239000000203 mixture Substances 0.000 description 6
- 239000013598 vector Substances 0.000 description 6
- 238000012804 iterative process Methods 0.000 description 4
- 230000005540 biological transmission Effects 0.000 description 3
- 238000012360 testing method Methods 0.000 description 3
- 230000003466 anti-cipated effect Effects 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 230000001419 dependent effect Effects 0.000 description 2
- 238000013461 design Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 238000001914 filtration Methods 0.000 description 2
- 230000007246 mechanism Effects 0.000 description 2
- 238000010606 normalization Methods 0.000 description 2
- 238000005457 optimization Methods 0.000 description 2
- 238000005192 partition Methods 0.000 description 2
- 230000002093 peripheral effect Effects 0.000 description 2
- 238000012805 post-processing Methods 0.000 description 2
- 230000006978 adaptation Effects 0.000 description 1
- 230000003044 adaptive effect Effects 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 238000004590 computer program Methods 0.000 description 1
- 239000012141 concentrate Substances 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 230000008602 contraction Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000003292 diminished effect Effects 0.000 description 1
- 230000009977 dual effect Effects 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 239000011229 interlayer Substances 0.000 description 1
- 230000007774 longterm Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000000638 solvent extraction Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
- 230000003936 working memory Effects 0.000 description 1
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/30—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using hierarchical techniques, e.g. scalability
- H04N19/33—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using hierarchical techniques, e.g. scalability in the spatial domain
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/102—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
- H04N19/124—Quantisation
- H04N19/126—Details of normalisation or weighting functions, e.g. normalisation matrices or variable uniform quantisers
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/102—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
- H04N19/13—Adaptive entropy coding, e.g. adaptive variable length coding [AVLC] or context adaptive binary arithmetic coding [CABAC]
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/134—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
- H04N19/146—Data rate or code amount at the encoder output
- H04N19/147—Data rate or code amount at the encoder output according to rate distortion criteria
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/169—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
- H04N19/18—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being a set of transform coefficients
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/169—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
- H04N19/187—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being a scalable video layer
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/60—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding
- H04N19/61—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding in combination with predictive coding
Definitions
- the present invention concerns methods for encoding and decoding an image, and associated apparatuses and programs.
- the invention is particularly useful for the encoding of digital video sequences made of images or "frames".
- Video compression algorithms such as those standardized by the standardization organizations ITU, ISO, and SMPTE, exploit the spatial and temporal redundancies of images in order to generate bitstreams of data of smaller size than original video sequences.
- These powerful video compression tools known as spatial (or intra) and temporal (or inter) predictions, make the transmission and/or the storage of video sequences more efficient.
- HEVC High Efficiency Video Coding
- the HEVC codec design is similar to that of most previous so-called block- based hybrid transform codecs such as H.263, H.264, MPEG-1 , MPEG-2, MPEG-4, SVC.
- the HEVC codec uses the ' spatial and temporal redundancies of the images in order to generate data bit streams of reduced size compared with these video sequences. Such compressions make the transmission and/or storage of the video sequences more effective.
- Video encoders and/or decoders are often embedded in portable devices with limited resources, such as cameras or camcorders.
- Conventional embedded codecs can process at best high definition (HD) digital videos, i.e 1080x1920 pixel frames.
- Real time encoding and decoding are however limited by the limited resources of the portable devices, especially regarding slow access to the working memory (e.g. random access memory, or RAM) and regarding the central processing unit (CPU).
- working memory e.g. random access memory, or RAM
- CPU central processing unit
- UHD is typically four times (4k2k pixels) the definition of an HD video which is the current standard definition video. Furthermore, very ultra high definition, which is sixteen times that definition (i.e. 8k4k pixels), is even being considered in a more long- term future.
- a method of encoding video data comprising:
- decoding the base-layer video data in conformity with HEVC upsampling the decoded base-layer video data to generate decoded video data having a second resolution higher than said first resolution, forming a difference between the generated decoded video data having said second resolution and further video data, having said second resolution and corresponding to said first-resolution video data, to generate data of a residual image, and compressing the residual-image data to generate video data of an enhancement layer.
- the compression of the residual-image data does not involve temporal prediction.
- the compression of the residual-image data does not involve spatial prediction.
- the compression of the residual data employs a method embodying one or more of the 5 th , 9 th , 11 th and 15 th aspects of the present invention described hereinafter.
- the compression of the residual-image data comprises applying a discrete cosine transformation (DCT) to obtain DCT coefficients.
- DCT discrete cosine transformation
- the method further comprises employing a parametric probabilistic model of the DCT coefficients.
- a parametric probabilistic model is preferably obtained for each type of DCT coefficient.
- the DCT may be a block-based DCT, in which case such a parametric probabilistic model is preferably obtained for each DCT coefficient position within a DCT block.
- the method may further comprise fitting the parametric probabilistic model onto respective collocated DCT coefficients of some or all DCT blocks of the residual image.
- the compression of the residual-image data comprises employing the parametric probabilistic model to choose quantizers from a pool of available quantizers and quantizing the residual data using the chosen quantizers.
- the available quantizers of said pool are preferably pre-computed quantizers dedicated to a parametric probabilistic model.
- the compression of the residual-image data comprises entropy encoding of quantized symbols and employing a parametric probabilistic model to obtain a probabilistic distribution of possible symbols of an alphabet associated with each DCT coefficient, which alphabet is used for the entropy encoding.
- the same parametric probabilistic model is preferably employed both for choosing quantizers and for entropy encoding.
- the parametric probabilistic model is a Generalised Gaussian Distribution GGD(a, ⁇ ) having a zero mean. This requires only two parameters, which makes it possible to supply the parameters to the decoder with low overhead.
- parameters of the parametric probabilistic model are determined in dependence upon one or more of:
- information about parameters of the parametric probabilistic model is supplied to a decoder.
- information about a set of saturated coefficients determined at the encoder to minimize a rate is supplied to a decoder.
- information about the chosen quantizers is supplied to a decoder.
- the quantizers are chosen based on a rate-distortion criterion.
- decoding in conformity with HEVC, encoded video data of a base layer of the bitstream to obtain video data having a first resolution, and upsampling the first- resolution video data to generate video data having a second resolution higher than said first resolution;
- decoding compressed video data of an enhancement layer of the bitstream to obtain data of a residual image and forming a sum of the generated second-resolution video data and the residual-image data to generate decoded video data having said second resolution.
- the decoding of the compressed video data of the enhancement layer does not involve temporal prediction.
- the decoding of the compressed video data of the enhancement layer does not involve spatial prediction.
- the decoding of the compressed video data of the enhancement layer employs a method embodying one or more of the 2 nd , 6 th , 13 th , 17 th and 20 th aspects of the present invention described hereinafter.
- the compressed video data of the enhancement layer comprises encoded discrete cosine transformed (DCT) coefficients.
- DCT discrete cosine transformed
- the method further comprises employing a parametric probabilistic model of the DCT coefficients.
- such a parametric probabilistic model is obtained for each type of DCT coefficient.
- the DCT may be a block-based DCT, in which case such a parametric probabilistic model is obtained for each DCT coefficient position within a DCT block.
- the decoding of the compressed video data of the enhancement layer comprises employing the parametric probabilistic model to choose quantizers from a pool of available quantizers and using the chosen quantizers for inverse quantization of the encoded DCT coefficients.
- the available quantizers of said pool are pre-computed quantizers dedicated to a parametric probabilistic model.
- the decoding of the compressed video data of the enhancement layer comprises entropy decoding of encoded quantized symbols obtained from the compressed video data to generate quantized DCT coefficients and employing a parametric probabilistic model to obtain a probabilistic distribution of possible symbols of an alphabet associated with each DCT coefficient, which alphabet is used for the entropy decoding.
- the same parametric probabilistic model is preferably employed both for choosing quantizers and for entropy decoding.
- the parametric probabilistic model is a Generalised Gaussian Distribution GGD(a, ⁇ ) having a zero mean.
- GGD(a, ⁇ ) Generalised Gaussian Distribution
- Information about parameters of the parametric probabilistic model can be received from an encoder.
- information about a set of saturated coefficients determined at an encoder to minimize a rate is received from the encoder.
- information about the chosen quantizers is received from an encoder.
- the quantizers are preferably chosen based on a rate-distortion criterion.
- apparatus for encoding video data comprising:
- the means for compressing the residual data comprises apparatus embodying one or more of the 3 rd , 7 th , 10 th , 12 th and 16 th aspects of the present invention described hereinafter.
- apparatus for decoding a scalable bitstream comprising: means for decoding, in conformity with HEVC, encoded video data of a base layer of the bitstream to obtain video data having a first resolution;
- the means for decoding of the compressed video data of the enhancement layer comprises apparatus embodying one or more of the 4 th , 8 th , 14 th , 18 th and 21 st aspects of the present invention described hereinafter.
- 5 th to 8 th aspects of the present invention may be used for the encoding of any image, regardless of the enhancement nature of the image, and furthermore regardless of its resolution.
- the features of the 5 th to 8 th aspects of the invention may be provided in combination with the features of the 1 st to 4 th aspects but this is not essential and it is possible to use the features of the 5 th to 8 th aspects independently of the features of the 1 st to 4 th aspects.
- a method for encoding at least one block of pixels comprising:
- the step of transforming pixel values corresponds for instance to a transformation from the spatial domain (pixels) to the frequency domain (e.g. into coefficients each corresponding to a specific spatial frequency).
- the transforming step includes applying a block based Discrete Cosine Transform; each of said coefficient types may then correspond to a respective coefficient index.or position
- Said estimated quality parameter depends for instance on the sum of the squared estimated distortions associated with quantizers of the set. Such a parameter is linked to the PSNR as explained below and is thus of particular interest.
- the method comprises a step of computing said pixel values by subtracting values obtained by decoding a base layer to values representing pixels of an image.
- the pixel values to be encoded thus correspond to residual values and form, once encoded, an enhancement layer.
- a set of quantizers may be determined for each of said two encoding modes. As explained below, the optimal quantizer may vary greatly depending on the encoding mode concerned.
- the step of determining the set of quantizers includes selecting, among optimal quantizers each associated with a rate and corresponding distortion, an optimal quantizer for each coefficient type such that the sum of the rates associated with selected quantizers is minimal and the estimated quality parameter resulting from the distortions associated with said selected quantizers corresponds to a predetermined target. Selection among optimal quantizers is an interesting practical solution to perform the necessary optimisation.
- said selecting the optimal quantizers for each coefficient type may for instance be performed by minimisation under Karush-Kuhn-Tucker necessary conditions, possibly by a fixed point algorithm.
- the method may also include, for at least one coefficient type, a step of determining a probabilistic model for coefficients of said at least one coefficient type based on a plurality of values of coefficients of said at least one coefficient type, wherein optimal quantizers, among which the optimal quantizer for said at least one coefficient type is selected, are determined based on said probabilistic model.
- the optimisation process may thus be performed based on the probabilistic model identified for the concemed coefficient type, which is a convenient way to take into account effective values of the coefficient in the optimisation process.
- Said probabilistic model is for instance one of a plurality of probabilistic models each defined by model parameters; parameters defining optimal quantizers may be stored in association with corresponding model parameters. Optimal quantizers are thus tabulated depending on the various probabilistic models that may be encountered. Distinct probabilistic models may be determined for the plurality of coefficient types. This improves flexibility in the optimisation process.
- optimal quantizers associated with a rate and distortion, and among which the optimal quantizer for said at least one coefficient type is selected, are determined off-line, i.e. in an apparatus possibly distinct from the encoding device and prior to the step of transforming pixel values. Parameters defining the resulting optimal quantizers may then be stored as mentioned above in the encoding device in order to perform the optimisation process.
- a quantizer is for instance defined by at least one parameter taken among a number of quantization intervals, a limit value for a quantization interval and a centroid of a quantization interval.
- Quantization and encoding may be performed for each and every coefficients resulting from the transforming step.
- the method comprises a step of determining, for each coefficient type, an estimated value representative of a ratio between a distortion variation (e.g. here decrease) provided by encoding a coefficient having the concemed type and a rate increase resulting from encoding said coefficient; the step of quantizing coefficients may then be applied only to a subset of said plurality of coefficients (i.e. the coefficients subjected to the quantization step form such a subset) and the estimated values for coefficient types of coefficients in said subset may then be larger than the highest estimated value over coefficient types of coefficients not included in said subset.
- a distortion variation e.g. here decrease
- the step of determining the set of quantizers may include a step of determining sets of quantizers each associated with a possible subset of said plurality of coefficients and a step of selecting, among said sets of quantizers, a set for which the sum of rates each associated with a quantizer of the set is minimal, thus selecting an associated subset.
- the number of coefficients to be quantized and encoded is thus determined during the optimisation process.
- a plurality of possible subsets are considered; however, as the coefficient types are ordered by decreasing encoding merit, only N+1 subsets need be considered if N is the total number of coefficients.
- a method for decoding a set of data representing at least one block of pixels comprising:
- the step of determining the set of quantizers may include estimating distortion for a coefficient type based on a parameter defining a distribution of the values of coefficients of said coefficient type and included in the set of data.
- apparatus for encoding at least one block of pixels comprising:
- apparatus for decoding a set of data representing at least one block of pixels comprising:
- 9 th and 10 th aspects of the present invention may be used for the encoding of any image, regardless of the enhancement nature of the image, and furthermore regardless of its resolution.
- the features of the 9 th and 10 th aspects of the invention may be provided in combination with the features of the 1 st to 4 th and/or 5 th to 8 th aspects but this is not essential and it is possible to use the features of the 9 th and 10 th aspects independently of the features of the 1 st to 4 th aspects and of the 5 th to 8 th aspects.
- a method for encoding at least one block of pixels comprising: - transforming pixel values for said block into a set of coefficients each having a coefficient type;
- the estimated value or ratio for each coefficient type it is possible to order the various coefficient types by decreasing estimated value, i.e. by decreasing merit of encoding as explained below, and the encoding process may be applied only to coefficients having the higher values (i.e. ratios), forming the subset defined above.
- Said distortion variation is for instance provided when no prior encoding has occurred for the coefficient having the concerned type. This amounts to taking into account the initial merit which makes it possible to keep an optimal result, as shown below.
- the method may include a step of computing said pixel values by subtracting values obtained by decoding a base layer to values representing pixels of an image.
- the pixel values are for instance representative of residual data to be encoded into an enhancement layer.
- the estimated value for the concerned coefficient type may be computed depending on a coding mode of a corresponding block in the base layer (i.e. for each of a plurality of such coding modes).
- the number of coefficients to be quantized and encoded is thus determined during the optimisation process.
- a plurality of possible subsets are considered; however, as the coefficient types are ordered by decreasing encoding merit, only N+1 subsets need be considered if N is the total number of coefficients.
- a quantizer is selected for each of a plurality of coefficient types, for each of a plurality of block sizes and for each of a plurality of base layer coding mode.
- the step of selecting quantizers may be performed by selecting, among optimal quantizers each associated with a rate and corresponding distortion, an optimal quantizer for each coefficient type associated with the possible subset concerned, such that the sum of the rates associated with selected quantizers is minimal and the global distortion resulting from the distortions associated with said selected quantizers corresponds to a predetermined distortion.
- an optimal quantizer for each coefficient type associated with the possible subset concerned such that the sum of the rates associated with selected quantizers is minimal and the global distortion resulting from the distortions associated with said selected quantizers corresponds to a predetermined distortion.
- an optimal quantizer may ne selected for each of a plurality of block sizes and for each of a plurality of base layer coding mode.
- the method may further include, for at least one coefficient type, a step of determining a probabilistic model for coefficients of said at least one coefficient type based on a plurality of values of coefficients of said at least one coefficient type, wherein said estimated value for said at least one coefficient type is computed based on said probabilistic model. Ordering of coefficients according to their encoding merit may thus be performed based on the probabilistic model identified for the various coefficient types, which is a convenient way to take into account effective values of the coefficient in the process.
- Said estimated value for a given coefficient type may in practice be computed using a derivative of a function associating rate and distortion of optimal quantizers for said coefficient type.
- rate and distortion of optimal quantizers are for example stored for a great number of possible values of the various parameters, as explained below. This allows a practical implementation.
- said estimated value for a given coefficient type may be
- the step of transforming pixel values corresponds for instance to a transformation from the spatial domain (pixels) to the frequency domain (e.g. into coefficients each corresponding to a specific spatial frequency).
- the transforming step includes for instance applying a block based Discrete Cosine Transform; each of said coefficient types may then correspond to a respective coefficient index.
- apparatus for encoding at least one block of pixels comprising:
- 1 1 th to 14 th aspects of the present invention may be used for the encoding of any image, regardless of the enhancement nature of the image, and furthermore regardless of its resolution.
- the features of the 11 th to 14 th aspects of the invention may be provided in combination with the features of the 1 st to 4 th and/or 5 th to 8 th and/or 9 th and 10 th aspects but this is not essential and it is possible to use the features of the 11 th to 14 th aspects independently of the features of the 1 st to 4 th aspects and of the 5 th to 8 th aspects and of the 9 th and 10 th aspects.
- a method for encoding a first image comprising blocks of pixels comprising:
- This aspect of the invention is applicable to any block-based video codec for which a probabilistic distribution of transformed coefficients, such as DCT coefficients, is available.
- the probabilistic distribution of the corresponding quantized symbols may then be directly obtained, based on parameters of the quantization (i.e. of the used quantizer).
- the mean length (i.e. entropy) of the used binary codes is substantially improved compared to the conventional approaches, and is made closer to the theoretical entropy of the probabilistic distributions. Better compression of the image is thus obtained.
- the grouping of alphabets is based on the probabilistic distributions of the symbols, enabling efficient grouping of the alphabets to improve the mean length of the binary codes used. This is because, as will become apparent below, the probabilistic distributions make it possible to distinguish the alphabets having small entropy and those having high entropy, to adapt the grouping. An alphabet with small entropy (i.e. generally having the worst mean overhead) may then be "hidden" into a new alphabet with an alphabet having high entropy (which has a low mean overhead).
- apparatus for encoding a first image comprising blocks of pixels comprising:
- an alphabet generator for generating alphabets of symbols, each alphabet defining the possible symbols of an associated transformed block coefficient due to the quantization
- a grouping module configured to group, using the probabilistic distributions, at least two alphabets into a new alphabet, the new alphabet defining, as symbols, the combinations of possible symbols of the at least two transformed block coefficients associated with the grouped alphabets;
- an encoding module for encoding the quantized symbols using the obtained binary codes.
- a method for decoding a data bit-stream of an encoded first image comprising: - obtaining, from the bit-stream, blocks of encoded symbols corresponding to quantized block coefficients;
- apparatus for decoding a data bit-stream of an encoded first image comprising:
- parser for obtaining, from the bit-stream, blocks of encoded symbols corresponding to quantized block coefficients
- an alphabet generator for generating alphabets of symbols, each alphabet defining the possible symbols of an associated quantized block coefficient
- a grouping module configured to group, using the probabilistic distributions, at least two alphabets into a new alphabet, the new alphabet defining, as symbols, the combinations of possible symbols of the at least two quantized block coefficients associated with the grouped alphabets;
- the encoding/decoding apparatuses and the decoding method may have features and advantages that are analogous to those set out above and below in relation to the encoding method embodying the 11 th aspect, in particular that of improving the mean length of the binary codes used.
- the encoding method may further comprise determining an entropy value for each of the alphabets, and the grouping of at least two alphabets comprises grouping the alphabet having the smallest entropy with the alphabet having the highest entropy.
- the grouping can then be achieved with few computations, since the entropy values can be determined directly from the probabilistic distribution of the corresponding symbols.
- the basic idea is that the alphabets with the largest overhead are most of the time the alphabets with the smallest entropy.
- the largest probability is bounded by the largest probability of the alphabet with high entropy. So, it is ensured that no high probability remains after grouping and the coding overhead is thus reduced.
- the grouping of at least two alphabets comprises grouping two alphabets which maximizes a reduction in distance between the entropy of the alphabets and the entropy of the binary codes constructed from the alphabets.
- the entropy of the binary codes may also be seen as the effective mean length of those codes.
- the reduction may be understood as a difference between the above- defined distance when using both alphabets before grouping, and the above-defined distance when using the resulting new alphabet.
- each distance relating to an alphabet is weighted by the entropy of that alphabet.
- This strategy gives priority to grouping alphabets having small entropy rather than alphabets with larger entropy, when those alphabets have binary codes with similar distances to the entropy.
- the entropy of the binary codes constructed from the alphabets is modelled by the entropy of a corresponding Shannon code.
- Huffman codes are generated using Huffman code.
- entropy (or mean length) of Huffman codes cannot be anticipated without constructing the Huffman tree itself, which is a very costly and not practical operation, in particular when an iterative grouping process is implemented.
- the method further comprises iteratively grouping alphabets using the probabilistic distributions, wherein the probabilistic distribution of a new alphabet is calculated based on the probabilistic distributions of the at least two alphabets involved in the corresponding grouping.
- This iterative process makes it possible to group alphabets several times, thus further reducing the mean coding overhead per symbol.
- the encoding of the first image is therefore made more efficient in terms of bitrate.
- alphabets are iteratively grouped as long as a resulting new alphabet comprises less symbols than a predefined number.
- Such limit of the number of symbols is defined to ensure that the associated Huffman trees (or equivalent) are small enough to be handled in real time by the codec. Indeed, an alphabet with a very large number of symbols will require a lot of memory to generate a corresponding Huffman tree.
- the step of grouping groups at least two alphabets associated with at least two transformed block coefficients of the same block of pixels.
- This strategy defines an intra-block grouping.
- Grouping within a block maintains the random spatial accessibility of the encoding stream with a precision of one block, in particular when temporal and spatial predictions are disabled.
- the same grouping of alphabets is performed for a plurality of blocks within the first image.
- This plurality may comprise all the blocks of the first image, or a set of blocks that are collocated with base layer blocks encoded with the same coding mode (e.g. Intra prediction, P image Inter prediction, B image Inter prediction, Skip mode).
- the same coding mode e.g. Intra prediction, P image Inter prediction, B image Inter prediction, Skip mode.
- such blocks often use the same quantizations, and then the same alphabets for their symbols to encode.
- the at least two transformed block coefficients are adjacent coefficients within the same block.
- the encoding method further comprises grouping at least two alphabets associated with transformed block coefficients collocated in at least two blocks of pixels.
- those grouped alphabets may be alphabets already grouped by the intra-block grouping (i.e. representing several block coefficients with a block).
- This strategy defines an inter-block grouping.
- the at least two blocks of pixels are from the same macroblock dividing the first image.
- the non-dependency from macroblock to macroblock also enables massive parallelization of the encoding and decoding processes, as well as low memory bandwidth consumption, since all encoding/decoding operations can be performed locally in each macroblock, independently from the other macroblocks.
- a new alphabet resulting from the grouping of at least two alphabets replaces the at least two grouped alphabets.
- the probabilistic distribution is the same for the alphabets associated with transformed block coefficients collocated within a plurality of the blocks of the first image.
- the plurality may comprise all the blocks of the first image, but also a part of it (e.g. related to the same base coding mode).
- the obtaining of the probabilistic distribution for an alphabet associated with a transformed block coefficient comprises fitting a Generalized Gaussian Distribution (GGD) model onto the transformed block coefficients collocated with said transformed block coefficient within said plurality of blocks.
- the probabilistic distribution for the alphabet can then be directly deduced from the fitted GGD, using the quantization parameters.
- GGD Generalized Gaussian Distribution
- the encoding method further comprises, prior to grouping, discarding alphabets having associated entropy or a size (in terms of number of symbols) less than a predefined value.
- the obtaining of binary codes comprises obtaining a Huffman code from each of the remaining alphabets and from their associated probabilistic distributions.
- the encoding method further comprises encoding a low resolution version of the first image into an encoded low resolution image; generating a residual enhancement image by subtracting an upsampled decoded version of the encoded low resolution image from said first image; and the step of transforming is applied to the blocks of pixels forming the residual enhancement image.
- the main steps of the encoding method i.e. the transforming of blocks, the quantizing of coefficients, the obtaining of alphabets, the obtaining of probabilistic distributions, the grouping of alphabets, the obtaining of binary codes and the encoding of the symbols
- the main steps of the encoding method i.e. the transforming of blocks, the quantizing of coefficients, the obtaining of alphabets, the obtaining of probabilistic distributions, the grouping of alphabets, the obtaining of binary codes and the encoding of the symbols
- the invention drastically reduces the complexity of encoding that residual enhancement image compared to conventional encoding schemes, while keeping a good compression ratio if the temporal and spatial predictions are disabled for that encoding.
- the alphabet associated with a transformed block coefficient of a block depends on a coding mode of the block collocated with that block in the encoded low resolution image.
- the coding mode of the corresponding blocks in the base layer enables fitting a (GGD) distribution to the effective values taken by various transformed block coefficients (DCT coefficients) with more accuracy. Therefore, as the probabilities of symbols are closer to reality, the entropy coding thus gets closer to the theoretical entropy of the data source to code.
- the same grouping of alphabets is performed for the blocks within the residual enhancement image that are collocated with blocks encoded with the same encoding mode in the encoded low resolution image.
- the decoding method may further comprise, for each alphabet of symbol, obtaining parameters from the bit-stream and applying these parameters to a probabilistic distribution model to obtain the probabilistic distribution of the possible symbols defined in said alphabet. By transmitting only parameters, bitrate of the bit-stream can be saved.
- 15 th to 19 th aspects of the present invention may be used for the encoding of any image, regardless of the enhancement nature of the image, and furthermore regardless of its resolution.
- the features of the 15 th to 19 th aspects of the invention may be provided in combination with the features of the 1 st to 4 th and/or 5 th to 8 th and/or 9 th and 10 th and/or 11 th to 14 th aspects but this is not essential and it is possible to use the features of the 15 th to 19 th aspects independently of the features of the 1 st to 4 th aspects and of the 5 th to 8 th aspects and of the 9 th and 10 th aspects and of the 11 th to 14 th aspects.
- a method for encoding a first image comprising blocks of pixels comprising:
- the mean length (i.e. entropy) of the binary codes used is substantially improved compared to the conventional approaches, and is made closer to the theoretical entropy of the probabilistic distributions. Better compression of the image is thus obtained.
- the overhead resulting from the encoding of the flag may be substantially limited in order to obtain, thanks to the present aspect of the invention, an overall gain in compression of about 5%.
- the flag according to the present aspect of the invention is furthermore used to restrict the alphabets in order to further decrease the entropy of the generated binary codes.
- the encoding apparatus comprising:
- an alphabet generator for generating alphabets of values, each defining the possible values of an associated quantized coefficient
- a flag generator for associating a respective flag with each block of quantized coefficients, said flag specifying a magnitude
- an alphabet restriction means for restricting alphabets corresponding to the quantized coefficients of each block, based on the magnitude specified in the flag associated with that block;
- an encoding module for encoding the quantized coefficients using the generated binary codes.
- a method for decoding a data bit-stream of an encoded first image comprising:
- the decoding apparatus comprising:
- an alphabet generator for generating alphabets of values for the encoded quantized coefficients, each alphabet defining the possible values taken by a quantizer associated with an encoded quantized coefficient
- an alphabet restriction means for restricting alphabets corresponding to the encoded quantized coefficients of each block, based on the magnitude specified in the flag associated with that block;
- a binary code generator for generating binary entropy codes for the possible values of each restricted alphabet;
- decoding module for decoding the encoded quantized coefficients using the generated binary codes.
- a bit-stream of an encoded first image comprising:
- binary codes of a block belong to binary entropy codes generated from the alphabets restricted based on the magnitude specified in the flag associated with that block.
- the bitstream may be carried by a carrier medium such as a signal. It may also be recorded on a recording medium.
- the encoding/decoding apparatuses and the decoding method may have features and advantages that are analogous to those set out above and below in relation to the encoding method embodying the 15 th aspect of the present invention, in particular that of improving the mean length of the binary codes used .
- each flag is selected from a set (or alphabet) of flags, each flag of the set having an associated probability of occurrence.
- the set of flags is a restricted set from a larger set of flags. This provision is useful since the number of possible values for the quantized coefficients is very large, thus defining a huge number of possible magnitudes (and therefore flags) for the blocks. Reducing this number to a restricted number enables the flag encoding overhead to be kept as low as possible compared to the compression gain obtained by the restriction of the alphabets as set out above.
- the encoding method may further comprise restricting the larger set of flags based on probabilities of occurrence of the flags of that larger set, a probability of occurrence of a flag corresponding to the probability that a block of quantized coefficients has a maximum magnitude equal to the magnitude specified in the flag.
- restricting the larger set of flags comprises selecting flags of the larger set that define intervals of magnitudes corresponding to a probability of occurrence greater than a predefined amount, for example 10 %.
- the probability of occurrence of an interval corresponds to the probability that a block of quantized coefficients has a maximum magnitude comprised within that interval. It may be determined from the probabilities of occurrences of the flags of the larger set.
- the magnitude specified in a flag associated with a block is determined based on the quantized coefficient values of that block. This ensures restricting the alphabets taking into account how they are used within the considered block.
- the magnitude specified in a flag associated with a block represents a maximum magnitude of the quantized coefficient values of that block. This makes it possible to restrict the alphabets in a way that deletes the values above the maximum magnitude (i.e. unused values).
- the specified maximum magnitude can be taken by one of the quantized coefficient values, but can also be an upper bound as suggested below.
- the larger set comprises at least flags specifying, as magnitude, all the possible values defined in the alphabets. It is then easy to select from this set a flag corresponding to the maximum magnitude of a block.
- restricting the alphabets of a block may comprise deleting, from the alphabets, the possible values that have a magnitude (i.e. absolute value) larger than the magnitude specified in the flag associated with the block.
- d B M B when all the quantized coefficient values a m of the block are zero.
- This value d B defines the minimum amount of possible values that can be deleted from each edge (or extreme) of the intervals of possible values of the alphabets, while keeping a lossless entropy encoding of the quantized coefficient values a m . This is because, due to the above construction of d B , it is sure that the d B most leftward and d B most rightward values of those intervals (i.e. the d B most positive and negative values) are not used in the considered block.
- restricting the alphabets of a block may comprise deleting the d B most negative and d B most positive possible values from each of the alphabets.
- the number of flags in the restricted set is at most three. It appears that the best results are obtained in that situation. As a variant four flags could however be kept.
- one flag in the restricted set is the flag specifying the largest magnitude in the larger set. This ensures that every block can be entropy encoded without loss, because, when this flag is used for a block, the alphabets are not restricted at all when encoding that block.
- one flag in the restricted set specifies a magnitude equal to zero.
- the flags associated with the blocks of the same macrobiock are concatenated into one flag associated with the macrobiock. This ensures a more efficient encoding of that information, based on the grouping of alphabets as described below with more detail.
- the flag associated with the macrobiock may be entropy encoded based on probabilities of occurrence of the possible values for such a macrobiock flag, those possible values resulting from the product of the sets of flags from which the concatenated flags are selected.
- the encoding method further comprises obtaining at least one probability p MB,n (in particular PMB.O) of occurrence that a macrobiock comprises only quantized coefficients having a magnitude less than a given corresponding value n (i.e. a macrobiock with only coefficients equal to zero when considering PMB.O); and the probabilities of occurrence of the possible values for a macrobiock flag depend on said at least one probability p M e,n-
- the sets of flags from which the concatenated flags are selected are the same set restricted from a larger set of flags. Only few probabilities associated with the flags of that restricted set have to be handled in that case.
- the encoding method further comprises encoding a low resolution version of the first image into an encoded low resolution image; generating a residual enhancement image by subtracting an upsampled decoded version of the encoded low resolution image from said first image; and wherein the blocks of quantized coefficients are obtained from the residual enhancement image.
- the present aspect of the invention reduces the complexity of encoding that residual enhancement image compared to conventional encoding schemes, while keeping a good compression ratio if the temporal and spatial predictions are disabled for that encoding.
- the flag associated with a block depends on a coding mode of the block collocated with that block in the encoded low resolution image.
- the coding mode of the corresponding blocks in the base layer enables fitting a (GGD) distribution to the effective values taken by various transformed block coefficients (DCT coefficients) with more accuracy. Therefore, as the probabilities of symbols are closer to reality, the entropy coding thus gets closer to the theoretical entropy of the data source to code.
- the decoding method may further comprise obtaining, from the bit-stream, probabilities associated with each of a set of possible flags; and entropy decoding encoded flags from the bit-stream based on the obtained probabilities to obtain said flag associated with each block.
- restricting the alphabets of a block may comprise deleting, from the alphabets, the possible values that have a magnitude (i.e. absolute value) larger than the magnitude specified in the flag associated with the block.
- restricting the alphabets of a block may comprise deleting the d B most negative and d B most positive possible values from each of the alphabets.
- the number of flags in the set is restricted to three at most.
- one flag in the restricted set specifies a magnitude equal to zero.
- the decoding method further comprises obtaining an encoded macroblock flag associated with a macroblock made of blocks; entropy decoding said encoded macroblock flag based on the obtained probabilities; and splitting the decoded macroblock flag into flags associated with the blocks of the macroblock.
- using such macroblock flags concatenating the flags for the blocks of the macroblock substantially improves the entropy encoding of the flags.
- the decoding method further comprises decoding a first bit-stream to obtain a decoded low resolution version of the first image; obtaining a decoded residual enhancement image from the decoded quantized coefficients; and adding the obtained decoded residual enhancement image to an upsampled version of the decoded low resolution version of the first image, to obtain a decoded first image.
- decoding a first enhancement image may involve using the motion information of the corresponding first base image in order to obtain residual blocks from a "reference" decoded UHD image (temporally corresponding to the decoded reference base image used for predicting the first base image). Such blocks may then be used to correct the enhancement image data directly obtained from the bit-stream.
- temporal prediction information of the base layer to decode the enhancement layer is known from the standard SVC (standing for "Scalable Video Coding').
- SVC Scalable Video Coding'
- the 20 th and 21 st aspects of the invention intend to improve the efficiency of a decoding method based on predicting the enhancement layer. This aims at improving the quality of reconstructed high resolution (e.g. UHD) images, while keeping low complexity at the encoding and decoding sides.
- UHD high resolution
- the features of the 20 th and 21 st aspects of the invention may be provided in combination with the features of the 1 st to 4 th and/or 5 th to 8 th and/or 9 th and 10 th and/or 1 1 th to 14 th and/or 15 th to 19 th aspects but this is not essential and it is possible to use the features of the 20 th and 21 st aspects independently of the features of the 1 st to 4 th aspects and of the 5 th to 8 th aspects and of the 9 th and 10 th aspects and of the 1 1 th to 14 th aspects and of the 15 th to 19 th aspects.
- a method for decoding a scalable video bit-stream comprising decoding a base layer from the bit-stream, decoding an enhancement layer from the bit-stream and adding the enhancement layer to the base layer to obtain a decoded video of high resolution images, wherein decoding the enhancement layer comprises:
- the method for decoding a scalable video bit-stream comprises:
- the decoding of the low resolution version comprising using motion information to temporally predict blocks of a low resolution image from blocks of a decoded reference low resolution image;
- each enhancement image of the enhancement version having a high resolution and temporally corresponding to a low resolution image of the low resolution video
- decoding a first enhancement image temporally corresponding to a first low resolution image comprises:
- the blocks of the encoded enhancement layer obtained from the bit-stream and the residual blocks obtained using the motion information of the base layer are merged together to form parts of the decoded enhancement image.
- this decoded enhancement image is then added to an up-sampled decoded base image to obtain a decoded high resolution (e.g. UHD) image.
- the quality of the decoded high resolution image is improved compared to known techniques. This is due to the use of two probabilistic distributions that model both the original transformed coefficients and an error of temporal prediction, when merging the transformed coefficients (e.g. DCT coefficients).
- the first probabilistic distribution corresponding to the transformed coefficients encoded in the bit-stream may be obtained from the bit-stream itself, for instance from parameters contained therein. These may represent statistical modelling of the original transformed coefficients (i.e. before quantization and encoding).
- the second probabilistic distributions that correspond to the blocks predicted using the motion information of the base layer provide information about the noise of temporal prediction. In particular they provide modelled information on the difference between those predicted coefficients and the transformed coefficients. Since the original transformed coefficients are not known by the decoder, the decoded and dequantized transformed coefficients known at the decoding side are used in place of the original transformed coefficients. The inventors have observed that using those coefficients rather than the original ones provides modelling that is quite close to reality.
- probabilities such as the expected value example below, provide good statistical results.
- the temporally predicted blocks may more often provide relevant information on the original DCT coefficients than the quantization level obtained by the dequantization. For high bitrate, the opposite occurs.
- the invention allows gains up to several dBs in rate-distortion performance at almost no cost of additional complexity at the decoder, and at the cost of zero additional rate when the parameters have already been transmitted.
- the approach according to the present aspect of the invention does not necessarily have to be performed at the decoding side. For example, it may be switched off in case of very low complexity decoders. Further, the encoding is independent of the switching decision.
- apparatus for decoding a scalable video bit-stream comprising a base layer decoder configured to decode a base layer from the bit-stream, an enhancement layer decoder configured to decode an enhancement layer from the bit-stream and a video building unit configured to add the enhancement layer to the base layer to obtain a decoded video, wherein the enhancement layer decoder is further configured to:
- the decoding apparatus comprises:
- a base decoder configured to decode a low resolution version of the video, using motion information to temporally predict blocks of a low resolution image from blocks of a decoded reference low resolution image;
- an enhancement decoder configured to decode an enhancement version of the video, each enhancement image of the enhancement version having a high resolution and temporally corresponding to a low resolution image of the low resolution video;
- an image building unit configured to add each decoded enhancement image to an up-sampled version of the corresponding low resolution image, to obtain a decoded video of decoded high resolution images
- enhancement decoder is further configured to:
- the step of merging may merge a dequantized transformed coefficient with a collocated coefficient in the transformed residual blocks (meaning collocated blocks and collocated coefficients within those blocks), using first and second probabilistic distributions associated with these collocated coefficients, on a quantization interval associated with the value of the corresponding quantized transformed coefficient (i.e. the value before the quantized transformed coefficient is dequantized). This ensures an accurate merged transformed coefficient to be provided, given its quantized value that has been transmitted in the encoded bit-stream.
- the first and second probabilistic distributions are integrated using Riemann sums over that quantization interval during the merging step. This provision makes it possible to perform a probabilistic merger of transformed coefficients, on low complexity decoders.
- the step of merging comprises calculating the expectation of a block coefficient, given the quantization interval associated with the value of the corresponding quantized transformed coefficient and given its corresponding value in the transformed residual blocks, based on the first and second probabilistic distributions.
- calculating the expectation x i of a block coefficient comprises calculating the following value:
- PDF is the first probabilistic distribution associated with the block coefficient /
- PDF N is the second probabilistic distribution
- Y 0 is the value of the coefficient collocated with said block coefficient / in the transformed residual blocks
- Q m is the quantization interval associated with the value of the quantized transformed coefficient collocated with said block coefficient .
- the probabilistic distributions are generalized Gaussian distributions
- the obtaining of the second probabilistic distribution comprises fitting a Generalized Gaussian Distribution model onto the differences between the coefficients in the transformed residual blocks and the dequantized transformed coefficients. In that case, the second probabilistic distribution is statistically obtained based on the coefficients that are actually handled by the decoder.
- the obtaining of the first probabilistic distribution comprises obtaining parameters from the bit-stream and applying these parameters to a probabilistic distribution model.
- the low resolution or base image temporally corresponding to a first enhancement image to decode is an image bi-directionally predicted from reference low resolution or base images using motion information in each of the two directions
- the decoding of the first enhancement image comprises obtaining transformed residual blocks for each direction and merging together the transformed residual blocks in both directions with the dequantized blocks of dequantized transformed coefficients.
- This approach proves to be more precise than an approach which first determines a single transformed residual block based on prediction in both directions. This is because a motion prediction noise estimation in each direction is separately obtained, improving a probabilistic merger.
- the merging can be based on calculating an expectation.
- the step of merging may comprise calculating the merger value x. of a block coefficient / ' using the formula:
- obtaining residual blocks comprises:
- These steps define the temporal prediction of the enhancement layer based on the images already reconstructed. They produce another enhancement layer (since each obtained block is the difference with the base layer) from which a modelling of the temporal prediction noise can be performed.
- that motion information before using the motion information, that motion information is up-sampled (or interpolated) into high resolution. This is because the reference image on which that information is about to be used is of high resolution.
- the motion information that is up-sampled comprises, for a given block, a motion vector and a temporal residual block; and the obtaining of the motion predictor blocks comprises
- It may further comprise a reference image index identifying said reference low resolution image, when the encoding with temporal prediction uses multiple reference images.
- the decoding method may further comprise filtering, using a deblocking filter, the obtained decoded high resolution images; wherein parameters (e.g. the filter strength parameter or the quantization-dependent parameter) of the deblocking filter depend on the first and second probabilistic distributions used during the merger.
- parameters e.g. the filter strength parameter or the quantization-dependent parameter
- the second probabilistic distributions are obtained for blocks collocated with enhancement image blocks of the corresponding low resolution or base image that are encoded with the same coding mode.
- the coding mode of the base (low resolution) layer is for example the INTER mode, which may be further subdivided into an INTER P-prediction mode and an INTER B-prediction mode, or the SKIP mode (as defined in H.264).
- first probabilistic distributions are obtained for respectively each of a plurality of channels, wherein a channel is associated with collocated coefficients having the same block coefficient position in their respective blocks. Furthermore, a channel may be restricted to the blocks collocated with base layer blocks having the same coding mode.
- 11 th to 14 th aspects, 15 th to 19 th aspects and 20 th & 21 st aspects) can be applied independently of one another but any two or more sets can be combined. As noted above, too, they can also be applied independently of the 1 st to 4 th aspects or in combination with the 1 st to 4 th aspects.
- the encoding and decoding of the base layer are in conformity with HEVC.
- Other preferred features of the 1 st to 4 th aspects of the invention, as described above, can still be applied when HEVC encoding and decoding are not used.
- Combinations with the sets of aspects described above (5 th to 8 th aspects, 9 th & 10 th aspects, 1 1 th to 14 th aspects, 15 th to 19 th aspects and 20 th & 21 st aspects) can be made when HEVC encoding and decoding are not used.
- aspects of the present invention relate to programs which, when executed by a computer or processor, cause the computer or processor to carry out encoding methods embodying any of, or any combination of, the 1 st , 5 th , 9 th , 11 th and 15 th aspects of the present invention.
- Further aspects of the present invention relate to programs which, when executed by a computer or processor, cause the computer or processor to carry out decoding methods embodying any of, or any combination of, the 2 nd , 6 th , 13 th , 17 th and 20 th aspects of the present invention. Further aspects of the present invention relate to programs which, when loaded into a computer or processor, cause the computer or processor to function as the encoding apparatuses of any of, or any combination of, the 3 rd , 7 th , 10 th , 12 m and 16 th aspects of the present invention.
- the programs may have features and advantages that are analogous to those set out above.
- a program embodying the invention may be provided by itself or may be carried in, on or by a carrier medium.
- the carrier medium may be a storage medium or recording medium.
- the storage medium is computer-readable.
- the carrier medium may also be a transmission medium such as a signal.
- Such a signal may be transmitted through a network such as the Internet to enable a program embodying the present invention to be distributed via the network.
- FIG. 1 schematically shows an encoder for a scalable codec
- FIG. 3 schematically illustrates the enhancement video encoding module of the encoder of Figure 1 ;
- FIG. 3a is a more detailed schematic illustration of the enhancement video encoding module of the encoder of Figure 1 according to the invention.
- FIG. 4 schematically illustrates the enhancement video decoding module of the encoder of Figure 2;
- FIG. 4a is a more detailed schematic illustration of the enhancement video decoding module of the decoder of Figure 2 according to the invention.
- FIG. 6 illustrates a structure of a 4:2:0 macroblock
- FIG. 7 illustrates an example of a quantizer based on Voronoi cells
- FIG. 8 schematically illustrates the alphabet of symbols and associated probabilities for the quantizer of Figure 7;
- FIG. 9 illustrates the video stream structure for the luminance, with indication of the alphabets involved for DCT coefficients
- Figure 11 illustrates the intra-block grouping of Figure 10
- Figure 13 shows steps of an exemplary method for the intra-block grouping of Figure 11 ;
- Figure 14 shows steps of an exemplary method for the inter-block grouping of Figure 12;
- FIG. 15 shows steps of an entropy encoding method according to the invention
- Figure 17 illustrates the performance of grouping alphabets, in comparison to the performances illustrated in Figure 5;
- FIG. 19 illustrates an implementation of entry points to allow spatial random access as illustrated in Figure 18;
- FIG. 20 shows a particular hardware configuration of a device able to implement methods according to the invention
- FIG. 23 illustrates the coding structure of the bitstream with macroblock flags according one embodiment of the invention
- FIG. 26 shows exemplary rate-distortion curves, each curve corresponding to a specific number of quanta
- - Figure 27 shows the rate-distortion curve obtained by taking the upper envelope of the curves of Figure 26;
- - Figure 28 depicts several rate-distortion curves obtained for various possible parameters of the DCT coefficient distribution;
- FIG. 31 is a more detailed schematic illustration of the decoder of Figure 2 according to another embodiment of the invention.
- Figure 32 illustrates the prediction of the enhancement layer in the decoder of Figure 31 ;
- Figure 34 illustrates the performance of the Figure 31 decoder.
- an embodiment of the present invention seeks to provide a UHD codec with low complexity based on scalable encoding.
- UHD is merely an example of a first (lower) resolution
- UHD is merely an example of a second (higher) resolution.
- the invention can be applied to any case in which two or more different resolutions of video data are available or can be obtained.
- Video data having the first (lower) resolution can, for example, be obtained from video data having the second (higher) resolution by downsampling.
- some video sources, such as video cameras may deliver simultaneously video data having the first and second resolutions.
- the UHD video is encoded into a base layer and one or more enhancement layers.
- the base layer results from the encoding of a low resolution version (version having the first resolution) of the UHD images, in particular having a HD resolution, with a standard existing codec (e.g. H.264 or HEVC - High Efficiency Video Coding).
- a standard existing codec e.g. H.264 or HEVC - High Efficiency Video Coding
- the conventional block-based codec such as H.264, implements a Skip mode as described for example in US application No 2009/0262835.
- a Skip mode when a macroblock (or block) residual resulting from the prediction is full of zeros (and possibly when the motion vector for the macroblock is zero [i.e. no motion]), a Skip Macroblock flag is set to 1 in the bit stream, instead of coding the motion vectors or the residual. This mainly reduces the number of bits to be coded, and improves compression rate.
- this Skip mode takes advantage of the dependence between the quantized DCT coefficients, in particular of the strong correlation between the small values of the quantized DCT coefficients. For instance, it is generally observed that there are much more DCT blocks in which all the quantized DCT coefficients are zeroes (so-called skipped blocks) than expected from a theory without dependence. In other words, the probability of a skipped block is much higher than the product of probabilities of having a zero for all quantized DCT coefficients.
- an enhancement image is obtained from subtracting an interpolated (or up-scaled or upsampled) decoded image of the base layer from the corresponding original UHD image.
- the enhancement images which are residuals or pixel differences with UHD resolution, are then encoded into an enhancement layer.
- Figure 1 illustrates such approach at the encoder 10.
- An input raw video 11 is down-sampled 12 to obtain a so-called base layer, for example with HD resolution, which is encoded by a standard base video coder 13, for instance H.264/AVC or HEVC. This results in a base layer bit stream 14.
- a standard base video coder 13 for instance H.264/AVC or HEVC.
- the encoded base layer is decoded 15 and up-sampled 16 into the initial resolution (UHD in the example) to obtain the up- sampled decoded base layer.
- the latter is then subtracted 17, in the pixel domain, from the original raw video to get the residual enhancement layer X.
- the information contained in X is the error or pixel difference due to the base layer encoding and the up-sampling. It is also known as a "residual".
- a conventional block division is then applied, for instance a homogenous
- a block-based DCT transform 18 is applied to each block to generate DCT blocks forming the DCT image X DCT having the initial UHD resolution.
- This DCT image X ⁇ is encoded in X ⁇ T . Q by an enhancement video encoding module 19 into an enhancement layer bit stream 20.
- the encoded bit-stream EBS resulting from the encoding of the raw video 11 is made of:
- Figure 2 illustrates the associated processing at the decoder 30 receiving the encoded bit-stream EBS.
- Part of the processing consists in decoding the base layer bit-stream 14 by the standard base video decoder 31 to produce a decoded base layer.
- the decoded base layer is then up-sampled 32 into the initial resolution, i.e. UHD resolution.
- both the enhancement layer bit-stream 20 and the parameters 21 are used by the enhancement video decoding and dequantization module 33 to generate X° E f .
- the image X ⁇ f is the result of the quantization and then the inverse quantization on the original image X.
- An inverse DCT transform 34 is then applied to each block of the dequantized image to obtain the decoded residual X° ⁇ T Q -, (of UHD resolution) in the pixel domain. For example, each decoded residual X ⁇ T Q - > is added 35 to the corresponding block in the up-sampled decoded base layer to obtain decoded images of the video.
- Filter post-processing for instance with a deblocking filter 36, is finally applied to obtain the decoded video 37 which is output by the decoder 30.
- Reducing UHD encoding and decoding complexity relies on simplifying the encoding of the enhancement images at the enhancement video encoding module 19 compared to the conventional encoding scheme.
- the inventors dispense with the temporal prediction and possibly the spatial prediction when encoding the UHD enhancement images. This is because the temporal prediction is very expensive in terms of memory bandwidth consumption, since it often requires accessing other enhancement images as reference images. Low-complexity codecs may then be designed, in particular at the encoding side.
- Figure 3 illustrates an embodiment of the enhancement video encoding module 19 (or “enhancement layer encoder”) that is provided by the inventors.
- the enhancement layer encoder models 190 the statistical distribution of the DCT coefficients within the DCT blocks of a current enhancement image by fitting a parametric probabilistic model.
- This fitted model becomes the channel model of DCT coefficients and the fitted parameters are output in the parameter bit-stream 21 coded by the enhancement layer encoder.
- a channel model may be obtained for each DCT coefficient position within a DCT block based on fitting the parametric probabilistic model onto the corresponding collocated DCT coefficients throughout all the DCT blocks of the image X DCT or of part of it.
- quantizers may be chosen 191 from a pool of pre-computed quantizers dedicated to each parametric channel.
- the chosen quantizers are used to perform the quantization 192 of the
- an entropy encoder 193 is applied to the quantized DCT image
- the associated enhancement video decoder 33 is shown in Figure 4.
- the channel models are reconstructed and quantizers are chosen 330 from the pool of quantizers.
- a dequantization 332 is then performed by using the chosen quantizers, to obtain X ⁇ C .
- channel modelling the selection of quantizers and the adaptation of entropy coding are some of the additional tools as introduced above.
- those additional tools may be used for the encoding of any image, regardless of the enhancement nature of the image, and furthermore regardless of its resolution.
- the invention is particularly advantageous when encoding images without prediction.
- predefined quantizers may be used instead of the selection 191 of optimal quantizers.
- Figure 5 illustrates the performance of conventional entropy encoding.
- the y-axis corresponds to the average picture quality obtained when decoding an encoded video bit-stream.
- the x-axis corresponds to the average bitrate obtained in the encoding of the corresponding video.
- the dashed curve shows a rate distortion curve obtained, with bitrate values equal to the theoretical entropy of the coded source.
- the entropy bitrate is the lowest rate bound that can be achieved by any entropy coder.
- the plain curve illustrates the rate-distortion curve obtained when a conventional Huffman coding/decoding process is used, for example as defined in the standard H.264.
- bitrate usually obtained is substantially higher than the entropy rate.
- this is due to the fact that the Huffman coding may be as much as one bit away from the theoretical entropy for a single symbol to encode.
- the present invention seeks to improve this entropy encoding situation, in particular with the aim of getting closer to the entropy rate.
- a low resolution version of the initial image has been encoded into an encoded low resolution image, referred above as the base layer; and a residual enhancement image has been obtained by subtracting an interpolated high resolution (or up-sampled) decoded version of the encoded low resolution image from said initial image.
- that residual enhancement image is then transformed from the spatial domain (i.e. pixels) into the (spatial) frequency domain, using for example a block-based DCT transform, to obtain an image of transformed block coefficients.
- X DCT which comprises a plurality of DCT blocks, each comprising DCT coefficients.
- the residual enhancement image has been divided into blocks B k , for instance 8x8 blocks but other divisions may be considered, on which the DCT transform is applied.
- Blocks are grouped into macroblocks MB k .
- a very common case for so- called 4:2:0 YUV video streams is a macroblock made of 4 blocks of luminance Y, 1 block of chrominance U and 1 block of chrominance V, as illustrated in Figure 6.
- other configurations may be considered.
- a macroblock MB k is made of 16x16 pixels of luminance Y and the chrominance has been down-sampled by a factor two both horizontally and vertically to obtain 8 * 8 pixels of chrominance U and 8 * 8 pixels of chrominance V.
- the four blocks within a macroblock MB k are referenced
- a probabilistic distribution P of each DCT coefficient is determined using a parametric probabilistic model. This step is referenced 190 in the Figure.
- the image X DCT is a residual image, i.e. information is about a noise residual, it is efficiently modelled by Generalized Gaussian Distributions (GGD) having a zero mean: DCT(X) « GGD(a, ⁇ ) ,
- the DCT coefficients cannot be all modelled by the same parameters and, practically, the two parameters ⁇ , ⁇ may depend on:
- a DCT channel is thus defined as the DCT coefficients collocated (i.e. having the same index) within a plurality of DCT blocks (possibly all the blocks of the image).
- a DCT channel can therefore be identified by the corresponding index i ;
- Intra blocks of the base layer do not behave the same way as Inter blocks.
- Blocks with a coded residual in the base layer do not behave the same way as blocks without such a residual (i.e. Skipped blocks).
- blocks coded with a non-nil texture data according to the coded-block-pattern syntax element as defined in H.264/AVC do not behave the same way as those blocks without non-nil texture data.
- the collocation of blocks should take into account that down-sampling.
- the 'four blocks of the n-th macroblock in the residual enhancement layer with UHD resolution are collocated with the n-th block of the base layer having a HD resolution. That is why, generally, all the blocks of a macroblock in an enhancement image have the same base coding mode.
- the modelling 190 has to determine the parameters of 64 DCT channels for each base coding mode.
- the luminance component Y and the chrominance components U and V have dramatically different source contents, they must be encoded in different DCT channels. For example, if it is decided to encode the luminance component Y on one channel and the chrominance components UV on another channel, 128 channels are needed for each base coding mode.
- At least 64 pairs of parameters for each base coding mode may appear as a substantial amount of data to transmit to the decoder (see parameters 21 ).
- experience proves that this is quite negligible compared to the volume of data needed to encode the residuals of Ultra High Definition (4k2k or more) videos.
- such a technique is preferably implemented on large videos, rather than on very small videos because the parametric data would be too costly.
- a set of DCT blocks corresponding to the same base coding mode is now considered.
- the invention may then be applied to each set corresponding to each base coding mode.
- the invention may be directly applied to the entire image, regardless the base coding modes.
- the Generalized Gaussian Distribution model is fitted onto the DCT block coefficients of the DCT channel, i.e. the DCT coefficients collocated within the DCT blocks with the same base coding mode. Since, this fitting is based on the values of the DCT coefficients before quantization (of the DCT blocks having the same base coding mode in the example), the probabilistic distribution is a statistical distribution of the DCT coefficients within a considered channel i.
- the fitting may be simply and robustly obtained using the moment of order k of the absolute value of a GGD:
- the value of the parameter ⁇ can thus be estimated by computing the above ratio of the two first and second moments, and then the inverse of the above function of ⁇ ,.
- this inverse function may be tabulated in a memory of the encoder instead of computing Gamma functions in real time, which is costly.
- a quantization 192 of the DCT coefficients is then performed, to obtain quantized DCT coefficients (i.e. symbols or values).
- the quantization of those coefficients may involve optimal quantizers chosen (step 191 ) for each DCT channel i based on the corresponding probabilistic distribution Pj(x) of the DCT coefficients.
- the quantizers may be predefined prior to the encoding. Since the quantization is not the core of the present invention, it is here assumed that a quantizer is selected for each DCT channel and each base coding mode as defined above, meaning that various quantizers are generally used for quantizing various DCT coefficients.
- Figure 7 illustrates an exemplary Voronoi cell based quantizer.
- a quantizer is made of M Voronoi cells distributed over the values of the
- Each cell corresponds to an interval [ ⁇ J ⁇ +I ], called quantum Q m .
- Each cell has a centroid c m , as shown in the Figure.
- intervals are used for quantization: a DCT coefficient comprised in the interval ['m ⁇ m + i ] is quantized by a symbol a m associated with that interval.
- centroids are used for de-quantization: a symbol a m associated with an interval is de-quantized into the centroid value c m of that interval.
- Figure 8 illustrates what is named an alphabet of quantization for a given quantizer.
- This Figure shows an example of symbols a m corresponding to the Voronoi cells. In practice, those symbols are consecutive integer values centred on 0, i.e. values from [-MAX; MAX] as shown in the Figure.
- the alphabet A comprises all the symbols or values a m defined by the Voronoi cells of the quantizer used.
- the alphabet A comprises all the possible symbols or values for a DCT coefficient due to the quantization performed on that coefficient.
- the explanations below concentrate on macroblocks associated with the same coding mode, i.e. using the same alphabets. This is because the residual enhancement image may be segmented according to the base coding modes, and then processed by segment each corresponding to the same coding mode.
- the probabilities ⁇ p i m ⁇ for an alphabet Aj thus define the probabilistic distribution of the possible symbols or values defined therein.
- the probabilistic distribution is the same for the alphabets associated with DCT coefficients collocated within a plurality of the blocks of the image.
- Such probabilities may be computed off-line and stored in memory of the encoder, in order to decrease the complexity of real time encoding. This is for example possible when the parameters ⁇ , ⁇ for modelling the distribution of the DCT coefficients are chosen from a limited number of possible parameters, and when the possible quantizers are know in advance.
- the restriction 194 of alphabets leads to generating an additional stream with the encoded bit-stream EBS, namely the magnitude probability bit-stream 22.
- the grouping 195 of alphabets does not need such an additional stream.
- the second processing operation i.e. the grouping of alphabets before entropy encoding the DCT coefficients based on the resulting grouped alphabets, is explained below.
- restriction of alphabets 194 is an extension of the Skip flag as introduced in H.264 for example. It results in restricted alphabets that well fit to the encoding of the DCT blocks, given the maximum magnitude of their quantized DCT coefficients.
- a flag is associated with each block of quantized DCT coefficients.
- That flag for a DCT block specifies a magnitude, which may be determined based on the quantized coefficient values in the associated block.
- the magnitude is the extended information compared to conventional Skip flags.
- the encoding method allows restriction of alphabets A corresponding to the quantized DCT coefficients of each block, based on the magnitude specified in the flag associated with that block.
- the magnitude specified in a flag associated with a block represents a maximum magnitude of the quantized coefficient values of that block.
- This specified maximum magnitude may be the maximum magnitude taken by one of the quantized coefficient values, but may also be an upper bound of those values.
- Figure 21 illustrates an embodiment of the main steps of the restriction
- the method begins at step 2100 with the alphabets Aj of the quantized DCT coefficients considered (e.g. from the blocks associated with the same base coding mode) and their associated probabilistic distributions ⁇ pi, m ⁇ - Step 2105 consists in computing the probabilities ⁇ . ⁇ as defined above..
- the alphabet F comprises the flags f n , where n defines the maximum magnitude (i.e. the maximum absolute value) of the quantized DCT coefficients a m in a block.
- the alphabet F comprises the flags specifying, as maximum magnitude, all the possible values defined in the alphabets A-
- a flag f n from the alphabet F is associated with each of the DCT blocks B k , based on the maximum magnitude n taken by a quantized DCT coefficient therein. Those flags specify the maximum magnitudes taken by one of the quantized coefficient values of their respective associated block. Those flags will be entropy encoded as further described below.
- this step may follow step 21 15 and then a flag from the restricted alphabet F' as defined below may be selected and directly associated with each DCT block rather than a flag from the larger alphabet F.
- the flags may specify only an upper bound of the magnitude taken by the quantized coefficient values of their respective associated block.
- the next step 2115 consists in restricting this set or alphabet F of flags into a restricted alphabet F, based on probabilities of occurrence Pe. n of the flags f n of F.
- the block flag f n may take many different values for a block (i.e. the alphabet F may be large), the encoding cost of the flag may turn out bigger than the overhead reduction obtained by using the Huffman coding based on the restricted alphabets Aj as defined below.
- the alphabet F is thus restricted to the most significant flag values, which will be used to restrict the alphabet A, as further described below.
- the number of flags selected to constitute the restricted alphabet F' is at most three. This appears to be a good tradeoff between the costs of encoding the flags and of encoding the DCT coefficients based on the restricted alphabets A, as defined below.
- one flag of the flags in the restricted alphabet F' is the flag specifying the largest magnitude in the alphabet F.
- the values a and b may be selected to define intervals of magnitudes (i.e.
- the probability of occurrence is the probability that a block has a maximum magnitude comprised in [0,a] or [a,b].
- a may be chosen as the smallest value with at least 10% probability of occurrence, and thus computed based on the following formula:
- P(n ⁇ t) means the probability of occurrence of a block having a maximum magnitude equal to or less than t.
- the other value b may be chosen as the smallest value with the next at least 10% probability of occurrence from the previous value a. b verifies the following formula:
- a predefined amount of 40% may be used to determine a.
- the conventional Skip flag corresponds to .
- one flag in the restricted alphabet F specifies a (maximum) magnitude equal to zero.
- step 2120 consists in restricting the alphabets A, corresponding to each quantized DCT coefficient in a current block based on the (maximum) magnitude specified in the flag associated with the current block to process.
- the restricting operation is as follows:
- the distance of DCT coefficient magnitudes to the maximum possible magnitude allowed by the quantizers (when they may be determined) is used.
- M t - the maximum value of the alphabet A, of the quantizer of the DCT coefficient; this means that values taken by this quantizer are in the interval M i > M i ] .
- M B max M i i
- the block distance to the maximum magnitude is defined as:
- plane level Me ⁇ d B
- « - M B means that quantizers have not been planed.
- Figure 22 illustrates such plane levels for the quantizers of six DCT coefficients.
- restricting the alphabets of a block comprises deleting the d B most negative and d B most positive possible values from each of the alphabets. This corresponds to dropping the d B outermost quanta from each alphabet, until the sole central quantum remains.
- the probabilities ⁇ , ⁇ and ⁇ , ⁇ may then be defined (respectively as the probability that the plane level of a given block is equal to n, and as the probability that the plane level of a whole macroblock is equal to n).
- the grouping 195 may be applied.
- the probabilistic distributions (i.e. the probabilities pi m ) of the alphabets are used to group at least two alphabets into a new alphabet, the new alphabet replacing the at least two alphabets and defining, as symbols, the combinations of possible symbols for the at least two transformed block coefficients associated with the grouped alphabets.
- variable-length-code (VLC) binary codes generated from an alphabet alone have an integer number of bits, moving away its mean length from the theoretical entropy of the alphabet (up to one bit).
- VLC variable-length-code
- the VLC code of each alphabet is not worse than one bit (the above "one bit cost") from the entropy and satisfies the following inequalities: H(A) ⁇ (A) ⁇ H(A) + 1 ( where the entropy H(A) is defined by ⁇ ⁇ ⁇ P' ⁇ ' , and the mean length of the VLC code is defined by A) - ⁇ P j L j ⁇ bgj g ⁇
- the entropy encoding may be made as close as wanted to the theoretical entropy, but this would not be practical since large Huffman trees would then have to be handled by the encoder.
- Different strategies for grouping the alphabets may be implemented as described below, providing VLC binary codes that are closer, to a greater or lesser extent, to the theoretical entropy of the alphabets.
- Figure 10 illustrates an overall method of processing the alphabets associated with the quantized DCT symbols, according to an embodiment of the invention.
- the method begins at step 1000 with the alphabets A of the quantized DCT symbols considered (e.g. from the blocks associated with the same base coding mode) and their associated probabilistic distributions ⁇ Pi ,m ⁇ m- At step 1005, which is optional, alphabets having associated entropy or a size (in terms of number of symbols) less than a predefined value are discarded from consideration for grouping alphabets.
- the encoding method may avoid generating corresponding VLC codes (e.g. Huffman trees) as described below, and may avoid encoding the DCT symbols coming from this alphabet.
- VLC codes e.g. Huffman trees
- step 1010 it is decided to group alphabets according to an intra- block grouping. It is recalled here that the grouping of alphabets is a product of those alphabets.
- Such intra-block grouping groups at least two alphabets associated with at least two DCT coefficients of the same block of pixels.
- the at least two DCT coefficients for which the grouping is performed are adjacent coefficients within the same block.
- the same grouping of alphabets may be performed for a plurality of blocks within the first image, in particular the blocks of the same macroblock (because they are associated with the same base coding mode).
- Figure 11 illustrates this situation for two adjacent blocks B k and B k+ i .
- step 1015 is an inter-block grouping of the alphabets remaining after step 1010, i.e. they may be initial alphabets resulting from the quantization and/or new alphabets resulting from the intra-block grouping.
- Such inter-block grouping groups at least two alphabets associated with DCT coefficients collocated in at least two blocks of pixels.
- the at least two blocks of pixels may be from the same macroblock dividing the DCT image X DCT .
- a new alphabet resulting from intra or inter-block grouping replaces the alphabets that are grouped, and that a probabilistic distribution of that new alphabet is also computed as defined above (i.e. based on the probabilistic distributions of the at least two alphabets involved in the grouping). This is because such probabilistic distribution of the new alphabet may be useful when considering whether or not that new alphabet should be grouped again, but is also required to generate VLC codes.
- step 1015 a set of alphabets and associated probabilistic distributions is obtained (step 1020).
- Intra-block grouping and then inter-block grouping are performed iteratively as long as a resulting new alphabet comprises less symbols than a predefined number. That means that the iterative process is continued up to a given upper limit of the cardinal of the grouped alphabets. This is in order to ensure that the associated VLC codes, e.g. conventional Huffman trees, are small enough to be handled in real time by the codec.
- VLC codes e.g. conventional Huffman trees
- a first method for intra-block grouping of alphabets is based on the entropy of those alphabets. This is because the alphabets with very small entropy are those with the largest overhead, due to the upper bound of one bit that may be asymptotically reached. On the contrary, the alphabets with high entropy have generally a lower overhead.
- the method comprises determining an entropy value for each of the alphabets, and the grouping of at least two alphabets comprises grouping the available alphabet having the smallest entropy with the available alphabet having the highest entropy.
- the entropy value derives from the probabilistic distribution of the alphabet.
- the smallest (and respectively the largest) entropy alphabet may not be the alphabet having the smallest (and respectively the largest) entropy as such, but those alphabets that satisfy this constraint.
- the alphabet with the smallest entropy is chosen for a grouping with the compatible (i.e. alphabet product size is smaller than the given upper limit) alphabet having the highest entropy.
- the grouping based on the probabilistic distributions (indirectly the entropy) of the alphabets is iteratively repeated until there are no more alphabets satisfying the constraint. It is to be noted that a group of alphabets resulting from the intra-block grouping is considered as an alphabet for the next intra-block grouping iteration.
- the initial state 1300 comprises the alphabets A A originated of a DCT block that is to be processed, as well as their associated probabilistic distribution ⁇ pi, m i ⁇ , --- > ⁇ Pn.mn ⁇ ("mi" being an index varying from 0 to the number of symbols within the corresponding alphabet Aj). To each alphabet A, is appended the list of the corresponding DCT coefficients, initially only the DCT coefficient with index i.
- step 1305 the entropy H(A) of each alphabet is computed, and the alphabet A min having the minimum entropy is selected.
- the alphabet A max which has the maximum entropy amongst the other alphabets, while satisfying the constraint on the cardinal of the product AminxA max (cardinal less that an upper limit denoted MAX_ALPHABET_SIZE in the Figure), is then selected.
- step 1320 is executed during which:
- the probabilistic distribution ⁇ p ne w.m ⁇ of the new alphabet is calculated based on the probabilistic distribution of the grouped alphabets: Pmin,m Pmax.m;
- the intra-block grouping ends at step 1325 with a plurality of alphabets (some group initial alphabets Aj) and their associated probabilistic distributions.
- Such intra-block grouping efficiently improves the entropy of VLC codes generated from the resulting alphabets. This is because in the resulting alphabet, the largest probability (which induces a large overhead) is bounded by the highest probability of the alphabet with high entropy. Therefore, the remaining highest probability induces reduced overhead.
- a second method for intra-block grouping of alphabets is now described. Compared to the first method, this method is more efficient but requires more calculation. Instead of only calculating the entropy of the alphabets, the second method selects alphabets such that the reduction of coding overhead is maximum, meaning that the steps 1305 and 1310 are different (but the other steps are the same).
- the second method comprises grouping two alphabets which maximizes a reduction in distance between the entropy of the alphabets and the entropy of the binary codes constructed from the alphabets.
- the binary codes are generated from the alphabet and the associated probabilistic distribution, based on a Huffman code.
- the distance of a Huffman code to the entropy cannot be anticipated without a costly and impractical operation of constructing the Huffman tree itself.
- the entropy of the binary codes constructed from the alphabets is modelled by the entropy of a corresponding Shannon code. This is because the distance d(Aj) as defined below provides an upper bound of the Huffman code distance to the entropy of the alphabet Aj.
- this entropy is used to estimate the overhead or distance to the entropy of the alphabet.
- This distance of the Shannon code to the entropy of the alphabet A is defined as follows:
- the second method for the intra-block coding consists in selecting (corresponding to the two steps 1305 and 1310) the two alphabets Ai and Aj which provide the maximum reduction together while satisfying the constraint on cardinality:
- step 1320 is performed, ensuring that the group of alphabets resulting from the intra-block grouping is considered as an alphabet for the next intra-block grouping iteration, with an associated probabilistic distribution.
- the grouping of the alphabets is iteratively repeated until there are no more alphabets satisfying the constraint on the size of A, x A j .
- a third method for the intra-block grouping derives from the second method, wherein each distance relating to an alphabet is weighted by the entropy of that alphabet. This gives priority to grouping alphabets having small entropy rather than alphabets with larger entropy, when those alphabets have binary codes with similar distances to the entropy.
- the alphabets to group are selected based on the following formula:
- step 1010 ends with a set of M alphabets (some grouping several initial alphabets A) and associated probabilistic distributions.
- Figure 14 illustrates an example of inter-block grouping that may be applied to these alphabets obtained at the end of step 1010. Let's denote those alphabets by A', when beginning the inter-block grouping 1015.
- the inter-block grouping is performed between blocks B k of the same macroblock MB k .
- Such a situation has been previously introduced with reference to Figure 12.
- the inter-block grouping is performed iteratively by making the product alphabet of an alphabet A', multiplied by itself, due to the same base coding mode within the macroblock (which permits the same intra-block grouping to be performed in all the blocks of that macroblock).
- the result is one alphabet per block B k . If it is grouped once, the result is one alphabet per 2 blocks (i.e. half the macroblock).
- the initial state 1400 comprise the M alphabets A',, as well as their associated probabilistic distributions ⁇ Pi, m ⁇ - nbCoeffs defines the number of DCT coefficients that are appended to a considered alphabet.
- Step 1405 initialises an index a enabling the iterative processing of the inter-block grouping through steps 1425 and 1430.
- Each iteration a processes the alphabet A' a , trying to group it with itself, once or twice (output "no" of test 1420 driving the second grouping of this alphabet).
- the only condition for allowing a grouping is the given upper limit (MAX_ALPHABET_SIZE) in the cardinality on the inter-block grouped alphabet.
- MAX_ALPHABET_SIZE the given upper limit
- step 1415 When a grouping of the alphabet A' a with itself is possible, step 1415 is performed. In particular, this step updates the appended list of DCT coefficients concerned by the new alphabet obtained. This list doubles each time the alphabet is grouped with itself.
- Step 1415 also updates the probabilistic distribution of that new alphabet.
- This iterative processing is over when all the alphabets A', have been processed, meaning that they have been grouped twice or they cannot satisfy the interblock grouping constraint regarding cardinality.
- Step 1015 of inter-block grouping ends with a set of M' alphabets (some grouping several initial alphabets for several blocks) and associated probabilistic distributions.
- Such a set is obtained for a specific macroblock with which a specific base coding mode is associated, as introduced above.
- binary entropy codes for the symbols defined in the M' alphabets are generated, in such a way that the quantized DCT symbols are then encoded using those generated binary codes.
- the same order of the DCT symbols is applied at the encoding and at the decoding, since the decoder performs the same operations as the encoder (grouping based on the probabilities and probabilistic distribution parameters that it will receive). For example, this order may be the zigzag scanning order, taking into account the grouping of alphabets (i.e. grouping of DCT coefficients): the smallest index of a DCT coefficient amongst the grouped indexes may be used to represent the encoding/decoding position of the grouped coefficients, along the zigzag order.
- the binary codes may be VLC codes obtained from the probabilistic distribution, based on a Huffman code. Huffman trees are thus generated to obtain the corresponding VLC binary codes for each of the M' alphabets.
- the entropy coding of the quantized DCT coefficients of a macroblock is illustrated in Figure 15.
- the process shown in this Figure is based on the fact that it is possible to alternate between many Huffman trees during the encoding without altering the decoding capability: the generated bit-stream 20 is still a prefix code and there is no need to add extra syntax to specify which DCT coefficients of the blocks have been grouped within a single alphabet (i.e. which DCT coefficients are encoded by the same binary code), since the decoder will perform the same grouping operations and obtain the same grouped alphabets.
- the algorithm of Figure 15 has to be processed for each macroblock of the DCT image X DCT to encode the entire residual enhancement image.
- the process begins (step 1500) with the obtaining of the restricted flag alphabet F' as constructed above, based on the current base coding mode.
- the flags f detox of the macroblock are encoded with the flag value f a , f b or f ⁇ , depending on which interval [0,a], ]a,b] or ]b, ⁇ [ the value n belongs to.
- This substitution between f n and one of f a , f b and f K can however be done previously, when restricting the alphabets.
- the flag associated with each block is entropy encoded based on the probabilities of occurrence of the flags of the alphabet F, used to construct VLC (e.g. Huffman) codes.
- VLC e.g. Huffman
- the flags associated with the blocks of the same macroblock are concatenated into one flag associated with the macroblock.
- the probabilities of occurrence of the possible values for a macroblock flag are calculated based on the probability ⁇ ⁇ . ⁇ defined above. This is because since p MB n » ⁇ ⁇ ⁇ * , the other probabilities for a macroblock flag have to be normalized. For example, if only p MB, o is considered for that normalization:
- the probabilities P MB, n may also be used, in which case ⁇ is adapted: ⁇ and the probabilities ⁇ , ⁇ are
- the macroblock flag f MB is thus encoded using a Huffman coding based on the above probabilities.
- Figures 23 illustrates the order of coding: a flag f MB followed by the Huffman coding of its associated macroblock data, then another flag followed by the Huffman coding of the associated next macroblock and so on.
- Such a structure preserves the spatial random access capability in the stream.
- step 1504 consists in obtaining the set S of IW symbol groups (i.e. of the M' alphabets issued from the intra and inter-block grouping) corresponding to that coding mode.
- - g.nbDCTCoeffs represents the total number of DCT coefficients coded by each symbol of the current alphabet g (i.e. the number of DCT coefficients appended to that alphabet);
- - g.nbBlock represents the value nbBlock obtained during the inter-block grouping for the alphabet g (see Figure 14);
- - quantized_coeff[g.dcti +block_index*64] is the value of the I th quantized DCT coefficient stored in the list appended to the current alphabet g, for the current symbol (since a symbol may represent several initial DCT coefficients).
- the current symbol being encoded is represented by the blockjndex variable.
- the M" alphabets are processed iteratively through steps 1505, 1545 (which detects the last alphabet) and 1550 (which selects another alphabet).
- the algorithm encodes one or more symbols of that alphabet, depending on the number of 8x8 DCT blocks (variable nbBlock obtained from Figure 14 for that alphabet) associated with the current alphabet.
- the number of symbols coded for the current alphabet depends on the number of blocks it covers.
- a loop is performed to encode the symbols of that current alphabet g, through steps 1510 (which initializes blockjndex), 1530 (which increments the blockjndex according to nbBlock) and 1535 (which checks whether all the blocks of the macroblock have been processed).
- the symbol value symbolVal is computed at step 1515. This consists in combining quantized DCT coefficient values together so as to obtain the right symbol value.
- the Huffman code associated with that symbol value is obtained at step 1520 from the constructed Huffman trees. Then that Huffman code is written in the output video bitstream 20 at step 1525.
- the DCT coefficients are encoded using the restricted alphabets.
- - all blocks whose flag value is f detox with n ⁇ a are encoded with flag value f a , i.e. with alphabets having magnitudes less than a (for the first implementation of the flag as described above);
- the decoding is to be performed successively for each encoded macroblock.
- step 1602 which is the decoding of the macroblock flag F MB and step 1604 which includes obtaining the alphabets restricted given the flags f n .
- the process comprises
- probabilities are obtained associated with each of the alphabet F' of possible flags; and the encoded flags are entropy decoded from the bit-stream based on the obtained probabilities to obtain said flag associated with each block. Those probabilities make it possible to reconstruct the
- Huffman trees They are the probabilities associated with the flags f a , f b and f ⁇ .
- the decoding process comprises splitting the decoded macroblock flag into flags associated with the blocks of the macroblock.
- the decoder performs the same restriction as the one performed by the encoder:
- restricting the alphabets of a block comprises deleting, from the alphabets, the possible values that have a magnitude (i.e. absolute value) larger than the magnitude specified in the flag associated with the block;
- restricting the alphabets of a block comprises deleting the d B most negative and d B most positive possible values from each of the alphabets.
- the magnitude n specified in the flag equals to M B - d B
- B can be retrieved from the probabilistic distributions of the DCT coefficients (i.e. from parameters ⁇ ,, ⁇ and the associated quantizers, to deduce d B .
- the decoding may comprise:
- parameters ( ⁇ ,, ⁇ ,) of each alphabet A can be obtained from the bit-stream 21 and then applied to the above defined probabilistic distribution model GGD(a. ) to obtain the probabilistic distribution of each alphabet (requiring integrating that model on the considered quantums Q m as explained above).
- GGD(a. ) the probabilistic distribution model
- the latter Given the knowledge of the quantizers by the decoder, the latter is then able to perform the same grouping operations as the encoder, resulting in the same grouped alphabets.
- the Figure shows that a bitrate close to the theoretical entropy rate is achieved due to the intra and inter-block grouping of the invention.
- the entropy coding scheme according to the present embodiment also has spatial random access properties, when the video encoding process is without inter frame (temporal) and intra block (spatial) predictions.
- the entropy coding according to the present embodiment has no dependence between macroblocks.
- an entry point of the generated bit-stream 20 and the index of the associated macroblock are given, it is possible to perform the entropy decoding from that point, without decoding other parts of the encoded video.
- bit-stream has the random spatial access property because it is possible to decode only a part of the image (a region of interest) once the associated entry points are given.
- Figure 18 illustrates how the residual enhancement image may be subdivided into spatial zones made of macroblocks, with entry points in order to allow efficient random access compliant coding.
- the position of the entry points may be encoded in the header of the bit- stream 20 in order to facilitate easy extraction from the server side and allow the reconstruction of a valid stream on the decoder side.
- Figure 19 shows the meta-organization of a bit-stream header.
- the slice header shown in the Figure re-uses the slice header of the H.264/AVC video compression standard.
- coded slice length a new field which indicates the length in bytes of the coded slice. The entry points can therefore be easily computed from the "coded slice length" fields.
- Another advantage of this independence between macroblocks of the residual enhancement image is the possibility to perform parallel entropy decoding on the decoder side. Each decoding thread starts decoding from one of the entry points as defined above.
- This invention provides a simple, efficient and block based entropy coder of the quantized DCT coefficients.
- the possible absence of intra prediction between blocks and the judicious choice of the entropy coder provides random spatial access into the video bitstream, i.e. the bitstream can be decoded from any entry point in the bit-stream.
- a device implementing the invention is for example a microcomputer 50, a workstation, a personal digital assistant, or a mobile telephone connected to various peripherals.
- the device is in the form of a photographic apparatus provided with a communication interface for allowing connection to a network.
- the peripherals connected to the device comprise for example a digital camera 64, or a scanner or any other image acquisition or storage means, connected to an input/output card (not shown) and supplying image data to the device.
- the device 50 comprises a communication bus 51 to which there are connected: - a central processing unit CPU 52 taking for example the form of a microprocessor;
- a read only memory 53 in which may be contained the programs whose execution enables the methods according to the invention. It may be a flash memory or EEPROM;
- RAM random access type
- This RAM memory 54 stores in particular the various images and the various blocks of pixels as the processing is carried out (transform, quantization, storage of the reference images) on the video sequences;
- a hard disk 58 or a storage memory, such as a memory of compact flash type, able to contain the programs of the invention as well as data used or produced on implementation of the invention;
- an optional diskette drive 59 or another reader for a removable data carrier, adapted to receive a diskette 63 and to read/write thereon data processed or to process in accordance with the invention
- a communication interface 60 connected to the telecommunications network 61 , the interface 60 being adapted to transmit and receive data.
- the device 50 is preferably equipped with an input/output card (not shown) which is connected to a microphone 62.
- the communication bus 51 permits communication and interoperability between the different elements included in the device 50 or connected to it.
- the representation of the bus 51 is non-limiting and, in particular, the central processing unit 52 unit may communicate instructions to any element of the device 50 directly or by means of another element of the device 50.
- the diskettes 63 can be replaced by any information carrier such as a compact disc (CD-ROM) rewritable or not, a ZIP disk or a memory card.
- CD-ROM compact disc
- an information storage means which can be read by a micro-computer or microprocessor, integrated or not into the device for processing a video sequence, and which may possibly be removable, is adapted to store one or more programs whose execution permits the implementation of the method according to the invention.
- the executable code enabling the coding device to implement the invention may equally well be stored in read only memory 53, on the hard disk 58 or on a removable digital medium such as a diskette 63 as described earlier.
- the executable code of the programs is received by the intermediary of the telecommunications network 61, via the interface 60, to be stored in one of the storage means of the device 50 (such as the hard disk 58) before being executed.
- the central processing unit 52 controls and directs the execution of the instructions or portions of software code of the program or programs of the invention, the instructions or portions of software code being stored in one of the aforementioned storage means.
- the program or programs which are stored in a non-volatile memory for example the hard disk 58 or the read only memory 53, are transferred into the random-access memory 54, which then contains the executable code of the program or programs of the invention, as well as registers for storing the variables and parameters necessary for implementation of the invention.
- the device implementing the invention or incorporating it may be implemented in the form of a programmed apparatus.
- a device may then contain the code of the computer program(s) in a fixed form in an application specific integrated circuit (ASIC).
- ASIC application specific integrated circuit
- the device described here and, particularly, the central processing unit 52 may implement all or part of the processing operations described in relation with Figures 1 to 19 and 21 to 34, to implement methods according to the present invention and constitute devices according to the present invention.
- the quality of a video or still image may be measured by the so-called Peak-Signal-to-Noise-Ratio or PSNR, which is dependent upon a measure of the L2- norm of the error of encoding in the pixel domain, i.e. the sum over the pixels of the squared difference between the original pixel value and the decoded pixel value.
- PSNR may be expressed in dB as:
- the distortion is thus a measure of the distance between the original coefficient (here the coefficient before quantization) and the decoded coefficient (here the dequantized coefficient).
- R is the total rate made of the sum of individual rates R n for each DCT coefficient.
- the rate R n depends only on the distortion D n of the associated n-th DCT coefficient.
- rate-distortion minimization problem (A) can be split into two consecutive sub-problems without losing the optimality of the solution:
- problem (B) into a continuum of problems (BJambda) having the following Lagrange formulation
- this algorithm is performed here for each of a plurality of possible probabilistic distributions (in order to obtain the pre-computed optimal quantizers for the possible distributions to be encountered in practice), and for a plurality of possible numbers M of quanta. It is described below when applied for a given probabistic distribution P and a given number M of quanta.
- the parameter alpha or equivalently the standard deviation ⁇ of the Generalized Gaussian Definition
- D 2 the distortion parameter
- the GGD representing a given DCT channel will be normalized before quantization (i.e. homothetically transformed into a unity standard deviation GGD), and will be de- normalized after de-quantization.
- the parameters in particular here the parameter a or equivalently the standard deviation ⁇ ) of the concerned GGD model are sent to the decoder in the video bit-stream.
- the current values of limits t m and centroids c m define a quantization, i.e. a quantizer, with M quanta, which solves the problem (BJambda), i.e. minimises the cost function for a given value ⁇ , and has an associated rate value R and an distortion value D x .
- a quantization i.e. a quantizer
- M quanta which solves the problem (BJambda), i.e. minimises the cost function for a given value ⁇ , and has an associated rate value R and an distortion value D x .
- Such a process is implemented for many values of the Lagrange parameter ⁇ (for instance 100 values comprised between 0 and 50). It may be noted that for ⁇ equal to 0 ⁇ there is no rate constraint, which corresponds to the so-called Lloyd quantizer.
- optimal quantizers of the general problem (B) are those associated to a point of the upper envelope of the rate-distortion curves making this diagram, each point being associated with a number of quanta (i.e. the number of quanta of the quantizer leading to this point of the rate-distortion curve).
- This upper envelope is illustrated on Figure 27.
- Each curve may in practice be stored in the encoder in a table containing, for a plurality of points on the curve, the rate and distortion (coordinates) of the point concerned, as well as features defining the associated quantizer (here the number of quanta and the values of limits t m and centroids c m for the various quanta). For instance, a few hundreds of quantizers may be stored for each ⁇ up to a maximum rate, e.g. of 5 bits per DCT coefficient, thus forming the pool of quantizers mentioned in Figure 3. It may be noted that a maximum rate of 5 bits per coefficient in the enhancement layer makes it possible to obtain good quality in the decoded image. Generally speaking, it is proposed to use a maximum rate per DCT coefficient equal or less than 10 bits, for which value near lossless coding is provided.
- ⁇ ⁇ is the normalization factor of the DCT coefficient, i.e. the GGD model associated to the DCT coefficient has ⁇ ⁇ for standard deviation, and where criz ' ⁇ 0 in view of the monotonicity just mentioned.
- an estimation of the merit M of encoding may be obtained by computing the ratio of the benefit on distortion to the cost of encoding:
- the ratio of the first order variations provides an explicit
- the initial merit M n ° is defined as the merit of encoding at zero rate, i.e. before any encoding, this initial merit ° can thus be expressed as follows using the
- the initial merit is thus an upper bound of the merit:
- non-nil-rate encoded DCT coefficients which indexes are a left segment of the tuple (n i ,n 2 ,...,n k ,...) .
- N DCT coefficients or channels
- the number of possible configurations that should be envisaged when deciding which coefficients to encode drops from 2 N (decision for each coefficient whether it should be encoded) to N + l (after ordering by decreasing initial merit, the number of coefficient may vary from 0 to N ).
- the encoding priority just mentioned does not specify whether a DCT coefficient is more or less encoded than another DCT coefficient; it indicates however that, if a DCT coefficient is encoded at a non-zero rate, then all coefficients with higher priority must be encoded at a non-zero rate.
- the encoding priority provides an optimal encoding order that may be compared to the non optimal conventional zigzag scan coding order used in MPEG, JPEG, H.264 and HEVC standard video coding.
- parameter ⁇ in the KKT function above is unrelated to the parameter ⁇ used above in the Lagrange formulation of the optimisation problem meant to determine optimal quantizers.
- the rt -th condition is said to be saturated. In the present case, it indicates that the n -th DCT coefficient is not encoded.
- R(t + 1) ⁇ R n (t + 1) , where R n (t + ⁇ ) is the rate associated with the distortion ne/n
- the selected quantizer is the quantizer associated with the corresponding optimal distortion just obtained. It may be recalled in this respect that features (number of quanta, centroids and limit values) defining this quantizer are stored at the encoder in association with the distortion it generates, as already explained.
- the best optimal quantizer associated to this distortion For instance one may take, from the list of optimal quantizers corresponding to the associated parameter ⁇ of the DCT channel model, the quantizer with the least rate among quantizers having distortion less or equal than the target distortion D n .
- quantization is performed by the chosen (or selected) quantizers to obtain the quantized data X DCT Q representing the DCT image.
- these data are symbols corresponding to the index of the quantum (or interval or Voronoi cell in 1 D) in which the value of the concerned coefficient of X DCT falls in.
- the entropy coding may be performed by any known coding technique like VLC coding or arithmetic coding.
- Context adaptive coding CAVLC or CABAC may also be used.
- the encoded data can then be transmitted together with parameters allowing in particular the decoder to use the same quantizers as those selected and used for encoding as described above.
- the transmitted parameters may include the parameters defining the distribution for each DCT channel, i.e. the parameter a (or equiva!ently the standard deviation ⁇ ) and the parameter ⁇ computed at the encoder side for each DCT channel.
- the decoder may deduce the quantizers to be used (a quantizer for each DCT channel) thanks to the selection process explained above at the encoder side (the only difference being that the parameters ⁇ for instance are computed from the original data at the encoder side whereas they are received at the decoder side).
- Dequantization (step 332 of Figure 4) can thus be performed with the selected quantizers (which are the same as those used at encoding because they are selected the same way).
- the parameters transmitted in the data stream include a parameter representative of the set 1° of non- saturated coefficients which was determined at the encoder side to minimize the rate (i.e. the set for which the minimum rate R min was obtained).
- the process of selecting quantizers to be used is thus faster (part I of the process described above only).
- the transmitted parameters may include identifiers of the various quantizers used in the pool of quantizers (this pool being common to the encoder and the decoder) and the standard deviation ⁇ (or equivalently the parameter a ).
- Dequantization (step 332 of Figure 4) can thus be performed at the decoder by use of the identified quantizers.
- Quantizers are chosen 330 from the pool of quantizers, possibly based on these probabilistic distributions.
- an entropy decoder 331 is applied to the received enhancement layer bit-stream 20 to obtain the quantized DCT image X DEC .
- conventional Huffman codes can be used, possibly taking into account the probabilistic distributions;
- a dequantization (or inverse quantization) 332 is then performed by using the chosen quantizers for each coefficient, to obtain a dequantized version of the DCT image.
- the dequantized version is referenced X c , since it is different from the original version X DCT due to the lossy quantization.
- the present embodiment particularly focuses on the rest of the decoding process, from the dequantized DCT coefficients ⁇ ° of that dequantized image, as described now with reference to Figure 31.
- the decoding method comprises a step of merging 38 the dequantized DCT blocks X ⁇ E , C of dequantized DCT coefficients with DCT residual blocks (or "predictor blocks") Y.
- the DCT residual blocks Y are generated by an enhancement prediction module 40 which is further detailed below.
- the DCT blocks X c form a first version of the residual enhancement image currently decoded, while the DCT residual blocks Y form, at least partly, a second version of the same residual enhancement image, that is temporally predicted based on base layer motion information and an already decoded UHD image of the video, as explained below.
- the merger of the blocks X°?, c with the blocks Y may be a probabilistic merging process that is based on the parameters 21 (i.e. the probabilistic distributions of the DCT coefficients as determined by the encoder) and on a second probabilistic distribution that characterizes the temporal prediction of the enhancement layer by the module 40.
- the second probabilistic distribution is a probabilistic distribution of the differences between the coefficients of the DCT residual blocks Y and the dequantized DCT coefficients of the dequantized DCT blocks X°?f .
- Figure 32 illustrates the generation of the DCT residual blocks Y, i.e. of transformed residual blocks of the enhancement image associated with a current image I to decode.
- This prediction successively consists in temporally predicting current enhancement image in the pixel domain (thanks to up-sampled motion information), computing the pixel difference data between temporal predicted image and up-sampled reconstructed base image and then applying a DCT transform on the difference image.
- This motion information comprises, for each block or macroblock, a base motion field BMF (including a motion vector and a reference image index) and a base residual BR, as well-known by one skilled in the art of video coding.
- BMF base motion field
- BR base residual BR
- an image l B of the base layer may be obtained by interpolating other decoded base images.
- the available motion information for those other decoded base images may also be interpolated to provide motion information specific to blocks or macroblocks of the interpolated base image l B .
- the following explanation also applies for such kind of base image l B .
- the corresponding motion information is up-sampled 400 into high resolution corresponding to the resolution of the enhancement layer (e.g. UHD). It is shown in the Figure by the references UMF (up-sampled motion field) and UR (up- sampled residual).
- UMF up-sampled motion field
- UR up- sampled residual
- this up-sampling comprises for each base macroblock:
- up-sampling the partition consists in multiplying the width and height of macroblock partitions by a factor of 2;
- This texture up-sampling process may use an interpolation filter that is identical to that used in inter-layer residual prediction mechanisms of the SVC scalable video compression standard; and - the up-sampling, by two (x- and y- coordinates), of the motion vector associated with the base macroblock.
- the generation of a DCT residual macroblock Y comprises a motion compensated prediction step 405 from the decoded UHD image that temporally corresponds to the reference base image l R B used for the decoding of the base layer, and based on the up-sampled motion information UMF and UR. That decoded UHD image is considered, for the temporal prediction, as the reference decoded image IR UHD .
- This motion compensation 405 in the pixel domain, leads to obtaining, using the motion information, motion predictor blocks from the decoded reference high resolution image l R UHD .
- the up-sampled prediction information is applied to the reference decoded image IR uhd to determine predicted macroblocks.
- the motion compensation results in a partially- reconstructed image. This is because the macroblocks reconstructed by prediction are obtained at spatial positions corresponding to INTER macroblocks in the base image l B only (because there is no motion information for other macroblocks). In other words, there is no predicted block that is generated for the macroblocks collocated with INTRA macroblocks in the base layer.
- residual blocks are obtained by subtracting 410 each motion predictor block from a corresponding (i.e. collocated) up-sampled block in the up-sampled decoded base image (which is obtained by the up-sampling 32 of Figure 2).
- This step calculates the difference image (or residual) between the temporally predicted image and the up-sampled reconstructed base layer image.
- This difference image has the same nature as the residual enhancement image.
- the module 40 ends by applying 415 a block-based transform, e.g. DCT on 8x8 blocks, on the obtained residual blocks to obtain transformed residual blocks that are the DCT residuals Y discussed above.
- a block-based transform e.g. DCT on 8x8 blocks
- a plurality of DCT residual macroblocks Y is obtained for the current image to decode, which generally represent a partial predicted enhancement image.
- next steps of the decoding method according to the invention may be applied to the entirety of that plurality of macroblocks Y, or to a part of it depending for example on the base coding mode (P image Inter prediction, B image Inter prediction, Skip mode) in which case only the DCT predictor macroblocks Y and the dequantized DCT macroblocks X°H C collocated with base macroblocks having the same coding mode are handled together.
- the base coding mode P image Inter prediction, B image Inter prediction, Skip mode
- macroblocks Y collocated with P, B and SKIP base macroblocks are considered separately, as was done at the encoder when determining the probabilistic distribution of each DCT channel.
- a probabilistic distribution may be obtained for the entire set of coefficients of the considered DCT residual macroblocks, or for each DCT channel / in which case the explanation below should be applied for the DCT coefficients of the same channel.
- Each DCT residual macroblock Y made of DCT coefficients for the current image to decode is considered as a version of the original DCT coefficients that would have been altered through a communication channel. It has been observed that the quantity Y-X DCT (i.e. the noise of the residual Y compared to the DCT coefficients before encoding) can be well modelled by a generalized Gaussian distribution as introduced above: DCT (Y - X DCT ) * GGD( N , ⁇ ⁇ )
- the modelling of the predictor noise thus comprises fitting a Generalized Gaussian Distribution model onto the differences between the coefficients in the transformed residual blocks Y and the dequantized transformed coefficients
- the merged value can take the form of a probabilistic estimation of the original DCT coefficients value, given the known quantization interval of this DCT coefficient, and an aside approximation of the coefficient resulting from its motion compensated temporal prediction (blocks Y).
- a merged value according to the invention may be the expectation (or the "expected value") of the considered coefficient given the quantization interval Q m associated with the value of the corresponding quantized transformed coefficient in X DEC and given its corresponding value Y 0 in the residual blocks Y.
- the quantization interval Q m is directly retrieved from the quantized DCT coefficient obtained from the bit-stream 20, since its value a m is the index of the quantization interval Q m given the quantizer used.
- the values X i calculated for the DCT coefficients of all the considered macroblocks are stored in memory to form, at least partially, the merged enhancement image corresponding to the current image to decode.
- the present embodiment has been illustrated and provides significant improvements in rate-distortion, of about several dBs. As explained in detail, the present embodiment relies on:
- Figure 33 illustrates the performance of the present embodiment, in which the rate-distortion curves are plotted when the merging according to the present embodiment is respectively not implemented and implemented.
- the Figure shows that an improvement the codec rate distortion performance is obtained, especially at low bitrates. This may be understood intuitively, since the quantization intervals get larger as the bitrate decreases, therefore increasing the relevant information brought by the temporal DCT residuals Y compared to the quantization level obtained by the dequantization step 332.
- the present embodiment also works for zero bitrate (meaning that no enhancement layer bitstream 20 is encoded or received by the decoder).
- the parameters 21 ( ,, ⁇ , for each DCT channel) are received and are used with the parameters ⁇ ⁇ , ⁇ calculated with the present embodiment to obtain an improvement of the decoding quality of the base layer by several dBs.
- the above performance is obtained with no complexity cost at the encoding side and with no additional bitrate when the parameters 21 are already needed and transmitted (e.g. for selecting the quantizers and/or entropy decoding).
- the complexity increase due to the merging step remains reasonable at the decoding side.
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Compression Or Coding Systems Of Tv Signals (AREA)
Abstract
La présente invention se rapporte à un procédé de codage de données vidéo. Le procédé selon l'invention consiste : à coder des données vidéo ayant une première résolution, conformément à la technique de codage HEVC (13), de sorte à obtenir des données vidéo d'une couche de base (14) ; et à décoder les données vidéo de la couche de base conformément à la technique de codage HEVC (15). Les données vidéo de la couche de base qui ont été décodées sont ensuite échantillonnées à la hausse (16), de sorte à générer des données vidéo décodées ayant une seconde résolution qui est supérieure à la première résolution. Une différence est mesurée (17) entre les données vidéo décodées générées ayant la seconde résolution et d'autres données vidéo qui ont ladite seconde résolution et qui correspondent aux données vidéo ayant la première résolution, de sorte à générer des données d'une image résiduelle. Les données de l'image résiduelle sont alors compressées (18, 19), de sorte à générer des données vidéo d'une couche d'amélioration (20). Le procédé selon l'invention peut être utilisé afin d'obtenir un codec UHD à la complexité réduite, basé sur un codage extensible.
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| GB1111199.4 | 2011-06-30 | ||
| GB201111199A GB2492397A (en) | 2011-06-30 | 2011-06-30 | Encoding and decoding residual image data using probabilistic models |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2013000575A1 true WO2013000575A1 (fr) | 2013-01-03 |
Family
ID=44511904
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/EP2012/002718 WO2013000575A1 (fr) | 2011-06-30 | 2012-06-28 | Procédés et dispositifs pour un codage vidéo extensible |
Country Status (2)
| Country | Link |
|---|---|
| GB (1) | GB2492397A (fr) |
| WO (1) | WO2013000575A1 (fr) |
Cited By (7)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2016032033A1 (fr) * | 2014-08-29 | 2016-03-03 | 전자부품연구원 | Dispositif et procede de diffusion en flux de contenu 4k uhd en nuage |
| US9491459B2 (en) | 2012-09-27 | 2016-11-08 | Qualcomm Incorporated | Base layer merge and AMVP modes for video coding |
| CN114630129A (zh) * | 2022-02-07 | 2022-06-14 | 浙江智慧视频安防创新中心有限公司 | 一种基于智能数字视网膜的视频编解码方法和装置 |
| CN115550669A (zh) * | 2022-11-30 | 2022-12-30 | 摩尔线程智能科技(北京)有限责任公司 | 一种视频转码方法及装置、电子设备和存储介质 |
| WO2023169303A1 (fr) * | 2022-03-10 | 2023-09-14 | 华为技术有限公司 | Procédé et appareil de codage et de décodage, dispositif, support de stockage et produit programme informatique |
| CN117459737A (zh) * | 2023-12-22 | 2024-01-26 | 中国科学技术大学 | 一种图像预处理网络的训练方法和图像预处理方法 |
| WO2025001621A1 (fr) * | 2023-06-30 | 2025-01-02 | 中兴通讯股份有限公司 | Procédé et appareil d'amélioration de qualité d'image pour débit de données de couche vidéo |
Families Citing this family (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| GB2506348B (en) * | 2012-09-14 | 2017-03-08 | Canon Kk | Method for encoding an image of pixels, corresponding decoding method and device |
| US10870398B2 (en) * | 2015-07-28 | 2020-12-22 | Ford Global Technologies, Llc | Vehicle with hyperlapse video and social networking |
Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US6700933B1 (en) * | 2000-02-15 | 2004-03-02 | Microsoft Corporation | System and method with advance predicted bit-plane coding for progressive fine-granularity scalable (PFGS) video coding |
| WO2005057933A1 (fr) * | 2003-12-08 | 2005-06-23 | Koninklijke Philips Electronics N.V. | Procede de compression evolutive spatiale avec zone morte |
| US20090262835A1 (en) | 2001-12-17 | 2009-10-22 | Microsoft Corporation | Skip macroblock coding |
Family Cites Families (7)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP2004180057A (ja) * | 2002-11-28 | 2004-06-24 | Toa Corp | デジタルデータの符号化装置および符号化方法 |
| CN1728827A (zh) * | 2004-07-26 | 2006-02-01 | 皇家飞利浦电子股份有限公司 | 一种视频流分级压缩方法及装置 |
| CN101077013A (zh) * | 2004-12-13 | 2007-11-21 | 皇家飞利浦电子股份有限公司 | 可缩放的图片编码 |
| WO2007044556A2 (fr) * | 2005-10-07 | 2007-04-19 | Innovation Management Sciences, L.L.C. | Procede et appareil pour decodeur video scalable faisant appel a un flux d'amelioration |
| US20090141809A1 (en) * | 2007-12-04 | 2009-06-04 | Sony Corporation And Sony Electronics Inc. | Extension to the AVC standard to support the encoding and storage of high resolution digital still pictures in parallel with video |
| US8295356B2 (en) * | 2008-03-07 | 2012-10-23 | International Business Machines Corporation | Method and system for coding mode selection in video compression systems |
| CN101577825B (zh) * | 2009-05-15 | 2011-09-07 | 武汉大学 | 压缩视频超分辨率中交互式量化噪声计算方法 |
-
2011
- 2011-06-30 GB GB201111199A patent/GB2492397A/en not_active Withdrawn
-
2012
- 2012-06-28 WO PCT/EP2012/002718 patent/WO2013000575A1/fr active Application Filing
Patent Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US6700933B1 (en) * | 2000-02-15 | 2004-03-02 | Microsoft Corporation | System and method with advance predicted bit-plane coding for progressive fine-granularity scalable (PFGS) video coding |
| US20090262835A1 (en) | 2001-12-17 | 2009-10-22 | Microsoft Corporation | Skip macroblock coding |
| WO2005057933A1 (fr) * | 2003-12-08 | 2005-06-23 | Koninklijke Philips Electronics N.V. | Procede de compression evolutive spatiale avec zone morte |
Non-Patent Citations (3)
| Title |
|---|
| BRUNO MACCHIAVELLO ET AL: "A STATISTICAL MODEL FOR A MIXED RESOLUTION WYNER-ZIV FRAMEWORK", 26. PICTURE CODING SYMPOSIUM; LISBON, 7 November 2007 (2007-11-07), XP030080372 * |
| LASSERRE S ET AL: "Low Complexity Scalable Extension of HEVC intra pictures based on content statistics", 9. JCT-VC MEETING; GENEVA; (JOINT COLLABORATIVE TEAM ON VIDEO CODING OF ISO/IEC JTC1/SC29/WG11 AND ITU-T SG.16 ), no. JCTVC-I0190, 26 April 2012 (2012-04-26), XP030052774 * |
| T.M. COVER; J.A. THOMAS: "Elements of Information Theory", 2006, WILEY |
Cited By (8)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US9491459B2 (en) | 2012-09-27 | 2016-11-08 | Qualcomm Incorporated | Base layer merge and AMVP modes for video coding |
| WO2016032033A1 (fr) * | 2014-08-29 | 2016-03-03 | 전자부품연구원 | Dispositif et procede de diffusion en flux de contenu 4k uhd en nuage |
| CN114630129A (zh) * | 2022-02-07 | 2022-06-14 | 浙江智慧视频安防创新中心有限公司 | 一种基于智能数字视网膜的视频编解码方法和装置 |
| WO2023169303A1 (fr) * | 2022-03-10 | 2023-09-14 | 华为技术有限公司 | Procédé et appareil de codage et de décodage, dispositif, support de stockage et produit programme informatique |
| CN115550669A (zh) * | 2022-11-30 | 2022-12-30 | 摩尔线程智能科技(北京)有限责任公司 | 一种视频转码方法及装置、电子设备和存储介质 |
| WO2025001621A1 (fr) * | 2023-06-30 | 2025-01-02 | 中兴通讯股份有限公司 | Procédé et appareil d'amélioration de qualité d'image pour débit de données de couche vidéo |
| CN117459737A (zh) * | 2023-12-22 | 2024-01-26 | 中国科学技术大学 | 一种图像预处理网络的训练方法和图像预处理方法 |
| CN117459737B (zh) * | 2023-12-22 | 2024-03-29 | 中国科学技术大学 | 一种图像预处理网络的训练方法和图像预处理方法 |
Also Published As
| Publication number | Publication date |
|---|---|
| GB2492397A (en) | 2013-01-02 |
| GB201111199D0 (en) | 2011-08-17 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| WO2013000575A1 (fr) | Procédés et dispositifs pour un codage vidéo extensible | |
| US8553769B2 (en) | Method and device for improved multi-layer data compression | |
| US7391807B2 (en) | Video transcoding of scalable multi-layer videos to single layer video | |
| US6671322B2 (en) | Video transcoder with spatial resolution reduction | |
| US10178391B2 (en) | Methods and devices for data compression using a non-uniform reconstruction space | |
| US9142036B2 (en) | Methods for segmenting and encoding an image, and corresponding devices | |
| US20050226335A1 (en) | Method and apparatus for supporting motion scalability | |
| US7088780B2 (en) | Video transcoder with drift compensation | |
| GB2492396A (en) | Decoding a Scalable Video Bit-Stream | |
| US20150063436A1 (en) | Method for encoding and decoding an image, and corresponding devices | |
| EP2575364A1 (fr) | Procédés et dispositifs pour la compression de données à l'aide d'un espace de reconstruction non uniforme | |
| US20130230096A1 (en) | Methods for encoding and decoding an image, and corresponding devices | |
| US6898241B2 (en) | Video transcoder with up-sampling | |
| GB2492394A (en) | Image block encoding and decoding methods using symbol alphabet probabilistic distributions | |
| US20130230102A1 (en) | Methods for encoding and decoding an image, and corresponding devices | |
| Grois et al. | Optimization methods for H. 264/AVC video coding | |
| US20250294159A1 (en) | Decoder-Side Motion Vector Refinement with Scaling | |
| Wang et al. | Rate distortion optimized quantization for H. 264/AVC based on dynamic programming | |
| GB2492395A (en) | Entropy encoding and decoding methods using quantized coefficient alphabets restricted based on flag magnitude | |
| WO2013000973A2 (fr) | Procédé de codage et de décodage d'une image, et dispositifs correspondants | |
| US20130230101A1 (en) | Methods for encoding and decoding an image, and corresponding devices | |
| WO2025174678A1 (fr) | Procédé multi-quantificateur pour compression d'image et de vidéo | |
| WO2025188896A1 (fr) | Construction de candidats à la fusion | |
| WO2025165911A1 (fr) | Amélioration de dérivation de candidats à la fusion à l'aide d'une mise en correspondance intra-modèle | |
| WO2024229038A2 (fr) | Systèmes et procédés de quantification dépendante sur la base d'états de déquantification actuels |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 12733596 Country of ref document: EP Kind code of ref document: A1 |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| 122 | Ep: pct application non-entry in european phase |
Ref document number: 12733596 Country of ref document: EP Kind code of ref document: A1 |