WO2024238680A1 - Chroma prediction mode signaling - Google Patents
Chroma prediction mode signaling Download PDFInfo
- Publication number
- WO2024238680A1 WO2024238680A1 PCT/US2024/029503 US2024029503W WO2024238680A1 WO 2024238680 A1 WO2024238680 A1 WO 2024238680A1 US 2024029503 W US2024029503 W US 2024029503W WO 2024238680 A1 WO2024238680 A1 WO 2024238680A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- block
- intra prediction
- encoded
- chroma
- prediction mode
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/102—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
- H04N19/103—Selection of coding mode or of prediction mode
- H04N19/11—Selection of coding mode or of prediction mode among a plurality of spatial predictive coding modes
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/102—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
- H04N19/13—Adaptive entropy coding, e.g. adaptive variable length coding [AVLC] or context adaptive binary arithmetic coding [CABAC]
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/169—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
- H04N19/17—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
- H04N19/176—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a block, e.g. a macroblock
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/169—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
- H04N19/186—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being a colour or a chrominance component
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/50—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
- H04N19/593—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving spatial prediction techniques
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/70—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by syntax aspects related to video coding, e.g. related to compression standards
Definitions
- Digital video streams may represent video using a sequence of frames or still images.
- Digital video can be used for various applications including, for example, video conferencing, high-definition video entertainment, video advertisements, or sharing of usergenerated videos.
- a digital video stream can contain a large amount of data and consume a significant amount of computing or communication resources of a computing device for processing, transmission, or storage of the video data.
- Various approaches have been proposed to reduce the amount of data in video streams, including compression and other encoding techniques.
- Video coding can exploit spatial and temporal correlations in video signals to achieve good compression efficiency.
- pixels of the current frame and/or a reference frame can be used to generate a prediction block that corresponds to a current block to be encoded. Differences between the prediction block and the current block can be encoded, instead of the values of the current block themselves, to reduce the amount of data encoded.
- This disclosure relates generally to encoding and decoding video data and more particularly relates to techniques for signaling a chroma prediction mode.
- An aspect of the teachings herein is a method for reconstructing a current block.
- the method includes receiving, from an encoded bitstream, an encoded block corresponding to the current block, where the encoded block includes at least a luma block and a chroma block, determining, from the encoded bitstream, that the chroma block was encoded by an intra prediction mode that uses the luma block for coding the chroma block, decoding the chroma block using pixel values of the luma block without generating an ordered list of intra prediction modes, and reconstructing the current block using the chroma block.
- determining that the chroma block as encoded by the intra prediction mode that uses the luma block includes entropy decoding a flag from the encoded bitstream that indicates that the chroma block was encoded by the intra prediction mode that uses the luma block.
- the flag is a binary flag.
- An aspect of the teachings herein is another method for reconstructing a current block.
- the method can include receiving, from an encoded bitstream, an encoded block corresponding to the current block that includes at least a luma block and a chroma block, determining, from the encoded bitstream, that the chroma block was not encoded by an intra prediction mode that uses the luma block for coding the current block, generating, in a defined order, a list of intra prediction modes for decoding the chroma block, determining, from the encoded bitstream, an identifier of an intra prediction mode for the chroma block, decoding the chroma block using the intra prediction mode of the list of intra prediction modes identified by the identifier, and reconstructing the current block using the chroma block.
- determining that the chroma block was not encoded by the intra prediction mode that uses the luma block includes entropy decoding a flag from the encoded bitstream that indicates that the chroma block was not encoded by the intra prediction mode that uses the luma block.
- Entropy decoding the flag can include entropy decoding a binary flag using a first probability model.
- determining the identifier comprises entropy decoding a mode index from the encoded bitstream using a second probability model.
- An aspect of the teachings herein is a non-transitory, computer-readable storage medium storing instructions that, when executed, causes a processor to perform any one of the methods described herein.
- An aspect of the teachings herein is another apparatus for reconstructing a current block.
- the apparatus includes a processor.
- the processor is configured to receive, from an encoded bitstream, an encoded block corresponding to the current block, where the encoded block includes at least a luma block and a chroma block, determine, from the encoded bitstream, if the chroma block was encoded by an intra prediction mode that uses the luma block for coding the chroma block, and in response to determining that the chroma block was encoded by the intra prediction mode that uses the luma block, decode the chroma block using pixel values of the luma block.
- the processor is further configured to, in response to determining that chroma block was not encoded by the intra prediction mode that uses the luma block, generate a list of intra prediction modes in a defined order, determine, from the encoded bitstream, an identifier of an intra prediction mode for the chroma block, decode the chroma block using the intra prediction mode of the list of intra prediction modes that is identified by the identifier, and reconstruct the current block using the chroma block.
- to determine if the chroma block was encoded by the intra prediction mode that uses the luma block includes to decode a symbol that identifies whether the chroma block was encoded by the intra prediction mode that uses the luma block.
- to determine if the chroma block was encoded by the intra prediction mode that uses the luma block includes to entropy decode a binary symbol from the encoded bitstream using a first probability model, wherein the binary symbol identifies whether the chroma block was encoded by the intra prediction mode that uses the luma block.
- to determine the identifier includes to entropy decode a mode index using a probability model.
- An aspect of the teachings herein is an apparatus for encoding a current block.
- the apparatus includes a processor configured to determine whether to encode a chroma block of the current block by an intra prediction mode that uses a luma block of the current block for coding the chroma block, in response to determining to encode the chroma block by the intra prediction mode that uses the luma block, encode the chroma block into an encoded bitstream using pixel values of the luma block, and encode a binary symbol into the encoded bitstream indicating that the chroma block was encoded by the intra prediction mode that uses the luma block.
- the process is configured to, in response to determining to encode the chroma block by an intra prediction mode other than the intra prediction mode that uses the luma block, generate a list of intra prediction modes in a defined order, determine an index that identifies the intra prediction mode for the chroma block, encode the index into the encoded bitstream, encode the chroma block into the encoded bitstream using the intra prediction mode, and encode the luma block into the encoded bitstream.
- the list excludes the intra prediction mode that uses the luma block for coding the chroma block.
- An aspect of the teachings herein is a non-transitory, computer-readable storage medium storing an encoded bitstream.
- the encoded bitstream includes an encoded block including a luma block and a chroma block, a binary flag indicating whether the chroma block was encoded using an intra prediction mode that uses the luma block to encode the chroma block, and only when the binary flag indicates that the chroma block was not encoded using the intra prediction mode that uses the luma block, a mode index identifying an intra prediction mode used to encode the chroma block from an ordered list of intra prediction modes.
- FIG. 1 is a schematic of a video encoding and decoding system.
- FIG. 2 is a block diagram of an example of a computing device that can implement a transmitting station or a receiving station.
- FIG. 3 is a diagram of a typical video stream to be encoded and subsequently decoded.
- FIG. 4 is a block diagram of an encoder according to implementations of this disclosure.
- FIG. 5 is a block diagram of a decoder according to implementations of this disclosure.
- FIG. 6 is a diagram of a representation of a portion of a block in accordance with implementations of this disclosure.
- a video stream can be compressed by a variety of techniques to reduce the bandwidth required to transmit or store the video stream.
- a video stream can be encoded into a bitstream (i.e., a compressed bitstream), which involves compression.
- the compressed bitstream can then be transmitted to a decoder that can decode (decompress, reconstruct, etc.) the compressed bitstream to prepare it for viewing or further processing. Compression of the video stream often exploits spatial and temporal correlation of video signals through spatial and/or motion-compensated prediction.
- Spatial prediction may also be referred to as intra prediction.
- Intra prediction uses previously encoded and decoded pixels from at least one block of the current image or frame other than a current block to be encoded to generate a block (also called a prediction block) that resembles the current block.
- the particular previously encoded and decoded pixels used and how they are combined to form the prediction block are dictated by the intra prediction mode.
- a decoder receiving the encoded signal can recreate the current block.
- Multiple intra prediction modes are possible are available to both luma blocks and chroma blocks forming a block of an image or frame. For example, there may be up to 14 possible intra prediction modes.
- Unique to a chroma block is a so-called chroma-from-luma (CFL) mode.
- the CFL mode uses pixel values of the corresponding luma block to generate a prediction block for the chroma block.
- One way to signal the intra prediction mode of a luma block or a chroma block is to signal a mode index that can be used to identify the prediction mode from a set or list of ordered intra prediction modes.
- a mode index that can be used to identify the prediction mode from a set or list of ordered intra prediction modes.
- one intra prediction mode is used more frequently than the others, for example the CFL mode for intra prediction of chroma blocks
- expressly signaling whether that prediction mode is used can be more efficient than identifying that prediction mode from the list. Further details of this improved signaling are described hereinbelow after a description of the environment in which the teachings herein may be implemented.
- a network 104 can connect the transmitting station 102 and a receiving station 106 for encoding and decoding of the video stream.
- the video stream can be encoded in the transmitting station 102 and the encoded video stream can be decoded in the receiving station 106.
- the network 104 can be, for example, the Internet.
- the network 104 can also be a local area network (LAN), wide area network (WAN), virtual private network (VPN), cellular telephone network or any other means of transferring the video stream from the transmitting station 102 to, in this example, the receiving station 106.
- the receiving station 106 in one example, can be a computer having an internal configuration of hardware such as that described in FIG. 2. However, other suitable implementations of the receiving station 106 are possible. For example, the processing of the receiving station 106 can be distributed among multiple devices.
- the transmitting station 102 and/or the receiving station 106 may include the ability to both encode and decode a video stream as described below.
- the receiving station 106 could be a video conference participant who receives an encoded video bitstream from a video conference server (e.g., the transmitting station 102) to decode and view and further encodes and transmits its own video bitstream to the video conference server for decoding and viewing by other participants.
- the computing device 200 can also include or be in communication with an imagesensing device 220, for example a camera, or any other image-sensing device 220 now existing or hereafter developed that can sense an image such as the image of a user operating the computing device 200.
- the image-sensing device 220 can be positioned such that it is directed toward the user operating the computing device 200.
- the position and optical axis of the image-sensing device 220 can be configured such that the field of vision includes an area that is directly adjacent to the display 218 and from which the display 218 is visible.
- the computing device 200 can also include or be in communication with a soundsensing device 222, for example a microphone, or any other sound-sensing device now existing or hereafter developed that can sense sounds near the computing device 200.
- the soundsensing device 222 can be positioned such that it is directed toward the user operating the computing device 200 and can be configured to receive sounds, for example, speech or other utterances, made by the user while the user operates the computing device 200.
- FIG. 2 depicts the CPU 202 and the memory 204 of the computing device 200 as being integrated into a single unit, other configurations can be utilized.
- the operations of the CPU 202 can be distributed across multiple machines (wherein individual machines can have one or more of processors) that can be coupled directly or across a local area or other network.
- the memory 204 can be distributed across multiple machines such as a networkbased memory or memory in multiple machines performing the operations of the computing device 200.
- the bus 212 of the computing device 200 can be composed of multiple buses.
- the secondary storage 214 can be directly coupled to the other components of the computing device 200 or can be accessed via a network and can comprise an integrated unit such as a memory card or multiple units such as multiple memory cards.
- FIG. 3 is a diagram of an example of a video stream 300 to be encoded and subsequently decoded.
- the video stream 300 includes a video sequence 302.
- the video sequence 302 includes multiple adjacent frames 304. While three frames are depicted as the adjacent frames 304, the video sequence 302 can include any number of adjacent frames 304.
- the adjacent frames 304 can then be further subdivided into individual frames, e.g., a frame 306.
- the frame 306 can be divided into a series of planes or segments 308.
- the segments 308 can be subsets of frames that permit parallel processing, for example.
- the segments 308 can also be subsets of frames that can separate the video data into separate colors.
- a frame 306 of color video data can include a luminance plane and two chrominance planes.
- the segments 308 may be sampled at different resolutions.
- FIG. 4 is a block diagram of an encoder 400 according to implementations of this disclosure.
- the encoder 400 can be implemented, as described above, in the transmitting station 102 such as by providing a computer software program stored in memory, for example, the memory 204.
- the computer software program can include machine instructions that, when executed by a processor such as the CPU 202, cause the transmitting station 102 to encode video data in the manner described in FIG. 4.
- the encoder 400 can also be implemented as specialized hardware included in, for example, the transmitting station 102.
- the encoder 400 may be a hardware encoder.
- the encoder 400 has the following stages to perform the various functions in a forward path (shown by the solid connection lines) to produce an encoded or compressed bitstream 420 using the video stream 300 as input: an intra/inter prediction stage 402, a transform stage 404, a quantization stage 406, and an entropy encoding stage 408.
- the encoder 400 may also include a reconstruction path (shown by the dotted connection lines) to reconstruct a frame for encoding of future blocks.
- the encoder 400 has the following stages to perform the various functions in the reconstruction path: a dequantization stage 410, an inverse transform stage 412, a reconstruction stage 414, and a loop filtering stage 416.
- Other structural variations of the encoder 400 can be used to encode the video stream 300.
- respective frames 304 can be processed in units of blocks.
- respective blocks can be encoded using intra-frame prediction (also called intra-prediction) or inter-frame prediction (also called inter-prediction).
- intra-prediction also called intra-prediction
- inter-frame prediction also called inter-prediction
- a prediction block can be formed.
- intra-prediction a prediction block may be formed from samples in the current frame that have been previously encoded and reconstructed.
- interprediction a prediction block may be formed from samples in one or more previously constructed reference frames.
- the prediction block can be subtracted from the current block at the intra/inter prediction stage 402 to produce a residual block (also called a residual).
- the transform stage 404 transforms the residual into transform coefficients in, for example, the frequency domain using block-based transforms.
- the quantization stage 406 converts the transform coefficients into discrete quantum values, which are referred to as quantized transform coefficients, using a quantizer value or a quantization level. For example, the transform coefficients may be divided by the quantizer value and truncated.
- the quantized transform coefficients are then entropy encoded by the entropy encoding stage 408.
- the entropy-encoded coefficients, together with other information used to decode the block, which may include for example the type of prediction used, transform type, motion vectors and quantizer value, are then output to the compressed bitstream 420.
- the compressed bitstream 420 can be formatted using various techniques, such as variable length coding (VLC) or arithmetic coding.
- VLC variable length coding
- the compressed bitstream 420 can also be referred to as an encoded video stream or encoded video bitstream, and the terms will be used interchangeably herein.
- the reconstruction path in FIG. 4 can be used to ensure that the encoder 400 and a decoder 500 (described below) use the same reference frames to decode the compressed bitstream 420.
- the reconstruction path performs similar functions to those functions that take place during the decoding process that are discussed in more detail below, including dequantizing the quantized transform coefficients at the dequantization stage 410 and inverse transforming the dequantized transform coefficients at the inverse transform stage 412 to produce a derivative residual block (also called a derivative residual).
- the prediction block that was predicted at the intra/inter prediction stage 402 can be added to the derivative residual to create a reconstructed block.
- the loop filtering stage 416 can be applied to the reconstructed block to reduce distortion such as blocking artifacts.
- encoder 400 can be used to encode the compressed bitstream 420.
- a non-transform-based encoder can quantize the residual signal directly without the transform stage 404 for certain blocks or frames.
- an encoder can have the quantization stage 406 and the dequantization stage 410 combined in a common stage.
- FIG. 5 is a block diagram of a decoder 500 according to implementations of this disclosure.
- the decoder 500 can be implemented in the receiving station 106, for example, by providing a computer software program stored in the memory 204.
- the computer software program can include machine instructions that, when executed by a processor such as the CPU 202, cause the receiving station 106 to decode video data in the manner described herein.
- the decoder 500 can also be implemented in hardware included in, for example, the transmitting station 102 or the receiving station 106.
- the decoder 500 may be a hardware decoder.
- the decoder 500 like the reconstruction path of the encoder 400 discussed above, includes in one example the following stages to perform various functions to produce an output video stream 516 from the compressed bitstream 420: an entropy decoding stage 502, a dequantization stage 504, an inverse transform stage 506, an intra/inter prediction stage 508, a reconstruction stage 510, a loop filtering stage 512 and a post filtering stage 514. Other structural variations of the decoder 500 can be used to decode the compressed bitstream 420. [0055] When the compressed bitstream 420 is presented for decoding, the data elements within the compressed bitstream 420 can be decoded by the entropy decoding stage 502 to produce a set of quantized transform coefficients.
- the dequantization stage 504 dequantizes the quantized transform coefficients (e.g., by multiplying the quantized transform coefficients by the quantizer value), and the inverse transform stage 506 inverse transforms the dequantized transform coefficients to produce a derivative residual that can be identical to that created by the inverse transform stage 412 in the encoder 400.
- the decoder 500 can use the intra/inter prediction stage 508 to create the same prediction block as was created in the encoder 400, e.g., at the intra/inter prediction stage 402.
- the prediction block can be added to the derivative residual to create a reconstructed block.
- the loop fdtering stage 512 can be applied to the reconstructed block to reduce blocking artifacts.
- the post filtering stage 514 can be a deblocking filter that is applied to the reconstructed block to reduce blocking distortion, and the result is output as the output video stream 516.
- the output video stream 516 can also be referred to as a decoded video stream, and the terms will be used interchangeably herein.
- Other variations of the decoder 500 can be used to decode the compressed bitstream 420.
- the decoder 500 can produce the output video stream 516 without the post filtering stage 514.
- FIG. 6 is a block diagram of a representation of a portion 600 of a frame, such as the frame 330 shown in FIG. 3, in accordance with implementations of this disclosure.
- the portion 600 of the frame includes four 64x64 blocks 610, in two rows and two columns in a matrix or Cartesian plane.
- Each 64x64 block may include four 32x32 blocks 620.
- Each 32x32 block may include four 16x16 blocks 630.
- Each 16x16 block may include four 8x8 blocks 640.
- Each 8x8 block 640 may include four 4x4 blocks 650.
- Each 4x4 block 650 may include 16 pixels, which may be represented in four rows and four columns in each respective block in the Cartesian plane or matrix.
- the pixels may include information representing an image captured in the frame, such as luminance information, color information, and location information.
- a block such as a 16x16 pixel block as shown, may include a luminance block 660, which may include luminance pixels 662; and two chrominance blocks 670, 680, such as a U or Cb chrominance block 670, and a V or Cr chrominance block 680.
- the chrominance blocks 670, 680 may include chrominance pixels 690.
- the luminance block 660 may include 16x16 luminance pixels 662 and each chrominance block 670, 680 may include 8x8 chrominance pixels 690 as shown. Although one arrangement of blocks is shown, any arrangement may be used. Although FIG. 6 shows NxN blocks, in some implementations, NxM blocks may be used. For example, 32x64 blocks, 64x32 blocks, 16x32 blocks, 32x16 blocks, or any other size blocks may be used. In some implementations, Nx2N blocks, 2NxN blocks, or a combination thereof may be used.
- video coding may include ordered block-level coding.
- Ordered block-level coding may include coding blocks of a frame in an order, such as rasterscan order, wherein blocks may be identified and processed starting with a block in the upper left corner of the frame, or portion of the frame, and proceeding along rows from left to right and from the top row to the bottom row, identifying each block in turn for processing.
- the 64x64 block in the top row and left column of a frame may be the first block coded and the 64x64 block immediately to the right of the first block may be the second block coded.
- the second row from the top may be the second row coded, such that the 64x64 block in the left column of the second row may be coded after the 64x64 block in the rightmost column of the first row.
- coding a block may include using quad-tree coding, which may include coding smaller block units within a block in raster-scan order.
- quad-tree coding may include coding smaller block units within a block in raster-scan order.
- the 64x64 block shown in the bottom left comer of the portion of the frame shown in FIG. 6 may be coded using quad-tree coding wherein the top left 32x32 block may be coded, then the top right 32x32 block may be coded, then the bottom left 32x32 block may be coded, and then the bottom right 32x32 block may be coded.
- Each 32x32 block may be coded using quad-tree coding wherein the top left 16x16 block may be coded, then the top right 16x16 block may be coded, then the bottom left 16x16 block may be coded, and then the bottom right 16x16 block may be coded.
- Each 16x16 block may be coded using quad-tree coding wherein the top left 8x8 block may be coded, then the top right 8x8 block may be coded, then the bottom left 8x8 block may be coded, and then the bottom right 8x8 block may be coded.
- Each 8x8 block may be coded using quad-tree coding wherein the top left 4x4 block may be coded, then the top right 4x4 block may be coded, then the bottom left 4x4 block may be coded, and then the bottom right 4x4 block may be coded.
- 8x8 blocks may be omitted for a 16x16 block, and the 16x16 block may be coded using quad-tree coding wherein the top left 4x4 block may be coded, then the other 4x4 blocks in the 16x16 block may be coded in raster-scan order.
- video coding may include compressing the information included in an original, or input, frame by, for example, omitting some of the information in the original frame from a corresponding encoded frame.
- coding may include reducing spectral redundancy, reducing spatial redundancy, reducing temporal redundancy, or a combination thereof.
- reducing spectral redundancy may include using a color model based on a luminance component (Y) and two chrominance components (U and V or Cb and Cr), which may be referred to as the YUV or YCbCr color model, or color space.
- YUV color model may include using a relatively large amount of information to represent the luminance component of a portion of a frame and using a relatively small amount of information to represent each corresponding chrominance component for the portion of the frame.
- a portion of a frame may be represented by a high-resolution luminance component, which may include a 16x16 block of pixels, and by two lower resolution chrominance components, each of which represents the portion of the frame as an 8x8 block of pixels.
- a pixel may indicate a value, for example, a value in the range from 0 to 255, and may be stored or transmitted using, for example, eight bits.
- reducing spatial redundancy may include transforming a block into the frequency domain using, for example, a discrete cosine transform (DCT).
- DCT discrete cosine transform
- a unit of an encoder such as the transform stage 404 shown in FIG. 4, may perform a DCT using transform coefficient values based on spatial frequency.
- reducing temporal redundancy may include using similarities between frames to encode a frame using a relatively small amount of data based on one or more reference frames, which may be previously encoded, decoded, and reconstructed frames of the video stream.
- a block or pixel of a current frame may be similar to a spatially corresponding block or pixel of a reference frame.
- a block or pixel of a current frame may be similar to block or pixel of a reference frame at a different spatial location and reducing temporal redundancy may include generating motion information indicating the spatial difference, or translation, between the location of the block or pixel in the current frame and corresponding location of the block or pixel in the reference frame.
- reducing temporal redundancy may include identifying a portion of a reference frame that corresponds to a current block or pixel of a current frame.
- a reference frame, or a portion of a reference frame, which may be stored in memory may be searched to identify a portion for generating a prediction to use for encoding a current block or pixel of the current frame with maximal efficiency.
- the search may identify a portion of the reference frame for which the difference in pixel values between the current block and a prediction block generated based on the portion of the reference frame is minimized and may be referred to as motion searching.
- the portion of the reference frame searched may be limited.
- the portion of the reference frame searched which may be referred to as the search area, may include a limited number of rows of the reference frame.
- identifying the portion of the reference frame for generating a prediction may include calculating a cost function, such as a sum of absolute differences (SAD), between the pixels of portions of the search area and the pixels of the current block.
- SAD sum of absolute differences
- the spatial difference between the location of the portion of the reference frame for generating a prediction in the reference frame and the current block in the current frame may be represented as a motion vector.
- the difference in pixel values between the prediction block and the current block may be referred to as differential data, residual data, a prediction error, or as a residual block.
- generating motion vectors may be referred to as motion estimation, and a pixel of a current block may be indicated based on location using Cartesian coordinates as fx,y. Similarly, a pixel of the search area of the reference frame may be indicated based on location using Cartesian coordinates as rx,y.
- a motion vector (MV) for the current block may be determined based on, for example, a SAD between the pixels of the current frame and the corresponding pixels of the reference frame.
- a frame may be stored, transmitted, processed, or any combination thereof, in any data structure such that pixel values may be efficiently represented for a frame or image.
- a frame may be stored, transmitted, processed, or any combination thereof, in a two-dimensional data structure such as a matrix as shown, or in a one-dimensional data structure, such as a vector array.
- a representation of the frame such as a two-dimensional representation as shown, may correspond to a physical location in a rendering of the frame as an image.
- a location in the top left corner of a block in the top left comer of the frame may correspond with a physical location in the top left corner of a rendering of the frame as an image.
- block-based coding efficiency may be improved by partitioning input blocks into one or more prediction partitions, which may be rectangular, including square, partitions for prediction coding.
- video coding using prediction partitioning may include selecting a prediction partitioning scheme from among multiple candidate prediction partitioning schemes.
- candidate prediction partitioning schemes for a 64x64 coding unit may include rectangular size prediction partitions ranging in sizes from 4x4 to 64x64, such as 4x4, 4x8, 8x4, 8x8, 8x16, 16x8, 16x16, 16x32, 32x16, 32x32, 32x64, 64x32, or 64x64.
- video coding using prediction partitioning may include a full prediction partition search, which may include selecting a prediction partitioning scheme by encoding the coding unit using each available candidate prediction partitioning scheme and selecting the best scheme, such as the scheme that produces the least rate-distortion error.
- encoding a video frame may include identifying a prediction partitioning scheme for encoding a current block, such as block 610.
- identifying a prediction partitioning scheme may include determining whether to encode the block as a single prediction partition of maximum coding unit size, which may be 64x64 as shown, or to partition the block into multiple prediction partitions, which may correspond with the sub-blocks, such as the 32x32 blocks 620 the 16x16 blocks 630, or the 8x8 blocks 640, as shown, and may include determining whether to partition into one or more smaller prediction partitions. For example, a 64x64 block may be partitioned into four 32x32 prediction partitions. Three of the four 32x32 prediction partitions may be encoded as 32x32 prediction partitions and the fourth 32x32 prediction partition may be further partitioned into four 16x16 prediction partitions.
- identifying the prediction partitioning scheme may include using a prediction partitioning decision tree.
- video coding for a current block may include identifying an optimal prediction coding mode from multiple candidate prediction coding modes, which may provide flexibility in handling video signals with various statistical properties and may improve the compression efficiency.
- a video coder may evaluate each candidate prediction coding mode to identify the optimal prediction coding mode, which may be, for example, the prediction coding mode that minimizes an error metric, such as a rate-distortion cost, for the current block.
- the complexity of searching the candidate prediction coding modes may be reduced by limiting the set of available candidate prediction coding modes based on similarities between the current block and a corresponding prediction block.
- the complexity of searching each candidate prediction coding mode may be reduced by performing a directed refinement mode search.
- metrics may be generated for a limited set of candidate block sizes, such as 16x16, 8x8, and 4x4, the error metric associated with each block size may be in descending order, and additional candidate block sizes, such as 4x8 and 8x4 block sizes, may be evaluated.
- block-based coding efficiency may be improved by partitioning a current residual block into one or more transform partitions, which may be rectangular, including square, partitions for transform coding.
- video coding using transform partitioning may include selecting a uniform transform partitioning scheme.
- a current residual block such as block 610, may be a 64x64 block and may be transformed without partitioning using a 64x64 transform.
- a residual block may be transform partitioned using a uniform transform partitioning scheme.
- a 64x64 residual block may be transform partitioned using a uniform transform partitioning scheme including four 32x32 transform blocks, using a uniform transform partitioning scheme including sixteen 16x16 transform blocks, using a uniform transform partitioning scheme including sixty-four 8x8 transform blocks, or using a uniform transform partitioning scheme including 256 4x4 transform blocks.
- video coding using transform partitioning may include identifying multiple transform block sizes for a residual block using multiform transform partition coding.
- multiform transform partition coding may include recursively determining whether to transform a current block using a current block size transform or by partitioning the current block and multiform transform partition coding each partition.
- the bottom left block 610 shown in FIG. 6 may be a 64x64 residual block
- multiform transform partition coding may include determining whether to code the current 64x64 residual block using a 64x64 transform or to code the 64x64 residual block by partitioning the 64x64 residual block into partitions, such as four 32x32 blocks 620, and multiform transform partition coding each partition.
- determining whether to transform partition the current block may be based on comparing a cost for encoding the current block using a current block size transform to a sum of costs for encoding each partition using partition size transforms.
- one way to identify an intra prediction mode used to code a luma block or a chroma block is to signal a mode index that can be used to identify the intra prediction mode from a set or list of ordered prediction modes.
- a mode index that can be used to identify the intra prediction mode from a set or list of ordered prediction modes.
- the intra prediction modes are ordered based on intra prediction modes of neighboring blocks, co-located blocks, or some combination thereof.
- the mode index can be used to infer the prediction mode after the ordering.
- the available intra prediction modes may be ordered in separate sets, and a set index may also be signaled to infer the set from which to select the prediction mode using the mode index.
- a defined cardinality of intra prediction modes may be adaptively selected and included in only one intra mode set.
- the ordering process is defined (e.g., predefined at each of the encoder and decoder).
- An example of the ordering process is as follows. First, if the co-located luma block prediction mode is an intra prediction directional mode, the directional mode is added to the top of the list (e.g., index 0). Thereafter, available intra prediction non-directional modes may be added in a defined order. In some implementations, the following non-directional modes may be added to the list in order: UV_DC_PRED, UV_SMOOTH_PRED,
- UV SMOOTH V PRED UV SMOOTH H PRED
- UV PAETH PRED UV PAETH PRED.
- the list may be filled with directional modes (e.g., based on those used for neighboring/co-located chroma blocks), while excluding the one already added to the top of the list, if any.
- the CFL mode may be added to the bottom of the list (e.g., at the lowest rank).
- An index (the mode index) is signaled in the bitstream using entropy coding with a defined cumulative distribution function (CDF) table, also referred to herein as a probability model.
- CDF cumulative distribution function
- An encoder such as the encoder 400, selects a prediction mode using a tradeoff between the number of bits (e.g., the rate) required for coding a block using the prediction mode and the quality (e.g., the degradation or distortion level) of the image.
- the prediction modes (or a subset thereof) are used to code the block, and a prediction mode that best meets the requirements of the tradeoff is selected. In some implementations, this is referred to as a rate-distortion optimization (RDO) decision process or simply as the RDO.
- RDO rate-distortion optimization
- a single prediction mode is often selected at a higher rate than the remaining prediction modes in the RDO.
- the CFL mode may be frequently selected.
- the low rank in the list means that the CFL mode has a relatively high mode index, which conventionally results in a longer signal within the bitstream.
- One way to address this issue is to give the CFL mode a higher rank in the ordering (also called a reordering) process, thus resulting in a smaller mode index (closer to 0) to be coded.
- Another way to address this issue is to modify the entropy coding of the CFL mode.
- FIG. 7 is a flowchart diagram of a method or process 700 for reconstructing a current block of an image using chroma prediction mode signaling.
- the process 700 can be implemented, for example, as a software program that may be executed by computing devices such as the receiving station 106 of FIG. 1.
- the software program can include machine- readable instructions that may be stored in a memory such as the memory 204 or the secondary storage 214, and that when executed by a processor, such as CPU 202, may cause the computing device to perform the process 700.
- the process may be implemented using specialized hardware or firmware. Multiple processors, memories, or both, may be used.
- the process 700 may be performed at a decoder, such as at the decoder 500.
- an encoded block corresponding to a current block is received.
- the encoded block is received from an encoded bitstream, such as the compressed bitstream 420 from FIG. 4.
- the encoded block includes a luma block and a chroma block that are encoded into the bitstream.
- the teachings herein are discussed with regards to a chroma block of the encoded block, in general the encoded block would include two chroma blocks. Accordingly, the teachings may be applied to each of the chroma blocks. Alternatively, the teachings may be applied to one chroma block to signal a single intra prediction mode.
- the process 700 determines if the chroma block was encoded using an intra prediction mode that uses (e.g., pixels of) the luma block, more conventionally known as a CFL mode. Determining if the chroma block was encoded using an intra prediction mode that uses the luma block is accomplished by reading a signal or indicator from the encoded bitstream. For example, if the chroma block was encoded in this way, the signal (or flag) in the encoded bitstream may be set to 1. The signal may be decoded from the header for the block in some implementations.
- the signal may be decoded at an entropy decoding stage, such as the entropy decoding stage 502.
- the symbol representing the signal may be entropy decoded using a dedicated CDF table with a context size of 3 (e.g., two bits).
- the context of the symbol may be the sum of the count of CFL modes of above and left neighbor blocks of the current block. For example, the prediction modes of the top block adjacent to the current block, the left block adjacent to the current block, and the top-left comer block adjacent to the current block are considered. For each block using the CFL mode, the count is increased for the context, and the CDF table associated with the total count was used to encode the flag or signal and is used to decode the flag or signal.
- the process 700 decodes the chroma block using that intra prediction mode.
- the process determines that the chroma block was encoded using this method, the process will set the intra prediction mode to a CFL mode (such as UV_CFL_PRED) and optionally set the angle that indicates the directional prediction angle (angle_delta) to 0.
- This mode makes the decoder 500 decode the chroma block by modifying the pixel values of the luma block after they are decoded and the luma block is reconstructed to form a prediction block for the chroma block.
- Modifying the pixel values of the reconstructed luma block may be performed by any known technique for a CFL mode.
- the resulting chroma prediction block may be added to the residual for the chroma block decoded from the bitstream to generate the chroma block.
- a predicted chroma pixel value Pred c of a chroma-from-luma prediction block may be represented by the following equation.
- Pred c of a chroma-from-luma prediction block may be represented by the following equation.
- DC Y may be the average of the current reconstructed luma block samples (e.g., pixel values downsampled if the chroma blocks are subsampled), and DC c is the average of the chroma samples (pixel values) neighboring the current chroma pixel.
- DC Y can be calculated based on neighboring (e.g., downsampled) reconstructed luma samples to match the samples for the calculation of DC c .
- Corresponding (e.g., co-located) luma samples Rec Y e.g., downsampled luma values
- a is a scaling factor that can be explicitly signaled in (and decoded from) the bitstream or can be implicitly derived by the decoder.
- the scaling factor may be derived implicitly by using neighboring reconstructed chroma samples Rec c (e.g., chroma pixel values from an adjacent chroma block) and their corresponding luma samples and using a least square error or some other technique to minimizes difference between the pixel values.
- the least square error may be represented by the following equation, which can be solved for the scaling factor a by minimizing the function Sum. is the average of the chroma samples (pixel values) neighboring the current chroma pixel.
- DC Y can be calculated based on neighboring (e.g., downsampled) reconstructed luma samples to match the samples for the calculation of DC c .
- a is a scaling factor that can be explicitly signaled in (and decoded from) the bitstream or can be implicitly derived by the decoder.
- the scaling factor may be derived implicitly by using neighboring reconstructed chroma samples Rec c (e.g., chroma pixel values from an adjacent chroma block) and their corresponding luma samples and using a least square error or some other technique to minimizes difference between the pixel values.
- the least square error may be represented by the following equation, which can be solved for the scaling factor a by minimizing the function Sum.
- a is a scaling factor that can be explicitly signaled in (and decoded from) the bitstream or can be implicitly derived by the decoder.
- the scaling factor may be derived implicitly by using neighboring reconstructed chroma samples Rec c (e.g., chroma pixel values from an adjacent chroma block) and their corresponding luma samples and using a least square error or some other technique to minimizes difference between the pixel values.
- the least square error may be represented by the following equation, which can be solved for the scaling factor a by minimizing the function Sum.
- Corresponding (e.g., co-located) luma samples Rec y (e.g., downsampled luma values) are used, and a is a scaling factor that can be explicitly signaled in (and decoded from) the bitstream or can be implicitly derived by the decoder.
- the scaling factor may be derived implicitly by using neighboring reconstructed chroma samples Rec c (e.g., chroma pixel values from an adjacent chroma block) and their corresponding luma samples and using a least square error or some other technique to minimizes difference between the pixel values.
- the least square error may be represented by the following equation, which can be solved for the scaling factor a by minimizing the function Sum.
- a is a scaling factor that can be explicitly signaled in (and decoded from) the bitstream or can be implicitly derived by the decoder.
- the scaling factor may be derived implicitly by using neighboring reconstructed chroma samples Rec c (e.g., chroma pixel values from an adjacent chroma block) and their corresponding luma samples and using a least square error or some other technique to minimizes difference between the pixel values.
- the least square error may be represented by the following equation, which can be solved for the scaling factor a by minimizing the function Sum.
- the scaling factor is a scaling factor that can be explicitly signaled in (and decoded from) the bitstream or can be implicitly derived by the decoder.
- the scaling factor may be derived implicitly by using neighboring reconstructed chroma samples Rec c (e.g., chroma pixel values from an adjacent chroma block) and their corresponding luma samples and using a least square error or some other technique to minimizes difference between the pixel values.
- the least square error may be represented by the following equation, which can be solved for the scaling factor a by minimizing the function Sum.
- the least square error may be represented by the following equation, which can be solved for the scaling factor a by minimizing the function Sum. by minimizing the function Sum.
- the process 700 in response to determining that the chroma block was not encoded using the intra prediction mode that uses the luma block, the process 700 generates a list of prediction modes in a defined sequence. For example, an ordered list of intra prediction modes as described above may be generated by first including the associated intra prediction mode of the co-located luma block if the prediction mode is a directional intra prediction mode. Then the following non-directional modes may be added to the list in order: UV_DC_PRED, UV_SMOOTH_PRED, UV_SMOOTH_V_PRED, UV_SMOOTH_H_PRED, UV_PAETH_PRED. Then the list may be filled with the remaining directional modes (e.g., based on angle or adjacent block intra prediction modes). The list preferably omits the CFL mode (UV_CFL_PRED).
- the process 700 determines an identifier of an intra prediction mode for the chroma block from the encoded bitstream.
- the identifier may be determined by reading the mode index from the encoded bitstream.
- the mode index is the index of the intra prediction mode in the list of intra prediction modes generated by operation 708. If the intra prediction mode identified is a directional intra prediction mode, then the angle_delta is set accordingly for the identified intra prediction mode. If the selected intra prediction mode is a non-directional intra prediction mode, the angle may be set to 0. That is, if the current intra prediction mode is determined to not be UV_CFL_PRED, then the mode index may be read from the encoded bitstream, and the ordered list of intra prediction modes is generated.
- the mode index may be encoded using a separate CDF table.
- the process 700 decodes the chroma block using the identified intra prediction mode from the list of generated intra prediction modes.
- the intra prediction mode may be a non-directional intra prediction mode such as UV_DC_PRED, in which case the angle_delta may be set to 0, and the decoder will decode the chroma block using UV_DC_PRED as the identified intra prediction mode.
- the intra prediction mode may be determined to be a directional intra prediction mode. In this case, the angle_delta is set according to the identified intra prediction mode and the decoder decodes the chrome block using the set parameters. In either case, the resulting prediction block may be added to the residual for the chroma block that is decoded from the bitstream to reconstruct the chroma block.
- the process 700 reconstructs the current block using the chroma and luma block as decoded.
- the techniques described herein can reduce computing time at the decoder. For example, if it is determined that the chroma block was encoded by the intra prediction mode that uses the luma block for encoding the chroma block (e.g., the CFL mode), the ordered list of intra prediction modes does not have to be generated by the decoder. That is, decoding the chroma block using the luma block may be done without selecting an intra prediction mode from an ordered list of intra prediction modes.
- a conventional rate-distortion optimization (RDO) technique may be used to select the prediction mode for the chroma block except that it is desirable that the encoder search the CFL mode first.
- the signal e.g., a binary symbol
- the CFL mode may be encoded into the bitstream (e.g., in the header for the block or for the chroma block), along with the chroma residual, as applicable.
- the encoder When the CFL mode is not selected by the encoder and the encoding mode is an intra prediction mode (not an inter prediction mode), the encoder generates the same ordered list that the decoder will generate and encodes the mode index for the decoder to receive and use in decoding the chroma block.
- the most common intra prediction mode selected for a chroma block is a CFL mode.
- the techniques can be used where the most common intra prediction mode is other than the CFL mode by modifying the query at operation 704 to determine whether the chroma block was encoded using the most common intra prediction mode, when applicable reorder the remaining intra prediction modes, and, optionally, perform RDO at the encoder by testing the most common intra prediction mode first.
- the probability models e.g., the CDF tables
- entropy coding e.g., encoding and decoding
- an additional, subsequent inquiry may be made to determine whether the luma block was encoded using the most common intra prediction mode within the set. If the luma block was not encoded using the most common set of intra prediction modes, the remaining sets may be reordered, the mode index decoded, and the luma block reconstructed.
- the set index for a luma block can be decoded. Each set of intra prediction modes may have a most common intra prediction mode. Then, the query at operation 704 checks to see whether the most common intra prediction mode for the set is used for the luma block. If not, the set can be reordered to select the mode from the mode index to reconstruct the luma block.
- this disclosure means the set of intra prediction modes of the multiple sets whose intra prediction modes are most often selected in coding some or all of one or more video sequences.
- this disclosure means the intra prediction mode of the available intra prediction modes that is most often selected in coding some or all of one or more video sequences.
- example is used herein to mean serving as an example, instance, or illustration. Any aspect or design described herein as “example” is not necessarily to be construed as preferred or advantageous over other aspects or designs. Rather, use of the word “example” is intended to present concepts in a concrete fashion.
- the term “or” is intended to mean an inclusive “or” rather than an exclusive “or”. That is, unless specified otherwise, or clear from context, “X includes A or B” is intended to mean any of the natural inclusive permutations. That is, if X includes A; X includes B; or X includes both A and B, then “X includes A or B” is satisfied under any of the foregoing instances.
- Implementations of the transmitting station 102 and/or the receiving station 106 can be realized in hardware, software, or any combination thereof.
- the hardware can include, for example, computers, intellectual property (IP) cores, application-specific integrated circuits (ASICs), programmable logic arrays, optical processors, programmable logic controllers, microcode, microcontrollers, servers, microprocessors, digital signal processors or any other suitable circuit.
- IP intellectual property
- ASICs application-specific integrated circuits
- programmable logic arrays optical processors
- programmable logic controllers programmable logic controllers
- microcode microcontrollers
- servers microprocessors, digital signal processors or any other suitable circuit.
- the transmitting station 102 or the receiving station 106 can be implemented using a general-purpose computer or general-purpose processor with a computer program that, when executed, carries out any of the respective methods, algorithms and/or instructions described herein.
- a special purpose computer/processor can be utilized that contains other hardware for carrying out any of the methods, algorithms, or instructions described herein.
- the transmitting station 102 and the receiving station 106 can, for example, be implemented on computers in a video conferencing system.
- the transmitting station 102 can be implemented on a server and the receiving station 106 can be implemented on a device separate from the server, such as a hand-held communications device.
- the transmitting station 102 can encode content using an encoder 400 into an encoded video signal and transmit the encoded video signal to the communications device.
- the communications device can then decode the encoded video signal using a decoder 500.
- the communications device can decode content stored locally on the communications device, for example, content that was not transmitted by the transmitting station 102.
- the receiving station 106 can be a generally stationary personal computer rather than a portable communications device and/or a device including an encoder 400 may also include a decoder 500.
- implementations of the present disclosure can take the form of a computer program product accessible from, for example, a computer-usable or computer-readable medium.
- a computer-usable or computer-readable medium can be any device that can, for example, tangibly contain, store, communicate, or transport the program for use by or in connection with any processor.
- the medium can be, for example, an electronic, magnetic, optical, electromagnetic, or a semiconductor device. Other suitable mediums are also available.
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Compression Or Coding Systems Of Tv Signals (AREA)
Abstract
An encoded block corresponding to a current block to be decoded is received from an encoded bitstream. A (e.g., binary) flag is used to determine if the chroma block of the current block was encoded by an intra prediction mode that uses the luma block of the current block for coding the chroma block, e.g., a chroma- from-luma (CFL) mode. Where the chroma block was encoded by the CFL mode, the chroma block is decoded using pixel values of the luma block according to the CFL mode. Where the chroma block was not encoded by the CFL mode, an ordered list of intra prediction modes is generated, an identifier of an intra prediction mode for the chroma block is determined, and the chroma block is decoded using the intra prediction mode of the ordered list that is identified by the identifier. The current block is reconstructed using the chroma block.
Description
CHROMA PREDICTION MODE SIGNALING
CROSS-REFERENCE TO RELATED APPLICATION^ )
[0001] This application claims priority to and the benefit of U.S. Provisional Application Patent Serial No. 63/466,433, filed May 15, 2023, the entire disclosure of which is hereby incorporated by reference.
BACKGROUND
[0002] Digital video streams may represent video using a sequence of frames or still images. Digital video can be used for various applications including, for example, video conferencing, high-definition video entertainment, video advertisements, or sharing of usergenerated videos. A digital video stream can contain a large amount of data and consume a significant amount of computing or communication resources of a computing device for processing, transmission, or storage of the video data. Various approaches have been proposed to reduce the amount of data in video streams, including compression and other encoding techniques.
[0003] Video coding can exploit spatial and temporal correlations in video signals to achieve good compression efficiency. In brief, pixels of the current frame and/or a reference frame can be used to generate a prediction block that corresponds to a current block to be encoded. Differences between the prediction block and the current block can be encoded, instead of the values of the current block themselves, to reduce the amount of data encoded.
SUMMARY
[0004] This disclosure relates generally to encoding and decoding video data and more particularly relates to techniques for signaling a chroma prediction mode.
[0005] An aspect of the teachings herein is a method for reconstructing a current block. The method includes receiving, from an encoded bitstream, an encoded block corresponding to the current block, where the encoded block includes at least a luma block and a chroma block, determining, from the encoded bitstream, that the chroma block was encoded by an intra prediction mode that uses the luma block for coding the chroma block, decoding the chroma block using pixel values of the luma block without generating an ordered list of intra prediction modes, and reconstructing the current block using the chroma block.
[0006] In some implementations of this method, determining that the chroma block as encoded by the intra prediction mode that uses the luma block includes entropy decoding a flag from the encoded bitstream that indicates that the chroma block was encoded by the intra prediction mode that uses the luma block.
[0007] In some implementations of this method, the flag is a binary flag.
[0008] An aspect of the teachings herein is another method for reconstructing a current block. The method can include receiving, from an encoded bitstream, an encoded block corresponding to the current block that includes at least a luma block and a chroma block, determining, from the encoded bitstream, that the chroma block was not encoded by an intra prediction mode that uses the luma block for coding the current block, generating, in a defined order, a list of intra prediction modes for decoding the chroma block, determining, from the encoded bitstream, an identifier of an intra prediction mode for the chroma block, decoding the chroma block using the intra prediction mode of the list of intra prediction modes identified by the identifier, and reconstructing the current block using the chroma block.
[0009] In some implementations of this method, determining that the chroma block was not encoded by the intra prediction mode that uses the luma block includes entropy decoding a flag from the encoded bitstream that indicates that the chroma block was not encoded by the intra prediction mode that uses the luma block. Entropy decoding the flag can include entropy decoding a binary flag using a first probability model.
[0010] In some implementations of this method, determining the identifier comprises entropy decoding a mode index from the encoded bitstream using a second probability model. [0011] An aspect of the teachings herein is an apparatus for reconstructing a current block according to any one of the methods described herein.
[0012] An aspect of the teachings herein is a non-transitory, computer-readable storage medium storing instructions that, when executed, causes a processor to perform any one of the methods described herein.
[0013] An aspect of the teachings herein is another apparatus for reconstructing a current block. The apparatus includes a processor. The processor is configured to receive, from an encoded bitstream, an encoded block corresponding to the current block, where the encoded block includes at least a luma block and a chroma block, determine, from the encoded bitstream, if the chroma block was encoded by an intra prediction mode that uses the luma block for coding the chroma block, and in response to determining that the chroma block was encoded by the intra prediction mode that uses the luma block, decode the chroma block using pixel values of the luma block. The processor is further configured to, in response to
determining that chroma block was not encoded by the intra prediction mode that uses the luma block, generate a list of intra prediction modes in a defined order, determine, from the encoded bitstream, an identifier of an intra prediction mode for the chroma block, decode the chroma block using the intra prediction mode of the list of intra prediction modes that is identified by the identifier, and reconstruct the current block using the chroma block.
[0014] In some implementations of this apparatus, to determine if the chroma block was encoded by the intra prediction mode that uses the luma block includes to decode a symbol that identifies whether the chroma block was encoded by the intra prediction mode that uses the luma block.
[0015] In some implementations of this apparatus, to determine if the chroma block was encoded by the intra prediction mode that uses the luma block includes to entropy decode a binary symbol from the encoded bitstream using a first probability model, wherein the binary symbol identifies whether the chroma block was encoded by the intra prediction mode that uses the luma block.
[0016] In some implementations of this apparatus, to determine the identifier includes to entropy decode a mode index using a probability model.
[0017] An aspect of the teachings herein is an apparatus for encoding a current block. The apparatus includes a processor configured to determine whether to encode a chroma block of the current block by an intra prediction mode that uses a luma block of the current block for coding the chroma block, in response to determining to encode the chroma block by the intra prediction mode that uses the luma block, encode the chroma block into an encoded bitstream using pixel values of the luma block, and encode a binary symbol into the encoded bitstream indicating that the chroma block was encoded by the intra prediction mode that uses the luma block. The process is configured to, in response to determining to encode the chroma block by an intra prediction mode other than the intra prediction mode that uses the luma block, generate a list of intra prediction modes in a defined order, determine an index that identifies the intra prediction mode for the chroma block, encode the index into the encoded bitstream, encode the chroma block into the encoded bitstream using the intra prediction mode, and encode the luma block into the encoded bitstream.
[0018] In some implementations of this apparatus, the list excludes the intra prediction mode that uses the luma block for coding the chroma block.
[0019] An aspect of the teachings herein is a non-transitory, computer-readable storage medium storing an encoded bitstream. The encoded bitstream includes an encoded block including a luma block and a chroma block, a binary flag indicating whether the chroma block
was encoded using an intra prediction mode that uses the luma block to encode the chroma block, and only when the binary flag indicates that the chroma block was not encoded using the intra prediction mode that uses the luma block, a mode index identifying an intra prediction mode used to encode the chroma block from an ordered list of intra prediction modes.
[0020] These and other aspects of the present disclosure are disclosed in the following detailed description of the embodiments, the appended claims, and the accompanying figures.
BRIEF DESCRIPTION OF THE DRAWINGS
[0021] The description herein refers to the accompanying drawings described below wherein like reference numerals refer to like parts throughout the several views unless otherwise noted.
[0022] FIG. 1 is a schematic of a video encoding and decoding system.
[0023] FIG. 2 is a block diagram of an example of a computing device that can implement a transmitting station or a receiving station.
[0024] FIG. 3 is a diagram of a typical video stream to be encoded and subsequently decoded.
[0025] FIG. 4 is a block diagram of an encoder according to implementations of this disclosure.
[0026] FIG. 5 is a block diagram of a decoder according to implementations of this disclosure.
[0027] FIG. 6 is a diagram of a representation of a portion of a block in accordance with implementations of this disclosure.
[0028] FIG. 7 is a flowchart diagram of a process for reconstructing a current block of an image using chroma prediction mode signaling.
DETAILED DESCRIPTION
[0029] A video stream can be compressed by a variety of techniques to reduce the bandwidth required to transmit or store the video stream. A video stream can be encoded into a bitstream (i.e., a compressed bitstream), which involves compression. The compressed bitstream can then be transmitted to a decoder that can decode (decompress, reconstruct, etc.) the compressed bitstream to prepare it for viewing or further processing. Compression of the video stream often exploits spatial and temporal correlation of video signals through spatial and/or motion-compensated prediction.
[0030] Spatial prediction may also be referred to as intra prediction. Intra prediction uses previously encoded and decoded pixels from at least one block of the current image or frame other than a current block to be encoded to generate a block (also called a prediction block) that resembles the current block. The particular previously encoded and decoded pixels used and how they are combined to form the prediction block are dictated by the intra prediction mode. By encoding the intra prediction mode and the difference between the two blocks (i.e., the current block and the prediction block), a decoder receiving the encoded signal can recreate the current block.
[0031] Multiple intra prediction modes are possible are available to both luma blocks and chroma blocks forming a block of an image or frame. For example, there may be up to 14 possible intra prediction modes. Unique to a chroma block is a so-called chroma-from-luma (CFL) mode. The CFL mode uses pixel values of the corresponding luma block to generate a prediction block for the chroma block.
[0032] One way to signal the intra prediction mode of a luma block or a chroma block is to signal a mode index that can be used to identify the prediction mode from a set or list of ordered intra prediction modes. However, when one intra prediction mode is used more frequently than the others, for example the CFL mode for intra prediction of chroma blocks, expressly signaling whether that prediction mode is used can be more efficient than identifying that prediction mode from the list. Further details of this improved signaling are described hereinbelow after a description of the environment in which the teachings herein may be implemented.
[0033] FIG. 1 is a schematic of a video encoding and decoding system 100. A transmitting station 102 can be, for example, a computer having an internal configuration of hardware such as that described in FIG. 2. However, other suitable implementations of the transmitting station 102 are possible. For example, the processing of the transmitting station 102 can be distributed among multiple devices.
[0034] A network 104 can connect the transmitting station 102 and a receiving station 106 for encoding and decoding of the video stream. Specifically, the video stream can be encoded in the transmitting station 102 and the encoded video stream can be decoded in the receiving station 106. The network 104 can be, for example, the Internet. The network 104 can also be a local area network (LAN), wide area network (WAN), virtual private network (VPN), cellular telephone network or any other means of transferring the video stream from the transmitting station 102 to, in this example, the receiving station 106.
[0035] The receiving station 106, in one example, can be a computer having an internal configuration of hardware such as that described in FIG. 2. However, other suitable implementations of the receiving station 106 are possible. For example, the processing of the receiving station 106 can be distributed among multiple devices.
[0036] Other implementations of the video encoding and decoding system 100 are possible. For example, an implementation can omit the network 104. In another implementation, a video stream can be encoded and then stored for transmission at a later time to the receiving station 106 or any other device having a non-transitory storage medium or memory. In one implementation, the receiving station 106 receives (e.g., via the network 104, a computer bus, and/or some communication pathway) the encoded video stream and stores the video stream for later decoding. In an example implementation, a real-time transport protocol (RTP) is used for transmission of the encoded video over the network 104. In another implementation, a transport protocol other than RTP may be used, e.g., a video streaming protocol based on the Hypertext Transfer Protocol (HTTP).
[0037] When used in a video conferencing system, for example, the transmitting station 102 and/or the receiving station 106 may include the ability to both encode and decode a video stream as described below. For example, the receiving station 106 could be a video conference participant who receives an encoded video bitstream from a video conference server (e.g., the transmitting station 102) to decode and view and further encodes and transmits its own video bitstream to the video conference server for decoding and viewing by other participants.
[0038] FIG. 2 is a block diagram of an example of a computing device 200 that can implement a transmitting station or a receiving station. For example, the computing device 200 can implement one or both of the transmitting station 102 and the receiving station 106 of FIG. 1. The computing device 200 can be in the form of a computing system including multiple computing devices, or in the form of one computing device, for example, a mobile phone, a tablet computer, a laptop computer, a notebook computer, a desktop computer, and the like. [0039] A CPU 202 in the computing device 200 can be a central processing unit. Alternatively, the CPU 202 can be any other type of device, or multiple devices, capable of manipulating or processing information now existing or hereafter developed. Although the disclosed implementations can be practiced with one processor as shown, e.g., the CPU 202, advantages in speed and efficiency can be achieved using more than one processor.
[0040] A memory 204 in computing device 200 can be a read only memory (ROM) device or a random-access memory (RAM) device in an implementation. Any other suitable type of storage device or non-transitory storage medium can be used as the memory 204. The memory
204 can include code and data 206 that is accessed by the CPU 202 using a bus 212. The memory 204 can further include an operating system 208 and application programs 210, the application programs 210 including at least one program that permits the CPU 202 to perform the methods described here. For example, the application programs 210 can include applications 1 through N, which further include a video coding application that performs the methods described here. Computing device 200 can also include a secondary storage 214, which can, for example, be a memory card used with a mobile computing device. Because the video communication sessions may contain a significant amount of information, they can be stored in whole or in part in the secondary storage 214 and loaded into the memory 204 as needed for processing.
[0041] The computing device 200 can also include one or more output devices, such as a display 218. The display 218 may be, in one example, a touch sensitive display that combines a display with a touch sensitive element that is operable to sense touch inputs. The display 218 can be coupled to the CPU 202 via the bus 212. Other output devices that permit a user to program or otherwise use the computing device 200 can be provided in addition to or as an alternative to the display 218. When the output device is or includes a display, the display can be implemented in various ways, including by a liquid crystal display (LCD), a cathode-ray tube (CRT) display or light emitting diode (LED) display, such as an organic LED (OLED) display.
[0042] The computing device 200 can also include or be in communication with an imagesensing device 220, for example a camera, or any other image-sensing device 220 now existing or hereafter developed that can sense an image such as the image of a user operating the computing device 200. The image-sensing device 220 can be positioned such that it is directed toward the user operating the computing device 200. In an example, the position and optical axis of the image-sensing device 220 can be configured such that the field of vision includes an area that is directly adjacent to the display 218 and from which the display 218 is visible. [0043] The computing device 200 can also include or be in communication with a soundsensing device 222, for example a microphone, or any other sound-sensing device now existing or hereafter developed that can sense sounds near the computing device 200. The soundsensing device 222 can be positioned such that it is directed toward the user operating the computing device 200 and can be configured to receive sounds, for example, speech or other utterances, made by the user while the user operates the computing device 200.
[0044] Although FIG. 2 depicts the CPU 202 and the memory 204 of the computing device 200 as being integrated into a single unit, other configurations can be utilized. The operations
of the CPU 202 can be distributed across multiple machines (wherein individual machines can have one or more of processors) that can be coupled directly or across a local area or other network. The memory 204 can be distributed across multiple machines such as a networkbased memory or memory in multiple machines performing the operations of the computing device 200. Although depicted here as one bus, the bus 212 of the computing device 200 can be composed of multiple buses. Further, the secondary storage 214 can be directly coupled to the other components of the computing device 200 or can be accessed via a network and can comprise an integrated unit such as a memory card or multiple units such as multiple memory cards. The computing device 200 can thus be implemented in a wide variety of configurations. [0045] FIG. 3 is a diagram of an example of a video stream 300 to be encoded and subsequently decoded. The video stream 300 includes a video sequence 302. At the next level, the video sequence 302 includes multiple adjacent frames 304. While three frames are depicted as the adjacent frames 304, the video sequence 302 can include any number of adjacent frames 304. The adjacent frames 304 can then be further subdivided into individual frames, e.g., a frame 306. At the next level, the frame 306 can be divided into a series of planes or segments 308. The segments 308 can be subsets of frames that permit parallel processing, for example. The segments 308 can also be subsets of frames that can separate the video data into separate colors. For example, a frame 306 of color video data can include a luminance plane and two chrominance planes. The segments 308 may be sampled at different resolutions.
[0046] Whether or not the frame 306 is divided into segments 308, the frame 306 may be further subdivided into blocks 310, which can contain data corresponding to, for example, 16x16 pixels in the frame 306. The blocks 310 can also be arranged to include data from one or more segments 308 of pixel data. The blocks 310 can also be of any other suitable size such as 4x4 pixels, 8x8 pixels, 16x8 pixels, 8x16 pixels, 16x16 pixels, or larger. Unless otherwise noted, the terms block and macroblock are used interchangeably herein.
[0047] FIG. 4 is a block diagram of an encoder 400 according to implementations of this disclosure. The encoder 400 can be implemented, as described above, in the transmitting station 102 such as by providing a computer software program stored in memory, for example, the memory 204. The computer software program can include machine instructions that, when executed by a processor such as the CPU 202, cause the transmitting station 102 to encode video data in the manner described in FIG. 4. The encoder 400 can also be implemented as specialized hardware included in, for example, the transmitting station 102. The encoder 400 may be a hardware encoder.
[0048] The encoder 400 has the following stages to perform the various functions in a forward path (shown by the solid connection lines) to produce an encoded or compressed bitstream 420 using the video stream 300 as input: an intra/inter prediction stage 402, a transform stage 404, a quantization stage 406, and an entropy encoding stage 408. The encoder 400 may also include a reconstruction path (shown by the dotted connection lines) to reconstruct a frame for encoding of future blocks. In FIG. 4, the encoder 400 has the following stages to perform the various functions in the reconstruction path: a dequantization stage 410, an inverse transform stage 412, a reconstruction stage 414, and a loop filtering stage 416. Other structural variations of the encoder 400 can be used to encode the video stream 300.
[0049] When the video stream 300 is presented for encoding, respective frames 304, such as the frame 306, can be processed in units of blocks. At the intra/inter prediction stage 402, respective blocks can be encoded using intra-frame prediction (also called intra-prediction) or inter-frame prediction (also called inter-prediction). In any case, a prediction block can be formed. In the case of intra-prediction, a prediction block may be formed from samples in the current frame that have been previously encoded and reconstructed. In the case of interprediction, a prediction block may be formed from samples in one or more previously constructed reference frames.
[0050] Next, still referring to FIG. 4, the prediction block can be subtracted from the current block at the intra/inter prediction stage 402 to produce a residual block (also called a residual). The transform stage 404 transforms the residual into transform coefficients in, for example, the frequency domain using block-based transforms. The quantization stage 406 converts the transform coefficients into discrete quantum values, which are referred to as quantized transform coefficients, using a quantizer value or a quantization level. For example, the transform coefficients may be divided by the quantizer value and truncated. The quantized transform coefficients are then entropy encoded by the entropy encoding stage 408. The entropy-encoded coefficients, together with other information used to decode the block, which may include for example the type of prediction used, transform type, motion vectors and quantizer value, are then output to the compressed bitstream 420. The compressed bitstream 420 can be formatted using various techniques, such as variable length coding (VLC) or arithmetic coding. The compressed bitstream 420 can also be referred to as an encoded video stream or encoded video bitstream, and the terms will be used interchangeably herein.
[0051] The reconstruction path in FIG. 4 (shown by the dotted connection lines) can be used to ensure that the encoder 400 and a decoder 500 (described below) use the same reference frames to decode the compressed bitstream 420. The reconstruction path performs
similar functions to those functions that take place during the decoding process that are discussed in more detail below, including dequantizing the quantized transform coefficients at the dequantization stage 410 and inverse transforming the dequantized transform coefficients at the inverse transform stage 412 to produce a derivative residual block (also called a derivative residual). At the reconstruction stage 414, the prediction block that was predicted at the intra/inter prediction stage 402 can be added to the derivative residual to create a reconstructed block. The loop filtering stage 416 can be applied to the reconstructed block to reduce distortion such as blocking artifacts.
[0052] Other variations of the encoder 400 can be used to encode the compressed bitstream 420. For example, a non-transform-based encoder can quantize the residual signal directly without the transform stage 404 for certain blocks or frames. In another implementation, an encoder can have the quantization stage 406 and the dequantization stage 410 combined in a common stage.
[0053] FIG. 5 is a block diagram of a decoder 500 according to implementations of this disclosure. The decoder 500 can be implemented in the receiving station 106, for example, by providing a computer software program stored in the memory 204. The computer software program can include machine instructions that, when executed by a processor such as the CPU 202, cause the receiving station 106 to decode video data in the manner described herein. The decoder 500 can also be implemented in hardware included in, for example, the transmitting station 102 or the receiving station 106. The decoder 500 may be a hardware decoder.
[0054] The decoder 500, like the reconstruction path of the encoder 400 discussed above, includes in one example the following stages to perform various functions to produce an output video stream 516 from the compressed bitstream 420: an entropy decoding stage 502, a dequantization stage 504, an inverse transform stage 506, an intra/inter prediction stage 508, a reconstruction stage 510, a loop filtering stage 512 and a post filtering stage 514. Other structural variations of the decoder 500 can be used to decode the compressed bitstream 420. [0055] When the compressed bitstream 420 is presented for decoding, the data elements within the compressed bitstream 420 can be decoded by the entropy decoding stage 502 to produce a set of quantized transform coefficients. The dequantization stage 504 dequantizes the quantized transform coefficients (e.g., by multiplying the quantized transform coefficients by the quantizer value), and the inverse transform stage 506 inverse transforms the dequantized transform coefficients to produce a derivative residual that can be identical to that created by the inverse transform stage 412 in the encoder 400. Using header information decoded from the compressed bitstream 420, the decoder 500 can use the intra/inter prediction stage 508 to
create the same prediction block as was created in the encoder 400, e.g., at the intra/inter prediction stage 402. At the reconstruction stage 510, the prediction block can be added to the derivative residual to create a reconstructed block. The loop fdtering stage 512 can be applied to the reconstructed block to reduce blocking artifacts.
[0056] Other filtering can be applied to the reconstructed block. In this example, the post filtering stage 514 can be a deblocking filter that is applied to the reconstructed block to reduce blocking distortion, and the result is output as the output video stream 516. The output video stream 516 can also be referred to as a decoded video stream, and the terms will be used interchangeably herein. Other variations of the decoder 500 can be used to decode the compressed bitstream 420. For example, the decoder 500 can produce the output video stream 516 without the post filtering stage 514.
[0057] FIG. 6 is a block diagram of a representation of a portion 600 of a frame, such as the frame 330 shown in FIG. 3, in accordance with implementations of this disclosure. As shown, the portion 600 of the frame includes four 64x64 blocks 610, in two rows and two columns in a matrix or Cartesian plane. In some implementations, a 64x64 block may be a maximum coding unit, N=64. Each 64x64 block may include four 32x32 blocks 620. Each 32x32 block may include four 16x16 blocks 630. Each 16x16 block may include four 8x8 blocks 640. Each 8x8 block 640 may include four 4x4 blocks 650. Each 4x4 block 650 may include 16 pixels, which may be represented in four rows and four columns in each respective block in the Cartesian plane or matrix. The pixels may include information representing an image captured in the frame, such as luminance information, color information, and location information. In some implementations, a block, such as a 16x16 pixel block as shown, may include a luminance block 660, which may include luminance pixels 662; and two chrominance blocks 670, 680, such as a U or Cb chrominance block 670, and a V or Cr chrominance block 680. The chrominance blocks 670, 680 may include chrominance pixels 690. For example, the luminance block 660 may include 16x16 luminance pixels 662 and each chrominance block 670, 680 may include 8x8 chrominance pixels 690 as shown. Although one arrangement of blocks is shown, any arrangement may be used. Although FIG. 6 shows NxN blocks, in some implementations, NxM blocks may be used. For example, 32x64 blocks, 64x32 blocks, 16x32 blocks, 32x16 blocks, or any other size blocks may be used. In some implementations, Nx2N blocks, 2NxN blocks, or a combination thereof may be used.
[0058] In some implementations, video coding may include ordered block-level coding. Ordered block-level coding may include coding blocks of a frame in an order, such as rasterscan order, wherein blocks may be identified and processed starting with a block in the upper
left corner of the frame, or portion of the frame, and proceeding along rows from left to right and from the top row to the bottom row, identifying each block in turn for processing. For example, the 64x64 block in the top row and left column of a frame may be the first block coded and the 64x64 block immediately to the right of the first block may be the second block coded. The second row from the top may be the second row coded, such that the 64x64 block in the left column of the second row may be coded after the 64x64 block in the rightmost column of the first row.
[0059] In some implementations, coding a block may include using quad-tree coding, which may include coding smaller block units within a block in raster-scan order. For example, the 64x64 block shown in the bottom left comer of the portion of the frame shown in FIG. 6, may be coded using quad-tree coding wherein the top left 32x32 block may be coded, then the top right 32x32 block may be coded, then the bottom left 32x32 block may be coded, and then the bottom right 32x32 block may be coded. Each 32x32 block may be coded using quad-tree coding wherein the top left 16x16 block may be coded, then the top right 16x16 block may be coded, then the bottom left 16x16 block may be coded, and then the bottom right 16x16 block may be coded. Each 16x16 block may be coded using quad-tree coding wherein the top left 8x8 block may be coded, then the top right 8x8 block may be coded, then the bottom left 8x8 block may be coded, and then the bottom right 8x8 block may be coded. Each 8x8 block may be coded using quad-tree coding wherein the top left 4x4 block may be coded, then the top right 4x4 block may be coded, then the bottom left 4x4 block may be coded, and then the bottom right 4x4 block may be coded. In some implementations, 8x8 blocks may be omitted for a 16x16 block, and the 16x16 block may be coded using quad-tree coding wherein the top left 4x4 block may be coded, then the other 4x4 blocks in the 16x16 block may be coded in raster-scan order.
[0060] In some implementations, video coding may include compressing the information included in an original, or input, frame by, for example, omitting some of the information in the original frame from a corresponding encoded frame. For example, coding may include reducing spectral redundancy, reducing spatial redundancy, reducing temporal redundancy, or a combination thereof.
[0061] In some implementations, reducing spectral redundancy may include using a color model based on a luminance component (Y) and two chrominance components (U and V or Cb and Cr), which may be referred to as the YUV or YCbCr color model, or color space. Using the YUV color model may include using a relatively large amount of information to represent the luminance component of a portion of a frame and using a relatively small amount of
information to represent each corresponding chrominance component for the portion of the frame. For example, a portion of a frame may be represented by a high-resolution luminance component, which may include a 16x16 block of pixels, and by two lower resolution chrominance components, each of which represents the portion of the frame as an 8x8 block of pixels. A pixel may indicate a value, for example, a value in the range from 0 to 255, and may be stored or transmitted using, for example, eight bits. Although this disclosure is described in reference to the YU V color model, any color model may be used.
[0062] In some implementations, reducing spatial redundancy may include transforming a block into the frequency domain using, for example, a discrete cosine transform (DCT). For example, a unit of an encoder, such as the transform stage 404 shown in FIG. 4, may perform a DCT using transform coefficient values based on spatial frequency.
[0063] In some implementations, reducing temporal redundancy may include using similarities between frames to encode a frame using a relatively small amount of data based on one or more reference frames, which may be previously encoded, decoded, and reconstructed frames of the video stream. For example, a block or pixel of a current frame may be similar to a spatially corresponding block or pixel of a reference frame. In some implementations, a block or pixel of a current frame may be similar to block or pixel of a reference frame at a different spatial location and reducing temporal redundancy may include generating motion information indicating the spatial difference, or translation, between the location of the block or pixel in the current frame and corresponding location of the block or pixel in the reference frame.
[0064] In some implementations, reducing temporal redundancy may include identifying a portion of a reference frame that corresponds to a current block or pixel of a current frame. For example, a reference frame, or a portion of a reference frame, which may be stored in memory, may be searched to identify a portion for generating a prediction to use for encoding a current block or pixel of the current frame with maximal efficiency. For example, the search may identify a portion of the reference frame for which the difference in pixel values between the current block and a prediction block generated based on the portion of the reference frame is minimized and may be referred to as motion searching. In some implementations, the portion of the reference frame searched may be limited. For example, the portion of the reference frame searched, which may be referred to as the search area, may include a limited number of rows of the reference frame. In an example, identifying the portion of the reference frame for generating a prediction may include calculating a cost function, such as a sum of absolute differences (SAD), between the pixels of portions of the search area and the pixels of the current block.
[0065] In some implementations, the spatial difference between the location of the portion of the reference frame for generating a prediction in the reference frame and the current block in the current frame may be represented as a motion vector. The difference in pixel values between the prediction block and the current block may be referred to as differential data, residual data, a prediction error, or as a residual block. In some implementations, generating motion vectors may be referred to as motion estimation, and a pixel of a current block may be indicated based on location using Cartesian coordinates as fx,y. Similarly, a pixel of the search area of the reference frame may be indicated based on location using Cartesian coordinates as rx,y. A motion vector (MV) for the current block may be determined based on, for example, a SAD between the pixels of the current frame and the corresponding pixels of the reference frame.
[0066] Although described herein with reference to matrix or Cartesian representation of a frame for clarity, a frame may be stored, transmitted, processed, or any combination thereof, in any data structure such that pixel values may be efficiently represented for a frame or image. For example, a frame may be stored, transmitted, processed, or any combination thereof, in a two-dimensional data structure such as a matrix as shown, or in a one-dimensional data structure, such as a vector array. In an implementation, a representation of the frame, such as a two-dimensional representation as shown, may correspond to a physical location in a rendering of the frame as an image. For example, a location in the top left corner of a block in the top left comer of the frame may correspond with a physical location in the top left corner of a rendering of the frame as an image.
[0067] In some implementations, block-based coding efficiency may be improved by partitioning input blocks into one or more prediction partitions, which may be rectangular, including square, partitions for prediction coding. In some implementations, video coding using prediction partitioning may include selecting a prediction partitioning scheme from among multiple candidate prediction partitioning schemes. For example, in some implementations, candidate prediction partitioning schemes for a 64x64 coding unit may include rectangular size prediction partitions ranging in sizes from 4x4 to 64x64, such as 4x4, 4x8, 8x4, 8x8, 8x16, 16x8, 16x16, 16x32, 32x16, 32x32, 32x64, 64x32, or 64x64. In some implementations, video coding using prediction partitioning may include a full prediction partition search, which may include selecting a prediction partitioning scheme by encoding the coding unit using each available candidate prediction partitioning scheme and selecting the best scheme, such as the scheme that produces the least rate-distortion error.
[0068] In some implementations, encoding a video frame may include identifying a prediction partitioning scheme for encoding a current block, such as block 610. In some implementations, identifying a prediction partitioning scheme may include determining whether to encode the block as a single prediction partition of maximum coding unit size, which may be 64x64 as shown, or to partition the block into multiple prediction partitions, which may correspond with the sub-blocks, such as the 32x32 blocks 620 the 16x16 blocks 630, or the 8x8 blocks 640, as shown, and may include determining whether to partition into one or more smaller prediction partitions. For example, a 64x64 block may be partitioned into four 32x32 prediction partitions. Three of the four 32x32 prediction partitions may be encoded as 32x32 prediction partitions and the fourth 32x32 prediction partition may be further partitioned into four 16x16 prediction partitions. Three of the four 16x16 prediction partitions may be encoded as 16x16 prediction partitions and the fourth 16x16 prediction partition may be further partitioned into four 8x8 prediction partitions, each of which may be encoded as an 8x8 prediction partition. In some implementations, identifying the prediction partitioning scheme may include using a prediction partitioning decision tree.
[0069] In some implementations, video coding for a current block may include identifying an optimal prediction coding mode from multiple candidate prediction coding modes, which may provide flexibility in handling video signals with various statistical properties and may improve the compression efficiency. For example, a video coder may evaluate each candidate prediction coding mode to identify the optimal prediction coding mode, which may be, for example, the prediction coding mode that minimizes an error metric, such as a rate-distortion cost, for the current block. In some implementations, the complexity of searching the candidate prediction coding modes may be reduced by limiting the set of available candidate prediction coding modes based on similarities between the current block and a corresponding prediction block. In some implementations, the complexity of searching each candidate prediction coding mode may be reduced by performing a directed refinement mode search. For example, metrics may be generated for a limited set of candidate block sizes, such as 16x16, 8x8, and 4x4, the error metric associated with each block size may be in descending order, and additional candidate block sizes, such as 4x8 and 8x4 block sizes, may be evaluated.
[0070] In some implementations, block-based coding efficiency may be improved by partitioning a current residual block into one or more transform partitions, which may be rectangular, including square, partitions for transform coding. In some implementations, video coding using transform partitioning may include selecting a uniform transform partitioning
scheme. For example, a current residual block, such as block 610, may be a 64x64 block and may be transformed without partitioning using a 64x64 transform.
[0071] Although not expressly shown in FIG. 6, a residual block may be transform partitioned using a uniform transform partitioning scheme. For example, a 64x64 residual block may be transform partitioned using a uniform transform partitioning scheme including four 32x32 transform blocks, using a uniform transform partitioning scheme including sixteen 16x16 transform blocks, using a uniform transform partitioning scheme including sixty-four 8x8 transform blocks, or using a uniform transform partitioning scheme including 256 4x4 transform blocks.
[0072] In some implementations, video coding using transform partitioning may include identifying multiple transform block sizes for a residual block using multiform transform partition coding. In some implementations, multiform transform partition coding may include recursively determining whether to transform a current block using a current block size transform or by partitioning the current block and multiform transform partition coding each partition. For example, the bottom left block 610 shown in FIG. 6 may be a 64x64 residual block, and multiform transform partition coding may include determining whether to code the current 64x64 residual block using a 64x64 transform or to code the 64x64 residual block by partitioning the 64x64 residual block into partitions, such as four 32x32 blocks 620, and multiform transform partition coding each partition. In some implementations, determining whether to transform partition the current block may be based on comparing a cost for encoding the current block using a current block size transform to a sum of costs for encoding each partition using partition size transforms.
[0073] As described initially, one way to identify an intra prediction mode used to code a luma block or a chroma block is to signal a mode index that can be used to identify the intra prediction mode from a set or list of ordered prediction modes. In general, separate lists are generated for a luma block as compared to a chroma block. The intra prediction modes are ordered based on intra prediction modes of neighboring blocks, co-located blocks, or some combination thereof. Instead of signaling the prediction mode itself, the mode index can be used to infer the prediction mode after the ordering. For luma blocks, the available intra prediction modes may be ordered in separate sets, and a set index may also be signaled to infer the set from which to select the prediction mode using the mode index.
[0074] To signal a chroma intra prediction mode, a defined cardinality of intra prediction modes may be adaptively selected and included in only one intra mode set. In an example, there are fourteen (14) intra prediction modes. For a respective chroma block encoded using an
intra prediction mode the ordering process is defined (e.g., predefined at each of the encoder and decoder). An example of the ordering process is as follows. First, if the co-located luma block prediction mode is an intra prediction directional mode, the directional mode is added to the top of the list (e.g., index 0). Thereafter, available intra prediction non-directional modes may be added in a defined order. In some implementations, the following non-directional modes may be added to the list in order: UV_DC_PRED, UV_SMOOTH_PRED,
UV SMOOTH V PRED, UV SMOOTH H PRED, UV PAETH PRED. Thereafter, the list may be filled with directional modes (e.g., based on those used for neighboring/co-located chroma blocks), while excluding the one already added to the top of the list, if any. Finally, the CFL mode may be added to the bottom of the list (e.g., at the lowest rank).
[0075] An index (the mode index) is signaled in the bitstream using entropy coding with a defined cumulative distribution function (CDF) table, also referred to herein as a probability model. The list maps the index to the actual chroma prediction mode.
[0076] An encoder, such as the encoder 400, selects a prediction mode using a tradeoff between the number of bits (e.g., the rate) required for coding a block using the prediction mode and the quality (e.g., the degradation or distortion level) of the image. The prediction modes (or a subset thereof) are used to code the block, and a prediction mode that best meets the requirements of the tradeoff is selected. In some implementations, this is referred to as a rate-distortion optimization (RDO) decision process or simply as the RDO.
[0077] In practice, a single prediction mode is often selected at a higher rate than the remaining prediction modes in the RDO. For chroma intra prediction modes, the CFL mode may be frequently selected. In the ordering process above, the low rank in the list means that the CFL mode has a relatively high mode index, which conventionally results in a longer signal within the bitstream. One way to address this issue is to give the CFL mode a higher rank in the ordering (also called a reordering) process, thus resulting in a smaller mode index (closer to 0) to be coded. Another way to address this issue is to modify the entropy coding of the CFL mode.
[0078] According to the teachings herein, when one intra prediction mode is used more frequently than the others, signaling whether that prediction mode is used is more efficient than identifying that intra prediction from a set of ordered intra prediction modes.
[0079] FIG. 7 is a flowchart diagram of a method or process 700 for reconstructing a current block of an image using chroma prediction mode signaling. The process 700 can be implemented, for example, as a software program that may be executed by computing devices such as the receiving station 106 of FIG. 1. The software program can include machine-
readable instructions that may be stored in a memory such as the memory 204 or the secondary storage 214, and that when executed by a processor, such as CPU 202, may cause the computing device to perform the process 700. The process may be implemented using specialized hardware or firmware. Multiple processors, memories, or both, may be used. In some implementations, the process 700 may be performed at a decoder, such as at the decoder 500.
[0080] At operation 702, an encoded block corresponding to a current block is received. The encoded block is received from an encoded bitstream, such as the compressed bitstream 420 from FIG. 4. The encoded block includes a luma block and a chroma block that are encoded into the bitstream. Note that although the teachings herein are discussed with regards to a chroma block of the encoded block, in general the encoded block would include two chroma blocks. Accordingly, the teachings may be applied to each of the chroma blocks. Alternatively, the teachings may be applied to one chroma block to signal a single intra prediction mode.
[0081] At operation 704, the process 700 determines if the chroma block was encoded using an intra prediction mode that uses (e.g., pixels of) the luma block, more conventionally known as a CFL mode. Determining if the chroma block was encoded using an intra prediction mode that uses the luma block is accomplished by reading a signal or indicator from the encoded bitstream. For example, if the chroma block was encoded in this way, the signal (or flag) in the encoded bitstream may be set to 1. The signal may be decoded from the header for the block in some implementations.
[0082] The signal may be decoded at an entropy decoding stage, such as the entropy decoding stage 502. To do so, the symbol representing the signal may be entropy decoded using a dedicated CDF table with a context size of 3 (e.g., two bits). The context of the symbol may be the sum of the count of CFL modes of above and left neighbor blocks of the current block. For example, the prediction modes of the top block adjacent to the current block, the left block adjacent to the current block, and the top-left comer block adjacent to the current block are considered. For each block using the CFL mode, the count is increased for the context, and the CDF table associated with the total count was used to encode the flag or signal and is used to decode the flag or signal.
[0083] At operation 706, in response to determining that the chroma block was encoded using the intra prediction mode that uses the luma block (also called the CFL mode herein), the process 700 decodes the chroma block using that intra prediction mode. In an example, if the process determines that the chroma block was encoded using this method, the process will set
the intra prediction mode to a CFL mode (such as UV_CFL_PRED) and optionally set the angle that indicates the directional prediction angle (angle_delta) to 0. This mode makes the decoder 500 decode the chroma block by modifying the pixel values of the luma block after they are decoded and the luma block is reconstructed to form a prediction block for the chroma block. Modifying the pixel values of the reconstructed luma block may be performed by any known technique for a CFL mode. In some implementations, the resulting chroma prediction block may be added to the residual for the chroma block decoded from the bitstream to generate the chroma block.
[0084] In an example, a predicted chroma pixel value Predc of a chroma-from-luma prediction block may be represented by the following equation. of a chroma-from-luma prediction block may be represented by the following equation.
[0086] In this equation, DCY may be the average of the current reconstructed luma block samples (e.g., pixel values downsampled if the chroma blocks are subsampled), and DCc is the average of the chroma samples (pixel values) neighboring the current chroma pixel.
Alternatively, DCY can be calculated based on neighboring (e.g., downsampled) reconstructed luma samples to match the samples for the calculation of DCc. Corresponding (e.g., co-located) luma samples RecY (e.g., downsampled luma values) are used, and a is a scaling factor that can be explicitly signaled in (and decoded from) the bitstream or can be implicitly derived by the decoder. For example, the scaling factor may be derived implicitly by using neighboring reconstructed chroma samples Recc (e.g., chroma pixel values from an adjacent chroma block) and their corresponding luma samples and using a least square error or some other technique to minimizes difference between the pixel values. For example, the least square error may be represented by the following equation, which can be solved for the scaling factor a by minimizing the function Sum. may be the average of the current reconstructed luma block samples (e.g., pixel values downsampled if the chroma blocks are subsampled), and DCc is the average of the chroma samples (pixel values) neighboring the current chroma pixel. Alternatively, DCY can be calculated based on neighboring (e.g., downsampled) reconstructed luma samples to match the samples for the calculation of DCc. Corresponding (e.g., co-located) luma samples RecY (e.g., downsampled luma values) are used, and a is a scaling factor that can be explicitly signaled in (and decoded from) the bitstream or can be implicitly derived by the decoder. For example, the scaling factor may be derived implicitly by using neighboring reconstructed chroma samples
Recc (e.g., chroma pixel values from an adjacent chroma block) and their corresponding luma samples and using a least square error or some other technique to minimizes difference between the pixel values. For example, the least square error may be represented by the following equation, which can be solved for the scaling factor a by minimizing the function Sum. is the average of the chroma samples (pixel values) neighboring the current chroma pixel. Alternatively, DCY can be calculated based on neighboring (e.g., downsampled) reconstructed luma samples to match the samples for the calculation of DCc. Corresponding (e.g., co-located) luma samples Recy (e.g., downsampled luma values) are used, and a is a scaling factor that can be explicitly signaled in (and decoded from) the bitstream or can be implicitly derived by the decoder. For example, the scaling factor may be derived implicitly by using neighboring reconstructed chroma samples Recc (e.g., chroma pixel values from an adjacent chroma block) and their corresponding luma samples and using a least square error or some other technique to minimizes difference between the pixel values. For example, the least square error may be represented by the following equation, which can be solved for the scaling factor a by minimizing the function Sum. can be calculated based on neighboring (e.g., downsampled) reconstructed luma samples to match the samples for the calculation of DCc. Corresponding (e.g., co-located) luma samples Recy (e.g., downsampled luma values) are used, and a is a scaling factor that can be explicitly signaled in (and decoded from) the bitstream or can be implicitly derived by the decoder. For example, the scaling factor may be derived implicitly by using neighboring reconstructed chroma samples Recc (e.g., chroma pixel values from an adjacent chroma block) and their corresponding luma samples and using a least square error or some other technique to minimizes difference between the pixel values. For example, the least square error may be represented by the following equation, which can be solved for the scaling factor a by minimizing the function Sum.
. Corresponding (e.g., co-located) luma samples Recy (e.g., downsampled luma values) are used, and a is a scaling factor that can be explicitly signaled in (and decoded from) the bitstream or can be implicitly derived by the decoder. For example, the scaling factor may be derived implicitly by using neighboring reconstructed chroma samples Recc (e.g., chroma pixel values from an adjacent chroma block) and their corresponding luma samples and using a least square error or some other technique to minimizes difference between the pixel values.
For example, the least square error may be represented by the following equation, which can be solved for the scaling factor a by minimizing the function Sum.
(e.g., downsampled luma values) are used, and a is a scaling factor that can be explicitly signaled in (and decoded from) the bitstream or can be implicitly derived by the decoder. For example, the scaling factor may be derived implicitly by using neighboring reconstructed chroma samples Recc (e.g., chroma pixel values from an adjacent chroma block) and their corresponding luma samples and using a least square error or some other technique to minimizes difference between the pixel values. For example, the least square error may be represented by the following equation, which can be solved for the scaling factor a by minimizing the function Sum. is a scaling factor that can be explicitly signaled in (and decoded from) the bitstream or can be implicitly derived by the decoder. For example, the scaling factor may be derived implicitly by using neighboring reconstructed chroma samples Recc (e.g., chroma pixel values from an adjacent chroma block) and their corresponding luma samples and using a least square error or some other technique to minimizes difference between the pixel values. For example, the least square error may be represented by the following equation, which can be solved for the scaling factor a by minimizing the function Sum.
(e.g., chroma pixel values from an adjacent chroma block) and their corresponding luma samples and using a least square error or some other technique to minimizes difference between the pixel values. For example, the least square error may be represented by the following equation, which can be solved for the scaling factor a by minimizing the function Sum. by minimizing the function Sum.
[0088] At operation 708, in response to determining that the chroma block was not encoded using the intra prediction mode that uses the luma block, the process 700 generates a list of prediction modes in a defined sequence. For example, an ordered list of intra prediction modes as described above may be generated by first including the associated intra prediction mode of the co-located luma block if the prediction mode is a directional intra prediction mode. Then the following non-directional modes may be added to the list in order: UV_DC_PRED, UV_SMOOTH_PRED, UV_SMOOTH_V_PRED, UV_SMOOTH_H_PRED, UV_PAETH_PRED. Then the list may be filled with the remaining
directional modes (e.g., based on angle or adjacent block intra prediction modes). The list preferably omits the CFL mode (UV_CFL_PRED).
[0089] At operation 710, the process 700 determines an identifier of an intra prediction mode for the chroma block from the encoded bitstream. The identifier may be determined by reading the mode index from the encoded bitstream. The mode index is the index of the intra prediction mode in the list of intra prediction modes generated by operation 708. If the intra prediction mode identified is a directional intra prediction mode, then the angle_delta is set accordingly for the identified intra prediction mode. If the selected intra prediction mode is a non-directional intra prediction mode, the angle may be set to 0. That is, if the current intra prediction mode is determined to not be UV_CFL_PRED, then the mode index may be read from the encoded bitstream, and the ordered list of intra prediction modes is generated. The mode index may be encoded using a separate CDF table.
[0090] At operation 712, the process 700 decodes the chroma block using the identified intra prediction mode from the list of generated intra prediction modes. In an example, the intra prediction mode may be a non-directional intra prediction mode such as UV_DC_PRED, in which case the angle_delta may be set to 0, and the decoder will decode the chroma block using UV_DC_PRED as the identified intra prediction mode. Alternatively, the intra prediction mode may be determined to be a directional intra prediction mode. In this case, the angle_delta is set according to the identified intra prediction mode and the decoder decodes the chrome block using the set parameters. In either case, the resulting prediction block may be added to the residual for the chroma block that is decoded from the bitstream to reconstruct the chroma block.
[0091] At operation 714, the process 700 reconstructs the current block using the chroma and luma block as decoded.
[0092] It is worth noting that in addition to coding gains achieved in the peak-signal-to- noise measures for the two chroma (U and V) planes over existing techniques for signaling the chroma mode, the techniques described herein can reduce computing time at the decoder. For example, if it is determined that the chroma block was encoded by the intra prediction mode that uses the luma block for encoding the chroma block (e.g., the CFL mode), the ordered list of intra prediction modes does not have to be generated by the decoder. That is, decoding the chroma block using the luma block may be done without selecting an intra prediction mode from an ordered list of intra prediction modes.
[0093] In the encoding process, a conventional rate-distortion optimization (RDO) technique may be used to select the prediction mode for the chroma block except that it is
desirable that the encoder search the CFL mode first. When the CFL mode is selected by the encoder, the signal (e.g., a binary symbol) for the CFL mode may be encoded into the bitstream (e.g., in the header for the block or for the chroma block), along with the chroma residual, as applicable. When the CFL mode is not selected by the encoder and the encoding mode is an intra prediction mode (not an inter prediction mode), the encoder generates the same ordered list that the decoder will generate and encodes the mode index for the decoder to receive and use in decoding the chroma block.
[0094] The above describes a technique whereby the most common intra prediction mode selected for a chroma block is a CFL mode. However, the techniques can be used where the most common intra prediction mode is other than the CFL mode by modifying the query at operation 704 to determine whether the chroma block was encoded using the most common intra prediction mode, when applicable reorder the remaining intra prediction modes, and, optionally, perform RDO at the encoder by testing the most common intra prediction mode first. The probability models (e.g., the CDF tables) for entropy coding (e.g., encoding and decoding) may be different than those used where the most common intra prediction mode is the CFL mode.
[0095] The above describes a technique whereby the most common intra prediction mode selected for a chroma block is a CFL mode. However, the teachings can be used for signaling the most common intra prediction mode selected for a luma block and/or signaling the most common intra prediction set from which the most common intra prediction mode for a luma block is selected. In such an example, the query at operation 704 may be modified to determine whether the luma block was encoded using the most common set of intra prediction modes, reorder only that set of intra prediction modes, and decode the mode index from the encoded bitstream to determine the intra prediction mode for reconstructing the luma block. Alternatively, an additional, subsequent inquiry may be made to determine whether the luma block was encoded using the most common intra prediction mode within the set. If the luma block was not encoded using the most common set of intra prediction modes, the remaining sets may be reordered, the mode index decoded, and the luma block reconstructed. In another example using the technique with luma blocks, the set index for a luma block can be decoded. Each set of intra prediction modes may have a most common intra prediction mode. Then, the query at operation 704 checks to see whether the most common intra prediction mode for the set is used for the luma block. If not, the set can be reordered to select the mode from the mode index to reconstruct the luma block.
[0096] When referring to a most common set of intra prediction modes for a luma block, this disclosure means the set of intra prediction modes of the multiple sets whose intra prediction modes are most often selected in coding some or all of one or more video sequences. When referring to a most common intra prediction mode for a luma or chroma block, this disclosure means the intra prediction mode of the available intra prediction modes that is most often selected in coding some or all of one or more video sequences.
[0097] The aspects of encoding and decoding described above illustrate some examples of encoding and decoding techniques. However, it is to be understood that encoding and decoding, as those terms are used in the claims, could mean compression, decompression, transformation, or any other processing or change of data.
[0098] The word “example” is used herein to mean serving as an example, instance, or illustration. Any aspect or design described herein as “example” is not necessarily to be construed as preferred or advantageous over other aspects or designs. Rather, use of the word “example” is intended to present concepts in a concrete fashion. As used in this application, the term “or” is intended to mean an inclusive “or” rather than an exclusive “or”. That is, unless specified otherwise, or clear from context, “X includes A or B” is intended to mean any of the natural inclusive permutations. That is, if X includes A; X includes B; or X includes both A and B, then “X includes A or B” is satisfied under any of the foregoing instances. In addition, the articles “a” and “an” as used in this application and the appended claims should generally be construed to mean “one or more” unless specified otherwise or clear from context to be directed to a singular form. Moreover, use of the term “an implementation” or “one implementation” throughout is not intended to mean the same embodiment or implementation unless described as such.
[0099] Implementations of the transmitting station 102 and/or the receiving station 106 (and the algorithms, methods, instructions, etc., stored thereon and/or executed thereby, including by the encoder 400 and the decoder 500) can be realized in hardware, software, or any combination thereof. The hardware can include, for example, computers, intellectual property (IP) cores, application-specific integrated circuits (ASICs), programmable logic arrays, optical processors, programmable logic controllers, microcode, microcontrollers, servers, microprocessors, digital signal processors or any other suitable circuit. In the claims, the term “processor” should be understood as encompassing any of the foregoing hardware, either singly or in combination. The terms “signal” and “data” are used interchangeably. Further, portions of the transmitting station 102 and the receiving station 106 do not necessarily have to be implemented in the same manner.
[00100] Further, in one aspect, for example, the transmitting station 102 or the receiving station 106 can be implemented using a general-purpose computer or general-purpose processor with a computer program that, when executed, carries out any of the respective methods, algorithms and/or instructions described herein. In addition, or alternatively, for example, a special purpose computer/processor can be utilized that contains other hardware for carrying out any of the methods, algorithms, or instructions described herein.
[00101] The transmitting station 102 and the receiving station 106 can, for example, be implemented on computers in a video conferencing system. Alternatively, the transmitting station 102 can be implemented on a server and the receiving station 106 can be implemented on a device separate from the server, such as a hand-held communications device. In this instance, the transmitting station 102 can encode content using an encoder 400 into an encoded video signal and transmit the encoded video signal to the communications device. In turn, the communications device can then decode the encoded video signal using a decoder 500. Alternatively, the communications device can decode content stored locally on the communications device, for example, content that was not transmitted by the transmitting station 102. Other suitable transmitting and receiving implementation schemes are available. For example, the receiving station 106 can be a generally stationary personal computer rather than a portable communications device and/or a device including an encoder 400 may also include a decoder 500.
[00102] Further, all or a portion of implementations of the present disclosure can take the form of a computer program product accessible from, for example, a computer-usable or computer-readable medium. A computer-usable or computer-readable medium can be any device that can, for example, tangibly contain, store, communicate, or transport the program for use by or in connection with any processor. The medium can be, for example, an electronic, magnetic, optical, electromagnetic, or a semiconductor device. Other suitable mediums are also available.
[00103] The above-described embodiments, implementations and aspects have been described to allow easy understanding of the present invention and do not limit the present invention. On the contrary, the invention is intended to cover various modifications and equivalent arrangements included within the scope of the appended claims, which scope is to be accorded the broadest interpretation to encompass all such modifications and equivalent structure as is permitted under the law.
Claims
1. An apparatus for reconstructing a current block, comprising: a processor configured to: receive, from an encoded bitstream, an encoded block corresponding to the current block, the encoded block comprising at least a luma block and a chroma block; determine, from the encoded bitstream, if the chroma block was encoded by an intra prediction mode that uses the luma block for coding the chroma block; in response to determining that the chroma block was encoded by the intra prediction mode that uses the luma block, decode the chroma block using pixel values of the luma block; and in response to determining that the chroma block was not encoded by the intra prediction mode that uses the luma block: generate a list of intra prediction modes in a defined order; determine, from the encoded bitstream, an identifier of an intra prediction mode for the chroma block; decode the chroma block using the intra prediction mode of the list of intra prediction modes identified by the identifier; and reconstruct the current block using the chroma block.
2. The apparatus of claim 1, wherein to determine if the chroma block was encoded by the intra prediction mode that uses the luma block comprises to decode a symbol that identifies whether the chroma block was encoded by the intra prediction mode that uses the luma block.
3. The apparatus of claim 1, wherein to determine if the chroma block was encoded by the intra prediction mode that uses the luma block comprises to entropy decode a binary flag from the encoded bitstream using a first probability model, wherein the binary flag identifies whether the chroma block was encoded by the intra prediction mode that uses the luma block.
4. The apparatus of any one of claims 1 to 3, wherein to determine the identifier comprises to entropy decode a mode index using a probability model.
5. A method for reconstructing a current block, comprising: receiving, from an encoded bitstream, an encoded block corresponding to the current block, the encoded block comprising at least a luma block and a chroma block; determining, from an encoded bitstream, that the chroma block was encoded by an intra prediction mode that uses the luma block for coding the chroma block; decoding the chroma block using pixel values of the luma block without generating an ordered list of intra prediction modes; and reconstructing the current block using the chroma block.
6. The method of claim 5, wherein determining that the chroma block as encoded by the intra prediction mode that uses the luma block comprises: entropy decoding a flag from the encoded bitstream that indicates that the chroma block was encoded by the intra prediction mode that uses the luma block.
7. The method of claim 6, wherein the flag is a binary flag.
8. A method for reconstructing a current block, comprising: receiving, from an encoded bitstream, an encoded block corresponding to the current block comprising at least a luma block and a chroma block; determining, from the encoded bitstream, that the chroma block was not encoded by an intra prediction mode that uses the luma block for coding the current block; generating, in a defined order, a list of intra prediction modes for decoding the chroma block; determining, from the encoded bitstream, an identifier of an intra prediction mode for the chroma block; decoding the chroma block using the intra prediction mode of the list of intra prediction modes identified by the identifier; and reconstructing the current block using the chroma block.
9. The method of claim 8, wherein determining that the chroma block was not encoded by the intra prediction mode that uses the luma block comprises: entropy decoding a flag from the encoded bitstream that indicates that the chroma block was not encoded by the intra prediction mode that uses the luma block.
10. The method of claim 9, wherein entropy decoding the flag comprises entropy decoding a binary flag using a first probability model, and determining the identifier comprises entropy decoding a mode index from the encoded bitstream using a second probability model.
11. An apparatus for reconstructing a current block according to the method of any one of claims 5 to 10.
12. A non-transitory, computer-readable storage medium storing instructions that, when executed, causes a processor to perform the method of any one of claims 5 to 10.
13. An apparatus for encoding a current block, comprising: a processor configured to: determine whether to encode a chroma block of the current block by an intra prediction mode that uses a luma block of the current block for coding the chroma block; in response to determining to encode the chroma block by the intra prediction mode that uses the luma block: encode the chroma block into an encoded bitstream using pixel values of the luma block; and encode a binary symbol into the encoded bitstream indicating that the chroma block was encoded by the intra prediction mode that uses the luma block: in response to determining to encode the chroma block by an intra prediction mode other than the intra prediction mode that uses the luma block: generate a list of intra prediction modes in a defined sequence; determine an index that identifies the intra prediction mode for the chroma block; encode the index into the encoded bitstream; and encode the chroma block into the encoded bitstream using the intra prediction mode; and encode the luma block into the encoded bitstream.
14. The apparatus of claim 13, wherein the list excludes the intra prediction mode that uses the luma block for coding the chroma block.
15. A non-transitory, computer-readable storage medium storing an encoded
bitstream, the encoded bitstream comprising an encoded block including a luma block and a chroma block, a binary flag indicating whether the chroma block was encoded using an intra prediction mode that uses the luma block to encode the chroma block, and only when the binary flag indicates that the chroma block was not encoded using the intra prediction mode that uses the luma block, a mode index identifying an intra prediction mode used to encode the chroma block from an ordered list of intra prediction modes.
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US202363466433P | 2023-05-15 | 2023-05-15 | |
| US63/466,433 | 2023-05-15 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2024238680A1 true WO2024238680A1 (en) | 2024-11-21 |
Family
ID=91433137
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/US2024/029503 Pending WO2024238680A1 (en) | 2023-05-15 | 2024-05-15 | Chroma prediction mode signaling |
Country Status (1)
| Country | Link |
|---|---|
| WO (1) | WO2024238680A1 (en) |
Citations (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| EP3941065A1 (en) * | 2019-03-15 | 2022-01-19 | Lg Electronics Inc. | Method and device for signaling information on chroma format |
| US20220159263A1 (en) * | 2019-08-01 | 2022-05-19 | Huawei Technologies Co., Ltd. | Encoder, a decoder and corresponding methods of chroma intra mode derivation |
-
2024
- 2024-05-15 WO PCT/US2024/029503 patent/WO2024238680A1/en active Pending
Patent Citations (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| EP3941065A1 (en) * | 2019-03-15 | 2022-01-19 | Lg Electronics Inc. | Method and device for signaling information on chroma format |
| US20220159263A1 (en) * | 2019-08-01 | 2022-05-19 | Huawei Technologies Co., Ltd. | Encoder, a decoder and corresponding methods of chroma intra mode derivation |
Non-Patent Citations (1)
| Title |
|---|
| RAY (QUALCOMM) B ET AL: "On the coding of cclm_mode_flag", no. JVET-Q0392 ; m51987, 8 January 2020 (2020-01-08), XP030223401, Retrieved from the Internet <URL:http://phenix.int-evry.fr/jvet/doc_end_user/documents/17_Brussels/wg11/JVET-Q0392-v2.zip JVET-Q0392-v2/JVET-Q0392-v2.docx> [retrieved on 20200108] * |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US12335469B2 (en) | Adaptive filter intra prediction modes in image/video compression | |
| US10523944B2 (en) | Modifying a scan order to limit scan distance | |
| EP3652939A1 (en) | Compound intra prediction for video coding | |
| GB2546888B (en) | Tile copying for video compression | |
| CN107852495A (en) | Low-latency two-time video coding | |
| GB2582118A (en) | Smart reordering in recursive block partitioning for advanced intra prediction in video coding | |
| US10951894B2 (en) | Transform block-level scan order selection for video coding | |
| KR102294438B1 (en) | Dual Deblocking Filter Thresholds | |
| WO2018118153A1 (en) | Non-causal overlapped block prediction in variable block size video coding | |
| WO2024238680A1 (en) | Chroma prediction mode signaling | |
| US20250008158A1 (en) | Efficient transform signaling for small blocks | |
| WO2025085657A1 (en) | Reference frame selection and signaling for frame context initialization in video coding | |
| WO2024123906A1 (en) | Recursive block partitioning for image and video compression based on power-of-two sizes | |
| WO2024254037A1 (en) | Limiting signaled motion vector syntax for temporally interpolated picture video coding | |
| WO2024243135A1 (en) | Extended palette coding | |
| WO2024158769A1 (en) | Hybrid skip mode with coded sub-block for video coding | |
| EP4584952A1 (en) | Region-based cross-component prediction | |
| WO2024264027A1 (en) | Palette coding with line copy | |
| WO2025259513A1 (en) | Adaptive motion field motion vector loading for video coding | |
| WO2025188890A1 (en) | Transform split restriction and skip texture context derivation optimization for temporally interpolated picture frame prediction | |
| WO2025255299A1 (en) | Directional storage of reference motion field motion vectors | |
| WO2024081011A1 (en) | Filter coefficient derivation simplification for cross-component prediction | |
| WO2024254041A1 (en) | Temporally interpolated picture prediction using a frame-level motion vector | |
| WO2024005777A1 (en) | Circular-shift transformation for image and video coding | |
| WO2022159115A1 (en) | Chroma transform type determination |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 24731784 Country of ref document: EP Kind code of ref document: A1 |