WO2025218691A1 - Procédés et appareil destinés à déterminer de manière adaptative un type de transformée sélectionné dans des systèmes de codage d'image et de vidéo - Google Patents
Procédés et appareil destinés à déterminer de manière adaptative un type de transformée sélectionné dans des systèmes de codage d'image et de vidéoInfo
- Publication number
- WO2025218691A1 WO2025218691A1 PCT/CN2025/089241 CN2025089241W WO2025218691A1 WO 2025218691 A1 WO2025218691 A1 WO 2025218691A1 CN 2025089241 W CN2025089241 W CN 2025089241W WO 2025218691 A1 WO2025218691 A1 WO 2025218691A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- mode
- candidate list
- intra prediction
- current block
- transform
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/134—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
- H04N19/157—Assigned coding mode, i.e. the coding mode being predefined or preselected to be further used for selection of another element or parameter
- H04N19/159—Prediction type, e.g. intra-frame, inter-frame or bidirectional frame prediction
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/102—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
- H04N19/103—Selection of coding mode or of prediction mode
- H04N19/11—Selection of coding mode or of prediction mode among a plurality of spatial predictive coding modes
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/102—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
- H04N19/12—Selection from among a plurality of transforms or standards, e.g. selection between discrete cosine transform [DCT] and sub-band transform or selection between H.263 and H.264
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/169—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
- H04N19/17—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
- H04N19/176—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a block, e.g. a macroblock
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/46—Embedding additional information in the video signal during the compression process
Definitions
- the present invention is a non-Provisional Application of and claims priority to U.S. Provisional Patent Application No. 63/634,520, filed on April 16, 2024.
- the U.S. Provisional Patent Application is hereby incorporated by reference in its entirety.
- the present invention relates to video coding system.
- the present invention relates to determining whether to insert Planar mode, DC mode, or both into a candidate list for deriving the transform set associated with the current block.
- VVC Versatile video coding
- JVET Joint Video Experts Team
- MPEG ISO/IEC Moving Picture Experts Group
- ISO/IEC 23090-3 2021
- Information technology -Coded representation of immersive media -Part 3 Versatile video coding, published Feb. 2021.
- VVC is developed based on its predecessor HEVC (High Efficiency Video Coding) by adding more coding tools to improve coding efficiency and also to handle various types of video sources including 3-dimensional (3D) video signals.
- HEVC High Efficiency Video Coding
- Fig. 1A illustrates an exemplary adaptive Inter/Intra video encoding system incorporating loop processing.
- Intra Prediction 110 the prediction data is derived based on previously coded video data in the current picture.
- Motion Estimation (ME) is performed at the encoder side and Motion Compensation (MC) is performed based on the result of ME to provide prediction data derived from other picture (s) and motion data.
- Switch 114 selects Intra Prediction 110 or Inter Prediction 112 and the selected prediction data is supplied to Adder 116 to form prediction errors, also called residues.
- the prediction error is then processed by Transform (T) 118 followed by Quantization (Q) 120.
- T Transform
- Q Quantization
- the transformed and quantized residues are then coded by Entropy Encoder 122 to be included in a video bitstream corresponding to the compressed video data.
- the bitstream associated with the transform coefficients is then packed with side information such as motion and coding modes associated with Intra prediction and Inter prediction, and other information such as parameters associated with loop filters applied to underlying image area.
- the side information associated with Intra Prediction 110, Inter prediction 112 and in-loop filter 130, is provided to Entropy Encoder 122 as shown in Fig. 1A. When an Inter-prediction mode is used, a reference picture or pictures have to be reconstructed at the encoder end as well.
- the transformed and quantized residues are processed by Inverse Quantization (IQ) 124 and Inverse Transformation (IT) 126 to recover the residues.
- the residues are then added back to prediction data 136 at Reconstruction (REC) 128 to reconstruct video data.
- the reconstructed video data may be stored in Reference Picture Buffer 134 and used for prediction of other frames.
- incoming video data undergoes a series of processing in the encoding system.
- the reconstructed video data from REC 128 may be subject to various impairments due to a series of processing.
- in-loop filter 130 is often applied to the reconstructed video data before the reconstructed video data are stored in the Reference Picture Buffer 134 in order to improve video quality.
- deblocking filter (DF) may be used.
- SAO Sample Adaptive Offset
- ALF Adaptive Loop Filter
- the loop filter information may need to be incorporated in the bitstream so that a decoder can properly recover the required information. Therefore, loop filter information is also provided to Entropy Encoder 122 for incorporation into the bitstream.
- DF deblocking filter
- SAO Sample Adaptive Offset
- ALF Adaptive Loop Filter
- Loop filter 130 is applied to the reconstructed video before the reconstructed samples are stored in the reference picture buffer 134.
- the system in Fig. 1A is intended to illustrate an exemplary structure of a typical video encoder. It may correspond to the High Efficiency Video Coding (HEVC) system, VP8, VP9, H. 264 or VVC.
- HEVC High Efficiency Video Coding
- the decoder can use some of the functional blocks as the encoder.
- the decoder can reuse Inverse Quantization 124 and Inverse Transform 126; however, Transform 118 and Quantization 120 are not needed at the decoder.
- Entropy Encoder 122 the decoder uses an Entropy Decoder 140 to decode the video bitstream into quantized transform coefficients and needed coding information (e.g. ILPF information, Intra prediction information and Inter prediction information) .
- the Intra prediction 150 at the decoder side does not need to perform the mode search. Instead, the decoder only needs to generate Intra prediction according to Intra prediction information received from the Entropy Decoder 140. Furthermore, for Inter prediction, the decoder only needs to perform motion compensation (MC 152) according to Inter prediction information received from the Entropy Decoder 140 without the need for motion estimation.
- the Sequence Parameter Set (SPS) and the Picture Parameter Set (PPS) contain high-level syntax elements that apply to entire coded video sequences and pictures, respectively.
- the Picture Header (PH) and Slice Header (SH) contain high-level syntax elements that apply to a current coded picture and a current coded slice, respectively.
- a coded picture is partitioned into non-overlapped square block regions represented by the associated coding tree units (CTUs) .
- a coded picture can be represented by a collection of slices, each comprising an integer number of CTUs. The individual CTUs in a slice are processed in raster-scan order.
- a bi-predictive (B) slice may be decoded using intra prediction or inter prediction with at most two motion vectors and reference indices to predict the sample values of each block.
- a predictive (P) slice is decoded using intra prediction or inter prediction with at most one motion vector and reference index to predict the sample values of each block.
- An intra (I) slice is decoded using intra prediction only.
- each CTU can be partitioned into one or multiple non-overlapped coding units (CUs) using a Quaternary Tree (QT) with nested Multi-Type-Tree (MTT) structure.
- QT Quaternary Tree
- MTT Multi-Type-Tree
- the partitioning information is signalled by a coding tree syntax structure, where each CTU is treated as the root of a coding tree.
- the CTUs may be first partitioned by the quaternary tree (a. k. a. quadtree) structure, as shown in Fig. 2A. Then the quaternary tree leaf nodes can be further partitioned by a MTT structure, as shown in Figs. 2B-E.
- Each quadtree child node may be further split into smaller coding tree nodes using any one of five split types in Fig. 2. However, each multi-type-tree child node is only allowed to be further split by one of four MTT split types.
- the coding tree leaf nodes correspond to the coding units (CUs) .
- Fig. 3 provides an example of a CTU recursively partitioned by QT with the nested MTT, where the bold block edges represent quadtree partitioning and the remaining edges represent multi-type tree partitioning.
- Each CU contains one or more Prediction Units (PUs) .
- the prediction unit together with the associated CU syntax, works as a basic unit for signalling the predictor information.
- the specified prediction process is employed to predict the values of the associated pixel samples inside the PU.
- Each CU may contain one or more transform units (TUs) for representing the prediction residual blocks.
- a transform unit (TU) is comprised of one transform block (TB) of luma samples and two corresponding transform blocks of chroma samples. Each TB corresponds to one residual block of samples from one colour component.
- An integer transform is applied to a transform block.
- the level values of quantized coefficients together with other side information are entropy coded in the bitstream.
- coding tree block CB
- CB coding block
- PB prediction block
- TB transform block
- Matrix weighted intra prediction (MIP) method is a newly added intra prediction technique in VVC. For predicting the samples of a rectangular block of width W and height H, matrix weighted intra prediction (MIP) takes one line of H reconstructed neighbouring boundary samples left of the block and one line of W reconstructed neighbouring boundary samples above the block as input. If the reconstructed samples are unavailable, they are generated as it is done in the conventional intra prediction. The generation of the prediction signal is based on the following three steps, i.e., averaging, matrix vector multiplication and linear interpolation as shown in Fig. 4.
- One line of H reconstructed neighbouring boundary samples 412 left of the block and one line of W reconstructed neighbouring boundary samples 410 above the block are shown as dot-filled small squares.
- the boundary samples are down-sampled to top boundary line 414 and left boundary line 424.
- the down-sampled samples are provided to the matric-vector multiplication unit 420 to generate the down-sampled prediction block 430.
- An interpolation process is then applied to generate the prediction block 440.
- LFNST is applied between forward primary transform and quantization (at the encoder side) and between de-quantization and inverse primary transform (at the decoder side) as shown in Fig. 5.
- Forward Primary Transform 510 Forward Low-Frequency Non-Separable Transform LFNST 520 is applied to top-left region 522 of the Forward Primary Transform output, for example, 16 coefficients for 4x4 forward LFNST and/or 64 coefficients for 8x8 forward LFNST.
- 4x4 non-separable transform or 8x8 non-separable transform is applied according to block size.
- 4x4 LFNST is applied for small blocks (i.e., min (width, height) ⁇ 8) and 8x8 LFNST is applied for larger blocks (i.e., min (width, height) > 4) .
- the transform coefficients are quantized by Quantization 530.
- the quantized transform coefficients are de-quantized using De-Quantization 540 to obtain the de-quantized transform coefficients.
- Inverse LFNST 550 is applied to the top-left region 552 (8 coefficients for 4x4 inverse LFNST or 16 coefficients for 8x8 inverse LFNST) .
- inverse Primary Transform 560 is applied to recover the input signal.
- the selected transform set for a current block is determined by the intra prediction mode associated with the current block, where a look-up table is pre-defined for mapping each intra prediction mode to the associated LFNST transform set.
- JVET Joint Video Expert Team
- ITU-T SG16 WP3 and ISO/IEC JTC1/SC29/WG11 are currently in the process of exploring the next-generation video coding standard.
- Some promising new coding tools have been adopted into Enhanced Compression Model 12 (ECM 12) (M. Coban, F. Le Léannec, R. -L. Liao, K. Naser, J. L. Zhang “Algorithm description of Enhanced Compression Model 12 (ECM 12) , ” Joint Video Expert Team (JVET) of ITU-T SG 16 WP 3 and ISO/IEC JTC 1/SC 29, Doc. JVET-AG2025, 33rd Meeting, by teleconference, 17–26 January 2024) to further improve VVC.
- ECM 12 Enhanced Compression Model 12
- a template-based intra mode derivation (TIMD) coding tool is utilised for intra coding a coding unit.
- a CU syntax flag is coded to indicate whether the TIMD tool is enabled for coding a current CU or not.
- a template-matching cost based on the sum of absolute transformed differences (SATD) between the prediction samples (612 and 614) and reconstruction samples (620 and 622) of the template for the current CU 610 is calculated, as illustrated in Fig. 6.
- the two intra prediction modes with the lowest SATD and the second lowest SATD are referred to as the primary TIMD intra prediction mode and the secondary TIMD intra prediction mode, respectively.
- a Decoder Side Intra Mode Derivation (DIMD) tool is utilised to determine the intra angular prediction mode of a block.
- DIMD Decoder Side Intra Mode Derivation
- a horizontal gradient and a vertical gradient are calculated for each of the reconstructed samples in a template region of the current block to build a histogram of gradients (HoG) with 65 bins, corresponding to the 65 angular modes in VVC.
- the final prediction signal is generated by a fusion of up to six intra predictors, where one predictor corresponds to Planar mode and the others correspond to the angular modes with the largest amplitude values in the HoG.
- the fusion weight for Planar is set to a fixed value equal to 1/4 when the number of fused intra predictors is greater than 2.
- the fusion weights for angular modes are determined by the corresponding amplitude values in the HoG.
- Three different kernel types, LFNST4, LFNST8, and LFNST16, are applied to 4xN/Nx4 (N ⁇ 4) , 8xN/Nx8 (N ⁇ 8) , and MxN (M, N ⁇ 16) , respectively.
- the separable DCT-II plus LFNST transform combinations can be replaced with non-separable primary transform (NSPT) for the block shapes 4x4, 4x8, 8x4 and 8x8, 4x16, 16x4, 8x16 and 16x8.
- NPT non-separable primary transform
- all NSPTs consist of 35 sets and each set further consists of 3 candidates (like the current LFNST in ECM-12.0) .
- the selected transform set for a block is determined by the intra prediction mode associated with the block, where a look-up table is pre-defined for mapping each intra prediction mode to one associated LFNST or NSPT transform set out of 35 LFNST or NSPT sets.
- a syntax element is further signalled to indicate a selected kernel from 3 candidate kernels in the associated LFNST or NSPT transform set.
- the DIMD tool is further utilised to derive the intra prediction mode (DIMD) 710 associated with a current block based on the prediction signal of the current block as shown in Fig. 7. Specifically, a horizontal gradient and a vertical gradient are calculated for each predicted sample of the current block to build a histogram of gradient (HoG) 720.
- the derived intra prediction mode is set equal to the angular mode with the largest histogram amplitude value in the HoG.
- This scheme has been adopted for deriving an intra prediction mode that is used to determine the LFNST/NSPT transform set and LFNST/NSPT transpose flag associated with a coded block in MIP, intra template matching prediction (IntraTMP) , extrapolation filter-based intra prediction (EIP) , or Inter mode.
- IntraTMP intra template matching prediction
- EIP extrapolation filter-based intra prediction
- Inter mode for MIP, this is done before up-sampling and is illustrated in Fig. 7.
- the scheme has also been adopted for deriving an intra prediction mode that is used to determine the multiple transform selection (MTS) set associated with a coded block in IntraTMP, or EIP mode.
- matrix vector multiplication module 730 is used to generate MIP prediction 740; and further processed by MIP prediction upsampling module 750 to generate the upsampled output 760.
- a method and apparatus for video coding comprising selecting LFNST or NSPT are disclosed.
- input data is received, wherein the input data comprises residual data for a current block at an encoder side or coded transformed residual data for the current block at a decoder side.
- a first candidate list comprising a plurality of member intra prediction modes for the current block is derived. Whether to insert Planar mode, DC mode, or both into the first candidate list to form an updated first candidate list for deriving a transform set associated with the current block is determined.
- a target intra prediction mode from the updated first candidate list is selected.
- a target set of LFNST (Low-Frequency Non-Separable Transform) or NSPT (Non-Separable Primary Transform) is derived according to the target intra prediction mode.
- Forward transform is applied to the residual data according to a selected transform kernel of the target set of LFNST or NSPT at the encoder side to generate transformed data at the encoder side, or inverse transform is applied to the coded transformed residual data according to the selected transform kernel of the target set of LFNST or NSPT to derive reconstructed residual data.
- the transformed data at the encoder side or the reconstructed residual data at the decoder side is provided.
- whether to insert the Planar mode, the DC mode, or both into the first candidate list is dependent on block dimension of the current block.
- the block dimension of the current block comprises width, height, area size, or a combination thereof.
- whether to insert the Planar mode, the DC mode, or both into the first candidate list is dependent on DIMD (Decoder Side Intra Mode Derivation) HoG (Histogram of Gradients) generated by predicted samples of the current block or reconstructed samples in a template region of the current block.
- DIMD Decoder Side Intra Mode Derivation
- HoG Heistogram of Gradients
- a position in the first candidate list for insertion of a non-angular intra prediction candidate is fixed or determined adaptively according to contextual information for coding the current block.
- the first candidate list comprises a first mode and a second mode selected from the Planar mode and a derived prediction mode corresponding to largest HoG (Histogram of Gradients) amplitude.
- whether to insert the Planar mode or the DC mode into the first candidate list is determined by comparing an average amplitude value of HoG with a first threshold and a second threshold.
- the Planar mode is inserted to a first mode position in the first candidate list and mode 0 is moved to a second mode position in the first candidate list if the average amplitude value of HoG is smaller than the first threshold, and wherein the first threshold is smaller than the second threshold, and mode 0 corresponds to a first intra prediction mode in an original first candidate list used to derive the first candidate list; otherwise, the mode 0 remains at the first mode position in the first candidate list and the Planar mode is inserted into the second mode position in the first candidate list if the average amplitude value of HoG is between the first threshold and the second threshold; otherwise, the mode 0 remains at the first mode position in the first candidate list and mode 1 remains at the second mode position in the first candidate list if the average amplitude value of HoG is larger than the second threshold.
- a number of the plurality of member intra prediction modes in the first candidate list or the updated first candidate list is determined adaptively by considering context information for coding the current block. In one embodiment, the number of the plurality of member intra prediction modes in the first candidate list or the updated first candidate list is determined according to block width, block height, block area, quantization parameter (QP) , and/or coding mode of the current block, and/or intra prediction modes of neighboring blocks.
- QP quantization parameter
- the number of the plurality of member intra prediction modes in the first candidate list or the updated first candidate list is signalled in one or more high-level syntax sets comprising SPS (Sequence Parameter Set) , PPS (Picture Parameter Set) , PH (Picture Header) , SH (Slice Header) , or a combination thereof.
- Fig. 1A illustrates an exemplary adaptive Inter/Intra video encoding system incorporating loop processing.
- Fig. 1B illustrates a corresponding decoder for the encoder in Fig. 1A.
- Fig. 2A-Fig. 2E illustrates examples of a multi-type tree structure corresponding to quadtree splitting (Fig. 2A) vertical binary splitting (SPLIT_BT_VER, Fig. 2B) , horizontal binary splitting (SPLIT_BT_HOR, Fig. 2C) , vertical ternary splitting (SPLIT_TT_VER, Fig. 2D) , and horizontal ternary splitting (SPLIT_TT_HOR, Fig. 2E) .
- Fig. 3 shows an example of a CTU divided into multiple CUs with a quadtree and nested multi-type tree coding block structure, where the bold block edges represent quadtree partitioning and the remaining edges represent multi-type tree partitioning.
- Fig. 4 illustrates an example of processing flow for Matrix weighted intra prediction (MIP) .
- Fig. 5 illustrates an example of Low-Frequency Non-Separable Transform (LFNST) process.
- Fig. 6 illustrates an example of template and its reference samples used in TIMD.
- Fig. 7 illustrates an example of LFNST modification for MIP coded blocks, which utilizes DIMD to derive the LFNST transform set and determine LFNST transpose flag.
- Fig. 8 illustrates a flowchart of an exemplary video coding system that determines whether to insert Planar mode or DC mode into a first candidate to form an updated first candidate list for deriving a transform set associated with the current block according to an embodiment of the present invention.
- DIMD is utilised to determine the current intra prediction mode for deriving the transform set of MTS, LFNST or NSPT associated with a current block.
- DIMD can only be used to classify a block into one of 65 angular modes and cannot detect an intra prediction mode corresponding to one of DC and Planar modes, which are commonly referred to as non-angular or non-directional modes.
- new methods are disclosed to improve the current scheme for determining the intra prediction mode associated with a current block based on DIMD in an image or video coding system. Particularly, methods are proposed to determine if the current prediction mode may belong to Planar or DC for determining the transform set associated with a current block.
- a video coder may determine whether to set the intra prediction mode of a current block equal to one of Planar and DC for deriving the transform set associated with the current block considering the block dimension (e.g. width, height, and/or area size) of the current block. For example, when one or both of block width and height of a current block are greater than a specified threshold T wh , a video coder may set the current intra prediction mode equal to Planar for deriving the selected transform set. Otherwise, the current intra prediction mode may be determined by DIMD or other specified methods. In some preferred embodiments, T wh may be set equal to 8, 16, or 32.
- a video coder may determine whether to set the current intra prediction mode of a current block equal to one of Planar and DC for deriving the transform set associated with the current block dependent on distribution of the HoG built up by DIMD, wherein the DIMD HoG can be generated from the predicted samples of the current block or the reconstructed samples in the template region of the current block.
- a video coder may determine whether to set the current intra prediction mode equal to Planar or DC for deriving the associated transform set considering the ratio of the largest amplitude value or the sum of a plurality of largest amplitude values of the HoG to the sum of all amplitude values of the HoG.
- a video coder may set the intra prediction mode of a current block equal to Planar for deriving the associated transform set. Otherwise, the current intra prediction mode may be set equal to the prediction mode selected by DIMD or other specified methods.
- T 1 may be set equal to 0.0625, 0.125, 0.1875, 0.25, 0.375, or 0.5.
- the said plurality of largest amplitude values of the HoG may correspond to two or more intra prediction modes that are utilised for generating a fused intra prediction for predicting a current block such as the selected intra prediction modes in TIMD or DIMD mode.
- a video coder may determine whether to set the current intra prediction mode of a current block equal to Planar or DC for deriving the associated transform set considering the average HoG amplitude values. For example, when the sum of HoG amplitude values is less than a specified threshold, T a scaled by the number of samples used for calculating the HoG, a video coder may set the intra prediction mode of a current block equal to Planar. Otherwise, the current intra prediction mode may be set equal to the prediction mode selected by DIMD or other specified methods.
- a video coder may determine whether to set the current intra prediction mode equal to one of Planar and DC for deriving the transform set associated with the current block dependent on the prediction modes of neighbouring blocks. For example, a video coder may set the intra prediction mode of a current block equal to Planar or DC for deriving the associated transform set when both pre-defined above and left neighbouring blocks of the current block are predicted in Planar or DC mode or when all specified neighbouring blocks of the current block are predicted in Planar or DC mode.
- a video coder may determine whether to set the current intra prediction mode equal to one of Planar and DC for deriving the transform set associated with the current block by considering the template-matching costs associated with Planar and/or DC, wherein the template matching cost may be calculated as the SATD or the sum of absolute differences (SAD) between the prediction and reconstruction samples of the template along the coded blocks boundaries. For example, a video coder may calculate the template matching costs for predictors corresponding to Planar, DC and the angular mode with the largest histogram amplitude value in the DIMD HoG, respectively. The video coder may then determine the selected intra prediction mode for deriving the transform set associated with a current block by comparing the template-matching costs associated with three intra predictors. In one embodiment, the video coder may just set the intra prediction mode of a current block equal to Planar or DC for deriving the associated transform set when Planar or DC corresponds to the lowest template matching cost.
- the template matching cost may be calculated as the SATD or the sum of absolute differences (SAD)
- the proposed methods may further comprise signalling one or more syntax elements in one or more high-level syntax sets to indicate the selected method in a current video data unit, wherein the high-level syntax sets may comprise SPS, PPS, PH, SH, or a combination thereof.
- the proposed methods may further comprise signalling one or more syntax elements to explicitly specify threshold values used in the proposed methods.
- a video coder may signal separate sets of syntax elements for the proposed methods applied to blocks coded in different CU modes.
- a video coder may determine whether to set the current intra prediction mode of a current block equal to one of Planar and DC for deriving the transform set associated with the current block considering the block dimension of the current block and distribution of the HoG built by DIMD jointly.
- a video coder may set the current intra prediction mode equal to Planar. Otherwise, the current intra prediction mode may be set equal to the prediction mode selected by the DIMD HoG or other specified methods.
- the said HoG built by DIMD can be generated by the predicted samples in the current block or the reconstructed samples in the template region of the current block.
- anyone of the proposed methods for determining whether to set the current intra prediction mode of a current block equal to one of Planar and DC for deriving the transform set associated with the current block can be utilised as a general method for detecting or predicting the presence of a non-angular intra prediction mode for a block in a video coding system.
- a video coder may comprise a plurality of intra prediction mode candidates for deriving the transform set associated with a current block.
- the video coder may further comprise creating a candidate list of intra prediction modes for deriving the transform set associated with the current block, wherein the intra prediction mode candidates may be selected based on the DIMD HoG or other specified methods dependent on the coding mode of the current block. For example, when a current block is coded in DIMD or TIMD mode, the candidate list of intra prediction modes for deriving the transform set associated with a current block may comprise two or more selected intra prediction modes used for generating a fused intra prediction signal.
- the candidate list of intra prediction modes for deriving the transform set associated with a current block may comprise two or more angular prediction modes corresponding to two or more largest amplitude values of the HoGs built by the DIMD.
- the video coder may further comprise encoding or decoding one or more syntax elements to indicate the selected intra prediction mode from the candidate list for deriving the associated transform set.
- the video coder may further comprise determining whether to insert Planar or DC to the candidate list of intra prediction modes for the current block by a specified method.
- the video coder may further comprise reordering the intra prediction modes in the candidate list for the current block by a specified method and encoding or decoding syntax information indicating the selected intra prediction mode based on the reordered indices.
- our previous methods proposed for determining whether to set the current intra prediction mode of a current block equal to Planar or DC for deriving the associated transform set may be further applied to determining whether to insert Planar or DC to the candidate list of intra prediction modes and/or applied to reordering the intra prediction modes in the candidate list for the current block. If it is determined that the intra prediction mode of a current block shall be set equal to Planar or DC for deriving the associated transform set by one of our previous methods, then a video coder may further add Planar or DC to the candidate list and/or prioritize Planar or DC in the candidate list for deriving the associated transform set for the current block.
- a video coder may determine whether to further add a non-angular mode, such as Planar or DC to a candidate list of intra prediction modes for deriving the associated transform set with a current block by comparing the block dimension (e.g. width, height, and/or area size) of the current block with a specified threshold T wh or by considering distribution of the HoG built up by DIMD using the predicted samples for the current block or the reconstructed samples in the template region of the current block.
- a non-angular mode such as Planar or DC
- a video coder may further add a Planar mode to a candidate list of intra prediction modes for deriving the associated transform set with a current block when one of the following conditions is met: the block size dimension of the current block is greater than a specified threshold T wh ; the average amplitude value of the HoG is less a specified threshold T a ; or the ratio of the largest amplitude value of the HoG to the sum of the all amplitude values of the HoG is less than a specified threshold T 1 .
- the HoG is built up by DIMD using the predicted samples for the current block or the reconstructed samples in the template region of the current block.
- a Planar or DC prediction mode when it is determined that a Planar or DC prediction mode shall be additionally inserted into a candidate list of the intra prediction modes, it is inserted to some fixed index position in the candidate list for a particular coding mode.
- the candidate list of intra prediction modes for deriving the transform set associated with the current block may comprise the Planar mode as the first mode and the prediction mode corresponding to the largest HoG amplitude as the second mode.
- the candidate list may comprise the prediction mode corresponding to the largest HoG amplitude as the first mode and the Planar mode as the second mode.
- the HoG is built by DIMD using the predicted samples for the current block or the reconstructed samples in the template region of the current block.
- the index position for the additional Planar or DC prediction mode in the candidate list of the intra prediction modes for a particular coding mode can be adaptively determined for a current block depending on contextual information related to the current block, such as the coding mode, block dimension, predicted samples, and reconstructed neighbouring samples associated with the current block.
- a video coder may determine the index position for the additional Planar or DC prediction mode in a candidate list for deriving the associated transform set with a current block in a particular coding mode by comparing the block dimension (e.g. width, height, and/or area size) of the current block with one or more sets of specified threshold values, or comparing the average amplitude value of the HoG with another set of specified threshold values.
- the index position is determined by comparing the ratio of the largest amplitude value of the HoG to the sum of the all amplitude values of the HoG with another set of specified threshold values.
- the HoG is built up by DIMD using the predicted samples for the current block or the reconstructed samples in the template region of the current block.
- a video coder may adaptively determine the number of the intra prediction modes in a candidate list for deriving the transform set associated with a current block.
- a video coder may determine the number of intra prediction mode candidates adaptively considering context information for coding a current block, such as block dimension (e.g. width, height, and/or area size) , quantization parameter (QP) , the coding mode of the current block and/or the intra prediction modes of the neighbouring blocks associated with the current block.
- context information for coding a current block such as block dimension (e.g. width, height, and/or area size) , quantization parameter (QP) , the coding mode of the current block and/or the intra prediction modes of the neighbouring blocks associated with the current block.
- QP quantization parameter
- a video coder may determine the number of intra prediction mode candidates for a current block by comparing the block area size (in number of pixels) with a plurality of threshold value.
- a video coder may determine the number of intra prediction mode candidates for a current block by considering syntax information related to signalling the position of the last significant coefficients in the current block.
- a video coder may signal the number of intra prediction mode candidates in one or more high-level syntax sets such as SPS, PPS, PH, and SH.
- the proposed methods can be applied to different coding modes to determine the current intra prediction mode of a current block for deriving the associated transform set of LFNST, NSPT, or MTS, such as the intraTMP mode, the cross-component linear model (CCLM) mode, the template-based intra mode derivation (TIMD) mode, the decoder side intra mode derivation (DIMD) mode, extrapolation filter-based intra prediction (EIP) , the Inter mode, and the intra block copy (IBC) mode in ECM 12.
- the intraTMP mode the cross-component linear model
- CCLM cross-component linear model
- TMD template-based intra mode derivation
- DIMD decoder side intra mode derivation
- EIP extrapolation filter-based intra prediction
- Inter mode the Inter mode
- IBC intra block copy
- a video coder may further adjust the value of the fusion weight for the intra non-angular predictor adaptively when the number of fused intra predictors is greater than 2 in DIMD mode.
- a video coder may further comprise determining the value of the fusion weight for the intra non-angular predictor for a current block in DIMD mode, wherein the fusion weight for the intra Planar predictor can be adaptively determined considering the distribution of the HoG built by DIMD and other coding conditions, such as the quantization parameter, block dimension, intra prediction modes of the neighbouring blocks associated with the current block.
- the video coder may further comprise detecting or predicting the presence of non-angular prediction mode for the current block. Specifically, our methods proposed for determining whether to set the current intra prediction mode of a current block equal to Planar or DC for deriving the associated transform set can be further applied to determining the fusion weight associated with the intra non-angular predictor used in the DIMD mode. If it is determined by one of our previous methods that the intra prediction mode of a current block shall be set equal to a non-angular mode for deriving the associated transform set by one of our previous methods, the video coder may further increase the fusion weight value for the non-angular predictor in the DIMD mode.
- a video coder may determine the value of the fusion weight for the non-angular intra predictor by comparing the current block size (e.g. width, height, or area size) with one or more threshold values. In some embodiments, a video coder may determine the value of the fusion weight for the intra non-angular predictor by comparing an average amplitude of HoG built by DIMD with another set of threshold values. In some embodiments, a video coder may determine the value of the fusion weight for the intra Planar predictor considering one or more largest amplitude values in comparison with the sum of all histogram amplitude values.
- any of the foregoing proposed methods of adaptively inserting Planar or DC mode into a candidate list for deriving a transform set for the current block can be implemented in encoders and/or decoders.
- any of the proposed methods can be implemented in an intra prediction module of an encoder, and/or an intra prediction module of a decoder.
- any of the proposed methods can be implemented as a circuit integrated to the intra prediction module of the encoder and/or the intra prediction module of the decoder.
- the proposed aspects, methods and related embodiments can be implemented individually or jointly in an image and video coding system.
- the proposed aspects, methods and related embodiments can be implemented individually or jointly in an image and video coding system.
- the proposed methods as described above can be implemented in an encoder side or a decoder side with reference to Fig. 1A and Fig. 1B.
- any of the proposed methods can be implemented in an Intra coding module (e.g. Intra Pred. 150 in Fig. 1B) in a decoder or an Intra coding module in an encoder (e.g. Intra Pred. 110 in Fig. 1A) .
- Any of the proposed candidate derivation method can also be implemented as circuits coupled to the intra coding module at the decoder or the encoder.
- the decoder or encoder may also use additional processing unit to implement the required processing.
- Intra Pred. units e.g. unit 110 in Fig. 1A and unit 150 in Fig. 1B
- a media such as hard disk or flash memory
- a CPU Central Processing Unit
- programmable devices e.g. DSP (Digital Signal Processor) or FPGA (Field Programmable Gate Array) .
- Fig. 8 illustrates a flowchart of an exemplary video coding system that determines whether to insert Planar mode or DC mode into a first candidate to form an updated first candidate list for deriving a transform set associated with the current block according to an embodiment of the present invention.
- the steps shown in the flowchart may be implemented as program codes executable on one or more processors (e.g., one or more CPUs) at the encoder side.
- the steps shown in the flowchart may also be implemented based on hardware such as one or more electronic devices or processors arranged to perform the steps in the flowchart.
- input data is received in step 810, wherein the input data comprises residual data for a current block at an encoder side or coded transformed residual data for the current block at a decoder side.
- a first candidate list comprising a plurality of member intra prediction modes for the current block is derived in step 820. Whether to insert Planar mode, DC mode, or both into the first candidate list to form an updated first candidate list for deriving a transform set associated with the current block is determined in step 830.
- a target intra prediction mode from the updated first candidate list is selected in step 840.
- a target set of LFNST (Low-Frequency Non-Separable Transform) or NSPT (Non-Separable Primary Transform) is derived according to the target intra prediction mode in step 850.
- Forward transform is applied to the residual data according to a selected transform kernel of the target set of LFNST or NSPT at the encoder side to generate transformed data at the encoder side, or inverse transform is applied to the coded transformed residual data according to the selected transform kernel of the target set of LFNST or NSPT to derive reconstructed residual data in step 860.
- the transformed data at the encoder side or the reconstructed residual data at the decoder side is provided in step 870.
- forward transform has been used to make distinguishment with “inverse transform” , and is not intended to bring additional limitation to the term “transform” in this patent.
- Embodiment of the present invention as described above may be implemented in various hardware, software codes, or a combination of both.
- an embodiment of the present invention can be one or more circuit circuits integrated into a video compression chip or program code integrated into video compression software to perform the processing described herein.
- An embodiment of the present invention may also be program code to be executed on a Digital Signal Processor (DSP) to perform the processing described herein.
- DSP Digital Signal Processor
- the invention may also involve a number of functions to be performed by a computer processor, a digital signal processor, a microprocessor, or field programmable gate array (FPGA) .
- These processors can be configured to perform particular tasks according to the invention, by executing machine-readable software code or firmware code that defines the particular methods embodied by the invention.
- the software code or firmware code may be developed in different programming languages and different formats or styles.
- the software code may also be compiled for different target platforms.
- different code formats, styles and languages of software codes and other means of configuring code to perform the tasks in accordance with the invention will not depart from the spirit and scope of the invention.
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Physics & Mathematics (AREA)
- Discrete Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Compression Or Coding Systems Of Tv Signals (AREA)
Abstract
Un procédé et un appareil de codage vidéo comprenant la sélection d'une LFNST ou d'une NSPT sont divulgués. Selon ce procédé, s'il faut insérer un mode « plan » ou un mode « DC » dans une liste de candidats est déterminé. Un mode de prédiction intra cible à partir de la liste de candidats mise à jour est sélectionné. Un ensemble cible de LFNST ou de NSPT est dérivé selon le mode de prédiction intra cible. Une transformée directe est appliquée aux données résiduelles selon un noyau de transformée sélectionné de l'ensemble cible de LFNST ou NSPT au niveau côté codeur afin de générer des données transformées au niveau côté codeur, ou une transformée inverse est appliquée aux données résiduelles transformées codées selon le noyau de transformée sélectionné de l'ensemble cible de LFNST ou NSPT afin de dériver des données résiduelles reconstruites. Les données transformées au niveau côté codeur ou les données résiduelles reconstruites au niveau côté décodeur sont fournies.
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US202463634520P | 2024-04-16 | 2024-04-16 | |
| US63/634520 | 2024-04-16 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2025218691A1 true WO2025218691A1 (fr) | 2025-10-23 |
Family
ID=97402943
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/CN2025/089241 Pending WO2025218691A1 (fr) | 2024-04-16 | 2025-04-16 | Procédés et appareil destinés à déterminer de manière adaptative un type de transformée sélectionné dans des systèmes de codage d'image et de vidéo |
Country Status (1)
| Country | Link |
|---|---|
| WO (1) | WO2025218691A1 (fr) |
Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20180020218A1 (en) * | 2016-07-15 | 2018-01-18 | Qualcomm Incorporated | Look-up table for enhanced multiple transform |
| US20220329819A1 (en) * | 2021-04-12 | 2022-10-13 | Qualcomm Incorporated | Low frequency non-separable transform for video coding |
| WO2023043885A1 (fr) * | 2021-09-15 | 2023-03-23 | Beijing Dajia Internet Information Technology Co., Ltd., | Prédiction de signes pour codage vidéo basé sur un bloc |
| US20230100043A1 (en) * | 2021-09-30 | 2023-03-30 | Tencent America LLC | Adaptive Transforms for Compound Inter-Intra Prediction Modes |
| WO2024017187A1 (fr) * | 2022-07-22 | 2024-01-25 | Mediatek Inc. | Procédé et appareil de nouvelle prédiction intra avec des combinaisons de lignes de référence et de modes de prédiction intra dans un système de codage vidéo |
-
2025
- 2025-04-16 WO PCT/CN2025/089241 patent/WO2025218691A1/fr active Pending
Patent Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20180020218A1 (en) * | 2016-07-15 | 2018-01-18 | Qualcomm Incorporated | Look-up table for enhanced multiple transform |
| US20220329819A1 (en) * | 2021-04-12 | 2022-10-13 | Qualcomm Incorporated | Low frequency non-separable transform for video coding |
| WO2023043885A1 (fr) * | 2021-09-15 | 2023-03-23 | Beijing Dajia Internet Information Technology Co., Ltd., | Prédiction de signes pour codage vidéo basé sur un bloc |
| US20230100043A1 (en) * | 2021-09-30 | 2023-03-30 | Tencent America LLC | Adaptive Transforms for Compound Inter-Intra Prediction Modes |
| WO2024017187A1 (fr) * | 2022-07-22 | 2024-01-25 | Mediatek Inc. | Procédé et appareil de nouvelle prédiction intra avec des combinaisons de lignes de référence et de modes de prédiction intra dans un système de codage vidéo |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| WO2017084577A1 (fr) | Procédé et appareil pour un mode d'intraprédiction utilisant un filtre d'intraprédiction en compression vidéo et image | |
| WO2023116716A1 (fr) | Procédé et appareil pour modèle linéaire de composante transversale pour une prédiction inter dans un système de codage vidéo | |
| WO2023241637A9 (fr) | Procédé et appareil de prédiction inter-composantes avec mélange dans des systèmes de codage vidéo | |
| WO2017036422A1 (fr) | Procédé et appareil de décalage de prédiction dérivé sur la base d'une zone voisine dans un codage vidéo | |
| WO2023202713A1 (fr) | Procédé et appareil de dérivation de vecteur de mouvement en mode de fusion affine basée sur une régression dans de systèmes de codage vidéo | |
| WO2023193516A9 (fr) | Procédé et appareil utilisant un mode de prédiction intra basé sur une courbe ou un angle d'étalement dans un système de codage vidéo | |
| WO2025218691A1 (fr) | Procédés et appareil destinés à déterminer de manière adaptative un type de transformée sélectionné dans des systèmes de codage d'image et de vidéo | |
| WO2025237222A1 (fr) | Procédés et appareil destinés à déterminer de manière adaptative un type de transformée dans des systèmes de codage d'image et de vidéo | |
| WO2025237149A1 (fr) | Procédés et appareil de prédiction intra et de sélection de type de transformée dans des systèmes de codage d'image et de vidéo | |
| WO2024235244A1 (fr) | Procédés et appareil de sélection de type de transformée dans un système de codage vidéo | |
| WO2025237150A1 (fr) | Procédés et appareil de sélection de type de transformée multiple dans des systèmes de codage d'image et de vidéo | |
| WO2024230472A1 (fr) | Procédés et appareil de fusion de mode intra dans une image et système de codage vidéo | |
| WO2024230464A1 (fr) | Procédés et appareil pour une prédiction intra de lignes de référence multiples dans une image et système de codage vidéo | |
| WO2025157012A1 (fr) | Procédés et appareil de fusion de signaux de prédiction intra dans une image et système de codage vidéo | |
| WO2025153050A1 (fr) | Procédés et appareil de prédiction intra basée sur un filtre avec de multiples hypothèses dans des systèmes de codage vidéo | |
| WO2025223420A1 (fr) | Procédés et appareil de codage vidéo | |
| WO2025148640A1 (fr) | Procédé et appareil de mélange basé sur la régression pour améliorer la fusion de prédiction intra dans un système de codage vidéo | |
| WO2025148904A1 (fr) | Procédés et appareil de prédiction intra basée sur un filtre pour système de codage vidéo | |
| WO2024222798A1 (fr) | Procédés et appareil pour hériter de modèles à composants transversaux décalés par vecteur de bloc pour un codage vidéo | |
| WO2024213100A1 (fr) | Procédés et appareil de prédiction intra fondée sur une matrice pour codage vidéo | |
| WO2025077755A1 (fr) | Procédés et appareil de mémoire tampon partagée pour un héritage de modèle de prédiction intra par extrapolation dans un codage vidéo | |
| WO2025007977A1 (fr) | Procédé et appareil permettant de construire une liste de candidats pour hériter de modèles inter-composants voisins pour un codage inter de chrominance | |
| WO2024088340A1 (fr) | Procédé et appareil pour hériter de multiples modèles inter-composants dans un système de codage vidéo | |
| WO2025152827A1 (fr) | Procédés et appareil de dérivation de filtre pour prédiction intra basée sur un filtre dans un système de codage vidéo | |
| WO2024213093A1 (fr) | Procédés et appareil de mélange de prédiction intra pour codage vidéo |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 25789936 Country of ref document: EP Kind code of ref document: A1 |