[go: up one dir, main page]

WO2024027784A1 - Procédé et appareil de prédiction de vecteurs de mouvement temporel basée sur un sous-bloc avec réorganisation et affinement dans un codage vidéo - Google Patents

Procédé et appareil de prédiction de vecteurs de mouvement temporel basée sur un sous-bloc avec réorganisation et affinement dans un codage vidéo Download PDF

Info

Publication number
WO2024027784A1
WO2024027784A1 PCT/CN2023/110903 CN2023110903W WO2024027784A1 WO 2024027784 A1 WO2024027784 A1 WO 2024027784A1 CN 2023110903 W CN2023110903 W CN 2023110903W WO 2024027784 A1 WO2024027784 A1 WO 2024027784A1
Authority
WO
WIPO (PCT)
Prior art keywords
motion
subblock
current block
shift
candidate
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/CN2023/110903
Other languages
English (en)
Inventor
Yu-Ling Hsiao
Chih-Wei Hsu
Ching-Yeh Chen
Tzu-Der Chuang
Yu-Wen Huang
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
MediaTek Inc
Original Assignee
MediaTek Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by MediaTek Inc filed Critical MediaTek Inc
Priority to CN202380056453.3A priority Critical patent/CN119654869A/zh
Publication of WO2024027784A1 publication Critical patent/WO2024027784A1/fr
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • H04N19/51Motion estimation or motion compensation
    • H04N19/513Processing of motion vectors
    • H04N19/517Processing of motion vectors by encoding
    • H04N19/52Processing of motion vectors by encoding by predictive encoding

Definitions

  • the present invention is a non-Provisional Application of and claims priority to U.S. Provisional Patent Application No. 63/370,508, filed on August 5, 2022.
  • the U.S. Provisional Patent Application is hereby incorporated by reference in its entirety.
  • the present invention relates to video coding system using SbTMVP (Subblock-based Temporal Motion Vector Prediction) .
  • the present invention relates to techniques to improve the coding efficiency for SbTMVP.
  • VVC Versatile video coding
  • JVET Joint Video Experts Team
  • MPEG ISO/IEC Moving Picture Experts Group
  • ISO/IEC 23090-3 2021
  • Information technology -Coded representation of immersive media -Part 3 Versatile video coding, published Feb. 2021.
  • VVC is developed based on its predecessor HEVC (High Efficiency Video Coding) by adding more coding tools to improve coding efficiency and also to handle various types of video sources including 3-dimensional (3D) video signals.
  • HEVC High Efficiency Video Coding
  • Fig. 1A illustrates an exemplary adaptive Inter/Intra video coding system incorporating loop processing.
  • Intra Prediction the prediction data is derived based on previously coded video data in the current picture.
  • Motion Estimation (ME) is performed at the encoder side and Motion Compensation (MC) is performed based on the result of ME to provide prediction data derived from other picture (s) and motion data.
  • Switch 114 selects Intra Prediction 110 or Inter-Prediction 112 and the selected prediction data is supplied to Adder 116 to form prediction errors, also called residues.
  • the prediction error is then processed by Transform (T) 118 followed by Quantization (Q) 120.
  • T Transform
  • Q Quantization
  • the transformed and quantized residues are then coded by Entropy Encoder 122 to be included in a video bitstream corresponding to the compressed video data.
  • the bitstream associated with the transform coefficients is then packed with side information such as motion and coding modes associated with Intra prediction and Inter prediction, and other information such as parameters associated with loop filters applied to underlying image area.
  • the side information associated with Intra Prediction 110, Inter prediction 112 and in-loop filter 130, are provided to Entropy Encoder 122 as shown in Fig. 1A. When an Inter-prediction mode is used, a reference picture or pictures have to be reconstructed at the encoder end as well.
  • the transformed and quantized residues are processed by Inverse Quantization (IQ) 124 and Inverse Transformation (IT) 126 to recover the residues.
  • the residues are then added back to prediction data 136 at Reconstruction (REC) 128 to reconstruct video data.
  • the reconstructed video data may be stored in Reference Picture Buffer 134 and used for prediction of other frames.
  • incoming video data undergoes a series of processing in the encoding system.
  • the reconstructed video data from REC 128 may be subject to various impairments due to a series of processing.
  • in-loop filter 130 is often applied to the reconstructed video data before the reconstructed video data are stored in the Reference Picture Buffer 134 in order to improve video quality.
  • deblocking filter (DF) may be used.
  • SAO Sample Adaptive Offset
  • ALF Adaptive Loop Filter
  • the loop filter information may need to be incorporated in the bitstream so that a decoder can properly recover the required information. Therefore, loop filter information is also provided to Entropy Encoder 122 for incorporation into the bitstream.
  • DF deblocking filter
  • SAO Sample Adaptive Offset
  • ALF Adaptive Loop Filter
  • Loop filter 130 is applied to the reconstructed video before the reconstructed samples are stored in the reference picture buffer 134.
  • the system in Fig. 1A is intended to illustrate an exemplary structure of a typical video encoder. It may correspond to the High Efficiency Video Coding (HEVC) system, VP8, VP9, H. 264 or VVC.
  • HEVC High Efficiency Video Coding
  • the decoder can use similar or portion of the same functional blocks as the encoder except for Transform 118 and Quantization 120 since the decoder only needs Inverse Quantization 124 and Inverse Transform 126.
  • the decoder uses an Entropy Decoder 140 to decode the video bitstream into quantized transform coefficients and needed coding information (e.g. ILPF information, Intra prediction information and Inter prediction information) .
  • the Intra prediction 150 at the decoder side does not need to perform the mode search. Instead, the decoder only needs to generate Intra prediction according to Intra prediction information received from the Entropy Decoder 140.
  • the decoder only needs to perform motion compensation (MC 152) according to Inter prediction information received from the Entropy Decoder 140 without the need for motion estimation.
  • an input picture is partitioned into non-overlapped square block regions referred as CTUs (Coding Tree Units) , similar to HEVC.
  • CTUs Coding Tree Units
  • Each CTU can be partitioned into one or multiple smaller size coding units (CUs) .
  • the resulting CU partitions can be in square or rectangular shapes.
  • VVC divides a CTU into prediction units (PUs) as a unit to apply prediction process, such as Inter prediction, Intra prediction, etc.
  • the VVC standard incorporates various new coding tools to further improve the coding efficiency over the HEVC standard.
  • various new coding tools some coding tools relevant to the present invention are reviewed as follows.
  • VVC supports the subblock-based temporal motion vector prediction (SbTMVP) method. Similar to the temporal motion vector prediction (TMVP) in HEVC, SbTMVP uses the motion field in the collocated picture to improve motion vector prediction and merge mode for CUs in the current picture. The same collocated picture used by TMVP is used for SbTMVP. SbTMVP differs from TMVP in the following two main aspects:
  • TMVP predicts motion at CU level but SbTMVP predicts motion at sub-CU level;
  • TMVP fetches the temporal motion vectors from the collocated block in the collocated picture (i.e., the collocated block is the bottom-right or centre block relative to the current CU)
  • SbTMVP applies a motion shift before fetching the temporal motion information from the collocated picture, where the motion shift is obtained from the motion vector from one of the spatial neighbouring blocks of the current CU.
  • the SbTMVP process is illustrated in Figs. 2A-B.
  • SbTMVP predicts the motion vectors of the sub-CUs within the current CU in two steps.
  • the spatial neighbour A1 in Fig. 2A is examined. If A1 has a motion vector that uses the collocated picture as its reference picture, this motion vector is selected to be the motion shift to be applied. If no such motion is identified, then the motion shift is set to (0, 0) .
  • the motion shift identified in Step 1 is applied (i.e. added to the current block’s coordinates) to obtain sub-CU level motion information (motion vectors and reference indices) from the collocated picture as shown in Fig. 2B.
  • the example in Fig. 2B assumes the motion shift is set to block A1’s motion, where frame 220 corresponds to the current picture and frame 230 corresponds to a reference picture (i.e., a collocated picture) .
  • the motion information of its corresponding block the smallest motion grid that covers the centre sample
  • the motion information for the sub-CU is used to derive the motion information for the sub-CU.
  • the motion information of the collocated sub-CU is identified, it is converted to the motion vectors and reference indices of the current sub-CU in a similar way as the TMVP process of HEVC, where temporal motion scaling is applied to align the reference pictures of the temporal motion vectors to those of the current CU.
  • the arrow (s) in each subblock of the collocated picture 230 correspond (s) to the motion vector (s) of a collocated subblock (thick-lined arrow for L0 MV and thin-lined arrow for L1 MV) .
  • the arrow (s) in each subblock correspond (s) to the scaled motion vector (s) of a current subblock (thick-lined arrow for L0 MV and thin-lined arrow for L1 MV) . If no motion information of the collocated sub-CU is available (e.g. an intra coded subblock) , a default motion is used.
  • a combined subblock based merge list which contains both SbTMVP candidate and affine merge candidates, is used for the signalling of subblock based merge mode.
  • the SbTMVP mode is enabled/disabled by a sequence parameter set (SPS) flag. If the SbTMVP mode is enabled, the SbTMVP predictor is added as the first entry of the list of subblock based merge candidates, and followed by the affine merge candidates.
  • SPS sequence parameter set
  • SbTMVP mode is only applicable to the CU with both width and height are larger than or equal to 8.
  • the encoding processing flow of the additional SbTMVP merge candidate is the same as for the other merge candidates, that is, for each CU in P or B slice, an additional RD check is performed to decide whether to use the SbTMVP candidate.
  • Non-Adjacent Motion Vector Prediction (NAMVP)
  • JVET-L0399 a coding tool referred as Non-Adjacent Motion Vector Prediction (NAMVP)
  • JVET-L0399 Joint Video Exploration Team (JVET) of ITU-T SG 16 WP 3 and ISO/IEC JTC 1/SC 29/WG 11, 12th Meeting: Macao, CN, 3–12 Oct. 2018, Document: JVET-L0399
  • the non-adjacent spatial merge candidates are inserted after the TMVP (i.e., the temporal MVP) in the regular merge candidate list.
  • the pattern of spatial merge candidates is shown in Fig.
  • each small numbered box corresponds to a NAMVP candidate and the candidates are ordered (as shown by the number inside the square) according to the distance.
  • MP-DMVR Multi-Pass Decoder-Side Motion Vector Refinement
  • a multi-pass decoder-side motion vector refinement is applied.
  • bilateral matching (BM) is applied to the coding block.
  • BM is applied to each 16x16 subblock within the coding block.
  • MV in each 8x8 subblock is refined by applying bi-directional optical flow (BDOF) .
  • BDOF bi-directional optical flow
  • a refined MV is derived by applying BM to a coding block. Similar to decoder-side motion vector refinement (DMVR) , in the bi-prediction operation, a refined MV is searched around the two initial MVs (i.e., MV0 and MV1) in the reference picture lists L0 and L1. The refined MVs (i.e., MV0_pass1 and MV1_pass1) are derived around the initiate MVs based on the minimum bilateral matching cost between the two reference blocks in L0 and L1.
  • DMVR decoder-side motion vector refinement
  • BM performs local search to derive integer sample precision intDeltaMV.
  • the local search applies a 3 ⁇ 3 square search pattern to loop through the search range [–sHor, sHor] in the horizontal direction and [–sVer, sVer] in the vertical direction, wherein, the values of sHor and sVer are determined by the block dimension, and the maximum value of sHor and sVer is 8.
  • MRSAD cost function is applied to remove the DC effect of distortion between reference blocks.
  • the intDeltaMV local search is terminated. Otherwise, the current minimum cost search point becomes the new centre point of the 3 ⁇ 3 search pattern and continue to search for the minimum cost, until it reaches the end of the search range.
  • a refined MV is derived by applying BM to a 16 ⁇ 16 grid subblock. For each subblock, a refined MV is searched around the two MVs (e.g. MV0_pass1 and MV1_pass1) , obtained during the first pass, in the reference picture list L0 and L1.
  • the refined MVs i.e., MV0_pass2 (sbIdx2) and MV1_pass2 (sbIdx2)
  • MV0_pass2 sbIdx2
  • sbIdx2 MV1_pass2
  • BM For each subblock, BM performs full search to derive integer sample precision intDeltaMV.
  • the full search has a search range [–sHor, sHor] in the horizontal direction and [–sVer, sVer] in the vertical direction, wherein, the values of sHor and sVer are determined by the block dimension, and the maximum value of sHor and sVer is 8.
  • the search area (2*sHor + 1) * (2*sVer + 1) is divided up to 5 diamond shape search regions shown on Fig. 4, where the 5 search regions are shown in 5 different shades.
  • Each search region is assigned a costFactor, which is determined by the distance (intDeltaMV) between each search point and the starting MV, and each diamond region is processed in the order starting from the centre of the search area. In each region, the search points are processed in the raster scan order starting from the top left going to the bottom right corner of the region.
  • the int-pel full search is terminated; otherwise, the int-pel full search continues to the next search region until all search points are examined. Additionally, if the difference between the previous minimum cost and the current minimum cost in the iteration is less than a threshold that is equal to the area of the block, the search process terminates.
  • the existing VVC DMVR fractional sample refinement is further applied to derive the final deltaMV (sbIdx2) .
  • the refined MVs at second pass is then derived as:
  • ⁇ MV0_pass2 (sbIdx2) MV0_pass1 + deltaMV (sbIdx2)
  • ⁇ MV1_pass2 (sbIdx2) MV1_pass1 –deltaMV (sbIdx2)
  • a refined MV is derived by applying BDOF to an 8 ⁇ 8 grid subblock. For each 8 ⁇ 8 subblock, BDOF refinement is applied to derive scaled Vx and Vy without clipping starting from the refined MV of the parent subblock of the second pass.
  • the derived bioMv (Vx, Vy) is rounded to 1/16 sample precision and clipped between -32 and 32.
  • the refined MVs (e.g. MV0_pass3 (sbIdx3) and MV1_pass3 (sbIdx3) ) at third pass are derived as:
  • MV0_pass3 MV0_pass2 (sbIdx2) + bioMv
  • MV1_pass3 MV0_pass2 (sbIdx2) –bioMv
  • Adaptive decoder side motion vector refinement method is an extension of multi-pass DMVR which consists of the two new merge modes to refine MV only in one direction, either L0 or L1, of the bi prediction for the merge candidates that meet the DMVR conditions.
  • the multi-pass DMVR process is applied for the selected merge candidate to refine the motion vectors, however either MVD0 or MVD1 is set to zero in the 1 st pass (i.e. PU level) DMVR.
  • the merge candidates for the new merge mode are derived from spatial neighbouring coded blocks, TMVPs, non-adjacent blocks, HMVPs, pair-wise candidate, similar as in the regular merge mode. The difference is that only those meet DMVR conditions are added into the candidate list. The same merge candidate list is used by the two new merge modes. If the list of BM candidates contains the inherited BCW weights and DMVR process is unchanged except the computation of the distortion is made using MRSAD or MRSATD if the weights are non-equal and the bi-prediction is weighted with BCW weights. Merge index is coded as in regular merge mode.
  • Template matching is a decoder-side MV derivation method to refine the motion information of the current CU by finding the closest match between a template (i.e., top 514 and/or left 516 neighbouring blocks of the current CU 512) in the current picture 510 and a block (i.e., same size to the template, block 524 and 526) in a reference picture 520 as shown in Fig. 5.
  • a better MV is searched around the initial motion 530 of the current CU 512 of the current picture 510 within a [–8, +8] -pel search range 522 around location 528 in the reference picture 520 as pointed by the initial MV 530.
  • JVET-J0021 The template matching method in JVET-J0021 (Yi-Wen Chen, et al., “Description of SDR, HDR and 360° video coding technology proposal by Qualcomm and Technicolor –low and high complexity versions” , Joint Collaborative Team on Video Coding (JCT-VC) of ITU-T SG16 WP3 and ISO/IEC JTC 1/SC 29/WG11, 10th Meeting: San Diego, US, 10–20 Apr. 2018, Document: JVET-J0021) is used with the following modifications: search step size is determined based on AMVR mode and TM can be cascaded with bilateral matching process in merge modes.
  • JCT-VC Joint Collaborative Team on Video Coding
  • an MVP candidate is determined based on the template matching error to select the one which reaches the minimum difference between the current block template and the reference block template.
  • TM is then performed only for this particular MVP candidate for MV refinement.
  • TM refines this MVP candidate by using iterative diamond search starting from full-pel MVD precision (or 4-pel for 4-pel AMVR mode) within a [–8, +8] -pel search range.
  • the AMVP candidate may be further refined by using cross search with full-pel MVD precision (or 4-pel for 4-pel AMVR mode) , followed sequentially by half-pel and quarter-pel ones depending on AMVR mode as specified in Table 1. This search process ensures that the MVP candidate still keeps the same MV precision as indicated by the AMVR mode after the TM process. In the search process, if the difference between the previous minimum cost and the current minimum cost in the iteration is less than a threshold that is equal to the area of the block, the search process terminates.
  • TM may be performed all the way down to 1/8-pel MVD precision or skipping those beyond half-pel MVD precision, depending on whether the alternative interpolation filter (used for AMVR being a half-pel mode) is used according to merged motion information.
  • template matching may work as an independent process or an extra MV refinement process between block-based and subblock-based bilateral matching (BM) methods, depending on whether BM can be enabled or not according to its enabling condition check.
  • the merge candidates are adaptively reordered according to costs evaluated using template matching (TM) .
  • the reordering method can be applied to the regular merge mode, template matching (TM) merge mode, and affine merge mode (excluding the SbTMVP candidate) .
  • TM merge mode merge candidates are reordered before the refinement process.
  • merge candidates are divided into multiple subgroups.
  • the subgroup size is set to 5 for the regular merge mode and TM merge mode.
  • the subgroup size is set to 3 for the affine merge mode.
  • Merge candidates in each subgroup are reordered ascendingly according to cost values based on template matching. For ARMC-TM, the candidates in a subgroup are skipped if the subgroup satisfies the following 2 conditions: (1) the subgroup is the last subgroup and (2) the subgroup is not the first subgroup. For simplification, merge candidates in the last, but not the first subgroup, are not reordered.
  • the template matching cost of a merge candidate is measured as the sum of absolute differences (SAD) between samples of a template of the current block and their corresponding reference samples.
  • the template comprises a set of reconstructed samples neighbouring to the current block. Reference samples of the template are located by the motion information of the merge candidate.
  • a merge candidate When a merge candidate utilizes bi-directional prediction, the reference samples of the template of the merge candidate are also generated by bi-prediction as shown in Fig. 6.
  • block 612 corresponds to a current block in current picture 610
  • blocks 622 and 632 correspond to reference blocks in reference pictures 620 and 630 in list 0 and list 1 respectively.
  • Templates 614 and 616 are for current block 612
  • templates 624 and 626 are for reference block 622
  • templates 634 and 636 are for reference block 632.
  • Motion vectors 640, 642 and 644 are merge candidates in list 0 and motion vectors 660, 662 and 664 are merge candidates in list 1.
  • the above template comprises several sub-templates with the size of Wsub ⁇ 1
  • the left template comprises several sub-templates with the size of 1 ⁇ Hsub.
  • the motion information of the subblocks in the first row and the first column of current block is used to derive the reference samples of each sub-template.
  • block 712 corresponds to a current block in current picture 710
  • block 722 corresponds to a collocated block in reference picture 720.
  • Each small square in the current block and the collocated block corresponds to a subblock.
  • the dot-filled areas on the left and top of the current block correspond to template for the current block.
  • the boundary subblocks are labelled from A to G.
  • the arrow associated with each subblock corresponds to the motion vector of the subblock.
  • the reference subblocks (labelled as Aref to Gref) are located according to the motion vectors associated with the boundary subblocks.
  • a method and apparatus for video coding using SbTMVP are disclosed.
  • input data associated with a current block are received, where the input data comprise pixel data for the current block to be encoded at an encoder side or encoded data associated with the current block to be decoded at a decoder side.
  • a target motion shift is determined from one or more neighbouring blocks comprising at least one non-adjacent spatial neighbouring block of the current block.
  • a collocated block in a reference picture is determined based on the target motion shift.
  • Subblock motion information is derived for subblocks of the current block based on motion information of corresponding subblocks of the collocated block.
  • An SbTMVP candidate is generated for the current block based on the subblock motion information for the subblocks of the current block.
  • the current block is encoded or decoded by using a motion prediction set comprising the SbTMVP candidate.
  • the SbTMVP candidate is used for Skip mode, Merge mode, Direct mode, Intra mode, Inter mode, IBC mode or a combination thereof.
  • the target motion shift is derived from one or more motion shift candidates from said one or more neighbouring blocks based on one or more schemes involved with TM (Template Matching) , ARMC-TM (Adaptive Reordering of Merge Candidates with Template Matching) or both.
  • the target motion shift is derived by refining one motion shift candidate based on the TM with one or more templates of the current block.
  • two or more motion shift candidates from one or more neighbouring adjacent blocks of the current block, one or more neighbouring non-adjacent blocks of the current block or both are determined, and wherein said two or more motion shift candidates are reordered according to the ARMC-TM.
  • a first motion shift candidate in reordered two or more motion shift candidates is selected as the target motion shift.
  • a first motion shift candidate in reordered two or more motion shift candidates is refined by the TM and used as the target motion shift.
  • two or more motion shift candidates from one or more neighbouring adjacent blocks of the current block, one or more neighbouring non-adjacent blocks of the current block or both are determined, and wherein said two or more motion shift candidates are refined according to the TM and then reordered according to the ARMC-TM.
  • a first motion shift candidate in reordered two or more motion shift candidates is selected as the target motion shift.
  • a TM cost or ARMC-TM cost for said one motion shift candidate is set to a large value so that said one motion shift candidate is skipped from being selected as the target motion shift.
  • a default motion is refined by the TM or BM (Bilateral Matching) and used as the subblock motion information for the target subblock.
  • the default motion is refined by the TM with one or more templates of the current block.
  • the default motion is refined by the BM.
  • the default motion is refined by the TM and then by the BM.
  • the default motion is refined by the BM and then by the TM.
  • Fig. 1A illustrates an exemplary adaptive Inter/Intra video coding system incorporating loop processing.
  • Fig. 1B illustrates a corresponding decoder for the encoder in Fig. 1A.
  • Fig. 2A illustrates an example of subblock-based Temporal Motion Vector Prediction (SbTMVP) in VVC, where the spatial neighbouring blocks are checked for availability of motion information.
  • SBTMVP Temporal Motion Vector Prediction
  • Fig. 2B illustrates an example of SbTMVP for deriving sub-CU motion field by applying a motion shift from spatial neighbour and scaling the motion information from the corresponding collocated sub-CUs.
  • Fig. 3 illustrates an exemplary pattern of the non-adjacent spatial merge candidates.
  • Fig. 4 illustrates the 5 diamond shape search regions used for multi-pass decoder-side motion vector refinement.
  • Fig. 5 illustrates an example of template matching used to refine an initial MV by searching an area around the initial MV.
  • Fig. 6 illustrates an example of templates used for the current block and corresponding reference blocks to measure matching costs associated with merge candidates.
  • Fig. 7 illustrates an example of template and reference samples of the template for block with sub-block motion using the motion information of the subblocks of the current block.
  • Fig. 8 illustrates a flowchart of an exemplary video coding system that utilizes SbTMVP with motion shift derived from one or more neighbouring blocks including at least one non-adjacent neighbouring block according to an embodiment of the present invention.
  • the motion shift used to locate a temporal collocated block is then used to derive the motion information for the subblocks of the current block.
  • the motion shift is determined from a spatial neighbouring block.
  • a motion vector of non-adjacent spatial neighbouring blocks is used as motion shift.
  • the conventional approach only adjacent neighbouring blocks are used for deriving the motion shift. If no motion information is available from neighbouring block (e.g. none of the spatial neighbouring blocks being intra coded) , the motion shift is set to (0, 0) , which may not provide a good prediction. Therefore, the present invention proposes to allow motion information from one or more non-adjacent neighbouring blocks being used for deriving the motion shift. The motion shift based on non-adjacent spatial neighbouring blocks is expected to provide a better prediction.
  • the motion shift derivation is related to TM and/or ARMC-TM.
  • a motion shift is refined by TM with CU template.
  • Two or more motion vector of adjacent and/or non-adjacent spatial neighbouring blocks are reordered by ARMC-TM, then select the first one or more as the motion shift.
  • the reordering is performed according to the template matching cost associated with said two or more motion vectors.
  • the candidates are reordered with the smallest template matching cost first. In other words, the one with the smallest template matching cost is selected.
  • Two or more motion vector of adjacent and/or non-adjacent spatial neighbouring blocks are refined by TM, then reordered by ARMC-TM, then select the first one or more as the motion shift.
  • Two or more motion vector of adjacent and/or non-adjacent spatial neighbouring blocks are reordered by ARMC-TM, then select the first one or more as motion shift and then refined by TM.
  • TM or ARMC-TM In the process of TM or ARMC-TM, set the cost of TM or ARMC-TM associated with a motion vector to a large value or skip a motion vector if treating the motion vector as a motion shift would make a SbTMVP candidate not available (i.e., the default motion is not available) .
  • the default motion is further refined by TM and/or BM:
  • a default motion is refined by TM with CU template.
  • a default motion is refined by BM.
  • a default motion is refined by TM and then BM.
  • a default motion is refined by BM and then TM.
  • any of the SbTMVP methods with motion shift derived from one or more neighbouring blocks including at least one non-adjacent neighbouring block a described above can be implemented in encoders and/or decoders.
  • any of the proposed SbTMVP methods can be implemented in an inter coding module and/or a merge/AMVP candidate derivation module of an encoder (e.g. Inter Pred. 112 in Fig. 1A) , or a motion compensation module (e.g., MC 152 in Fig. 1B) and/or a merge/AMVP candidate derivation module of a decoder.
  • any of the proposed methods can be implemented as a circuit coupled to the inter coding module and/or a merge/AMVP candidate derivation module of an encoder and/or motion compensation module and/or a merge/AMVP candidate derivation module of the decoder.
  • the Inter-Pred. 112 and MC 152 are shown as individual processing units to support the SbTMVP methods, they may correspond to executable software or firmware codes stored on a media, such as hard disk or flash memory, for a CPU (Central Processing Unit) or programmable devices (e.g. DSP (Digital Signal Processor) or FPGA (Field Programmable Gate Array) ) .
  • Fig. 8 illustrates a flowchart of an exemplary video coding system that utilizes SbTMVP with motion shift derived from one or more neighbouring blocks including at least one non-adjacent neighbouring block according to an embodiment of the present invention.
  • the steps shown in the flowchart may be implemented as program codes executable on one or more processors (e.g., one or more CPUs) at the encoder side.
  • the steps shown in the flowchart may also be implemented based hardware such as one or more electronic devices or processors arranged to perform the steps in the flowchart.
  • input data associated with a current block are received in step 810, wherein the input data comprise pixel data for the current block to be encoded at an encoder side or encoded data associated with the current block to be decoded at a decoder side.
  • a target motion shift is determined from one or more neighbouring blocks comprising at least one non-adjacent spatial neighbouring block of the current block in step 820.
  • a collocated block in a reference picture is determined based on the target motion shift in step 830.
  • Subblock motion information is derived for subblocks of the current block based on motion information of corresponding subblocks of the collocated block in step 840.
  • An SbTMVP candidate is generated for the current block based on the subblock motion information for the subblocks of the current block in step 850.
  • the current block is encoded or decoded by using a motion prediction set comprising the SbTMVP candidate in step 860.
  • Embodiment of the present invention as described above may be implemented in various hardware, software codes, or a combination of both.
  • an embodiment of the present invention can be one or more circuit circuits integrated into a video compression chip or program code integrated into video compression software to perform the processing described herein.
  • An embodiment of the present invention may also be program code to be executed on a Digital Signal Processor (DSP) to perform the processing described herein.
  • DSP Digital Signal Processor
  • the invention may also involve a number of functions to be performed by a computer processor, a digital signal processor, a microprocessor, or field programmable gate array (FPGA) .
  • These processors can be configured to perform particular tasks according to the invention, by executing machine-readable software code or firmware code that defines the particular methods embodied by the invention.
  • the software code or firmware code may be developed in different programming languages and different formats or styles.
  • the software code may also be compiled for different target platforms.
  • different code formats, styles and languages of software codes and other means of configuring code to perform the tasks in accordance with the invention will not depart from the spirit and scope of the invention.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

L'invention concerne un procédé et un appareil de codage vidéo utilisant la prédiction de vecteurs de prédiction de mouvement temporels basé sur un sous-bloc avec réordonnancement et affinement (SbTMVP). Selon le procédé, un décalage de mouvement cible est déterminé à partir d'un ou de plusieurs bloc(s) voisin(s) comprenant au moins un bloc voisin spatial non adjacent du bloc courant. Un bloc colocalisé dans une image de référence est déterminé sur la base du décalage de mouvement cible. Une information de mouvement de sous-bloc est dérivée pour des sous-blocs du bloc courant sur la base d'information de mouvement de sous-blocs correspondants du bloc colocalisé. Un prédicteur candidat de prédiction SbTMVP est généré pour le bloc courant sur la base d'information de mouvement de sous-bloc pour les sous-blocs du bloc courant. Le bloc courant est codé ou décodé à l'aide d'un ensemble de prédiction de mouvement comprenant le prédicteur SbTMVP candidat.
PCT/CN2023/110903 2022-08-05 2023-08-03 Procédé et appareil de prédiction de vecteurs de mouvement temporel basée sur un sous-bloc avec réorganisation et affinement dans un codage vidéo Ceased WO2024027784A1 (fr)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202380056453.3A CN119654869A (zh) 2022-08-05 2023-08-03 视频编解码中基于子块的时间运动向量预测及其重新排序和细化的方法和装置

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US202263370508P 2022-08-05 2022-08-05
US63/370,508 2022-08-05

Publications (1)

Publication Number Publication Date
WO2024027784A1 true WO2024027784A1 (fr) 2024-02-08

Family

ID=89848534

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/110903 Ceased WO2024027784A1 (fr) 2022-08-05 2023-08-03 Procédé et appareil de prédiction de vecteurs de mouvement temporel basée sur un sous-bloc avec réorganisation et affinement dans un codage vidéo

Country Status (2)

Country Link
CN (1) CN119654869A (fr)
WO (1) WO2024027784A1 (fr)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200221108A1 (en) * 2019-01-05 2020-07-09 Tencent America LLC Method and apparatus for video coding
US20210227206A1 (en) * 2020-01-12 2021-07-22 Mediatek Inc. Video Processing Methods and Apparatuses of Merge Number Signaling in Video Coding Systems
US20210314596A1 (en) * 2020-03-29 2021-10-07 Alibaba Group Holding Limited Enhanced decoder side motion vector refinement
CN113574869A (zh) * 2019-03-17 2021-10-29 北京字节跳动网络技术有限公司 基于光流的预测细化
CN114503564A (zh) * 2019-08-05 2022-05-13 Lg电子株式会社 使用运动信息候选的视频编码/解码方法和设备及发送比特流的方法

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200221108A1 (en) * 2019-01-05 2020-07-09 Tencent America LLC Method and apparatus for video coding
CN113574869A (zh) * 2019-03-17 2021-10-29 北京字节跳动网络技术有限公司 基于光流的预测细化
CN114503564A (zh) * 2019-08-05 2022-05-13 Lg电子株式会社 使用运动信息候选的视频编码/解码方法和设备及发送比特流的方法
US20210227206A1 (en) * 2020-01-12 2021-07-22 Mediatek Inc. Video Processing Methods and Apparatuses of Merge Number Signaling in Video Coding Systems
US20210314596A1 (en) * 2020-03-29 2021-10-07 Alibaba Group Holding Limited Enhanced decoder side motion vector refinement

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
Y. HAN (QUALCOMM), W.-J. CHIEN (QUALCOMM), H. HUANG (QUALCOMM), M. KARCZEWICZ(QUALCOMM): "CE4.4.6: Improvement on Merge/Skip mode", 12. JVET MEETING; 20181003 - 20181012; MACAO; (THE JOINT VIDEO EXPLORATION TEAM OF ISO/IEC JTC1/SC29/WG11 AND ITU-T SG.16 ), 1 October 2018 (2018-10-01), XP030194210 *

Also Published As

Publication number Publication date
CN119654869A (zh) 2025-03-18

Similar Documents

Publication Publication Date Title
US11956462B2 (en) Video processing methods and apparatuses for sub-block motion compensation in video coding systems
TW201944781A (zh) 視訊編解碼系統中具有重疊塊運動補償的視訊處理的方法以及裝置
US20230328278A1 (en) Method and Apparatus of Overlapped Block Motion Compensation in Video Coding System
WO2023221993A1 (fr) Procédé et appareil d'affinement de vecteur de mouvement côté décodeur et de flux optique bidirectionnel pour codage vidéo
WO2023208224A1 (fr) Procédé et appareil de réduction de complexité de codage vidéo à l'aide de fusion avec mode mvd
WO2023134564A1 (fr) Procédé et appareil dérivant un candidat de fusion à partir de blocs codés affine pour un codage vidéo
US20240357081A1 (en) Method and Apparatus for Hardware-Friendly Template Matching in Video Coding System
US20240357084A1 (en) Method and Apparatus for Low-Latency Template Matching in Video Coding System
WO2024027784A1 (fr) Procédé et appareil de prédiction de vecteurs de mouvement temporel basée sur un sous-bloc avec réorganisation et affinement dans un codage vidéo
WO2024078331A1 (fr) Procédé et appareil de prédiction de vecteurs de mouvement basée sur un sous-bloc avec réorganisation et affinement dans un codage vidéo
WO2025007931A1 (fr) Procédés et appareil d'amélioration de codage vidéo par de multiples modèles
US12501066B2 (en) Video processing methods and apparatuses for sub-block motion compensation in video coding systems
WO2025007972A1 (fr) Procédés et appareil visant à obtenir des modèles de composante transversale à partir de voisins temporels et historiques pour un codage inter de chrominance
WO2024016844A1 (fr) Procédé et appareil utilisant une estimation de mouvement affine avec affinement de vecteur de mouvement de point de commande
WO2025077512A1 (fr) Procédés et appareil de mode de partition géométrique avec modes de sous-bloc
WO2023208189A1 (fr) Procédé et appareil pour l'amélioration d'un codage vidéo à l'aide d'une fusion avec un mode mvd avec mise en correspondance de modèles
WO2024149035A1 (fr) Procédés et appareil de compensation de mouvement affine pour limites de bloc et raffinement de mouvement dans un codage vidéo
WO2025153050A1 (fr) Procédés et appareil de prédiction intra basée sur un filtre avec de multiples hypothèses dans des systèmes de codage vidéo
US12501026B2 (en) Method and apparatus for low-latency template matching in video coding system
WO2025167844A1 (fr) Procédés et appareil de dérivation et d'héritage de modèle de compensation d'éclairage local destinés à un codage vidéo
WO2024141071A1 (fr) Procédé, appareil et support de traitement vidéo
WO2025026397A1 (fr) Procédés et appareil de codage vidéo utilisant une prédiction inter-composantes à hypothèses multiples pour un codage de chrominance
WO2025218694A1 (fr) Procédés et appareil de sélection du nombre de cadidats mvd en mode amvp avec sbtmvp pour le codage vidéo
WO2025152945A1 (fr) Procédés et appareil d'héritage de modèles inter-composantes sur la base d'un vecteur en cascade pour l'amélioration du codage vidéo d'une inter chrominance
WO2025152710A1 (fr) Procédés et appareil d'utilisation d'un appariement modèle-objet pour un affinement de mv ou un réordonnancement de candidats pour un codage vidéo

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23849488

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 202380056453.3

Country of ref document: CN

NENP Non-entry into the national phase

Ref country code: DE

WWP Wipo information: published in national office

Ref document number: 202380056453.3

Country of ref document: CN

122 Ep: pct application non-entry in european phase

Ref document number: 23849488

Country of ref document: EP

Kind code of ref document: A1