WO2021018081A1

WO2021018081A1 - Configurable coding tree unit size in video coding

Info

Publication number: WO2021018081A1
Application number: PCT/CN2020/104784
Authority: WO
Inventors: Zhipin DENG; Li Zhang; Kai Zhang; Hongbin Liu
Original assignee: Beijing ByteDance Network Technology Co Ltd; ByteDance Inc
Current assignee: Beijing ByteDance Network Technology Co Ltd; ByteDance Inc
Priority date: 2019-07-26
Filing date: 2020-07-27
Publication date: 2021-02-04
Anticipated expiration: 2022-01-26
Also published as: CN114175650A; WO2021018084A1; CN114175649A

Abstract

Methods, systems, and devices for coding or decoding video which include configurable coding tree units (CTUs) are described. An example method of video processing includes performing a conversion between a video comprising one or more video regions comprising one or more video blocks and a bitstream representation of the video, wherein the conversion conforms to a rule that allows use of different sizes for the one or more video blocks in different video regions of the one or more video regions for performing the conversion. Another example method of video processing includes determining, based on dimensions of a video block of a video region of a video exceeding a threshold, whether an indication for binary-tree (BT) splitting of the video block is signaled in a bitstream representation of the video, and performing, based on the determining, a conversion between the video and the bitstream representation.

Description

CONFIGURABLE CODING TREE UNIT SIZE IN VIDEO CODING

CROSS-REFERENCE TO RELATED APPLICATION

Under the applicable patent law and/or rules pursuant to the Paris Convention, this application is made to timely claim the priority to and benefits of International Patent Application No. PCT/CN2019/097926 filed on July 26, 2019. For all purposes under the law, the entire disclosures of the aforementioned applications are incorporated by reference as part of the disclosure of this application.

TECHNICAL FIELD

This document is related to video and image coding and decoding technologies.

BACKGROUND

Digital video accounts for the largest bandwidth use on the internet and other digital communication networks. As the number of connected user devices capable of receiving and displaying video increases, it is expected that the bandwidth demand for digital video usage will continue to grow.

SUMMARY

The disclosed techniques may be used by video or image decoder or encoder to performing coding or decoding of video in which a configurable coding tree unit size is used.

In an example aspect a method of video processing is disclosed. The method includes performing a conversion between a video comprising one or more video regions comprising one or more video blocks and a bitstream representation of the video, wherein the conversion conforms to a rule that allows use of different sizes for the one or more video blocks in different video regions of the one or more video regions for performing the conversion.

In another example aspect a method of video processing is disclosed. The method includes determining, based on a size of a video block of a video region of a video exceeding a threshold, that the video block is split using a quadtree-based splitting until a size condition is met and an indication of the quadtree-based splitting is excluded from a bitstream representation of the video, and performing, based on the determining, a conversion between the video and the bitstream representation.

In yet another example aspect a method of video processing is disclosed. The method includes determining, based on dimensions of a video block of a video region of a video exceeding a threshold, whether an indication for ternary-tree (TT) splitting of the video block is signaled in a bitstream representation of the video, and performing, based on the determining, a conversion between the video and the bitstream representation.

In yet another example aspect a method of video processing is disclosed. The method includes determining, based on dimensions of a video block of a video region of a video exceeding a threshold, whether an indication for binary-tree (BT) splitting of the video block is signaled in a bitstream representation of the video, and performing, based on the determining, a conversion between the video and the bitstream representation.

In yet another example aspect a method of video processing is disclosed. The method includes performing a conversion between a video comprising a video region comprising a video block and a bitstream representation of the video, wherein the conversion comprises an affine model parameters calculation, and wherein the affine model parameters calculation is based on dimensions of the video block.

In yet another example aspect a method of video processing is disclosed. The method includes performing a conversion between a video comprising a video region comprising a video block and a bitstream representation of the video, wherein the conversion comprises an application of an intra block copy (IBC) tool, and wherein a size of an IBC buffer is based on maximum configurable and/or allowable dimensions of the video block.

In yet another example aspect a method of video processing is disclosed. The method includes performing a conversion between a video comprising one or more video regions comprising one or more video blocks and a bitstream representation of the video, wherein the conversion is performed according to a rule that specifies a relationship between an indication of a size of a video block of the one or more video blocks and an indication of a maximum size of a transform block (TB) used for the video block.

In another example aspect, the above-described method may be implemented by a video encoder apparatus that comprises a processor.

In yet another example aspect, these methods may be embodied in the form of processor-executable instructions and stored on a computer-readable program medium.

These, and other, aspects are further described in the present document.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a block diagram of an example of a hardware platform used for implementing techniques described in the present document.

FIG. 2 is a block diagram of an example video processing system in which disclosed techniques may be implemented.

FIG. 3 is a flowchart for an example method of video processing.

FIG. 4 is a flowchart for another example method of video processing.

FIG. 5 is a flowchart for yet another example method of video processing.

FIG. 6 is a flowchart for yet another example method of video processing.

FIG. 7 is a flowchart for yet another example method of video processing.

FIG. 8 is a flowchart for yet another example method of video processing.

FIG. 9 is a flowchart for yet another example method of video processing.

DETAILED DESCRIPTION

The present document provides various techniques that can be used by a decoder of image or video bitstreams to improve the quality of decompressed or decoded digital video or images. For brevity, the term “video” is used herein to include both a sequence of pictures (traditionally called video) and individual images. Furthermore, a video encoder may also implement these techniques during the process of encoding in order to reconstruct decoded frames used for further encoding.

Section headings are used in the present document for ease of understanding and do not limit the embodiments and techniques to the corresponding sections. As such, embodiments from one section can be combined with embodiments from other sections.

1. Summary

This document is related to video coding technologies. Specifically, it is directed to configurable coding tree units (CTUs) in video coding and decoding. It may be applied to the existing video coding standard like HEVC, or the standard (Versatile Video Coding) to be finalized. It may be also applicable to future video coding standards or video codec.

2. Initial discussion

Video coding standards have evolved primarily through the development of the well-known ITU-T and ISO/IEC standards. The ITU-T produced H. 261 and H. 263, ISO/IEC produced MPEG-1 and MPEG-4 Visual, and the two organizations jointly produced the H. 262/MPEG-2 Video and H. 264/MPEG-4 Advanced Video Coding (AVC) and H. 265/HEVC standards. Since H. 262, the video coding standards are based on the hybrid video coding structure wherein temporal prediction plus transform coding are utilized. To explore the future video coding technologies beyond HEVC, Joint Video Exploration Team (JVET) was founded by VCEG and MPEG jointly in 2015. Since then, many new methods have been adopted by JVET and put into the reference software named Joint Exploration Model (JEM) . The JVET meeting is concurrently held once every quarter, and the new coding standard is targeting at 50%bitrate reduction as compared to HEVC. The new video coding standard was officially named as Versatile Video Coding (VVC) in the April 2018 JVET meeting, and the first version of VVC test model (VTM) was released at that time. As there are continuous effort contributing to VVC standardization, new coding techniques are being adopted to the VVC standard in every JVET meeting. The VVC working draft and test model VTM are then updated after every meeting. The VVC project is now aiming for technical completion (FDIS) at the July 2020 meeting.

2.1 CTU size in VVC

VTM-5.0 software allows 4 different CTU sizes: 16x16, 32x32, 64x64 and 128x128. However, at the July 2019 JVET meeting, the minimum CTU size was redefined to 32x32 due to the adoption of JVET-O0526. And the CTU size in VVC working draft 6 is encoded in the SPS header in a UE-encoded syntax element called log2_ctu_size_minus_5.

Below are the corresponding spec modifications in VVC draft 6 with the definition of Virtual pipeline data units (VPDUs) and the adoption of JVET-O0526.

7.3.2.3. Sequence parameter set RBSP syntax

7.4.3.3. Sequence parameter set RBSP semantics

…

log2_ctu_size_minus5plus 5 specifies the luma coding tree block size of each CTU. It is a requirement of bitstream conformance that the value of log2_ctu_size_minus5 be less than or equal to 2.

log2_min_luma_coding_block_size_minus2 plus 2 specifies the minimum luma coding block size.

The variables CtbLog2SizeY, CtbSizeY, MinCbLog2SizeY, MinCbSizeY, IbcBufWidthY, IbcBufWidthC and Vsize are derived as follows:

CtbLog2SizeY = log2_ctu_size_minus5+5 (7-15)

CtbSizeY = 1<<CtbLog2SizeY (7-16)

MinCbLog2SizeY = log2_min_luma_coding_block_size_minus2+2 (7-17)

MinCbSizeY = 1<<MinCbLog2SizeY (7-18)

IbcBufWidthY = 128*128/CtbSizeY (7-19)

IbcBufWidthC = IbcBufWidthY/SubWidthC (7-20)

VSize = Min (64, CtbSizeY) (7-21)

The variables CtbWidthC and CtbHeightC, which specify the width and height, respectively, of the array for each chroma CTB, are derived as follows:

– If chroma_format_idc is equal to 0 (monochrome) or separate_colour_plane_flag is equal to 1, CtbWidthC and CtbHeightC are both equal to 0.

– Otherwise, CtbWidthC and CtbHeightC are derived as follows:

CtbWidthC = CtbSizeY /SubWidthC (7-22)

CtbHeightC = CtbSizeY /SubHeightC (7-23)

For log2BlockWidth ranging from 0 to 4 and for log2BlockHeight ranging from 0 to 4, inclusive, the up-right diagonal and raster scan order array initialization process as specified in clause 6.5.2 is invoked with 1<<log2BlockWidth and 1<<log2BlockHeight as inputs, and the output is assigned to

DiagScanOrder [log2BlockWidth] [log2BlockHeight] and

RasterScanOrder [log2BlockWidth] [log2BlockHeight] .

…

slice_log2_diff_max_bt_min_qt_luma specifies the difference between the base 2 logarithm of the maximum size (width or height) in luma samples of a luma coding block that can be split using a binary split and the minimum size (width or height) in luma samples of a luma leaf block resulting from quadtree splitting of a CTU in the current slice. The value of slice_log2_diff_max_bt_min_qt_luma shall be in the range of 0 to CtbLog2SizeY-MinQtLog2SizeY, inclusive. When not present, the value of slice_log2_diff_max_bt_min_qt_luma is inferred as follows:

– If slice_type equal to 2 (I) , the value of slice_log2_diff_max_bt_min_qt_luma is inferred to be equal to sps_log2_diff_max_bt_min_qt_intra_slice_luma

– Otherwise (slice_type equal to 0 (B) or 1 (P) ) , the value of slice_log2_diff_max_bt_min_qt_luma is inferred to be equal to sps_log2_diff_max_bt_min_qt_inter_slice.

slice_log2_diff_max_tt_min_qt_luma specifies the difference between the base 2 logarithm of the maximum size (width or height) in luma samples of a luma coding block that can be split using a ternary split and the minimum size (width or height) in luma samples of a luma leaf block resulting from quadtree splitting of a CTU in in the current slice. The value of slice_log2_diff_max_tt_min_qt_luma shall be in the range of 0 to CtbLog2SizeY-MinQtLog2SizeY, inclusive. When not present, the value of slice_log2_diff_max_tt_min_qt_luma is inferred as follows:

– If slice_type equal to 2 (I) , the value of slice_log2_diff_max_tt_min_qt_luma is inferred to be equal to sps_log2_diff_max_tt_min_qt_intra_slice_luma

– Otherwise (slice_type equal to 0 (B) or 1 (P) ) , the value of slice_log2_diff_max_tt_min_qt_luma is inferred to be equal to sps_log2_diff_max_tt_min_qt_inter_slice.

slice_log2_diff_min_qt_min_cb_chroma specifies the difference between the base 2 logarithm of the minimum size in luma samples of a chroma leaf block resulting from quadtree splitting of a chroma CTU with treeType equal to DUAL_TREE_CHROMA and the base 2 logarithm of the minimum coding block size in luma samples for chroma CUs with treeType equal to DUAL_TREE_CHROMA in the current slice. The value of slice_log2_diff_min_qt_min_cb_chroma shall be in the range of 0 to CtbLog2SizeY-MinCbLog2SizeY, inclusive. When not present, the value of slice_log2_diff_min_qt_min_cb_chroma is inferred to be equal to sps_log2_diff_min_qt_min_cb_intra_slice_chroma.

slice_max_mtt_hierarchy_depth_chroma specifies the maximum hierarchy depth for coding units resulting from multi-type tree splitting of a quadtree leaf with treeType equal to DUAL_TREE_CHROMA in the current slice. The value of slice_max_mtt_hierarchy_depth_chroma shall be in the range of 0 to CtbLog2SizeY-MinCbLog2SizeY, inclusive. When not present, the values of slice_max_mtt_hierarchy_depth_chroma is inferred to be equal to sps_max_mtt_hierarchy_depth_intra_slices_chroma.

slice_log2_diff_max_bt_min_qt_chroma specifies the difference between the base 2 logarithm of the maximum size (width or height) in luma samples of a chroma coding block that can be split using a binary split and the minimum size (width or height) in luma samples of a chroma leaf block resulting from quadtree splitting of a chroma CTUwith treeType equal to DUAL_TREE_CHROMA in the current slice. The value of slice_log2_diff_max_bt_min_qt_chroma shall be in the range of 0 to CtbLog2SizeY-MinQtLog2SizeC, inclusive. When not present, the value of slice_log2_diff_max_bt_min_qt_chroma is inferred to be equal to sps_log2_diff_max_bt_min_qt_intra_slice_chroma

slice_log2_diff_max_tt_min_qt_chroma specifies the difference between the base 2 logarithm of the maximum size (width or height) in luma samples of a chroma coding block that can be split using a ternary split and the minimum size (width or height) in luma samples of a chroma leaf block resulting from quadtree splitting of a chroma CTUwith treeType equal to DUAL_TREE_CHROMA in the current slice. The value of slice_log2_diff_max_tt_min_qt_chroma shall be in the range of 0 to CtbLog2SizeY-MinQtLog2SizeC, inclusive. When not present, the value of slice_log2_diff_max_tt_min_qt_chroma is inferred to be equal to sps_log2_diff_max_tt_min_qt_intra_slice_chroma

The variables MinQtLog2SizeY, MinQtLog2SizeC, MinQtSizeY, MinQtSizeC, MaxBtSizeY, MaxBtSizeC, MinBtSizeY, MaxTtSizeY, MaxTtSizeC, MinTtSizeY, MaxMttDepthY and MaxMttDepthC are derived as follows:

MinQtLog2SizeY = MinCbLog2SizeY+slice_log2_diff_min_qt_min_cb_luma (7-86)

MinQtLog2SizeC = MinCbLog2SizeY+slice_log2_diff_min_qt_min_cb_chroma (7-87)

MinQtSizeY = 1<<MinQtLog2SizeY (7-88)

MinQtSizeC = 1<<MinQtLog2SizeC (7-89)

MaxBtSizeY = 1<< (MinQtLog2SizeY+slice_log2_diff_max_bt_min_qt_luma) (7-90)

MaxBtSizeC = 1<< (MinQtLog2SizeC+slice_log2_diff_max_bt_min_qt_chroma) (7-91)

MinBtSizeY = 1<<MinCbLog2SizeY (7-92)

MaxTtSizeY = 1<< (MinQtLog2SizeY+slice_log2_diff_max_tt_min_qt_luma) (7-93)

MaxTtSizeC = 1<< (MinQtLog2SizeC+slice_log2_diff_max_tt_min_qt_chroma) (7-94)

MinTtSizeY = 1<<MinCbLog2SizeY (7-95)

MaxMttDepthY = slice_max_mtt_hierarchy_depth_luma (7-96)

MaxMttDepthC = slice_max_mtt_hierarchy_depth_chroma (7-97)

2.2 Maximum transform size in VVC

In VVC Draft 5, the max transform size is signalled in the SPS but it is fixed as 64-length and not configurable. However, at the July 2019 JVET meeting, it was decided to enable the max luma transform size to be either 64 or 32 only with a flag at the SPS level. Max chroma transform size is derived from the chroma sampling ratio relative to the max luma transform size.

Below are the corresponding spec modifications in VVC draft 6 with the adoption of JVET-O05xxx.

7.3.2.3. Sequence parameter set RBSP syntax

7.4.3.3. Sequence parameter set RBSP semantics

…

sps_max_luma_transform_size_64_flagequal to 1 specifies that the maximum transform size in luma samples is equal to 64. sps_max_luma_transform_size_64_flagequal to 0 specifies that the maximum transform size in luma samples is equal to 32.

When CtbSizeY is less than 64, the value of sps_max_luma_transform_size_64_flag shall be equal to 0. The variables MinTbLog2SizeY, MaxTbLog2SizeY, MinTbSizeY, and MaxTbSizeY are derived as follows:

MinTbLog2SizeY = 2 (7-27)

MaxTbLog2SizeY = sps_max_luma_transform_size_64_flag? 6: 5 (7-28)

MinTbSizeY = 1<<MinTbLog2SizeY (7-29)

MaxTbSizeY = 1<<MaxTbLog2SizeY (7-30)

…

sps_sbt_max_size_64_flag equal to 0 specifies that the maximum CU width and height for allowing subblock transform is 32 luma samples. sps_sbt_max_size_64_flag equal to 1 specifies that the maximum CU width and height for allowing subblock transform is 64 luma samples.

MaxSbtSize = Min (MaxTbSizeY, sps_sbt_max_size_64_flag ? 64: 32) (7-31)

…

3.Examples of technical problems addressed by the disclosed technical solutions

There are several problems in the latest VVC working draft JVET-O2001-v11, which are described below.

1) In current VVC draft 6, the maximum transform size and CTU size are defined independently. E.g., CTU size could be 32, whereas transform size could be 64. It is desirable that the maximum transform size should be equal or smaller than the CTU size.

2) In current VVC draft 6, the block partition process depends on the maximum transform block size other than the VPDU size. Therefore, if the maximum transform block size is 32x32, in addition to prohibit 128x128 TT split and 64x128 vertical BT split, and 128x64 horizontal BT split to obey the VPDU rule, it further prohibits TT split for 64x64 block, prohibits vertical BT split for 32x64/16x64/8x64 coding block, and also prohibits horizontal BT split for 64x8/64x16/64x32 coding block, which may not efficient for coding efficiency.

3) Current VVC draft 6 allows CTU size equal to 32, 64, and 128. However, it is possible that the CTU size could be larger than 128. Thus some syntax elements need to be modified.

a) If larger CTU size is allowed, the block partition structure and the signaling of block split flags may be redesigned.

b) If larger CTU size is allowed, then some of the current design (e.g., affine parameters derivation, IBC prediction, IBC buffer size, merge triangle prediction, CIIP, regular merge mode, and etc. ) may be redesigned.

4) In current VVC draft 6, the CTU size is signaled at SPS level. However, since the adoption of reference picture resampling (a.k.a. adaptive resolution change) allows that the pictures could be coded with difference resolutions in one bistream, the CTU size may be different across multiple layers.

4. Example embodiments and techniques

The listing of solutions below should be considered as examples to explain some concepts. These items should not be interpreted in a narrow way. Furthermore, these items can be combined in any manner.

In this document, C=min (a, b) indicates that the C is equal to the minimum value between a and b.

In this document, the video unit size/dimension may be either the height or width of a video unit (e.g., width or height of a picture/sub-picture/slice/brick/tile/CTU/CU/CB/TU/TB) . If a video unit size is denoted by MxN, then M denotes the width and N denotes the height of the video unit.

In this document, “a coding block” may be a luma coding block, and/or a chroma coding block. The size/dimension in luma samples for a coding block may be used in this invention to represent the size/dimension measured in luma samples. For example, a 128x128 coding block (or a coding block size 128x128 in luma samples) may indicate a 128x128 luma coding block, and/or a 64x64 chroma coding block for 4: 2: 0 color format. Similarly, for 4: 2: 2 color format, it may refer to a 128x128 luma coding block and/or a 64x128 chroma coding block. For 4: 4: 4 color format, it may refer to a 128x128 luma coding block and/or a 128x128 chroma coding block.

Configurable CTU size related

1. It is proposed that different CTU dimensions (such as width and/or height) may be allowed for different video units such as Layers/Pictures/Subpictures/Slices/Tiles/Bricks.

a) In one example, one or multiple sets of CTU dimensions may be explicitly signaled at a video unit level such as VPS/DPS/SPS/PPS/APS/Picture/Subpicture/Slice/Slice header/Tile/Brick level.

b) In one example, when the reference picture resampling (a.k.a. Adaptive Resolution Change) is allowed, the CTU dimensions may be different across difference layers.

i. For example, the CTU dimensions of an inter-layer picture may be implicitly derived according to the downsample/upsample scaling factor.

1. For example, if the signaled CTU dimensions for a base layer is M×N (such as M=128 and N=128) and the inter-layer coded picture is resampled by a scaling factor S in width and a scaling factor T in height, which may be larger or smaller than 1 (such as S=1/4 and T=1/2 denoting the inter-layer coded picture is downsampled by 4 times in width and downsamled by 2 times in height) , then the CTU dimensions in the inter-layer coded picture may be derived to (M×S) × (N×T) , or (M/S) × (N/T) .

ii. For example, different CTU dimensions may be explicitly signalled for multiple layers at video unit level, e.g., for inter-layer resampling pictures/subpictures, the CTU dimensions may be signaled at

VPS/DPS/SPS/PPS/APS/picture/subpicture/Slice/Slice header/Tile/Brick level which is different from the base-layer CTU size.

2. It is proposed that whether TT or BT split is allowed or not may be dependent on VPDU dimensions (such as width and/or height) . Suppose VPDU is with dimension VSize in luma samples, and the coding tree block is with dimension CtbSizeY in luma samples.

a) In one example, VSize = min (M, CtbSizeY) . M is an integer value such as 64.

b) In one example, whether TT or BT split is allowed or not may be independent of the maximum transform size.

c) In one example, TT split may be disabled when a coding block width or height in luma samples is greater than min (VSize, maxTtSize) .

i. In one example, when maximum transform size is equal to 32x32 but VSize is equal to 64x64, TT split may be disabled for 128x128/128x64/64x128 coding block.

ii. In one example, when maximum transform size is equal to 32x32 but VSize is equal to 64x64, TT split may be allowed for 64x64 coding block.

d) In one example, vertical BT split may be disabled when a coding block width in luma samples is less than or equal to VSize, but its height in luma samples is greater than VSize.

i. In one example, in case of maximum transform size 32x32 but VPDU size equal to 64x64, vertical BT split may be disabled for 64x128 coding block.

ii. In one example, in case of maximum transform size 32x32 but VPDU size equal to 64x64, vertical BT split may be allowed for 32x64/16x64/8x64 coding block.

e) In one example, vertical BT split may be disabled when a coding block exceeds the Picture/Subpicture width in luma samples, but its height in luma samples is greater than VSize.

i. Alternatively, horizontal BT split may be allowed when a coding block exceeds the Picture/Subpicture width in luma samples.

f) In one example, horizontal BT split may be disabled when a coding block width in luma samples is greater than VSize, but its height in luma samples is less than or equal to VSize.

i. In one example, in case of maximum transform size 32x32 but VPDU size equal to 64x64, vertical BT split may be disabled for 128x64 coding block.

ii. In one example, in case of maximum transform size 32x32 but VPDU size equal to 64x64, horizontal BT split may be allowed for 64x8/64x16/64x32 coding block.

g) In one example, horizontal BT split may be disabled when a coding block exceeds the Picture/Subpicture height in luma samples, but its width in luma samples is greater than VSize.

i. Alternatively, vertical BT split may be allowed when a coding block exceeds the Picture/Subpicture height in luma samples.

h) In one example, when TT or BT split is disabled, the TT or BT split flag may be not signaled and implicitly derived to be zero.

i. Alternatively, when TT and/or BT split is enabled, the TT and/or BT split flag may be explicitly signaled in the bitstream.

ii. Alternatively, when TT or BT split is disabled, the TT or BT split flag may be signaled but ignored by the decoder.

iii. Alternatively, when TT or BT split is disabled, the TT or BT split flag may be signaled but it must be zero in a conformance bitstream.

3. It is proposed that the CTU dimensions (such as width and/or height) may be larger than 128.

a) In one example, the signaled CTU dimensions may be 256 or even larger (e.g., log2_ctu_size_minus5 may be equal to 3 or larger) .

b) In one example, the derived CTU dimensions may be 256 or even larger.

i. For example, the derived CTU dimensions for resampling pictures/subpictures may be larger than 128.

4. It is proposed that when larger CTU dimensions is allowed (such as CTU width and/or height is larger than 128) , then the QT split flag may be inferred to be true and the QT split may be recursively applied till the dimension of split coding block reach a specified value (e.g., a specified value may be set to the maximum transform block size, or 128, or 64, or 32) .

a) In one example, the recursive QT split may be implicitly conducted without signaling, until the split coding block size reach the maximum transform block size.

b) In one example, when CTU 256x256 is applied to dual tree, then the QT split flag may be not signalled for a coding block larger than maximum transform block size, and the QT split may be forced to be used for the coding block until the split coding block size reach the maximum transform block size.

5. It is proposed that TT split flag may be conditionally signalled for CU/PU dimensions (width and/or height) larger than 128.

a) In one example, both horizontal and vertical TT split flags may be signalled for a 256x256 CU.

b) In one example, vertical TT split but not horizontal TT split may be signalled for a 256x128/256x64 CU/PU.

c) In one example, horizontal TT split but not vertical TT split may be signalled for a 128x256/64x256 CU/PU.

d) In one example, when TT split flag is prohibited for CU dimensions larger than 128, then it may not be signalled and implicitly derived as zero.

i. In one example, horizontal TT split may be prohibited for 256x128/256x64 CU/PU.

ii. In one example, vertical TT split may be prohibited for 128x256/64x256 CU/PU.

6. It is proposed that BT split flag may be conditionally signalled for CU/PU dimensions (width and/or height) larger than 128.

a) In one example, both horizontal and vertical BT split flags may be signalled for 256x256/256x128/128x256 CU/PU.

b) In one example, horizontal BT split flag may be signaled for 64x256 CU/PU.

c) In one example, vertical BT split flag may be signaled for 256x64 CU/PU.

d) In one example, when BT split flag is prohibited for CU dimension larger than 128, then it may be not signalled and implicitly derived as zero.

i. In one example, vertical BT split may be prohibited for Kx256 CU/PU (such as K is equal to or smaller than 64 in luma samples) , and the vertical BT split flag may be not signaled and derived as zero.

1. For example, in the above case, vertical BT split may be prohibited for 64x256 CU/PU.

2. For example, in the above case, vertical BT split may be prohibited to avoid 32x256 CU/PU at picture/subpicture boundaries.

ii. In one example, vertical BT split may be prohibited when a coding block exceeds the Picture/Subpicture width in luma samples, but its height in luma samples is greater than M (such as M=64 in luma samples) .

iii. In one example, horizontal BT split may be prohibited for 256xK (such as K is equal to or smaller than 64 in luma samples) coding block, and the horizontal BT split flag may be not signaled and derived as zero.

1. For example, in the above case, horizontal BT split may be prohibited for 256x64 coding block.

2. For example, in the above case, horizontal BT split may be prohibited to avoid 256x32 coding block at picture/subpicture boundaries.

iv. In one example, horizontal BT split may be prohibited when a coding block exceeds the Picture/Subpicture height in luma samples, but its width in luma samples is greater than M (such as M=64 in luma samples) .

7. It is proposed that the affine model parameters calculation may be dependent on the CTU dimensions.

a) In one example, the derivation of scaled motion vectors, and/or control point motion vectors in affine prediction may be dependent on the CTU dimensions.

8. It is proposed that the intra block copy (IBC) buffer may depend on the maximum configurable/allowable CTU dimensions.

a) For example, the IBC buffer width in luma samples may be equal to NxN divided by CTU width (or height) in luma samples, wherein N may be the maximum configurable CTU size in luma samples, such as N = 1 << (log2_ctu_size_minus5 + 5) .

9. It is proposed that a set of specified coding tool (s) may be disabled for a large CU/PU, where the large CU/PU refers to a CU/PU where either the CU/PU width or CU/PU height is larger than N (such as N=64 or 128) .

a) In one example, the above-mentioned specified coding tool (s) may be palette, and/or intra block copy (IBC) , and/or intra skip mode, and/or triangle prediction mode, and/or CIIP mode, and/or regular merge mode, and/or decoder side motion derivation, and/or bi-directional optical flow, and/or prediction refinement based optical flow, and/or affine prediction, and/or sub-block based TMVP, and etc.

i. Alternatively, screen content coding tool (s) such as palette and/or intra block copy (IBC) mode may be applied to large CU/PU.

b) In one example, it may explicitly use syntax constraint for disabling the specified coding tool (s) for a large CU/PU.

i. For example, Palette/IBC flag may explicitly signal for a CU/PU which is not a large CU/PU.

c) In one example it may use bitstream constraint for disabling specified coding tool (s) for a large CU/PU.

Configurable maximum transform size related

10. It is proposed that the maximum TU size may be dependent on CTU dimensions (width and/or height) , or CTU dimensions may be dependent on the maximum TU size

a) In one example, abitstream constraint may be used that the maximum TU size shall be smaller or equal to the CTU dimensions.

b) In one example, the signaling of maximum TU size may depend on the CTU dimensions.

i. For example, when the CTU dimensions are smaller than N (e.g. N=64) , the signaled maximum TU size must be smaller than N.

ii. For example, when the CTU dimensions are smaller than N (e.g. N=64) , the indication of whether the maximum luma transform size is 64 or 32 (e.g., sps_max_luma_transform_size_64_flag) may not be signaled and the maximum luma transform size may be derived as 32 implicitly.

5. Embodiments

Newly added parts are enclosed in bolded double parentheses, e.g., { {a} } denotes that “a” has been added, whereas the deleted parts from VVC working draft are enclosed in bolded double brackets, e.g., [ [b] ] denotes that “b” has been deleted. The modifications are based on the latest VVC working draft (JVET-O2001-v11)

5.1 An example embodiment#1

The embodiment below is for the invented method that making the maximum TU size dependent on the CTU size.

7.4.3.3. Sequence parameter set RBSP semantics

…

sps_max_luma_transform_size_64_flag equal to 1 specifies that the maximum transform size in luma samples is equal to 64. sps_max_luma_transform_size_64_flag equal to 0 specifies that the maximum transform size in luma samples is equal to 32.

MinTbLog2SizeY = 2 (7-27)

MaxTbLog2SizeY = sps_max_luma_transform_size_64_flag? 6: 5 (7-28)

MinTbSizeY = 1<<MinTbLog2SizeY (7-29)

MaxTbSizeY = { {min (CtbSizeY, 1<<MaxTbLog2SizeY) } } (7-30)

…

5.2 An example embodiment#2

The embodiment below is for the invented method that making the TT and BT split process dependent on the VPDU size.

6.4.2Allowed binary split process

The variable allowBtSplit is derived as follows:

….

– Otherwise, if all of the following conditions are true, allowBtSplit is set equal to FALSE

– btSplit is equal to SPLIT_BT_VER

– cbHeight is greater than [ [MaxTbSizeY] ] { {VSize} }

– x0+cbWidth is greater than pic_width_in_luma_samples

– btSplit is equal to SPLIT_BT_HOR

– cbWidth is greater than [ [MaxTbSizeY] ] { {VSize} }

– y0+cbHeight is greater than pic_height_in_luma_samples

…

– Otherwise if all of the following conditions are true, allowBtSplit is set equal to FALSE

– btSplit is equal to SPLIT_BT_VER

– cbWidth is less than or equal to [ [MaxTbSizeY] ] { {VSize} }

– cbHeight is greater than [ [MaxTbSizeY] ] { {VSize} }

– btSplit is equal to SPLIT_BT_HOR

– cbWidth isgreater than [ [MaxTbSizeY] ] { {VSize} }

– cbHeight is less than or equal to [ [MaxTbSizeY] ] { {VSize} }

6.4.3Allowed ternary split process

…

The variable allowTtSplit is derived as follows:

– If one or more of the following conditions are true, allowTtSplit is set equal to FALSE:

– cbSize is less than or equal to 2*MinTtSizeY

– cbWidth is greater than Min ( [ [MaxTbSizeY] ] { {VSize} } , maxTtSize)

– cbHeight is greater than Min ( [ [MaxTbSizeY] ] { {VSize} } , maxTtSize)

– mttDepth is greater than or equal to maxMttDepth

– x0+cbWidth is greater than pic_width_in_luma_samples

– y0+cbHeight is greater than pic_height_in_luma_samples

– treeType is equal to DUAL_TREE_CHROMA and (cbWidth/SubWidthC) * (cbHeight/SubHeightC) is less than or equal to 32

– treeType is equal to DUAL_TREE_CHROMA and modeType is equal to INTRA

– Otherwise, allowTtSplit is set equal to TRUE.

5.3 An example embodiment#3

The embodiment below is for the invented method that making the affine model parameters calculation dependent on the CTU size.

7.4.3.3. Sequence parameter set RBSP semantics

…

log2_ctu_size_minus5plus 5 specifies the luma coding tree block size of each CTU. It is a requirement of bitstream conformance that the value of log2_ctu_size_minus5 be less than or equal to [ [2] ] { {3 (could be larger per specified) } } .

…

CtbLog2SizeY = log2_ctu_size_minus5+5

{ {CtbLog2SizeY is used to indicate the CTU size in luma sampales of current video unit. When a single CTU size is used for the current video unit, the CtbLog2SizeY is calculated by above equation. Otherwise, CtbLog2SizeY may depend on the actual CTU size which may be explicit signalled or implicit derived for the current video unit. (an example) } }

…

8.5.5.5 Derivation process for luma affine control point motion vectors from a neighbouring block

…

The variables mvScaleHor, mvScaleVer, dHorX and dVerX are derived as follows:

– If isCTUboundary is equal to TRUE, the following applies:

mvScaleHor=MvLX [xNb] [yNb+nNbH-1] [0] << [ [7] ] { {CtbLog2SizeY} } (8-533)

mvScaleVer=MvLX [xNb] [yNb+nNbH-1] [1] << [ [7] ] { {CtbLog2SizeY} } (8-534)

…

– Otherwise (isCTUboundary is equal to FALSE) , the following applies:

mvScaleHor=CpMvLX [xNb] [yNb] [0] [0] << [ [7] ] { {CtbLog2SizeY} } (8-537)

mvScaleVer=CpMvLX [xNb] [yNb] [0] [1] << [ [7] ] { {CtbLog2SizeY} } (8-538)

…

8.5.5.6 Derivation process for constructed affine control point motion vector merging candidates

…

When availableFlagCorner [0] is equal to TRUE and availableFlagCorner [2] is equal to TRUE, the following applies:

– For X being replaced by 0 or 1, the following applies:

– The variable availableFlagLX is derived as follows:

– If all of following conditions are TRUE, availableFlagLX is set equal to TRUE:

– predFlagLXCorner [0] is equal to 1

– predFlagLXCorner [2] is equal to 1

– refIdxLXCorner [0] is equal to refIdxLXCorner [2]

– Otherwise, availableFlagLX is set equal to FALSE.

– When availableFlagLX is equal to TRUE, the following applies:

– The second control point motion vector cpMvLXCorner [1] is derived as follows:

cpMvLXCorner [1] [0] = (cpMvLXCorner [0] [0] << [ [7] ] { {CtbLog2SizeY} } ) + ( (cpMvLXCorner [2] [1] -cpMvLXCorner [0] [1] ) (8-606)

<<( [ [7] ] { {CtbLog2SizeY} } +Log2 (cbHeight/cbWidth) ) )

cpMvLXCorner [1] [1] = (cpMvLXCorner [0] [1] << [ [7] ] { {CtbLog2SizeY} } ) + ( (cpMvLXCorner [2] [0] -cpMvLXCorner [0] [0] ) (8-607)

<<( [ [7] ] { {CtbLog2SizeY} } +Log2 (cbHeight/cbWidth) ) )

8.5.5.9 Derivation process for motion vector arrays from affine control point motion vectors

The variables mvScaleHor, mvScaleVer, dHorX and dVerX are derived as follows:

mvScaleHor=cpMvLX [0] [0] << [ [7] ] { {CtbLog2SizeY} } (8-665)

mvScaleVer=cpMvLX [0] [1] << [ [7] ] { {CtbLog2SizeY} } (8-666)

FIG. 1 is a block diagram of a video processing apparatus 1300. The apparatus 1300 may be used to implement one or more of the methods described herein. The apparatus 1300 may be embodied in a smartphone, tablet, computer, Internet of Things (IoT) receiver, and so on. The apparatus 1300 may include one or more processors 1302, one or more memories 1304 and video processing hardware 1306. The processor (s) 1302 may be configured to implement one or more methods described in the present document. The memory (memories) 1304 may be used for storing data and code used for implementing the methods and techniques described herein. The video processing hardware 1306 may be used to implement, in hardware circuitry, some techniques described in the present document. In some embodiments, the hardware 1306 may be at least partially internal to the processors 1302, e.g., a graphics co-processor.

In some embodiments, the video coding methods may be implemented using an apparatus that is implemented on a hardware platform as described with respect to FIG. 1.

Some embodiments of the disclosed technology include making a decision or determination to enable a video processing tool or mode. In an example, when the video processing tool or mode is enabled, the encoder will use or implement the tool or mode in the processing of a block of video, but may not necessarily modify the resulting bitstream based on the usage of the tool or mode. That is, a conversion from the block of video to the bitstream representation of the video will use the video processing tool or mode when it is enabled based on the decision or determination. In another example, when the video processing tool or mode is enabled, the decoder will process the bitstream with the knowledge that the bitstream has been modified based on the video processing tool or mode. That is, a conversion from the bitstream representation of the video to the block of video will be performed using the video processing tool or mode that was enabled based on the decision or determination.

Some embodiments of the disclosed technology include making a decision or determination to disable a video processing tool or mode. In an example, when the video processing tool or mode is disabled, the encoder will not use the tool or mode in the conversion of the block of video to the bitstream representation of the video. In another example, when the video processing tool or mode is disabled, the decoder will process the bitstream with the knowledge that the bitstream has not been modified using the video processing tool or mode that was enabled based on the decision or determination.

FIG. 2 is a block diagram showing an example video processing system 200 in which various techniques disclosed herein may be implemented. Various implementations may include some or all of the components of the system 200. The system 200 may include input 202 for receiving video content. The video content may be received in a raw or uncompressed format, e.g., 8 or 10 bit multi-component pixel values, or may be in a compressed or encoded format. The input 202 may represent a network interface, a peripheral bus interface, or a storage interface. Examples of network interface include wired interfaces such as Ethernet, passive optical network (PON) , etc. and wireless interfaces such as Wi-Fi or cellular interfaces.

The system 200 may include a coding component 204 that may implement the various coding or encoding methods described in the present document. The coding component 204 may reduce the average bitrate of video from the input 202 to the output of the coding component 204 to produce a coded representation of the video. The coding techniques are therefore sometimes called video compression or video transcoding techniques. The output of the coding component 204 may be either stored, or transmitted via a communication connected, as represented by the component 206. The stored or communicated bitstream (or coded) representation of the video received at the input 202 may be used by the component 208 for generating pixel values or displayable video that is sent to a display interface 210. The process of generating user-viewable video from the bitstream representation is sometimes called video decompression. Furthermore, while certain video processing operations are referred to as “coding” operations or tools, it will be appreciated that the coding tools or operations are used at an encoder and corresponding decoding tools or operations that reverse the results of the coding will be performed by a decoder.

Examples of a peripheral bus interface or a display interface may include universal serial bus (USB) or high definition multimedia interface (HDMI) or Displayport, and so on. Examples of storage interfaces include SATA (serial advanced technology attachment) , PCI, IDE interface, and the like. The techniques described in the present document may be embodied in various electronic devices such as mobile phones, laptops, smartphones or other devices that are capable of performing digital data processing and/or video display.

FIG. 3 is a flowchart for a method 300 of video processing. The method 300 includes, at operation 310, performing a conversion between a video comprising one or more video regions comprising one or more video blocks and a bitstream representation of the video, the conversion conforming to a rule that allows use of different sizes for the one or more video blocks in different video regions of the one or more video regions for performing the conversion.

FIG. 4 is a flowchart for a method 400 of video processing. The method 400 includes, at operation 410, determining, based on a size of a video block of a video region of a video exceeding a threshold, that the video block is split using a quadtree-based splitting until a size condition is met and an indication of the quadtree-based splitting is excluded from a bitstream representation of the video.

The method 400 includes, at operation 420, performing, based on the determining, a conversion between the video and the bitstream representation.

FIG. 5 is a flowchart for a method 500 of video processing. The method 500 includes, at operation 510, determining, based on dimensions of a video block of a video region of a video exceeding a threshold, whether an indication for ternary-tree (TT) splitting of the video block is signaled in a bitstream representation of the video.

The method 500 includes, at operation 520, performing, based on the determining, a conversion between the video and the bitstream representation.

FIG. 6 is a flowchart for a method 600 of video processing. The method 600 includes, at operation 610, determining, based on dimensions of a video block of a video region of a video exceeding a threshold, whether an indication for binary-tree (BT) splitting of the video block is signaled in a bitstream representation of the video.

The method 600 includes, at operation 620, performing, based on the determining, a conversion between the video and the bitstream representation.

FIG. 7 is a flowchart for a method 700 of video processing. The method 700 includes, at operation 710, performing a conversion between a video comprising a video region comprising a video block and a bitstream representation of the video, the conversion comprising an affine model parameters calculation, and the affine model parameters calculation being based on dimensions of the video block.

FIG. 8 is a flowchart for a method 800 of video processing. The method 800 includes, at operation 810, performing a conversion between a video comprising a video region comprising a video block and a bitstream representation of the video, the conversion comprising an application of an intra block copy (IBC) tool, and a size of an IBC buffer being based on maximum configurable and/or allowable dimensions of the video block.

FIG. 9 is a flowchart for a method 900 of video processing. The method 900 includes, at operation 910, performing a conversion between a video comprising one or more video regions comprising one or more video blocks and a bitstream representation of the video, the conversion being performed according to a rule that specifies a relationship between an indication of a size of a video block of the one or more video blocks and an indication of a maximum size of a transform block (TB) used for the video block.

In some embodiments, the following technical solutions may be implemented:

A1. A method of video processing, comprising performing a conversion between a video comprising one or more video regions comprising one or more video blocks and a bitstream representation of the video, wherein the conversion conforms to a rule that allows use of different sizes for the one or more video blocks in different video regions of the one or more video regions for performing the conversion.

A2. The method of solution A1, wherein the rule further specifies that a syntax element is included in the bitstream representation indicative of one or more sizes of video blocks permitted in the bitstream representation.

A3. The method of solution A2, wherein the syntax element is included in a sequence parameter set (SPS) .

A4. The method of solution A2, wherein the syntax element is included in a picture parameter set (PPS) .

A5. The method of solution A2, wherein the syntax element is included in a video parameter set (VPS) , a decoding parameter set (DPS) , an adaptation parameter set (APS) , a picture header, a subpicture header, a slice header, a tile header, or a brick header.

A6. The method of solution A1, wherein the one or more video regions correspond to video layers, and wherein the one or more video blocks correspond to coding tree units (CTUs) representing logical partitions used for coding the video into the bitstream representation.

A7. The method of solution A6, wherein the different sizes for the one or more video blocks are used in the video layers when a reference picture resampling tool is enabled for at least one of the one or more video regions.

A8. The method of solution A6, wherein at least one of the one or more video regions comprises an inter-layer picture or an intra-layer picture, and wherein the dimensions of the one or more video blocks for inter-layer referencing or intra-layer referencing are implicitly based on a scale factor.

A9. The method of solution A8, wherein the scale factor comprises an upsample scale factor or a downsample scale factor.

A10. The method of solution A8, wherein the scale factor is derived from a size of a current picture comprising the one or more blocks and a size of a reference picture associated with the current picture.

A11. The method of solution A8, wherein the scale factor is derived from one or more syntax elements in the bitstream representation.

A12. The method of solution A8, wherein a size of a video block of the one or more video blocks of an inter-layer picture or an intra-layer picture is M×N, wherein the inter-layer picture or the intra-layer picture is resampled by a first scale factor (S) in a width dimension and by a second scale factor (T) , wherein the dimensions of video blocks for inter-layer referencing or intra-layer referencing are (M×S) × (N×T) or (M/S) × (N/T) , and wherein M, N, S, and T are positive integers.

A13. The method of solution A8, wherein the different sizes for the one or more video blocks used in the video layers are signaled in the bitstream representation.

A14. The method of solution A13, wherein the different sizes are signaled in a sequence parameter set (SPS) or a picture parameter set (PPS) .

A15. The method of solution A14, wherein each of the different sizes is different from a size of a base-layer CTU.

A16. The method of solution A6, wherein the dimensions of the CTUs comprise a height and a width, and wherein the height and/or the width is greater than 128.

A17. The method of solution A6, wherein the dimensions of the CTUs comprise a height and a width, and wherein the height and/or the width is greater than or equal to 256.

A18. A method of video processing, comprising determining, based on a size of a video block of a video region of a video exceeding a threshold, that the video block is split using a quadtree-based splitting until a size condition is met and an indication of the quadtree-based splitting is excluded from a bitstream representation of the video; and performing, based on the determining, a conversion between the video and the bitstream representation.

A19. The method of solution A18, wherein the threshold is 128.

A20. The method of solution A18 or A19, wherein the size condition corresponds to a maximum transform block size of 64 or 32.

A21. The method of any of solutions A18 to A20, wherein the video block corresponds to a coding tree unit (CTU) representing a logical partition used for coding the video into the bitstream representation.

A22. A method of video processing, comprising determining, based on dimensions of a video block of a video region of a video exceeding a threshold, whether an indication for ternary-tree (TT) splitting of the video block is signaled in a bitstream representation of the video; and performing, based on the determining, a conversion between the video and the bitstream representation.

A23. The method of solution A22, wherein the threshold is 128.

A24. The method of solution A22 or 23, wherein the indication comprises a horizontal TT flag and a vertical TT flag when the dimensions of the video block are 256×256.

A25. The method of solution A22 or 23, wherein the indication consists of a vertical TT flag when the dimensions of the video block are 256×128 or 256×64.

A26. The method of solution A22 or 23, wherein the indication consists of a horizontal TT flag when the dimensions of the video block are 128×256 or 64×256.

A27. The method of any of solutions A22 to A26, wherein the video block is a coding unit (CU) or a prediction unit (PU) .

A28. A method of video processing, comprising determining, based on dimensions of a video block of a video region of a video exceeding a threshold, whether an indication for binary-tree (BT) splitting of the video block is signaled in a bitstream representation of the video; and performing, based on the determining, a conversion between the video and the bitstream representation.

A29. The method of solution A28, wherein the threshold is 128.

A30. The method of solution A28 or 29, wherein the indication comprises a horizontal TT flag and a vertical TT flag when the dimensions of the video block are 256×256, 256×128 or 128×256.

A31. The method of solution A28 or 29, wherein the indication consists of a horizontal TT flag when the dimensions of the video block are 64×256.

A32. The method of solution A28 or 29, wherein the indication consists of a vertical TT flag when the dimensions of the video block are 256×64.

A33. The method of any of solutions A28 to A32, wherein the video block is a coding unit (CU) or a prediction unit (PU) .

A34. A method of video processing, comprising performing a conversion between a video comprising a video region comprising a video block and a bitstream representation of the video, wherein the conversion comprises an affine model parameters calculation, and wherein the affine model parameters calculation is based on dimensions of the video block.

A35. The method of solution A34, wherein the affine model parameters calculation is part of an affine prediction process that further comprises a derivation of scaled motion vectors and/or control point motion vectors, and wherein the derivation is based on the dimensions of the video block.

A36. The method of solution A34 or A35, wherein the video block corresponds to a coding tree unit (CTU) representing a logical partition used for coding the video into the bitstream representation.

A37. A method of video processing, comprising performing a conversion between a video comprising a video region comprising a video block and a bitstream representation of the video, wherein the conversion comprises an application of an intra block copy (IBC) tool, and wherein a size of an IBC buffer is based on maximum configurable and/or allowable dimensions of the video block.

A38. The method of solution A37, wherein a width of the IBC buffer in luma samples is equal to N×N divided by a width or a height of the video block, wherein N×N is the maximum configurable dimensions of the video block in luma samples, and wherein N is an integer.

A39. The method of solution A38, wherein N = 1 << (log2_ctu_size_minus5 + 5) , wherein log2_ctu_size_minus5 denotes an indication of a coding tree unit (CTU) size.

A40. The method of any of solutions A37 to A39, wherein the video block corresponds to a coding tree unit (CTU) representing a logical partition used for coding the video into the bitstream representation.

A41. The method of any of solutions A1 to A40, wherein performing the conversion comprises generating the bitstream representation from the video.

A42. The method of any of solutions A1 to A40, wherein performing the conversion comprises generating the video from the bitstream representation.

A43. An apparatus in a video system comprising a processor and a non-transitory memory with instructions thereon, wherein the instructions upon execution by the processor, cause the processor to implement the method in any one of solutions A1 to A42.

A44. A computer program product stored on a non-transitory computer readable media, the computer program product including program code for carrying out the method in any one of solutions A1 to A42.

In some embodiments, the following technical solutions may be implemented:

B1. A method of video processing, comprising performing a conversion between a video comprising one or more video regions comprising one or more video blocks and a bitstream representation of the video, wherein the conversion is performed according to a rule that specifies a relationship between an indication of a size of a video block of the one or more video blocks and an indication of a maximum size of a transform block (TB) used for the video block.

B2. The method of solution B1, wherein the relationship specifies that the maximum size of the TB is based on the size of the video block.

B3. The method of solution B1, wherein the relationship specifies that the size of the video block is based on the maximum size of the TB.

B4. The method of solution B2 or B3, wherein the maximum size of the TB is smaller than or equal to the dimensions of the video block.

B5. The method of solution B2 or B3, wherein an inclusion of an indication of the maximum size of the TB in the bitstream representation is based on the dimensions of the video block.

B6. The method of solution B5, wherein at least one of the dimensions of the video block is smaller than N, wherein the indication of the maximum size of the TB indicates that the maximum size of the TB is smaller than N, and wherein N is a positive integer.

B7. The method of solution B6, wherein N = 64.

B8. The method of solution B5, wherein a maximum size of a luma transform block associated with the video region is 64 or 32.

B9. The method of solution B8, wherein the bitstream representation excludes an indication of the maximum size of the luma transform block when at least one of the dimensions of the video block is smaller than N, wherein the maximum size of the luma transform block is implicitly derived as 32, and wherein N is a positive integer.

B10. The method of solution B9, wherein N = 64.

B11. The method of any of solutions B1 to B10, wherein the video block corresponds to a coding tree block (CTB) representing a logical partition used for coding the video into the bitstream representation.

B12. The method of any of solutions B1 to B10, wherein the video block corresponds to a luma coding tree block (CTB) representing a logical partition used for coding a luma component of the video into the bitstream representation.

B13. The method of any of solutions B1 to B10, wherein the indication of the size of the video block corresponds to a syntax element or a variable representing whether a size of the luma coding tree block (CTB) is greater than 32.

B14. The method of any of solutions B1 to B10, wherein the indication of the size of the video block corresponds to a syntax element or a variable representing whether a size of the luma coding tree block (CTB) is greater than or equal to 64.

B15. The method of any of solutions B1 to B10, wherein the maximum size of the transform block corresponds to a maximum size of a luma transform block.

B16. The method of any of solutions B1 to B10, wherein the indication of the maximum size of the transform block corresponds to a syntax element or a variable representing whether a maximum size of a luma transform block is equal to 64.

B17. The method of solution B16, wherein the syntax element is a flag.

B18. The method of any of solutions B1 to B17, wherein performing the conversion comprises generating the bitstream representation from the video region.

B19. The method of any of solutions B1 to B17, wherein performing the conversion comprises generating the video region from the bitstream representation.

B20. An apparatus in a video system comprising a processor and a non-transitory memory with instructions thereon, wherein the instructions upon execution by the processor, cause the processor to implement the method in any one of solutions B1 to B19.

B21. A computer program product stored on a non-transitory computer readable media, the computer program product including program code for carrying out the method in any one of solutions B1 to B19.

The disclosed and other solutions, examples, embodiments, modules and the functional operations described in this document can be implemented in digital electronic circuitry, or in computer software, firmware, or hardware, including the structures disclosed in this document and their structural equivalents, or in combinations of one or more of them. The disclosed and other embodiments can be implemented as one or more computer program products, i.e., one or more modules of computer program instructions encoded on a computer readable medium for execution by, or to control the operation of, data processing apparatus. The computer readable medium can be a machine-readable storage device, a machine-readable storage substrate, a memory device, a composition of matter effecting a machine-readable propagated signal, or a combination of one or more them. The term “data processing apparatus” encompasses all apparatus, devices, and machines for processing data, including by way of example a programmable processor, a computer, or multiple processors or computers. The apparatus can include, in addition to hardware, code that creates an execution environment for the computer program in question, e.g., code that constitutes processor firmware, a protocol stack, a database management system, an operating system, or a combination of one or more of them. A propagated signal is an artificially generated signal, e.g., a machine-generated electrical, optical, or electromagnetic signal, that is generated to encode information for transmission to suitable receiver apparatus.

Acomputer program (also known as a program, software, software application, script, or code) can be written in any form of programming language, including compiled or interpreted languages, and it can be deployed in any form, including as a stand-alone program or as a module, component, subroutine, or other unit suitable for use in a computing environment. A computer program does not necessarily correspond to a file in a file system. A program can be stored in a portion of a file that holds other programs or data (e.g., one or more scripts stored in a markup language document) , in a single file dedicated to the program in question, or in multiple coordinated files (e.g., files that store one or more modules, sub programs, or portions of code) . A computer program can be deployed to be executed on one computer or on multiple computers that are located at one site or distributed across multiple sites and interconnected by a communication network.

The processes and logic flows described in this document can be performed by one or more programmable processors executing one or more computer programs to perform functions by operating on input data and generating output. The processes and logic flows can also be performed by, and apparatus can also be implemented as, special purpose logic circuitry, e.g., an FPGA (field programmable gate array) or an ASIC (application specific integrated circuit) .

Processors suitable for the execution of a computer program include, by way of example, both general and special purpose microprocessors, and any one or more processors of any kind of digital computer. Generally, a processor will receive instructions and data from a read only memory or a random-access memory or both. The essential elements of a computer are a processor for performing instructions and one or more memory devices for storing instructions and data. Generally, a computer will also include, or be operatively coupled to receive data from or transfer data to, or both, one or more mass storage devices for storing data, e.g., magnetic, magneto optical disks, or optical disks. However, a computer need not have such devices. Computer readable media suitable for storing computer program instructions and data include all forms of non-volatile memory, media and memory devices, including by way of example semiconductor memory devices, e.g., EPROM, EEPROM, and flash memory devices; magnetic disks, e.g., internal hard disks or removable disks; magneto optical disks; and CD ROM and DVD-ROM disks. The processor and the memory can be supplemented by, or incorporated in, special purpose logic circuitry.

While this patent document contains many specifics, these should not be construed as limitations on the scope of any subject matter or of what may be claimed, but rather as descriptions of features that may be specific to particular embodiments of particular techniques. Certain features that are described in this patent document in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination. Moreover, although features may be described above as acting in certain combinations and even initially claimed as such, one or more features from a claimed combination can in some cases be excised from the combination, and the claimed combination may be directed to a subcombination or variation of a subcombination.

Similarly, while operations are depicted in the drawings in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order, or that all illustrated operations be performed, to achieve desirable results. Moreover, the separation of various system components in the embodiments described in this patent document should not be understood as requiring such separation in all embodiments.

Only a few implementations and examples are described and other implementations, enhancements and variations can be made based on what is described and illustrated in this patent document.

Claims

A method of video processing, comprising:

performing a conversion between a video comprising one or more video regions comprising one or more video blocks and a bitstream representation of the video,

wherein the conversion conforms to a rule that allows use of different sizes for the one or more video blocks in different video regions of the one or more video regions for performing the conversion.
The method of claim 1, wherein the rule further specifies that a syntax element is included in the bitstream representation indicative of one or more sizes of video blocks permitted in the bitstream representation.
The method of claim 2, wherein the syntax element is included in a sequence parameter set (SPS) .
The method of claim 2, wherein the syntax element is included in a picture parameter set (PPS) .
The method of claim 2, wherein the syntax element is included in a video parameter set (VPS) , a decoding parameter set (DPS) , an adaptation parameter set (APS) , a picture header, a subpicture header, a slice header, a tile header, or a brick header.
The method of claim 1, wherein the one or more video regions correspond to video layers, and wherein the one or more video blocks correspond to coding tree units (CTUs) representing logical partitions used for coding the video into the bitstream representation.
The method of claim 6, wherein the different sizes for the one or more video blocks are used in the video layers when a reference picture resampling tool is enabled for at least one of the one or more video regions.
The method of claim 6, wherein at least one of the one or more video regions comprises an inter-layer picture or an intra-layer picture, and wherein the dimensions of the one or more video blocks for inter-layer referencing or intra-layer referencing are implicitly based on a scale factor.
The method of claim 8, wherein the scale factor comprises an upsample scale factor or a downsample scale factor.
The method of claim 8, wherein the scale factor is derived from a size of a current picture comprising the one or more blocks and a size of a reference picture associated with the current picture.
The method of claim 8, wherein the scale factor is derived from one or more syntax elements in the bitstream representation.
The method of claim 8, wherein a size of a video block of the one or more video blocks of an inter-layer picture or an intra-layer picture is M×N, wherein the inter-layer picture or the intra-layer picture is resampled by a first scale factor (S) in a width dimension and by a second scale factor (T) , wherein the dimensions of video blocks for inter-layer referencing or intra-layer referencing are (M×S) × (N×T) or (M/S) × (N/T) , and wherein M, N, S, and T are positive integers.
The method of claim 8, wherein the different sizes for the one or more video blocks used in the video layers are signaled in the bitstream representation.
The method of claim 13, wherein the different sizes are signaled in a sequence parameter set (SPS) or a picture parameter set (PPS) .
The method of claim 14, wherein each of the different sizes is different from a size of a base-layer CTU.
The method of claim 6, wherein the dimensions of the CTUs comprise a height and a width, and wherein the height and/or the width is greater than 128.
The method of claim 6, wherein the dimensions of the CTUs comprise a height and a width, and wherein the height and/or the width is greater than or equal to 256.
A method of video processing, comprising:

determining, based on a size of a video block of a video region of a video exceeding a threshold, that the video block is split using a quadtree-based splitting until a size condition is met and an indication of the quadtree-based splitting is excluded from a bitstream representation of the video; and

performing, based on the determining, a conversion between the video and the bitstream representation.
The method of claim 18, wherein the threshold is 128.
The method of claim 18 or 19, wherein the size condition corresponds to a maximum transform block size of 64 or 32.
The method of any of claims 18 to 20, wherein the video block corresponds to a coding tree unit (CTU) representing a logical partition used for coding the video into the bitstream representation.
A method of video processing, comprising:

determining, based on dimensions of a video block of a video region of a video exceeding a threshold, whether an indication for ternary-tree (TT) splitting of the video block is signaled in a bitstream representation of the video; and

performing, based on the determining, a conversion between the video and the bitstream representation.
The method of claim 22, wherein the threshold is 128.
The method of claim 22 or 23, wherein the indication comprises a horizontal TT flag and a vertical TT flag when the dimensions of the video block are 256×256.
The method of claim 22 or 23, wherein the indication consists of a vertical TT flag when the dimensions of the video block are 256×128 or 256×64.
The method of claim 22 or 23, wherein the indication consists of a horizontal TT flag when the dimensions of the video block are 128×256 or 64×256.
The method of any of claims 22 to 26, wherein the video block is a coding unit (CU) or a prediction unit (PU) .
A method of video processing, comprising:

determining, based on dimensions of a video block of a video region of a video exceeding a threshold, whether an indication for binary-tree (BT) splitting of the video block is signaled in a bitstream representation of the video; and

performing, based on the determining, a conversion between the video and the bitstream representation.
The method of claim 28, wherein the threshold is 128.
The method of claim 28 or 29, wherein the indication comprises a horizontal TT flag and a vertical TT flag when the dimensions of the video block are 256×256, 256×128 or 128×256.
The method of claim 28 or 29, wherein the indication consists of a horizontal TT flag when the dimensions of the video block are 64×256.
The method of claim 28 or 29, wherein the indication consists of a vertical TT flag when the dimensions of the video block are 256×64.
The method of any of claims 28 to 32, wherein the video block is a coding unit (CU) or a prediction unit (PU) .
A method of video processing, comprising:

performing a conversion between a video comprising a video region comprising a video block and a bitstream representation of the video, wherein the conversion comprises an affine model parameters calculation, and wherein the affine model parameters calculation is based on dimensions of the video block.
The method of claim 34, wherein the affine model parameters calculation is part of an affine prediction process that further comprises a derivation of scaled motion vectors and/or control point motion vectors, and wherein the derivation is based on the dimensions of the video block.
The method of claim 34 or 35, wherein the video block corresponds to a coding tree unit (CTU) representing a logical partition used for coding the video into the bitstream representation.
A method of video processing, comprising:

performing a conversion between a video comprising a video region comprising a video block and a bitstream representation of the video, wherein the conversion comprises an application of an intra block copy (IBC) tool, and wherein a size of an IBC buffer is based on maximum configurable and/or allowable dimensions of the video block.
The method of claim 37, wherein a width of the IBC buffer in luma samples is equal to N×N divided by a width or a height of the video block, wherein N×N is the maximum configurable dimensions of the video block in luma samples, and wherein N is an integer.
The method of claim 38, wherein N = 1 << (log2_ctu_size_minus5 + 5) , wherein log2_ctu_size_minus5 denotes an indication of a coding tree unit (CTU) size.
The method of any of claims 37 to 39, wherein the video block corresponds to a coding tree unit (CTU) representing a logical partition used for coding the video into the bitstream representation.
The method of any of claims 1 to 40, wherein performing the conversion comprises generating the bitstream representation from the video.
The method of any of claims 1 to 40, wherein performing the conversion comprises generating the video from the bitstream representation.
An apparatus in a video system comprising a processor and a non-transitory memory with instructions thereon, wherein the instructions upon execution by the processor, cause the processor to implement the method recited in one or more of claims 1 to 42.
A computer program product stored on a non-transitory computer readable media, the computer program product including program code for carrying out the method recited in one or more of claims 1 to 42.