WO2025218682A1

WO2025218682A1 - Video decoder and methods for decoding and signaling a video bit stream for performing a dual-tree partitioning technology

Info

Publication number: WO2025218682A1
Application number: PCT/CN2025/089194
Authority: WO
Inventors: Chia-Ming Tsai; Tzu-Der Chuang; Ching-Yeh Chen; Chih-Wei Hsu
Original assignee: MediaTek Inc
Current assignee: MediaTek Inc
Priority date: 2024-04-16
Filing date: 2025-04-16
Publication date: 2025-10-23
Anticipated expiration: 2026-10-16

Abstract

A method for decoding the video bit stream includes receiving the video bit stream associated with a coding tree unit (CTU) including a coding unit (CU) at a k-th depth node, identifying a first syntax element from the video bit stream, identifying a second syntax element from the video bit stream, and configuring the video decoder based on these two syntax elements, and decoding the CTU to generate the video data. The CU includes a plurality of coding blocks (CBs). The first syntax element indicates if a CU at a (k+1) th depth node is to be further split by using a dual-tree partitioning method. The second syntax element indicates that a first CB of the CU at the (k+1) th depth node is to be further split and a second CB of the CU at the (k+1) th depth node is not to be further split.

Description

VIDEO DECODER AND METHODS FOR DECODING AND SIGNALING A VIDEO BIT STREAM FOR PERFORMING A DUAL-TREE PARTITIONING TECHNOLOGY

BACKGROUND OF THE INVENTION 1. FIELD OF THE INVENTION

The present invention presents a method for decoding a video bit stream, a method for signaling syntax element into the video bit stream, and a video decoder, and more particularly, a method for decoding a video bit stream, a method for signaling syntax element into the video bit stream, and a video decoder capable of adaptively partitioning tree structure-based blocks.

2. DESCRIPTION OF THE PRIOR ART

In video coding, a picture is typically partitioned into coding tree units (CTUs) , and each CTU can be recursively split into smaller coding units (CUs) . The partitioning structure is represented by a tree structure denoted as a coding tree to adapt to various local characteristics of the picture. For a picture that has three sample arrays, a CTU consists of an N×N block of luma samples together with two corresponding blocks of chroma samples.

In HEVC, coding tree units (CTUs) are split into many coding units (CUs) by using a quaternary-tree (QT) structure denoted as the coding tree to adapt to various local characteristics. The decision of whether to code a picture area using inter-picture (temporal) or intra-picture (spatial) prediction is made at the leaf CU level. Each leaf CU can be further split into one, two, or four PUs according to the PU splitting type. Inside one PU, the same prediction process is applied, and the relevant information is transmitted to the decoder on a PU basis. After obtaining the residual block by applying the prediction process based on the PU splitting type, a leaf CU can be partitioned into transform units (TUs) according to another quaternary-tree structure similar to the coding tree for the CU. One of key feature of the High-Efficiency Video Coding (HEVC) structure is that it has multiple partition conceptions including CU, PU, and TU.

In Versatile Video Coding (VVC) , a quadtree with nested multi-type tree (MTT) using binary and ternary splits segmentation structure replaces the concepts of multiple partition unit types, i.e. it removes the separation of the CU, PU, and TU concepts except as needed for CUs that have a size too large for the maximum transform length, and supports more flexibility for CU partition shapes. In the coding tree structure, a CU can have either a square or rectangular shape. A coding tree unit (CTU) is first partitioned by a quaternary tree (a. k. a. quadtree, QT) structure. Then the quaternary tree blocks can be further partitioned by a multi-type tree structure. As shown in FIG. 8, there are four splitting types in multi-type tree structure, vertical binary splitting (SPLIT_BT_VER) , horizontal binary splitting (SPLIT_BT_HOR) , vertical ternary splitting (SPLIT_TT_VER) , and horizontal ternary splitting (SPLIT_TT_HOR) . The multi-type tree blocks are called coding units (CUs) , and unless the CU is too large for the maximum transform length, this segmentation is used for prediction and transform processing without any further partitioning. This means that, in most cases, the CU, PU, and TU have the same block size in the quadtree with nested multi-type tree coding block structure. The exception occurs when the maximum supported transform length is smaller than the width or height of the color component of the CU. FIG. 9 illustrates the signaling mechanism of the partition splitting information in quadtree with nested multi-type tree coding tree structure. A coding tree unit (CTU) is treated as the root of a quaternary tree and is first partitioned by a quaternary tree structure. Each quaternary tree block (when sufficiently large to allow it) is then further partitioned by a multi-type tree structure. In quadtree with nested multi-type tree coding tree structure, for each CU node, a flag (split_cu_flag) is signaled to indicate whether the node is further partitioned. If the current CU node is a quadtree CU node, another flag (split_qt_flag) is signaled for whether it's a QT partitioning or MTT partitioning mode. When a node is partitioned with MTT partitioning mode, yet another flag (mtt_split_cu_vertical_flag) is signaled to indicate the splitting direction, and then the other flag (mtt_split_cu_binary_flag) is signaled to indicate whether the split is a binary split or a ternary split. Based on the values of mtt_split_cu_vertical_flag and mtt_split_cu_binary_flag, the multi-type tree slitting mode (MttSplitMode) of a CU is derived as shown in Table 1.

Table 1 –MttSplitMode derivation based on multi-type tree syntax elements.

FIG. 10 shows a CTU divided into multiple CUs with a quadtree and nested multi-type tree coding block structure, where the bold block edges represent quadtree partitioning and the remaining edges represent multi-type tree partitioning. The quadtree with nested multi-type tree partition provides a content-adaptive coding tree structure comprised of CUs. The size of the CU may be as large as the CTU or as small as 4×4 in units of luma samples. For the case of the 4: 2: 0 chroma format, the maximum chroma CB size is 64×64 and the minimum size chroma CB consists of 16 chroma samples.

In VVC, the maximum supported luma transform size is 64×64 and the maximum supported chroma transform size is 32×32. When the width or height of the CB is larger than the maximum transform width or height, the CB is automatically split in the horizontal and/or vertical direction to meet the transform size restriction in that direction.

The following parameters are defined for the quadtree with nested multi-type tree coding tree scheme. These parameters are specified by SPS syntax elements and can be further refined by picture header syntax elements.

CTU size: the root node size of a quaternary tree

MinQTSize: the minimum allowed quaternary tree block size

MaxBtSize: the maximum allowed binary tree root node size

MaxTtSize: the maximum allowed ternary tree root node size

MaxMttDepth: the maximum allowed hierarchy depth of multi-type tree splitting from a quadtree leaf

MinCbSize: the minimum allowed coding block node size

In one example of the quadtree with nested multi-type tree coding tree structure, the CTU size is set as 128×128 luma samples with two corresponding 64×64 blocks of 4: 2: 0 chroma samples, the MinQTSize is set as 16×16, the MaxBtSize is set as 128×128 and MaxTtSize is set as 64×64, the MinCbsize (for both width and height) is set as 4×4, and the MaxMttDepth is set as 4. The quaternary tree partitioning is applied to the CTU first to generate quaternary tree blocks. The quaternary tree blocks may have a size from 16×16 (i.e., the MinQTSize) to 128×128 (i.e., the CTU size) . If the leaf QT node is 128×128, it will not be further split by the binary tree since the size exceeds the MaxBtSize and MaxTtSize (i.e., 64×64) . Otherwise, the leaf QT node could be further partitioned by the multi-type tree. Therefore, the quaternary tree block is also the root node for the multi-type tree and it has multi-type tree depth (mttDepth) as 0. When the multi-type tree depth reaches MaxMttDepth (i.e., 4) , no further splitting is considered. When the multi-type tree node has a width equal to MinCbsize, no further horizontal splitting is considered. Similarly, when the multi-type tree node has a height equal to MinCbsize, no further vertical splitting is considered.

In VVC, the coding tree scheme supports the ability of the luma and chroma to have a separate block tree structure in specific condition. For P and B slices, the luma and chroma CTBs in one CTU have to share the same coding tree structure. However, for I slices, the luma and chroma can have separate block tree structures. When separate block tree mode is applied, luma CTB is partitioned into CBs by one coding tree structure, and the chroma CTBs are partitioned into chroma CBs by another coding tree structure. This means that a CU in an I slice may consist of a coding block of the luma component or coding blocks of two chroma components, and a CU in a P or B slice always consists of coding blocks of all three color components unless the video is monochrome.

In conventional single-tree coding, both luma and chroma components of a CU are partitioned identically. That is, the luma tree structure and the chroma tree structure are the same. However, luma and chroma components can exhibit different characteristics. For example, chroma components tend to demonstrate less variation than luma components. Consequently, it may be more efficient to employ different tree structures for luma and chroma components. Current VVC technology introduces dual-tree coding to enable separated tree structures for luma and chroma components. However, the dual-tree coding is applied to the entire CU according to respective single flags (one for luma CB and another for Chroma CBs) , which may lack sufficient flexibility. Further, the traditional dual-tree signaling methods necessitate that the decoder receives all the signaling syntax elements before it can proceed with configuration, potentially reducing efficiency. As a result, a delay occurs when the decoder is compelled to wait for all the syntax information before initiating the configuration process, which may decelerate the overall decoding process.

SUMMARY OF THE INVENTION

In an embodiment, a method for decoding a video bit stream associated with video data by a video decoder is disclosed. The method for decoding the video bit stream comprises receiving the video bit stream associated with a coding tree unit (CTU) comprising a coding unit (CU) at a k-th depth node, determining a first syntax element from the video bit stream, determining a second syntax element from the video bit stream, and configuring the video decoder based on at least the first syntax element and the second syntax element, and decoding the CTU to generate the video data. The CU comprises a plurality of coding blocks (CBs) , wherein k is an integer greater than or equal to 0. The first syntax element indicates if a CU at a (k+1) th depth node is to be further split by using a dual-tree partitioning method. The second syntax element is a single-bit flag for indicating that a first CB of the CU at the (k+1) th depth node is to be further split and a second CB of the CU at the (k+1) th depth node is not to be further split.

In another embodiment, a method for signaling syntax element into a video bit stream associated with video data by a video encoder is disclosed. The method for signaling a video bit stream comprises receiving the video data associated with a CTU comprising a CU at a k-th depth node, determining a first syntax element for indicating if a CU at a (k+1) th depth node is to be further split by using a dual-tree partitioning method, determining a single-bit flag as a second syntax element for indicating a first CB of the CU at the (k+1) depth node is to be further split and a second CB of the CU at the (k+1) depth node is not to be further split, and signaling the first syntax element and the second syntax element into the video bit stream. The CU comprises CBs, wherein k is an integer greater than or equal to 0.

In another embodiment, a video decoder for decoding a video bit stream associated with video data is disclosed. The video decoder comprises a parsing stage module and a reconstructive stage module. The parsing stage module is configured to analyze syntax elements in the video bit stream. The reconstructive stage module is linked to the parsing stage. The parsing stage module receives the video bit stream associated with a CTU comprising a CU at a k-th depth node. The CU comprises a plurality of CBs, wherein k is an integer greater than or equal to 0. The parsing stage module determines a first syntax element from the video bit stream. The first syntax element indicates if a CU at a (k+1) th depth node is to be further split by using a dual-tree partitioning method. The parsing stage module determines a second syntax element from the video bit stream. The second syntax element is a single-bit flag for indicating that a first CB of the CU at the (k+1) th depth node is to be further split and a second CB of the CU at the (k+1) th depth node is not to be further split. The parsing stage module configures the reconstructive stage module based on at least the first syntax element and the second syntax element. The reconstructive stage module decodes the CTU to generate the video data.

These and other objectives of the present invention will no doubt become obvious to those of ordinary skill in the art after reading the following detailed description of the preferred embodiment that is illustrated in the various figures and drawings.

BRIEF DESCRIPTION OF THE DRAWINGS

Aspects of the present disclosure are best understood from the following detailed description when read with the accompanying figures. The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate several exemplary embodiments and, together with the corresponding descriptions, provide examples for explaining the disclosed embodiment consistent with the present disclosure and related principles. It is appreciable that the drawings are not necessarily in scale as some components may be shown to be out of proportion to the size in actual implementation to clearly illustrate the concept of the present disclosure.

FIG. 1 is a block diagram of a video decoder for decoding a video bit stream associated with video data according to an embodiment of the present invention.

FIG. 2A is a schematic diagram of splitting a coding unit from a k-th depth node to a (k+1) th depth node of the video decoder using a single syntax element in FIG. 1.

FIG. 2B is a schematic diagram of splitting the coding unit from the k-th depth node to a (k+1) th depth node of the video decoder using multiple syntax elements in FIG. 1.

FIG. 3 is a schematic diagram of splitting a chroma coding block of the coding unit from the (k+1) th depth node to a (k+2) th depth node of the video decoder in FIG. 1.

FIG. 4 is a flow chart of performing a method for signaling syntax element into the video bit stream associated with the video data by a video encoder.

FIG. 5A is a flow chart of performing a method for decoding the video bit stream associated with the video data using the single syntax element by the video decoder in FIG. 1.

FIG. 5B is a flow chart of performing a method for decoding the video bit stream associated with the video data using the multiple syntax elements by the video decoder in FIG. 1.

FIG. 6 illustrates an embodiment of the video encoder that may implement the method for signaling block partitioning information into a video bit stream.

FIG. 7 illustrates an embodiment of the video decoder for decoding a video bit stream associated with video data according to an embodiment of the present invention.

FIG. 8 illustrates four splitting types in multi-type tree structure according to the prior art.

FIG. 9 illustrates the signaling mechanism of the partition splitting information in quadtree with nested multi-type tree coding tree structure according to the prior art.

FIG. 10 shows a CTU divided into multiple CUs with a quadtree and nested multi-type tree coding block structure according to the prior art.

DETAILED DESCRIPTION

The following disclosure provides many different embodiments, or examples, for implementing different features of the provided subject matter. Specific examples of components and arrangements are described below to simplify the present disclosure. These are, of course, merely examples and are not intended to be limiting. Any variations, derivatives, and/or extensions based on teachings described herein are within the protective scope of the present disclosure. In some instances, well-known methods, procedures, components, and/or circuitry pertaining to one or more example implementations disclosed herein may be described at a relatively high level without detail, in order to avoid unnecessarily obscuring aspects of teachings of the present disclosure.

In this specification, "signaling" and "signaled" may refer to either embedding, inserting, receiving, or retrieving information within a bitstream about controlling a filter, such as enabling or disabling modes or other control parameters. It is also understood that a "class" denotes a category of elements, and an "active" flag signals that the corresponding filter/tool is in use.

In one embodiment, when doing block partitioning, single-tree partitioning method is firstly applied to partitioning a current CTU into many smaller CUs, and a syntax element can be used to indicate if the current CU (i.e., a leaf block of the single-tree) is further split by using a dual-tree partitioning method or not. However, the current application is not limited to, the dual-tree partitioning method can be firstly applied to partitioning a CTU block into many smaller CUs.

In yet another embodiment, when doing block partitioning, the split by using the dual-tree partitioning method can depends on prediction mode (e.g., intra mode, inter mode, IBC mode, or palette mode) of the current leaf block of the single-tree. If luma or chroma components are further split by using dual-tree partitioning method, the splitting type and splitting direction are further explicitly determined or implicitly derived. For example, if the current leaf block of the single-tree (or the CTU block) is coded by a target prediction mode, the dual-tree partitioning method can be used to split the current block into one or more smaller blocks, and the prediction mode of each smaller blocks are inferred as the target prediction mode thus no additional syntax element is signaled to indicate the prediction mode of each smaller blocks. For another example, if the current leaf block of the single-tree (or the CTU block) is coded by a non-target prediction mode, the further split by using the dual-tree partitioning method is not allowed. For another example, if the target prediction mode is an intra-mode, the non-target prediction mode can be an inter-mode, an IBC mode, or a palette mode. The person of skilled in the art would appreciate identifying various modes as the target mode (s) and the others are non-target modes as their need.

Further, in an embodiment of the application, by applying the dual partitioning method, if a CU block is to be split, the luma and chroma CB blocks can be split differently in both splitting type and splitting direction for flexibility. In other word, the splitting type applied for the luma CB block may be different from that applied for the chroma CB blocks. The splitting direction applied for the luma CB block may be different from that applied for the chroma CB blocks. The splitting type includes vertical binary splitting (SPLIT_BT_VER) , horizontal binary splitting (SPLIT_BT_HOR) , vertical ternary splitting (SPLIT_TT_VER) , and horizontal ternary splitting (SPLIT_TT_HOR) , but not limited to. The splitting direction includes vertical and horizontal, but not limited to.

FIG. 1 is a block diagram of a video decoder 100 for decoding a video bit stream associated with video data according to an embodiment of the present invention. The video decoder 100 addresses the issues associated with dual-tree coding in video processing, notably within a target slice that applies the target prediction mode (e.g. inter-slices, says B-slices and P-slices) , to improve the signaling efficiency and flexibility. One of the advantages of the video decoder 100 is its adaptive dual-tree usage in block partitioning. The video decoder 100 identifies a syntax element to indicate whether the current block is to be further split using a dual-tree partitioning method. The video decoder 100 identifies another syntax element for indicating whether either luma (luminance) or chroma (color) components are to be further split or not. In the embodiment, by applying the dual-tree partitioning method, the decoder 100 improves bandwidth and decoding resource utilization. Furthermore, since two syntax elements are introduced for splitting the current block across luma and chroma components, the video decoder 100 can enhance decoder efficiency by enabling earlier configuration of hardware components, such as memory buffers and reconfiguration line buffers.

In FIG. 1, the video decoder 100 includes a processor 10 and a memory 11 coupled to the processor 10. The processor 10 is configured to execute a parsing stage module 10a and a reconstructive stage module 10b. The parsing stage module 10a is linked to the reconstructive stage module 10b. The parsing stage module 10a is configured to analyze syntax elements signaled in the video bit stream VS. In the embodiment, the video bit stream VS is a signal format used for transmitting video data information from a video encoder to the video decoder 100. The parsing stage module 10a is configured to process the incoming video bit stream (VS) . It extracts and analyzes syntax elements and information, such as flag values, and sends this information to the memory 11 for data buffering. The parsing stage module 10a uses the syntax elements to execute part of decoding functions, such as to determine whether the current block is to be further split by using the dual-tree partitioning method or not. The reconstructive stage module 10b is responsible for the actual reconstruction of the block contents. The reconstructive stage module 10b utilizes the information parsed and prepared by the parsing stage module 10a, such as configuration data D1 (i.e., buffer configurations, register configurations…) , to perform necessary operations for reconstructing the video data VD.

In the video decoder 100, the parsing stage module 10a receives the video bit stream VS associated with a coding tree unit (CTU) including a coding unit (CU) at a k-th depth node. It should be understood that the CTU is a root node (k=0) of a quaternary tree structure in video coding. It is the largest coding unit, and a picture is initially partitioned into CTUs. Each CTU can be recursively split into smaller CUs. In other words, CTUs are partitioned into a plurality of CUs by using a tree structure. CUs are the smaller units derived from partitioning the CTU. The k-th depth node refers to the depth of a node in the coding tree structure, where k is an integer greater than or equal to 0. For example, the k-th depth node signifies the level of partitioning the CU from the CTU within the tree structure, indicating how many times a CTU has been split to a particular CU node until it reaches a CU leaf node block. In other word, the leaf node block will not be further split and is used as the basic block for coding (decoding and encoding) . The CU includes a plurality of coding blocks (CBs) . It should be understood that the CB is a constituent block of the CU. For example, a CU may include a coding block of the luma component and two coding blocks of two chroma components respectively. Further, the parsing stage module 10a determines a first syntax element signaled in the video bit stream VS. The first syntax element indicates if a CU at a (k+1) th depth node is to be further split by using a dual-tree partitioning method. The parsing stage module 10a determines a second syntax element signaled in the video bit stream VS. The second syntax element is a single-bit flag for indicating that a first CB of the CU at the (k+1) th depth node is to be further split and a second CB of the CU at the (k+1) th depth node is not to be further split. The parsing stage module 10a configures the reconstructive stage module 10b by using the configuration data D1 based on at least the first syntax element and the second syntax element. The reconstructive stage module 10b decodes the CTU to generate (or say, reconstruct) the video data VD. In another embodiment, the memory 11 is configured to buffer the configuration data D1 and the video bit stream VS for use by the reconstructive stage module 10b. In the embodiment, the configuration data D1 includes at least the buffer configurations and the register configurations. Details of the method for decoding the video bit stream VS by the video decoder 100 are illustrated below.

FIG. 2A is a schematic diagram of splitting the CU from the k-th depth node to the (k+1) th depth node of the video decoder 100 using a single syntax element. As previously mentioned, the CTU represents the largest coding unit within a tree-based video coding framework and serves as the root node for the coding tree structure. As depicted, the CTU, for a 4: 2: 0 chroma format, is composed of a Luma Coding Tree Block (CTB) and two Chroma Coding Tree Blocks (CTBs) at k=0 depth node. The Luma CTB in FIG. 2A represents the coding tree block specifically designated for the luma component of the video bit stream VS. Luma corresponds to the brightness or luminance information in the picture of the video bit stream VS. The Chroma CTBs in FIG. 2A represent the coding tree blocks specifically designated for the color components of the video bit stream VS. For example, two chroma CTBs are denoted as Cb and Cr, or U and V in respective colour domain. In the video encoder, it can set a size threshold of CU and compare the size threshold of CU with a size of the CU at the k-th depth node. If the size of the CU at the k-th depth node meets the size threshold of CU, the video encoder switches from the single-tree partitioning method to the dual-tree partitioning method for splitting the CU at the (k+1) th depth node.

For example, in the embodiment, the video encoder initially applies a quaternary-tree (QT) structure in a single-tree coding as the partitioning method of splitting the CTUs at the K=0 depth node into 4 CUs at the k=1 depth node for both luma CTB and chroma CTBs. The video encoder continues to apply quaternary-tree (QT) structure in the single-tree coding at least down to a CU size of 64×64 (size threshold) luma samples. This implies that for CUs with a size greater than 64×64, the video encoder splits the CUs based on quaternary-tree (QT) structure in the single-tree partitioning method. If the CU is at most 64x64 luma samples and the CU corresponds to a leaf block of the single tree structure at the k-th depth node, the video encoder may determine switching from the single-tree partitioning method to the dual-tree partitioning method. At the k-th depth node, the CU includes one luma component (4 luma CBs) and two chroma components (4 chroma CBs for each chroma component) . The Luma CB is a fundamental coding block for encoding the luminance information within the video data at the k-th depth node. The chroma CBs are fundamental coding blocks for encoding the color information within the video data.

In the (k+1) th depth node, as previously mentioned, the video encoder switches from the single-tree partitioning method to the dual-tree partitioning method for splitting the CU. Since the CU at the k-th depth node includes one luma component (luma CB) and two chroma components (chroma CBs) , a single syntax element can be introduced for indicating if a CU at the (k+1) th depth node is to be further split by using a dual-tree partitioning method. In FIG. 2A, for example, the video encoder determines a first syntax element A having a flag (A=1) to indicate that the CU at the (k+1) th depth node is to be further split by using the dual-tree partitioning method. In one embodiment, the luma CB of the CU at the (k+1) th depth node is further split (e.g. horizontally split) based on a first splitting type. The chroma CBs of the CU at the (k+1) th depth node are further split (e.g. vertically split) based on a second splitting type. Since both luma CB and chroma CBs are further split at the (k+1) th depth node, the video encoder only determines the first syntax element A having the flag (A=1) and signaled it into the video bit stream VS. In other words, the first syntax element A having the flag (A=1) signifies that the CU at the (k+1) th depth node is to be further split by using the dual-tree partitioning method.

For another example, the video encoder determines the first syntax element A having a flag (A=0) to indicate that the CU at the (k+1) th depth node is “not” to be further split by using the dual-tree partitioning method. In one embodiment, the luma CB of the CU at the (k) th depth node is the leaf node block for encoding/decoding. Therefore, no additional splitting of the luma CB into smaller coding blocks is applied. Similarly, the chroma CBs of the CU at the (k) th depth node are a leaf node block for encoding/decoding. Therefore, no additional splitting of the chroma CBs into smaller coding blocks is applied. Since both luma CB and chroma CBs are not further split at the (k+1) th depth node, the video encoder only signaled the first syntax element having the flag (A=0) into the video bit stream VS for the CBs at the (k) th depth node. In other words, the first syntax element A having the flag (A=0) signifies that the CU at the (k) th depth node is not to be further split by using the dual-tree partitioning method.

Likewise, the parsing stage module 10a in the video decoder 100 parses the first syntax element A for the CU at the (k) th depth node. Once the parsing stage module 10a determines the first syntax element A having a flag (A=0) , then the parsing stage module 10a learns the CU at the (k)th depth node is the leaf node block for decoding. Once the parsing stage module 10a determines the first syntax element A having a flag (A=1) , then the parsing stage module 10a learns that the CU at the (k+1) th depth node is required to be further split for decoding. The single syntax element used to indicate whether a CU is to be further split in the video decoder 100 is similar to the concept used in the video encoder, and thus the details are not repeated. As a result, FIG. 2A can be applied to the video encoder and the video decoder 100.

FIG. 2B is a schematic diagram of splitting the coding unit from the k-th depth node to a (k+1) th depth node of the video decoder 100 using multiple syntax elements in FIG. 1. In FIG. 2B, the multiple syntax elements are introduced to indicate either the luma CB or the chroma CB is to be further split. In FIG. 2B, for example, the CU at the (k+1) th depth node is to be further split. The chroma CBs of the CU at the (k+1) th depth node are further split (e.g. vertically split) . The luma CB of the CU at the (k+1) th depth node is not to be further split. In the embodiment, the chroma CB is referred to as the first coding block CB1. The luma CB is referred to as the second coding block CB2. The second syntax element B can be regarded as “splitting luma tree syntax” . For example, if the second syntax element B (splitting luma tree syntax) is false (B=0, the luma tree is not split) , the implicated syntax element for the chroma tree is implicitly inferred to be true (i.e., the implicated syntax element for the chroma tree is not signaled) , resulting in only splitting the chroma CBs. However, the embodiment is not limited thereto. For example, the second syntax element B may also be regarded as “splitting chroma tree syntax” . Under the condition where the “splitting chroma tree syntax” is true, it may be inferred that the implicated syntax element for the chroma tree is “false” . Any reasonable technical modification falls within the scope of the embodiment.

In another embodiment, the video encoder determines or identifies the first syntax element having a flag (A=1) and a second syntax element having a flag B=0 to split the chroma CBs (CB1) without splitting the luma CB (CB2) of the CU at the (k+1) th depth node. Therefore, by utilizing the first syntax element A and the second syntax element B in combination, the video encoder can determine whether the CU at the (k+1) th depth node is to be further split. Additionally, it can identify which of the chroma CB and the luma CB in the CU is to be further split, and recognize the other un-split CB block as a leaf node block.

For another example, the luma CB of the CU at the (k+1) th depth node is further split (e.g. horizontally split) . The chroma CBs of the CU at the (k+1) th depth node are not further split. The luma CB is referred to as the first coding block CB1. The chroma CB is referred to as the second coding block CB2. The second syntax element B can be regarded as “splitting chroma tree syntax” . For example, if the second syntax element B (splitting chroma tree syntax) is false (B=0, the chroma tree is not split) , the implicated syntax element for the luma tree is implicitly inferred to be true (i.e., the implicated syntax element for the luma tree is not signaled) , resulting in only splitting the luma CB. However, the embodiment is not limited thereto. For example, the second syntax element B may also be regarded as “splitting luma tree syntax” . When the “splitting luma tree syntax” is true, it may be inferred that the implicated syntax element for the luma tree is “false” . Any reasonable technical modification falls within the scope of the embodiment.

In another embodiment, the video encoder determines or identifies the first syntax element having a flag (A=1) and a second syntax element having a flag B=0 (or 1) to split the luma CB (CB1) without splitting the chroma CBs (CB2) of the CU at the (k+1) th depth node. It employs two syntax elements to signify that the CU is to be further split at the (k+1) th depth node, and also specifies that only the luma CB (CB1) is further to be split.

Likewise, the parsing stage module 10a in the video decoder 100 parses the first syntax element A and the second syntax element B for the CU at the (k) th depth node. Once the parsing stage module 10a identifies that the first syntax element A has a flag (A=1) and the second syntax element B has a flag B=0 (or 1) , then the parsing stage module 10a learns that the CU at the (k+1) -th depth node is required to be further split for decoding, and only the chroma CB is further to be split. Once the parsing stage module 10a identifies that the first syntax element A has a flag (A=1) and the second syntax element B has a flag B=1 (or 0) , then the parsing stage module 10a learns that the CU at the (k+1) -th depth node is required to be further split for decoding, and only the luma CB is further to be split. The dual syntax elements used in the video decoder 100 is similar to the concept used in the video encoder, and thus the details are not repeated. As a result, FIG. 2B can be applied to the video encoder and the video decoder 100.

As a result, by employing the decoding mechanisms in FIG. 2B, the reconstructive stage module 10b of the video decoder 100 can determine the hardware configuration and decoding parameters in advance, thereby significantly reducing the decoding delay.

In the video decoder 100, as previously mentioned, the dual-tree partitioning method is applied for further splitting the CU at the (k+1) th depth node. The first CB (CB1) is defined as the luma CB or the chroma CB to be split. To identify the splitting type and the splitting direction, the video decoder 100 can determine a third syntax element and a fourth syntax element. The third syntax element is used for indicating the splitting type of the first CB of the CU at the (k+1) th depth node. The fourth syntax element is used for indicating the splitting direction of the first CB of the CU at the (k+1) th depth node. For example, the splitting type can include options such as a binary tree split or a ternary tree split, corresponding to the third syntax element. As for the splitting direction, this could be a horizontal direction or a vertical direction, corresponding to the fourth syntax element. Further, the first syntax element, the second syntax element, the third syntax element, and/or the fourth element is specified respectively in a picture parameter set (PPS) , a sequence parameter set (SPS) , or a CU parameter set of the video bit stream VS. In other words, each of these syntax elements can be individually embedded within specific components of the video bit stream VS structure. In one embodiment, these syntax elements can be embedded in the PPS. The PPS is a syntax structure holding parameters that apply to an entire coded picture. It decouples picture-level information from slice headers, enhancing coding efficiency. The PPS complements the SPS, with the SPS governing sequence-wide parameters and the PPS handling picture-specific ones. In another embodiment, these syntax elements can be embedded in the SPS. The SPS includes parameters that apply to an entire sequence of coded video pictures. These parameters define the overall characteristics of the video stream, such as profile, level, picture dimensions, and other sequence-wide settings. In another embodiment, these syntax elements can be embedded in the CU parameter set. It should be understood that in video coding, particularly within advanced standards like HEVC or VVC, the parameter set refers to a collection of data elements that define how a coding unit is processed. For example, the CU parameter set includes CU partitioning information, prediction mode information, coding flags, transform coefficient information, and control information.

Furthermore, when the dual-tree partitioning method is applied to the CU at the k-th depth node, all leaf node blocks of the CU after the k-th depth node are coded by the same prediction mode. For example, all leaf node blocks of luma and chroma after the k-th depth node are coded by the same prediction mode, such as intra-prediction, inter-prediction, intra-block copying, or palette mode. The prediction mode is determined before identifying the syntax elements of the dual-tree.

FIG. 3 is a schematic diagram of splitting the chroma coding block CB1 of the CU from the (k+1) th depth node to a (k+2) th depth node of the video decoder 100. As previously mentioned, after the parsing stage module 10a of the decoder 100 determines the first syntax element A and the second syntax element B, for example, in the second splitting state, the luma CB of the CU at the (k+1) th depth node is the leaf node block and is not to be further split. The chroma CBs of the CU at the (k+1) th depth node are to be further split into smaller blocks, such as CB1’. In the (k+2) th depth node, the parsing stage module 10a determines a fifth syntax element signaled in the video bit stream VS. The fifth syntax element indicates if the chroma coding block CB1’ (marked with shadow line in FIG. 3) at a (k+2) th depth node is to be further split. For example, the parsing stage module 10a can further split the chroma coding block CB1’ at a (k+2) th depth node into smaller blocks, such as CB1” , based on the fifth syntax element. Further, the fifth syntax element is specified in the PPS, the SPS, or the CU parameter set of the video bit stream VS. It should be understood that, if required, as the depth node advances, the CU can be recursively split until the CU reaches the minimum-size block. In the process of recursively splitting the CU, corresponding syntax elements can be introduced to indicate whether the CU is to be further split at the next depth mode. Since this is a repetitive process, the details are omitted here.

In a video encoder, it primary role is to take raw video data and transform it into a compressed format suitable for storage or transmission. The video encoder’s operation involves a series of decision-making and signaling processes. Initially, the video encoder receives video data, which is structured into CTUs and further split into CUs. The video encoder then embarks on a critical task: determining how these CUs should be partitioned. This process may involve employing techniques like dual-tree partitioning to achieve efficient compression. To effectively communicate these partitioning decisions and other relevant information, the video encoder determines various syntax elements and flags, which are meticulously embedded into the video bit stream VS. These syntax elements act as a set of instructions, providing the decoder with the necessary guidance for reconstructing the video, such as the first syntax element to the fifth syntax element previously mentioned. In some embodiments, the video encoder may also handle additional tasks such as determining splitting types, splitting directions, and prediction modes. In essence, the video encoder analyzes the video content of the video data, makes critical decisions on how to encode it, and meticulously signals those decisions to the decoder. FIG. 4 is a flow chart of performing a method for signaling syntax element into the video bit stream VS associated with the video data by the video encoder. The method for signaling the video bit stream VS by the video encoder includes steps S401 to S404. Any hardware or technology modification falls into the scope of the present invention. Steps S401 to S404 are described below.

Step S401: receiving the video data associated with the CTU including the CU at the k-th depth node, the CU including CBs, and k being an integer greater than or equal to 0;

Step S402: determining the first syntax element for indicating if the CU at the (k+1) th depth node is to be further split by using the dual-tree partitioning method;

Step S403: determining the single-bit flag as the second syntax element for indicating the first CB of the CU at the (k+1) depth node is to be further split and the second CB of the CU at the (k+1) depth node is not to be further split;

Step S404: signaling the first syntax element and the second syntax element into the video bit stream.

In yet another embodiment, the method for signaling the video bit stream VS by the video encoder includes steps S401, S402 and S404 (i.e. step S403 is omitted) . In such a circumstance, in step S404, the second syntax element is also omitted and is not signaled into the video bit stream.

FIG. 5A is a flow chart of performing a method for decoding the video bit stream associated with the video data using the single syntax element by the video decoder 100. The method for signaling the video bit stream VS by the video decoder 100 using the single syntax element includes steps S5011 to S5013. Any hardware or technology modification falls into the scope of the present invention. Steps S5011 to S5013 are described below.

Step S5011: receiving the video bit stream associated with the CTU including the CU at the k-th depth node, wherein the CU including a plurality of CBs, and k being an integer greater than or equal to 0;

Step S5012: identifying the first syntax element from the video bit stream VS, the first syntax element indicating if the CU at the (k+1) th depth node is to be further split by using the dual-tree partitioning method;

Step S5013: configuring the video decoder based on the first syntax element, and decoding the CTU to generate the video data.

FIG. 5B is a flow chart of performing a method for decoding the video bit stream associated with the video data using the multiple syntax elements by the video decoder 100. In one embodiment, the method for signaling the video bit stream VS by the video decoder 100 using dual syntax elements includes steps S5021 to S5024. Any hardware or technology modification falls into the scope of the present invention. Steps S5021 to S5024 are described below.

Step S5021: receiving the video bit stream associated with the CTU including the CU at the k-th depth node, the CU including a plurality of CBs, and k being an integer greater than or equal to 0;

Step S5022: identifying the first syntax element from the video bit stream VS, the first syntax element indicating if the CU at the (k+1) th depth node is to be further split by using a dual-tree partitioning method;

Step S5023: identifying the second syntax element from the video bit stream VS, the second syntax element being the single-bit flag for indicating that the first CB of the CU at the (k+1) th depth node is to be further split and a second CB of the CU at the (k+1) th depth node is not to be further split;

Step S5024: configuring the video decoder based on at least the first syntax element and the second syntax element, and decoding the CTU to generate the video data.

Details of steps S501 to S504 are previously described. Thus, they are omitted here. The video decoder 100 presents advantages in how it handles dual-tree coding. The video decoder 100 employs two syntax elements to signal whether the current CU is to be further split using a dual-tree partitioning method, and whether luma or chroma components are further split or not. Such adaptive dual-tree usage enhances signaling efficiency and flexibility. Further, by using the two syntax elements for splitting the current block across at least two depth nodes, the video decoder 100 enables earlier configuration of hardware components, such as memory buffers and reconfiguration buffers, thereby improving decoder efficiency.

FIG. 6 illustrates an example video encoder that may implement the method for signaling block partitioning information into a video bit stream. As illustrated, the video encoder 2400 receives an input video signal from a video source 2401 and encodes the signal into bitstream 2495. The video encoder 2400 has several components or modules for encoding the signal from the video source 2401, at least including some components selected from a block structure partitioning module 2405, a transform module 2410, a quantization module 2411, an inverse quantization module 2414, an inverse transform module 2415, an intra-picture estimation module 2424, an intra-prediction module 2425, a motion compensation module 2430, a motion estimation module 2435, an in-loop filter 2445, a reconstructed picture buffer 2450, a MV buffer 2465, and a MV prediction module 2475, and an entropy encoder 2490. The motion compensation module 2430 and the motion estimation module 2435 are part of an inter-prediction module 2440. The intra-prediction module 2425 and the intra-prediction estimation module 2424 are part of a current picture prediction module 2420, which uses current picture reconstructed samples as reference samples for the prediction of the current block.

In some embodiments, the modules 2405 –2490 are modules of software instructions being executed by one or more processing units (e.g., a processor) of a computing device or electronic apparatus. In some embodiments, the modules 2405 –2490 are modules of hardware circuits implemented by one or more integrated circuits (ICs) of an electronic apparatus. Though the modules 2405 –2490 are illustrated as being separate modules, some of the modules can be combined into a single module.

The video source 2401 provides a raw video signal that presents pixel data of each video frame without compression. The Block Structure Partitioning Module 2405 receives input data of video pictures and determines a block partitioning structure according to above mentioned embodiments for each block in the video picture to be encoded. A current video picture is first divided into non-overlapped blocks CTU and each block is further divided by a recursive partitioning structure into leaf node blocks in the Block Structure Partitioning Module 2405 , for example, the Block Structure Partitioning module 2405 adaptively partitions each of the non-overlapping CTUs in the current video picture into leaf CUs for prediction processing. According to various embodiments of the present invention, the Block Structure Partitioning Module 2405 receives input data associated with a current block in the current video picture and checks whether one or more predefined splitting types are allowed for partitioning the current block. The current block may be a current CTU, a CU split from the current CTU.

In some embodiments, a predefined splitting type is allowed if each sub-block split from the current block satisfies at least one of first and second constraints, where the first constraint restricts each sub-block to be completely contained in one pipeline unit and the second constraint restricts each sub-block to contain one or more complete pipeline units. The pipeline units are non-overlapping units in the current video picture designed for pipeline processing, and a pipeline unit size is predefined or implicitly defined based on a profile or level according to a video compression standard. For example, the pipeline unit size is set to a maximum Transform Block (TB) size. The Block Structure Partitioning Module 2405 adaptively partitions the current block using an allowed splitting type. The predefined splitting type is not allowed to split the current block if any sub-block partitioned by the predefined splitting type violates both the first and second constraints.

Some other embodiments of the Block Structure Partitioning Module 2405 check if a predefined splitting type is allowed to partition the current block by comparing a width, a height, or both the width and height of the current block with one or both threshold W and threshold H. An example of the threshold W is 64 and an example of threshold H is also 64. If there are two or more allowed splitting types that can be used to partition the current block, an embodiment of the Block Structure Partitioning Module 2405 systematically tests the allowed splitting types, determines whether to further split and selects the splitting type according to Rate Distortion Optimization (RDO) results. Information corresponding to the selected splitting type for the current block such as the block partitioning structure may be signaled in the video bitstream for the decoders to decode the current block.

A subtractor 2408 computes the difference between the raw video pixel data of the video source 2401 and the predicted pixel data 2413 from the motion compensation module 2430 or intra-prediction module 2425 as the prediction residual 2409. The transform module 2410 converts the difference (or the residual pixel data or residual signal 2408) into transform coefficients (e.g., by performing Discrete Cosine Transform, or DCT) . The quantization module 2411 quantizes the transform coefficients into quantized data (or quantized coefficients) 2412, which is encoded into the bitstream 2495 by the entropy encoder 2490.

The inverse quantization module 2414 de-quantizes the quantized data (or quantized coefficients) 2412 to obtain transform coefficients, and the inverse transform module 2415 performs inverse transform on the transform coefficients to produce reconstructed residual 2419. The reconstructed residual 2419 is added with the predicted pixel data 2413 to produce reconstructed pixel data 2417. In some embodiments, the reconstructed pixel data 2417 is temporarily stored in a line buffer 2427 (or intra-prediction buffer) for intra-picture prediction and spatial MV prediction. The reconstructed pixels are filtered by the in-loop filter 2445 and stored in the reconstructed picture buffer 2450. In some embodiments, the reconstructed picture buffer 2450 is a storage external to the video encoder 2400. In some embodiments, the reconstructed picture buffer 2450 is a storage internal to the video encoder 2400.

The intra-picture estimation module 2424 performs intra-prediction based on the reconstructed pixel data 2417 to produce intra prediction data. The intra-prediction data is provided to the entropy encoder 2490 to be encoded into bitstream 2495. The intra-prediction data is also used by the intra-prediction module 2425 to produce the predicted pixel data 2413.

The motion estimation module 2435 performs inter-prediction by producing MVs to reference pixel data of previously decoded frames stored in the reconstructed picture buffer 2450. These MVs are provided to the motion compensation module 2430 to produce predicted pixel data.

Instead of encoding the complete actual MVs in the bitstream, the video encoder 2400 uses MV prediction to generate predicted MVs, and the difference between the MVs used for motion compensation and the predicted MVs is encoded as residual motion data and stored in the bitstream 2495.

The MV prediction module 2475 generates the predicted MVs based on reference MVs that were generated for encoding previous video frames, i.e., the motion compensation MVs that were used to perform motion compensation. The MV prediction module 2475 retrieves reference MVs from previous video frames from the MV buffer 2465. The video encoder 2400 stores the MVs generated for the current video frame in the MV buffer 2465 as reference MVs for generating predicted MVs.

The MV prediction module 2475 uses the reference MVs to create the predicted MVs. The predicted MVs can be computed by spatial MV prediction or temporal MV prediction. The difference between the predicted MVs and the motion compensation MVs (MC MVs) of the current frame (residual motion data) are encoded into the bitstream 2495 by the entropy encoder 2490.

The entropy encoder 2490 encodes various parameters and data into the bitstream 2495 by using entropy-coding techniques such as context-adaptive binary arithmetic coding (CABAC) or Huffman encoding. The entropy encoder 2490 encodes various header elements, and flags, along with the quantized transform coefficients 2412, and the residual motion data as syntax elements into the bitstream 2495. The bitstream 2495 is in turn stored in a storage device or transmitted to a decoder over a communications medium such as a network.

The in-loop filter 2445 performs filtering or smoothing operations on the reconstructed pixel data 2417 to reduce the artifacts of coding, particularly at the boundaries of pixel blocks. In some embodiments, the filtering or smoothing operations performed by the in-loop filter 2445 include deblock filter (DBF) , sample adaptive offset (SAO) , and/or adaptive loop filter (ALF) . In some embodiments, luma mapping chroma scaling (LMCS) is performed before the loop filters.

FIG. 7 illustrates an example video decoder 2700 for decoding a video bit stream associated with video data according to an embodiment of the present invention. As illustrated, the video decoder 2700 is an image-decoding or video-decoding circuit that receives a bitstream 2795 and decodes the content of the bitstream into pixel data of video frames for display. The video decoder 2700 has several components or modules for decoding the bitstream 2795, including some components selected from an inverse quantization module 2711, an inverse transform module 2710, an intra-prediction module 2725, a motion compensation module 2730, an in-loop filter 2745, a decoded picture buffer 2750, a MV buffer 2765, a MV prediction module 2775, and a parser 2790. The motion compensation module 2730 is part of an inter-prediction module 2740. The intra-prediction module 2725 is part of a current picture prediction module 2720, which uses current picture reconstructed samples as reference samples for the prediction of the current block.

In some embodiments, the modules 2710 –2790 are modules of software instructions being executed by one or more processing units (e.g., a processor) of a computing device. In some embodiments, the modules 2710 –2790 are modules of hardware circuits implemented by one or more ICs of an electronic apparatus. Though the modules 2710 –2790 are illustrated as being separate modules, some of the modules can be combined into a single module.

The parser 2790 (or entropy decoder) performs a similar function as the parsing stage module 10a of FIG. 1. The parser 2790 (or entropy decoder) receives the bitstream 2795 and performs initial parsing according to the syntax defined by a video-coding or image-coding standard. The parsed syntax element includes various header elements, and flags, as well as quantized data (or quantized coefficients) 2712. The parser 2790 parses out the various syntax elements by using entropy-coding techniques such as context-adaptive binary arithmetic coding (CABAC) or Huffman encoding.

The inverse quantization module 2711 de-quantizes the quantized data (or quantized coefficients) 2712 to obtain transform coefficients, and the inverse transform module 2710 performs inverse transform on the transform coefficients 2716 to produce a reconstructed residual signal 2719. The reconstructed residual signal 2719 is added with predicted pixel data 2713 from the intra-prediction module 2725 or the motion compensation module 2730 to produce decoded pixel data 2717. The decoded pixel data are filtered by the in-loop filter 2745 and stored in the decoded picture buffer 2750. In some embodiments, the decoded picture buffer 2750 is a storage external to the video decoder 2700. In some embodiments, the decoded picture buffer 2750 is a storage internal to the video decoder 2700.

The intra-prediction module 2725 receives intra-prediction data from bitstream 2795 and according to which, produces the predicted pixel data 2713 from the decoded pixel data 2717 stored in the decoded picture buffer 2750. In some embodiments, the decoded pixel data 2717 is also stored in a line buffer 2727 (or intra-prediction buffer) for intra-picture prediction and spatial MV prediction.

In some embodiments, the content of the decoded picture buffer 2750 is used for display. A display device 2705 either retrieves the content of the decoded picture buffer 2750 for display directly or retrieves the content of the decoded picture buffer to a display buffer. In some embodiments, the display device receives pixel values from the decoded picture buffer 2750 through pixel transport.

The motion compensation module 2730 produces predicted pixel data 2713 from the decoded pixel data 2717 stored in the decoded picture buffer 2750 according to motion compensation MVs (MC MVs) . These motion compensation MVs are decoded by adding the residual motion data received from the bit stream 2795 with predicted MVs received from the MV prediction module 2775.

The MV prediction module 2775 generates the predicted MVs based on reference MVs that were generated for decoding previous video frames, e.g., the motion compensation MVs that were used to perform motion compensation. The MV prediction module 2775 retrieves the reference MVs of previous video frames from the MV buffer 2765. The video decoder 2700 stores the motion compensation MVs generated for decoding the current video frame in the MV buffer 2765 as reference MVs for producing predicted MVs.

The in-loop filter 2745 performs filtering or smoothing operations on the decoded pixel data 2717 to reduce the artifacts of coding, particularly at the boundaries of pixel blocks. In some embodiments, the filtering or smoothing operations performed by the in-loop filter 2745 include deblock filter (DBF) , sample adaptive offset (SAO) , and/or adaptive loop filter (ALF) . In some embodiments, luma mapping chroma scaling (LMCS) is performed before the loop filters.

In summary, the embodiments provide a method for decoding a video bit stream, a method for signaling syntax element into the video bit stream, and a video decoder for dual-tree partitioning technology. The embodiments overcome the limitations of conventional dual-tree coding methods by introducing an innovative approach to signaling in video coding. The core idea lies in its adaptive dual-tree usage and more efficient signaling mechanism. This is achieved through the introduction of specific syntax elements that indicate whether a CU at a certain depth node is to be further split using a dual-tree partitioning method, and whether luma or chroma components are further split. By doing so, the necessity for signaling every split for luma and chroma components can be reduced. This approach offers a significant advantage over traditional methods that apply dual-tree coding to the entire CU with a single flag or require the decoder to receive all signaling syntax elements at the least depth node before proceeding with configuration. The embodiments enable earlier configuration of hardware components in the decoder, such as memory buffers and reconfiguration buffers, thereby enhancing the overall decoding efficiency. As a result, the embodiments are applicable in the field of video coding and decoding, particularly in scenarios involving inter-slices and dual-tree partitioning, offering improved performance in video compression and processing.

Those skilled in the art will readily observe that numerous modifications and alterations of the device and method may be made while retaining the teachings of the invention. Accordingly, the above disclosure should be construed as limited only by the metes and bounds of the appended claims.

Claims

A method for decoding a video bit stream associated with video data by a video decoder, the method comprising:

receiving the video bit stream associated with a coding tree unit (CTU) comprising a coding unit (CU) at a k-th depth node, wherein the CU comprises a plurality of coding blocks (CBs) , and k is an integer greater than or equal to 0;

identifying a first syntax element from the video bit stream, wherein the first syntax element indicates if a CU at a (k+1) th depth node is to be further split by using a dual-tree partitioning method;

identifying a second syntax element from the video bit stream, wherein the second syntax element is a single-bit flag for indicating that a first CB of the CU at the (k+1) th depth node is to be further split and a second CB of the CU at the (k+1) th depth node is not to be further split; and

configuring the video decoder based on at least the first syntax element and the second syntax element, and decoding the CTU to generate the video data.
The method of claim 1, wherein the first CB comprises luma block and the second CB comprises chroma block.
The method of claim 1, wherein the first CB comprises chroma block and the second CB comprises luma block.
The method of claim 1, wherein the first syntax element and/or the second syntax element is specified in a picture parameter set (PPS) , a sequence parameter set (SPS) , or a CU parameter set of the video bit stream.
The method of claim 1, further comprising:

identifying a third syntax element and a fourth syntax element from the video bit stream, wherein the third syntax element indicates a splitting type of the first CB of the CU at the (k+1) th depth node, and the fourth syntax element indicates a splitting direction of the first CB of the CU at the (k+1) th depth node.
The method of claim 5, wherein the third syntax element and/or the fourth syntax element is specified in a picture parameter set (PPS) , a sequence parameter set (SPS) or a CU parameter set of the video bit stream.
The method of claim 5, wherein the splitting type comprises a binary tree split and a ternary tree split, and the splitting direction comprises a horizontal direction and a vertical direction.
The method of claim 1, further comprising:

identifying a fifth syntax element from the video bit stream, wherein the fifth syntax element indicates if a first CB at a (k+2) th depth node is to be further split.
The method of claim 1, further comprising:

setting a size threshold of CU;

comparing the size threshold of CU with a size of the CU at the k-th depth node; and

if the size of the CU at the k-th depth node satisfies the size threshold of CU, switching from a single-tree partitioning method to the dual-tree partitioning method for splitting the CU at the (k+1) th depth node.
The method of claim 1, wherein when the dual-tree partitioning method is applied to the CU at the k-th depth node, all leaf node blocks of the CU after the k-th depth node are coded by the same prediction mode.
A method for signaling syntax elements into a video bit stream associated with video data by a video encoder, the method comprising:

receiving the video data associated with a coding tree unit (CTU) comprising a coding unit (CU) at a k-th depth node, wherein the CU comprises coding blocks (CBs) , and k is an integer greater than or equal to 0;

determining a first syntax element for indicating if a CU at a (k+1) th depth node is to be further split by using a dual-tree partitioning method;

determining a single-bit flag as a second syntax element for indicating a first CB of the CU at the (k+1) depth node is to be further split and a second CB of the CU at the (k+1) depth node is not to be further split; and

signaling the first syntax element and the second syntax element into the video bit stream.
The method of claim 11, further comprising:

signaling a third syntax element and a fourth syntax element into the video bit stream, wherein the third syntax element indicates a splitting type of the first CB of the CU at the (k+1) th depth node, and the fourth syntax element indicates a splitting direction of the first CB of the CU at the (k+1) th depth node.
A video decoder for decoding a video bit stream associated with video data, the video decoder comprising:

a parsing stage module configured to analyze syntax elements in the video bit stream; and

a reconstructive stage module linked to the parsing stage module;

wherein the parsing stage module receives the video bit stream associated with a coding tree unit (CTU) comprising a coding unit (CU) at a k-th depth node, the CU comprises a plurality of coding blocks (CBs) , and k is an integer greater than or equal to 0, the parsing stage module identifies a first syntax element from the video bit stream, the first syntax element indicates if a CU at a (k+1) th depth node is to be further split by using a dual-tree partitioning method, the parsing stage module identifies a second syntax element from the video bit stream, the second syntax element is a single-bit flag for indicating that a first CB of the CU at the (k+1) th depth node is to be further split and a second CB of the CU at the (k+1) th depth node is not to be further split, and the parsing stage module configures the reconstructive stage module based on at least the first syntax element and the second syntax element, and the reconstructive stage module decodes the CTU to generate the video data.
The video decoder of claim 13, wherein the first CB comprises luma block and the second CB comprises chroma block.
The video decoder of claim 13, wherein the first CB comprises chroma block and the second CB comprises luma block.
The video decoder of claim 13, wherein the parsing stage module identifies a third syntax element and a fourth syntax element from the video bit stream, the third syntax element indicates a splitting type of the first CB of the CU at the (k+1) th depth node, and the fourth syntax element indicates a splitting direction of the first CB of the CU at the (k+1) th depth node.
The video decoder of claim 16, wherein the splitting type comprises a binary tree split and a ternary tree split, and the splitting direction comprises a horizontal direction and a vertical direction.
The video decoder of claim 13, wherein the parsing stage module identifies a fifth syntax element from the video bit stream, wherein the fifth syntax element indicates if a first CB at a (k+2) th depth node is to be further split.
The video decoder of claim 13, wherein the parsing stage module sets a size threshold of CU and compares the size threshold of CU with a size of the CU at the k-th depth node, and if the size of the CU at the k-th depth node satisfies the size threshold of CU, the parsing stage module switches from a single-tree partitioning method to the dual-tree partitioning method for splitting the CU at the (k+1) th depth node.
The video decoder of claim 13, wherein when the dual-tree partitioning method is applied to the CU at the k-th depth node, all leaf node blocks of the CU after the k-th depth node are coded by the same prediction mode.