[go: up one dir, main page]

WO2006043772A1 - Method for encoding/decoding video sequence based on mctf using adaptively-adjusted gop structure - Google Patents

Method for encoding/decoding video sequence based on mctf using adaptively-adjusted gop structure Download PDF

Info

Publication number
WO2006043772A1
WO2006043772A1 PCT/KR2005/003467 KR2005003467W WO2006043772A1 WO 2006043772 A1 WO2006043772 A1 WO 2006043772A1 KR 2005003467 W KR2005003467 W KR 2005003467W WO 2006043772 A1 WO2006043772 A1 WO 2006043772A1
Authority
WO
WIPO (PCT)
Prior art keywords
gop
frame
bitstream
sized
encoding
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/KR2005/003467
Other languages
French (fr)
Inventor
Se Yoon Jeong
Kyu Heon Kim
Jin Woo Hong
Gwang Hoon Park
Min Woo Park
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Electronics and Telecommunications Research Institute ETRI
Kyung Hee University
Original Assignee
Electronics and Telecommunications Research Institute ETRI
Kyung Hee University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from KR1020050031712A external-priority patent/KR20060045796A/en
Application filed by Electronics and Telecommunications Research Institute ETRI, Kyung Hee University filed Critical Electronics and Telecommunications Research Institute ETRI
Priority to US11/576,572 priority Critical patent/US20090080519A1/en
Publication of WO2006043772A1 publication Critical patent/WO2006043772A1/en
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/60Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding
    • H04N19/61Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding in combination with predictive coding
    • H04N19/615Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding in combination with predictive coding using motion compensated temporal filtering [MCTF]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/103Selection of coding mode or of prediction mode
    • H04N19/114Adapting the group of pictures [GOP] structure, e.g. number of B-frames between two anchor frames
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/146Data rate or code amount at the encoder output
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/146Data rate or code amount at the encoder output
    • H04N19/147Data rate or code amount at the encoder output according to rate distortion criteria
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/177Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being a group of pictures [GOP]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/189Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the adaptation method, adaptation tool or adaptation type used for the adaptive coding
    • H04N19/192Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the adaptation method, adaptation tool or adaptation type used for the adaptive coding the adaptation method, adaptation tool or adaptation type being iterative or recursive
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/30Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using hierarchical techniques, e.g. scalability
    • H04N19/31Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using hierarchical techniques, e.g. scalability in the temporal domain
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/42Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation
    • H04N19/436Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation using parallelised computational arrangements
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • H04N19/51Motion estimation or motion compensation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/60Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding
    • H04N19/61Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding in combination with predictive coding
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/60Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding
    • H04N19/63Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding using sub-band based transform, e.g. wavelets
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/70Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by syntax aspects related to video coding, e.g. related to compression standards
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/13Adaptive entropy coding, e.g. adaptive variable length coding [AVLC] or context adaptive binary arithmetic coding [CABAC]

Definitions

  • the present invention relates to a video coding/decoding scheme, and more particularly, to a method for encoding a video sequence based on motion compensated temporal filtering (MCTF) using intelligently-divided group of pictures (GOP) and a method for decoding an encoded bitstream.
  • MCTF motion compensated temporal filtering
  • GOP intelligently-divided group of pictures
  • MCTF-based video coding scheme by performing a wavelet transform along temporal axis, with motion information, to improve coding efficiency.
  • the existing MCTF-based video coding is performed in a unit of a th
  • GOP which has the fixed size of n power of 2.
  • FIG. 1 shows an encoding concept of a video sequence where a GOP size is 8.
  • MCTF is repeatedly performed "n" times in one GOP which has the size of n power of 2.
  • a process performing the MCTF once is called a decomposition process, and the number of performing the MCTF in one GOP is expressed as a decomposition level.
  • Motion information is obtained through motion prediction in the prediction process, and a wavelet transform is carried out in a motion direction using the motion information.
  • the wavelet transform used may be a Haar wavelet transform or a 5/3 spline wavelet transform as examples.
  • the MCTF can be carried out in units of blocks so that the encoding can be separately performed for the intra blocks and inter blocks.
  • the block-based MCTF is performed, the Haar wavelet transform is used for motion prediction in one direction, and the 5/3 spline wavelet transform is used for motion prediction in both directions.
  • Intra coding is performed when motion prediction cannot be performed or the efficiency of the motion prediction is lower than that of the intra coding.
  • FIG. 2 shows a process of performing MCTF when a decomposition level is 4. As shown, the reference frame is quite far along the time axis, when H3 and 114 pictures are predicted. (Here, the first frame in the GOP, which is the last low-frequency frame in the previous GOP, is only referred to.)
  • FIG. 3 shows a part of "Foreman" QCIF (Quarter Common Intermediate Format) 15Hz video sequence. It can be seen from the figure that there are a little motion variation in a GOP. Thus, it can be concluded that MCTF encoding produces a good prediction result in a video sequence with relatively little motion.
  • FIG. 4 shows a part of "Football" QCIF 15Hz video sequence. It can be seen from the figure that image frames change dynamically in a GOP. Thus, it can be concluded that, in a dynamic video sequence, the higher the decomposition level is, the more intra blocks are generated in the MCTF-based encoding process. In other words, it can be expected that the, in case of the dynamic video sequence, a poor prediction efficiency can be obtained by the MCTF.
  • FIG. 5 shows an example where too many intra blocks are included in a prediction frame due to the poor motion prediction when "Football" QCIF 15Hz video sequence is encoded with the GOP size of 8.
  • FIGS. 6 and 7 show graphs of coding efficiency results with 4 different GOP sizes (1, 2, 4 and 8) for "Football" sequence at QCIF 7.5Hz and 15Hz sequences, respectively. As shown, the smaller the GOP size, the higher the coding efficiency. It can be concluded from the graphs that the GOP size of 1 is higher in performance by about 0.3 dB to 0.4 dB than the GOP size of 8.
  • FIGS. 8 to 10 show the MCTF process with three different GOP sizes 8, 4 and
  • FIG. 12 shows the frame-based PSNR (Peak Signal-to-Noise Ratio) results of
  • FIG. 13 shows the frame-based PSNR resul ts of MCTF-based coding for frames th th from 137 to 144 of "Foreman" sequence at QCIF 15Hz at the same bit rate, based on three different GOP size, such as 8, 4 and 2.
  • the 8-sized GOP shows the best coding efficiency.
  • FIG. 14 shows the frame-based PSNR results of MCTF-based coding for frames from 97 to 104 of "Foreman" sequence at QCIF 15Hz at the same bit rate, based on three different GOP size, such as 8, 4 and 2.
  • the 8-sized GOP has higher coding efficiency than the 4 or 2- sized GOP, which is the opposite to the overall result of "Foreman" video sequence.
  • the 4 or 2-sized GOP may have slightly improved the overall coding efficiency. It can be expected that front four frames have the best coding efficiency when the GOP size is 2, while the rear four frames have the best coding efficiency when the GOP size is 4.
  • a method for performing motion compensated temporal filtering (MCTF)-based encoding on a video sequence comprises the steps of: for each predefined 2 frame-sized group of pictures (GOP) of the video sequence, (a) encoding the 2 frame-sized GOP of the video sequence based on each of the different GOP sizes from the maximum size, 2 , to the minimum size, 2 (M is an integer between 1 and N) and obtaining different values between frames reconstructed after the encoding is performed and frames after the MCTF is performed, based on each of the different GOP sizes; (b) selecting at least one sub-GOP based on the difference values obtained by encoding the 2 frame-sized GOP of the video sequence based on each of the different GOP sizes; and (c) generating a bitstream by encoding the 2 -frame-sized GOP based on the at least one selected sub-GOP.
  • GOP motion compensated temporal filtering
  • the step (b) includes the sub-steps of (bl) comparing the difference values obtained from the encoding based on each of the different
  • GOP sizes from 2 to 2 and, if the difference value obtained from the encoding based on the 2 -sized GOP unit is the smallest, selecting the 2- si zed GOP as a sub-GOP ; and (b2) i f the di f ference value obtained from the encoding based on the 2 N -sized GOP is not the smallest, after decreasing N by
  • N N-1
  • i selecting two 2 M -sized GOPs as the sub-GOPs if N has the same value as M, and ii) repeating steps (bl) and (b2) for each of the front
  • N does not have the same value as M.
  • the difference value may be selected from a group of MSE (Mean Square Error), SAD (Sum of Absolute Differences), SSE (Sum of Squared Errors), SAD+XSADR (R is the number of bits of the GOP unit), and SSE+ ⁇ ss ⁇ R.
  • the step b) includes the sub-steps of: (bl) comparing the difference values obtained from the encoding based on each of the different GOP sizes from 2 N to 2 M and, if the difference value obtained from the encoding based on the 2 N -sized GOP unit is the smallest, selecting the 2 N - sized GOP as a sub-GOP and setting a GOP divide bit inserted before the GOP bitstream as "0"; (b2) if the difference value obtained from the encoding based on the 2 -sized GOP is not the smallest, setting the GOP divide bit inserted before the GOP bitstream as "0" and, after decreasing N by 1 (i.e.,
  • N N-I)
  • selecting two 2 M -sized GOPs as the sub-GOPs if N has the same value as M and ii) repeating steps (bl) and (b2) for each of the front 2 N frames and the rear 2 N frames, if N does not have the same value as M.
  • the at least one selected sub-GOP information is set in the first frame header information of the GOP to transmit to a decoder.
  • the method comprises the steps of: (a) for each predetermined 2 N -frame-sized GOP of the video sequence, (al) encoding the 2 frame-sized GOP of the video sequence based on each of the different GOP sizes from the maximum size, 2 , to the M minimum size, 2 (M is an integer between 1 and N) and selecting at least one sub-GOP based on the encoding result, and (a2) generating a bitstream by encoding the 2 -frame-sized GOP based on the at least one selected sub-GOP; and (b) inserting temporal scalability range information in the generated bitstream.
  • the range of temporal scalability is based on a minimum
  • a method for decoding a MCTF-based encoded bitstream comprises the steps of: for each predetermined 2 -sized GOP bitstream, (a) determining whether the GOP is divided; (b) when the GOP is determined not to be divided, decoding the GOP bitstream; (c) when the GOP is determined to be divided, dividing the GOP bitstream; and (d) decoding the divided GOP bitstreams.
  • a method for decoding a MCTF-based encoded bitstream comprises the steps of: for each predetermined 2 -sized GOP bitstream, reading adaptively- divided GOP structure information from the GOP bitstream; and decoding the GOP bitstream based on the adaptively-divided GOP structure information.
  • a method for decoding a MCTF-based encoded bitstream comprises the steps of: for a predetermined-sized GOP bitstream, reading variable GOP structure information from the GOP bitstream; and decoding the GOP bitstream based on the variable GOP structure information.
  • a method for providing 1/L temporal scalability upon decoding an MCTF-based encoded bitstream comprises the steps of: for each predetermined
  • step (a) initializing "k” to 0 (k is an integer); (b) initializing "FrameNum” to 2 ; (c) detecting whether there is a low- frequency frame in a bitstream from FrameNum frame to L frame in the reverse direction, and decreasing FrameNum by L; (d) based on the result of detecting in step (c), (d-1) increasing the value of k by 1, if there is no low-frequency frame, and (d-2) selecting the low-frequency frame detected first in the reverse direction if there is a low-frequency picture and, if the value of k is not 0, further selecting subsequent k number of high- frequency frames and then re-initializing k to 0; and (e) repeating steps
  • the MCTF-based video coding is performed by adaptively dividing the GOP size based on the performance and thereby obtaining high coding efficiency.
  • FIG. 1 shows an encoding concept of a video sequence where a GOP size is 8; ⁇ 33> FIG.2 shows a process of performing MCTF when a decomposition level is 4; ⁇ 34> FIG.3 shows a part of "Foreman” QCIF 15Hz video sequence; ⁇ 35> FIG.4 shows a part of "Football” QCIF 15Hz video sequence; ⁇ 36> FIG. 5 shows an example where too many intra blocks are included in a prediction frame due to the poor motion prediction when "Football” QCIF 15Hz video sequence is encoded with the GOP size of 8; ⁇ 37> FIG.
  • FIG. 6 is a graph of the coding result for "Football" sequence at QCIF 7.5Hz while varying GOP sizes;
  • FIG. 7 is a graph of the coding result for "Football” sequence at QCIF 15Hz while varying GOP sizes;
  • FIG. 9 shows a MCTF process of 17 th to 24 th frames for "Football" sequence at
  • FIG. 10 shows a MCTF process of 17 to 24 frames for "Football" sequence at
  • FIG. 11 shows the graph of the coding results with different GOP sizes, for static "Foreman" sequence at QCIF 15Hz;
  • FIG. 12 shows the frame-based PSNR results of MCTF-based coding for frames from 17 th to 24 th of "Football" sequence at QCIF 15Hz;
  • FIG. 13 shows the frame-based PSNR results of MCTF-based coding for frames from 137 to 144 of "Foreman" sequence at QCIF 15Hz;
  • FIG. 14 shows the frame-based PSNR results of MCTF-based coding for frames from 97 th to 104 th of "Foreman" sequence at QCIF 15Hz; ⁇ 46>
  • FIG. 15 shows the frame-based PSNR results and the targeted performance values for 97 to 112 frames of "Foreman" QCIF 15Hz sequence; ⁇ 47>
  • FIG. 16 shows a flowchart of an algorithm of adaptive GOP structure-based video coding according to one embodiment of the present invention; ⁇ 48>
  • FIG. 17 shows the detailed mode decision process shown in FIG. 16; ⁇ 49>
  • FIG. 18 shows the conceptual locations where the MSE value of each GOP is taken within the 16 frame-sized GOP;
  • FIG. 19 conceptually shows the process of adaptively dividing the 16 frame- sized GOP based on the MSE values in accordance with one embodiment of the present invention
  • FIG 20 shows a graph of frame-based PSNR results when encoding is performed on the basis of the adaptively-divided GOP structure shown in FIG. 19
  • FIG. 21 shows a flowchart showing a mode decision process according to one embodiment of the present invention
  • FIG. 22 shows a graph comparing results of the HHI codec-based encoding and the adaptive GOP structure-based encoding according to the present invention with respect to "Crew" video sequence (QCIF and CIF);
  • FIG. 23 is a graph comparing results of the HHI codec-based encoding and the adaptive GOP structure-based encoding according to the present invention for
  • FIG. 24 shows the GOP structure for "Crew” QCIF 15Hz video sequence, wherein the GOP structure is adaptively divided in accordance with the present invention
  • FIG. 25 shows frames obtained by the HHI codec-based encoding and the adaptive GOP structure-based encoding of the present invention with respect to "Crew” video sequence (241 frame);
  • FIG. 26 shows frames obtained by the HHI codec-based encoding and the adaptive GOP structure-based encoding of the present invention with respect to "Crew" video sequence (279 frame);
  • FIG. 27 shows frames obtained by the HHI codec-based encoding and the adaptive GOP structure-based encoding of the present invention with respect to "Crew" video sequence (298 frame);
  • FIG. 28 shows a graph comparing results of the HHI codec-based encoding and the adaptive GOP structure-based encoding of the present invention for
  • FIG. 29 shows a graph comparing results of the HHI codec-based encoding and the adaptive GOP structure-based encoding of the present invention for
  • FIG. 30 shows the GOP structure for "Football” QCIF 15Hz video sequence wherein the GOP structure is adaptively divided in accordance with the present invention
  • FIG. 31 shows a graph comparing results of the HHI codec-based encoding and the adaptive GOP structure-based encoding of the present invention for
  • FIG. 32 shows the GOP structure for "Football” QCIF 15Hz video sequence wherein the GOP structure is adaptively divided in accordance with the present invention
  • FIG. 33 shows a configuration of a GOP bitstream to which "gop_divide_bit" is added in accordance with one embodiment of the present invention
  • FIG. 34 shows an example of a bitstream where the 16 frame-sized GOP is divided and encoded into sub-GOPs of (2, 2, 4, 8) according to the present invention
  • FIG. 35 shows a process of decoding the bitstream shown in FIG. 34 in accordance with one embodiment of the present invention
  • FIG. 34 shows the GOP structure for "Football” QCIF 15Hz video sequence wherein the GOP structure is adaptively divided in accordance with the present invention
  • FIG. 33 shows a configuration of a GOP bitstream to which "gop_divide_bit" is added in accordance with one embodiment of the present invention
  • FIG. 34 shows an example of a bitstream where the 16 frame-sized GOP is divided and encoded into sub
  • FIG. 36 shows an example of a GOP structure where the 16 frame-sized GOP is adaptively divided in accordance with the present invention
  • FIG. 37 shows a process where the 16 frame-sized GOP is divided and encoded into sub-GOPs of (8, 4, 2, 2) according to the present invention
  • FIG. 38 shows a flowchart of a 1/L resolution-supported algorithm performed at a bitstream extractor in a decoder to provide temporal scalability in accordance with one embodiment of the present invention.
  • FIG. 39 shows the modified syntax of scalability information, SEI message of Joint Scalable Video Model (JSVM) 2.0, according to one embodiment of the present invention.
  • JSVM Joint Scalable Video Model
  • FIG. 16 is a flowchart of a method of encoding a video sequence based on the adaptive GOP structure according to one embodiment of the present invention.
  • the method of encoding a video sequence based on the adaptive GOP structure is performed as follows. ⁇ 73> (1) For one GOP of the video frame sequence, encoding is performed on the basis of each different GOP sizes, from the maximum 2 -sized GOP to the minimum 2 M -sized GOP and, for each of the different GOP sizes, MSE (Mean Square Error) between each frame reconstructed per sub-band after the MCTF is performed and each frame reconstructed per sub-band after the encoding is performed are obtained (S1610, S1620 and S1630). MSE is just an example and not limited thereto and any one of SAD (Sum of Absolute Differences), SSE (Sum of Squared Errors), SAD + ⁇ SAD R (R: the bit number of one GOP), SSE + ⁇
  • SSE R may be used as the reference. This will be described below.
  • At least one sub-GOP is selected to divide the 2-sized GOP on the basis of MSE of each GOP size obtained in process (1) (S1640).
  • a process of dividing the 2-sized GOP by selecting the sub- GOPs that produce minimum MSE in the GOP is referred to as a "mode decision" procedure.
  • the video sequence having a 2 frame size is encoded on the basis of the selected sub-GOP structure to generate a bitstream thereof (S1650).
  • the MSE of each frame can be calculated by the following equation.
  • k indicates the number of pixels of one frame
  • F(O is the pixel value of per-subband frame generated after the MCTF procedure is performed
  • G(O is the pixel value of per-subband frame reconstructed after the encoding is performed.
  • FIG. 17 shows the detailed mode decision process shown in FIG. 16, where the GOP size is 16 (or N is 4) and the minimum selectable GOP size is 2 (or M is 1). As shown, when the GOP size is 16 frames, adaptive division of the GOP is as follows:
  • the 4-sized GOP is selected (S1726). ⁇ 97> B.
  • two 2-sized GOPs are selected (S1728).
  • the front 4 frames are encoded based on the different GOP sizes and each of the MSE values of the encoding results are compared with each other: ⁇ 99> A.
  • the MSE obtained from the encoding result of 4-sized GOP is the smallest, the 4 frame-sized GOP is selected (S1730).
  • ⁇ too> B When the MSE obtained from the encoding result of the 4-sized GOP is not the smallest, two 2-sized GOPs are selected (S1732).
  • FIG. 18 shows the conceptual locations where the MSE of each sub-GOP is taken within the 16 frame-sized GOP.
  • FIG. 19 shows the MSE values calculated from the encoding results based on the GOP sizes of 16, 8, 4, and 2, respectively, in the process of adaptively dividing a GOP including 81 to 96 frames of "Foreman" QCIF 15Hz sequence, as an example of 16 frame-sized video sequence, in accordance with the present invention, in which the selected sub-GOP sizes are marked with shading.
  • the sub-GOP sizes are determined through the following processes:
  • FIG. 20 is a graph of frame-based PSNR results when encoding is performed on the basis of an adaptively divided GOP structure in accordance with FIG. 19.
  • the dotted line denotes the PSNR (picture quality) according to SVM
  • the MSE is used in the comparison of the mode decision process.
  • SAD Sum of Absolute Difference
  • SSE Squared Error
  • Lagrangian optimization scheme used in the AVC(Advanced Video Coding), the basic international video standard, may be used. This scheme is to use a value represented by the sum of the product of the number of bits and a Lagrangian coefficient, together with the SAD or SSE.
  • the Lagrangian coefficient is defined based on a quantization coefficient (Qp) value as follows:
  • Equations 3 and 4 R indicates the number of bits in one GOP when the GOP is encoded.
  • the mode decision may be made, (a) considering the calculation complexity, (2) using an optimizing method that considers calculation complexity to a certain degree in predicting the bit amounts to be transmitted, or (3) predicting or obtaining the actual bit amounts.
  • FIG. 22 shows the comparison graph of bit rate-PSNR results at QCIF and CIF for "Crew" QCIF and CIF video sequences, wherein among the results, one is based on a SVM 3.0 codec proposed by HHI for SVC, and the other is based on an adaptive GOP structure proposed in the present invention.
  • the resulting graph shows that the encoding based on the adaptive GOP structure according to the present invention improves performance from about 0.02 dB to 0.45 dB in comparison with the results of the existing SVM 3.0 encoding method.
  • FIG. 23 shows the comparison graph of results of encoding "Crew" video sequence at 4-CIF, wherein one is based on the HHI's codec and the other is based on the adaptive GOP structure according to the present invention. It is shown that the coding scheme according to the present invention improves performance from about 0.18 dB to 0.43 dB.
  • FIG. 24 shows the GOP structure for "Crew" QCIF 15Hz video sequence, which is adaptively divided in accordance with the present invention. It can be concluded that the GOP size is divided according to the variation of motion, and coding efficiency is improved.
  • FIG. 26 the quality comparison between the frames, which are obtained by the HHI codec and the adaptive GOP structure-based encoding of the present invention, respectively, with respect to the 279 frame of "Crew" video sequence. As shown in the left figure, it can be concluded from the resultant frame of the HHI codec that there are serious blocking artifacts at the highlighted portion of the frame.
  • FIG. 27 the quality comparison between the frames, which are obtained by the HHI codec and the adaptive GOP structure-based encoding of the present th invention, respectively, with respect to the 298 frame of "Crew" video sequence. As shown in the left figure, it can be concluded from the result picture of the HHI codec that the color spread phenomenon is prominent at the highlighted portion of the frame.
  • FIG. 28 shows the comparison graph of the results based on the HHI codec and the adaptive GOP structure-based encoding of the present invention with respect to "Football" sequence at QCIF and CIF.
  • the encoding results according to the present invention show that performance is improved from about O.OldB to O.l ⁇ dB.
  • FIG, 29 shows the comparison graph of the results based on the HHI codec and the adaptive GOP structure-based encoding of the present invention with respect to 4-CIF of "Football" video sequence.
  • the encoding results according to the present invention show that performance is further improved from about 0.06 dB to 0.14 dB.
  • FIG. 30 shows the GOP structure for "Football" QCIF 15Hz video sequence, which is adaptively divided in accordance with the present invention, It can be concluded that the GOP is divided according to a degree of motion, and coding efficiency is improved.
  • FlG. 31 shows the comparison graph of the results based on the HHI codec and the adaptive GOP structure-based encoding of the present invention with respect to "Football" video sequence at QCIF and CIF. As shown, the encoding result according to the present invention shows that performance is improved from about 0.15 dB to 0.65 dB.
  • FIG. 32 shows the GOP structure for "Football" QCIF 15Hz video sequence, which is adaptively divided in accordance with the present invention. It can be concluded that the GOP is divided according to a degree of motion, and coding efficiency is improved.
  • the corresponding GOP size and decomposition stage information may be encoded and transmitted to a decoder using "gop_size" and "decomposition_stage", which are specified as slice header information in SVM 3.0 codec of SVC.
  • the SVM 3.0 codec is currently undergoing international standardization. Then, the encoded bitstream is decoded using the corresponding GOP size in the existing SVC decoder without any additional bit.
  • the GOP size information may be encoded and transmitted into a GOP header, and thereby the adaptively-divided GOP may be decoded.
  • 1 bit for "variable-GOP-Size" is allocated in a header of a bitstream to indicate the use of the variable GOP size to the decoder, and then decoding can be performed suitably in the corresponding conditions.
  • an encoder in order to decode the encoded bitstream at a decoder, can add a "gop_divide_bit" as one bit flag to the front of each GOP-based bitstream and then transmits it to the decoder, wherein the "gop_divide_bit" indicates whether to divide a GOP. This may be performed by slightly modifying the mode decision process shown in FIG. 21. In other words, difference values obtained from encoding based on the different GOP sizes from the maximum 2-
  • M N sized GOP to the minimum 2 -sized GOP, with respect to a 2-frame video sequence are compared with each other (S2110).
  • the difference value obtained by the 2 -sized GOP-based encoding is the smallest (S2120)
  • the 2 -sized GOP is selected and, at the same time, the flag "gop_divide_bit" is set to "0". Otherwise, the flag "gop_divide_bit" is set to T to indicate that the GOP is divided.
  • the other steps S2140 to S2160 may be performed in the same manner.
  • FIG. 33 shows a configuration of a GOP bitstream to which "gop_divide_bit" is added in accordance with one embodiment of the present invention.
  • FIG. 35 shows an example of a process of decoding the encoded bitstream shown in FIG. 34 in accordance with one embodiment of the present invention.
  • an encoder of performing the adaptive GOP structure-based video coding according to the present invention can encode information on sub ⁇ G0Ps, which are adaptively divided within the constant GOP size, and then transmit it to a decoder.
  • FIG. 36 shows an example of the selected GOP mode in a 16 frame- sized GOP.
  • the original GOP size information is encoded in a sequence header and the information on sub-GOPs, which are adaptively divided within the 16 frame-sized GOP, is encoded in the slice header of a first frame of each GOP.
  • the sub-GOP information is represented by the sub-GOP size divided by "N" of the original GOP size, 2 , i.e., 4. Then, each of the divided information is encoded in two fixed bits. For example, if the 16- sized GOP is selected as the sub-GOP, the information is encoded in "00b". If the 8-sized GOP is selected as the sub-GOP, the information is encoded in "01b”. If the 4-sized GOP is selected as the sub-GOP, the information is encoded in "10b”. If the 2-sized GOP is selected as the sub-GOP, the information is encoded in "lib". When the sum of the GOP sizes selected as the sub-GOPs within the 16 frame size is 16, the encoding of the selected GOP information is terminated.
  • the sub-GOPs are determined as the sizes of 8, 4, 4, a total of 6 bits, 01-10-10, are needed. If the sub- GOPs are determined as the sizes of 16, a total of 2 bits, 00, are needed. If the sub-GOPs are determined as the sizes of 4, 2, 2, 8, a total of 8 bits, 10-11-11-01, are needed. If the sub-GOPs are determined as the sizes of 2, 2, 2, 2, 2, 2, 2, 2, 2, a total of 16 bits, 11-11-11-11-11-11, are needed. In this manner, the sub-GOP information can be expressed. In the case of transmitting the sub-GOP information on the encoder side together with the bitstream, the decoder decodes each GOP on the basis of the received sub-GOP information.
  • the decoder may provide temporal scalability upon decoding a bitstream encoded based on the adaptively divided GOP structure.
  • the bitstream encoded based on the adaptively divided GOP structure should be decoded according the order of frames because it is different in a structure from a bitstream encoded based on the fixed GOP size.
  • ⁇ 2i5> For example, as shown in FIG. 37, when the sub-GOPs are selected and encoded with the size of (8, 4, 2, 2) in the 16-frame GOP, the order of bitstream is determined based on each temporal resolution as follows:
  • FIG. 38 shows a 1/L resolution-supported algorithm performed to provide temporal scalability at a bitstream extractor in a decoder in accordance with one embodiment of the present invention.
  • the following is a video extraction algorithm for supporting 1/L resolution of a bitstream encoded based on an adaptive GOP structure-based encoding method of the present invention. It is assumed that the GOP size is 16.
  • a value of k is initially set to 0 (here, k is an integer) (S3810).
  • the above-mentioned algorithm for supporting the 1/8 resolution of the encoded bitstream where the sub-GOP sizes are selected as (8, 4, 2, 2) in the 16-frame GOP is executed as follows. Provided that the entire bitstream is configured of "L3(0), H3(l), H2(2), H2(3), Hl(4), Hl(5), HK6), Hl(7), L2(8), H2(9), Hl(IO), H2(ll), Ll(12), Hl(13), Ll(14) and HK15)":
  • a parameter k is initially set to 0.
  • ⁇ 238> An example of supporting a 1/4 resolution is as follows: ⁇ 239> (1) A parameter k is initially set to 0. ⁇ 240> (2) FrameNum is initially set to 16.
  • step (9) Based on the detecting result in step (9), the low-frequency frame L3(0) is selected. Then, because k is 1, the next frame H3(l) is selected, and k is set to 0 again.
  • the encoder may provide desired temporal scalability upon encoding a video based on the adaptive GOP structure, by adjusting a level of a sub-GOP (i.e., a selectable minimum size of the sub-GOP) selected in the mode decision process. For example, when the GOP size is 2 (generally, N ⁇ 4), mode decision performed by comparing the encoding results based on 2 sized sub-GOP, 2 sized sub-GOP,
  • 2 sized sub-GOP, and 2 sized sub-GOP units to each other to provide 1/2 or more temporal scalability is defined to as "Level_l”
  • mode decision performed by comparing the encoding results based on 2 N sized sub-GOP, 2 N-1 N-_ N-2 sized sub-GOP, and 2 sized sub-GOP units to each other to provide 1/2 or more temporal scalability is defined to as "Level_2”
  • mode decision performed by comparing encoding result values based on 2 sized sub-GOP and 2 sized sub-GOP units to each other to provide 1/2 or more temporal scalability is defined to as "Level_3".
  • the encoder may encode level information of the mode decision and transmit it to a decoder (extractor), in order to notify a supportable range of the temporal scalability.
  • a decoder extract
  • Table 1 shows the range of the temporal scalability that can be provided depending on levels of the mode decision.
  • the level information is defined as follows.
  • Level_l When the comparison of the encoding
  • 4 3 2 results based on 2 frame-sized GOP, 2 frame-sized GOP, and 2 frame-sized
  • the encoder encodes the corresponding level information transmits it to the decoder (e.g., for SVC, in order to provide the specific temporal scalability among the three temporal scalability modes, the encoder encodes the corresponding level information and transmits it to the extractor).
  • the encoder may encode Level_l into “0”, Level_2 into “10", and Level_l into “11”.
  • the encoder may encode Level_l into “1", Level_2 into "010', and Level_3 into “Oil”. It will be appreciated by those skilled in the art that the level information may be encoded by any other manners and the present invention is not limited to the above-mentioned manners.
  • a flag may be added to the scalability Information, SEI message, of JSVM (Joint Scalable Video Model) 2.0, as shown in FIG. 39.
  • a flag "use_adaptive_gop_structure_flag" in a hatched area of FIG. 39 is a flag indicating whether the adaptive GOP structure is used upon encoding a video, in which a value of 1 indicates that the adaptive GOP structure has been used. Further, “sub_gop_level” indicates a sub-G0P level of the adaptive GOP structure to notify a temporal scalability level that is supportable to the extractor.
  • the present invention described above may be provided as one or more computer-readable mediums that are implemented on at least one manufactured object.
  • the manufactured object may be a floppy disc, a hard disc, a CD ROM, a flash memory card, a PROM, a RAM, a ROM, or a magnetic tape.
  • computer-readable programs may be implemented by any programming language. The language includes C, C++, or JAVA.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Computing Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

Provided is a method for performing motion compensated temporal filtering (MCTF)-based coding on a video sequence using the structure of adaptively divided group of pictures (GOP). The method includes the steps of, for each predefined 2N frame-sized group of pictures (GOP) of the video sequence, (a) encoding the 2N frame-sized GOP of the video sequence based on each of the different GOP sizes from the maximum size, 2N , to the minimum size, 2N (M is an integer between 1 and N) and obtaining different values between frames reconstructed after the encoding is performed and frames after the MCTF is performed, based on each of the different GOP sizes; (b) selecting at least one sub-GOP based on the difference values obtained by encoding the 2N frame-sized GOP of the video sequence based on each of the different GOP sizes; and (c) generating a bitstream by encoding the 2N -frame-sized GOP based on the at least one selected sub-GOP. Thereby, the MCTF-based video coding is performed by adapt ively dividing the GOP size based on performance and thereby obtains high coding efficiency.

Description

[DESCRIPTION]
[Invention TitIe]
METHOD FOR ENCODING/DECODING VIDEO SEQUENCE BASED ON MCTF USING ADAPTIVELY- ADJUSTED GOP STRUCTURE
[Technical Field]
<ι> The present invention relates to a video coding/decoding scheme, and more particularly, to a method for encoding a video sequence based on motion compensated temporal filtering (MCTF) using intelligently-divided group of pictures (GOP) and a method for decoding an encoded bitstream.
[Background Art]
<2> There is an existing MCTF-based video coding scheme by performing a wavelet transform along temporal axis, with motion information, to improve coding efficiency. The existing MCTF-based video coding is performed in a unit of a th
GOP, which has the fixed size of n power of 2.
<3> FIG. 1 shows an encoding concept of a video sequence where a GOP size is 8. MCTF is repeatedly performed "n" times in one GOP which has the size of n power of 2. A process performing the MCTF once is called a decomposition process, and the number of performing the MCTF in one GOP is expressed as a decomposition level. Motion information is obtained through motion prediction in the prediction process, and a wavelet transform is carried out in a motion direction using the motion information. The wavelet transform used may be a Haar wavelet transform or a 5/3 spline wavelet transform as examples.
<4> As in the motion prediction process of the existing video coding schemes, there inherently exists a domain where no motion can be predicted in the prediction process of the MCTF. In the case of the video coding scheme using a wavelet transform for a spatial domain transform, coding efficiency is lowered due to the effect of an unpredictable domain, i.e., an intra domain, because the spatial domain transform should be performed on the entire picture. Various methods of solving this problem have been proposed, but no effective solution has been yet found. When the spatial domain is transformed not by the wavelet transform but by a block-based video coding scheme, which is used in the existing international video standards such as MPEG-I, MPEG-2, MPEG-4 Part 2 Visual, MPEG-4 Part 10 AVC (Advanced Video Coding), or ITU-T H.264, the MCTF can be carried out in units of blocks so that the encoding can be separately performed for the intra blocks and inter blocks. When the block-based MCTF is performed, the Haar wavelet transform is used for motion prediction in one direction, and the 5/3 spline wavelet transform is used for motion prediction in both directions. Intra coding is performed when motion prediction cannot be performed or the efficiency of the motion prediction is lower than that of the intra coding.
<5> In the process of performing MCTF, the higher the decomposition level, the farther a reference picture is temporally located, wherein the reference picture is referred to during motion prediction in the prediction process. Therefore, there is a high possibility of low correlation with a currently predicted picture. FIG. 2 shows a process of performing MCTF when a decomposition level is 4. As shown, the reference frame is quite far along the time axis, when H3 and 114 pictures are predicted. (Here, the first frame in the GOP, which is the last low-frequency frame in the previous GOP, is only referred to.)
<6> Prediction efficiency is associated with motion variation in the video sequence. FIG. 3 shows a part of "Foreman" QCIF (Quarter Common Intermediate Format) 15Hz video sequence. It can be seen from the figure that there are a little motion variation in a GOP. Thus, it can be concluded that MCTF encoding produces a good prediction result in a video sequence with relatively little motion.
<7> Meanwhile, FIG. 4 shows a part of "Football" QCIF 15Hz video sequence. It can be seen from the figure that image frames change dynamically in a GOP. Thus, it can be concluded that, in a dynamic video sequence, the higher the decomposition level is, the more intra blocks are generated in the MCTF-based encoding process. In other words, it can be expected that the, in case of the dynamic video sequence, a poor prediction efficiency can be obtained by the MCTF. FIG. 5 shows an example where too many intra blocks are included in a prediction frame due to the poor motion prediction when "Football" QCIF 15Hz video sequence is encoded with the GOP size of 8.
<8> On the basis of the fact that, in a dynamic video sequence, the larger the GOP size, the lower the prediction efficiency of the prediction picture, experiments have been performed while varying the GOP size. FIGS. 6 and 7 show graphs of coding efficiency results with 4 different GOP sizes (1, 2, 4 and 8) for "Football" sequence at QCIF 7.5Hz and 15Hz sequences, respectively. As shown, the smaller the GOP size, the higher the coding efficiency. It can be concluded from the graphs that the GOP size of 1 is higher in performance by about 0.3 dB to 0.4 dB than the GOP size of 8.
<9> FIGS. 8 to 10 show the MCTF process with three different GOP sizes 8, 4 and
2, respectively, for 17 to 24 frames of "Football" QCIF 15Hz sequence. Here, the first frame in the GOP, which is the last low-frequency frame in the previous GOP, is only referred to. As a result, it can be seen that when the GOP size is decreased, the intra frame is increased in one GOP, but the coding efficiency is further improved. Thus, it can be predicted that, in the dynamic video sequence, the smaller the GOP size, the higher the coding efficiency.
<10> In contrast, with regard to the static "Forman" QCIF 15Hz video sequence, the graphs representing the coding results with different GOP sizes 8, 4, 2 and 1 are shown in FIG. 11. As shown, in the static video sequence, the larger the GOP size, the higher the coding efficiency. It can be concluded that the GOP size of 8 improves performance by 0.8 dB to 1.0 dB or more as compared with the GOP size of 1.
<11> FIG. 12 shows the frame-based PSNR (Peak Signal-to-Noise Ratio) results of
MCTF-based coding for frames from 17 to 24 of "Football" sequence at QCIF 15Hz at the same bit rate, based on three different GOP size, such as 8, 4 and 2. As shown in this figure, the 2-sized GOP has the best coding ef f i c iency. <i2> FIG. 13 shows the frame-based PSNR resul ts of MCTF-based coding for frames th th from 137 to 144 of "Foreman" sequence at QCIF 15Hz at the same bit rate, based on three different GOP size, such as 8, 4 and 2. As shown in this figure, the 8-sized GOP shows the best coding efficiency.
<13> Although the foregoing descriptions explains the relationship between the GOP size and coding efficiency, by giving examples of a dynamic video sequence with a lot of motion variations and a static video sequence having little motion variations, it is general for one video sequence to include various degrees of motion variations. For example, there are the various degrees of motion variations in "Foreman" video sequence, as can be seen in FIG. 14. FIG. 14 shows the frame-based PSNR results of MCTF-based coding for frames from 97 to 104 of "Foreman" sequence at QCIF 15Hz at the same bit rate, based on three different GOP size, such as 8, 4 and 2. It can be seen from the figure that the 8-sized GOP has higher coding efficiency than the 4 or 2- sized GOP, which is the opposite to the overall result of "Foreman" video sequence. The 4 or 2-sized GOP may have slightly improved the overall coding efficiency. It can be expected that front four frames have the best coding efficiency when the GOP size is 2, while the rear four frames have the best coding efficiency when the GOP size is 4.
<i4> In view of the PSNR results for 97th to 112th frames for "Foreman" QCIF 15Hz sequence of FIG. 15, it is possible to obtain the optimal coding efficiency when, as shown in FIG. 14, the first four frames are encoded with 2-sized GOP, the next four frames are encoded with 4-sized GOP s and the remaining eight frames are encoded with 8-sized GOP.
<i5> Accordingly, when performing the MCTF-based coding of a video sequence, it is possible to achieve a high coding efficiency by intelligently selecting the GOP size. [Disclosure] [Technical Problem] <16> It is an object of the present invention to provide a method of performing
MCTF-based encoding in a unit of a 2 frame-sized GOP by adaptively dividing the GOP. <i7> It is another object of the present invention to provide a method for decoding an encoded video bitstream, which is encoded based on the adaptive
GOP structure. <i8> It is yet another object of the present invention to provide a method of decoding an encoded video bitstream based on an adaptive GOP structure, which can support temporal scalability.
[Technical Solution] <i9> In order to accomplish these objectives, according to an aspect of the present invention, there is provided a method for performing motion compensated temporal filtering (MCTF)-based encoding on a video sequence. The method comprises the steps of: for each predefined 2 frame-sized group of pictures (GOP) of the video sequence, (a) encoding the 2 frame-sized GOP of the video sequence based on each of the different GOP sizes from the maximum size, 2 , to the minimum size, 2 (M is an integer between 1 and N) and obtaining different values between frames reconstructed after the encoding is performed and frames after the MCTF is performed, based on each of the different GOP sizes; (b) selecting at least one sub-GOP based on the difference values obtained by encoding the 2 frame-sized GOP of the video sequence based on each of the different GOP sizes; and (c) generating a bitstream by encoding the 2 -frame-sized GOP based on the at least one selected sub-GOP.
<20> In one embodiment, the step (b) includes the sub-steps of (bl) comparing the difference values obtained from the encoding based on each of the different
N M
GOP sizes from 2 to 2 and, if the difference value obtained from the encoding based on the 2 -sized GOP unit is the smallest, selecting the 2- si zed GOP as a sub-GOP ; and (b2) i f the di f ference value obtained from the encoding based on the 2N -sized GOP is not the smallest, after decreasing N by
1 (i.e., N=N-1), i) selecting two 2M-sized GOPs as the sub-GOPs if N has the same value as M, and ii) repeating steps (bl) and (b2) for each of the front
2 frames and the rear 2 frames, if N does not have the same value as M.
<21> In one embodiment, the difference value may be selected from a group of MSE (Mean Square Error), SAD (Sum of Absolute Differences), SSE (Sum of Squared Errors), SAD+XSADR (R is the number of bits of the GOP unit), and SSE+λssεR.
<22> In another embodiment, the step b) includes the sub-steps of: (bl) comparing the difference values obtained from the encoding based on each of the different GOP sizes from 2N to 2M and, if the difference value obtained from the encoding based on the 2N-sized GOP unit is the smallest, selecting the 2N- sized GOP as a sub-GOP and setting a GOP divide bit inserted before the GOP bitstream as "0"; (b2) if the difference value obtained from the encoding based on the 2 -sized GOP is not the smallest, setting the GOP divide bit inserted before the GOP bitstream as "0" and, after decreasing N by 1 (i.e.,
N=N-I), i) selecting two 2M-sized GOPs as the sub-GOPs if N has the same value as M, and ii) repeating steps (bl) and (b2) for each of the front 2N frames and the rear 2N frames, if N does not have the same value as M. <23> In another embodiment, the at least one selected sub-GOP information is set in the first frame header information of the GOP to transmit to a decoder. <24> According to another aspect of the present invention, there is provided a method for performing MCTF-based coding on a video sequence. The method comprises the steps of: (a) for each predetermined 2N-frame-sized GOP of the video sequence, (al) encoding the 2 frame-sized GOP of the video sequence based on each of the different GOP sizes from the maximum size, 2 , to the M minimum size, 2 (M is an integer between 1 and N) and selecting at least one sub-GOP based on the encoding result, and (a2) generating a bitstream by encoding the 2 -frame-sized GOP based on the at least one selected sub-GOP; and (b) inserting temporal scalability range information in the generated bitstream. <25> In one embodiment, the range of temporal scalability is based on a minimum
M size (2 ) of the selected sub-GOP.
<26> According to yet another aspect of the present invention, there is provided a method for decoding a MCTF-based encoded bitstream. The method comprises the steps of: for each predetermined 2 -sized GOP bitstream, (a) determining whether the GOP is divided; (b) when the GOP is determined not to be divided, decoding the GOP bitstream; (c) when the GOP is determined to be divided, dividing the GOP bitstream; and (d) decoding the divided GOP bitstreams. <27> According to yet another aspect of the present invention, there is provided a method for decoding a MCTF-based encoded bitstream. The method comprises the steps of: for each predetermined 2 -sized GOP bitstream, reading adaptively- divided GOP structure information from the GOP bitstream; and decoding the GOP bitstream based on the adaptively-divided GOP structure information.
<28> According to yet another aspect of the present invention, there is provided a method for decoding a MCTF-based encoded bitstream. The method comprises the steps of: for a predetermined-sized GOP bitstream, reading variable GOP structure information from the GOP bitstream; and decoding the GOP bitstream based on the variable GOP structure information.
<29> According to yet another aspect of the present invention, there is provided a method for providing 1/L temporal scalability upon decoding an MCTF-based encoded bitstream, the method comprises the steps of: for each predetermined
2 -frame-sized GOP bitstream, (a) initializing "k" to 0 (k is an integer); (b) initializing "FrameNum" to 2 ; (c) detecting whether there is a low- frequency frame in a bitstream from FrameNum frame to L frame in the reverse direction, and decreasing FrameNum by L; (d) based on the result of detecting in step (c), (d-1) increasing the value of k by 1, if there is no low-frequency frame, and (d-2) selecting the low-frequency frame detected first in the reverse direction if there is a low-frequency picture and, if the value of k is not 0, further selecting subsequent k number of high- frequency frames and then re-initializing k to 0; and (e) repeating steps
(c) and (d) until FramNum reaches 0, and finally selecting 2 /L number of frames.
[Advantageous Effects] <31> According to the present invention, the MCTF-based video coding is performed by adaptively dividing the GOP size based on the performance and thereby obtaining high coding efficiency.
[Description of Drawings]
<32> FIG. 1 shows an encoding concept of a video sequence where a GOP size is 8; <33> FIG.2 shows a process of performing MCTF when a decomposition level is 4; <34> FIG.3 shows a part of "Foreman" QCIF 15Hz video sequence; <35> FIG.4 shows a part of "Football" QCIF 15Hz video sequence; <36> FIG. 5 shows an example where too many intra blocks are included in a prediction frame due to the poor motion prediction when "Football" QCIF 15Hz video sequence is encoded with the GOP size of 8; <37> FIG. 6 is a graph of the coding result for "Football" sequence at QCIF 7.5Hz while varying GOP sizes; <38> FIG. 7 is a graph of the coding result for "Football" sequence at QCIF 15Hz while varying GOP sizes;
<39> FIG. 8 shows a MCTF process of 17 to 24 frames for "Football" sequence at QCIF 15Hz (G0P=8);
<40> FIG. 9 shows a MCTF process of 17th to 24th frames for "Football" sequence at
QCIF 15Hz (G0P=4);
<4i> FIG. 10 shows a MCTF process of 17 to 24 frames for "Football" sequence at
QCIF 15Hz (G0P=2);
<42> FIG. 11 shows the graph of the coding results with different GOP sizes, for static "Foreman" sequence at QCIF 15Hz;
<43> FIG. 12 shows the frame-based PSNR results of MCTF-based coding for frames from 17th to 24th of "Football" sequence at QCIF 15Hz;
<44> FIG. 13 shows the frame-based PSNR results of MCTF-based coding for frames from 137 to 144 of "Foreman" sequence at QCIF 15Hz;
<45> FIG. 14 shows the frame-based PSNR results of MCTF-based coding for frames from 97th to 104th of "Foreman" sequence at QCIF 15Hz; <46> FIG. 15 shows the frame-based PSNR results and the targeted performance values for 97 to 112 frames of "Foreman" QCIF 15Hz sequence; <47> FIG. 16 shows a flowchart of an algorithm of adaptive GOP structure-based video coding according to one embodiment of the present invention; <48> FIG. 17 shows the detailed mode decision process shown in FIG. 16; <49> FIG. 18 shows the conceptual locations where the MSE value of each GOP is taken within the 16 frame-sized GOP;
<50> FIG. 19 conceptually shows the process of adaptively dividing the 16 frame- sized GOP based on the MSE values in accordance with one embodiment of the present invention; <5i> FIG 20 shows a graph of frame-based PSNR results when encoding is performed on the basis of the adaptively-divided GOP structure shown in FIG. 19; <52> FIG. 21 shows a flowchart showing a mode decision process according to one embodiment of the present invention; <5^> FIG. 22 shows a graph comparing results of the HHI codec-based encoding and the adaptive GOP structure-based encoding according to the present invention with respect to "Crew" video sequence (QCIF and CIF); <54> FIG. 23 is a graph comparing results of the HHI codec-based encoding and the adaptive GOP structure-based encoding according to the present invention for
"Crew" video sequence at 4CIF; <55> FIG. 24 shows the GOP structure for "Crew" QCIF 15Hz video sequence, wherein the GOP structure is adaptively divided in accordance with the present invention; <56> FIG. 25 shows frames obtained by the HHI codec-based encoding and the adaptive GOP structure-based encoding of the present invention with respect to "Crew" video sequence (241 frame);
<57> FIG. 26 shows frames obtained by the HHI codec-based encoding and the adaptive GOP structure-based encoding of the present invention with respect to "Crew" video sequence (279 frame);
<58> FIG. 27 shows frames obtained by the HHI codec-based encoding and the adaptive GOP structure-based encoding of the present invention with respect to "Crew" video sequence (298 frame); <59> FIG. 28 shows a graph comparing results of the HHI codec-based encoding and the adaptive GOP structure-based encoding of the present invention for
"Football" sequence at QCIF and CIF; <60> FIG. 29 shows a graph comparing results of the HHI codec-based encoding and the adaptive GOP structure-based encoding of the present invention for
"Football" sequence at 4CIF; <61> FIG. 30 shows the GOP structure for "Football" QCIF 15Hz video sequence wherein the GOP structure is adaptively divided in accordance with the present invention; <62> FIG. 31 shows a graph comparing results of the HHI codec-based encoding and the adaptive GOP structure-based encoding of the present invention for
"Football" sequence at QCIF and CIF; <63> FIG. 32 shows the GOP structure for "Football" QCIF 15Hz video sequence wherein the GOP structure is adaptively divided in accordance with the present invention; <64> FIG. 33 shows a configuration of a GOP bitstream to which "gop_divide_bit" is added in accordance with one embodiment of the present invention; <65> FIG. 34 shows an example of a bitstream where the 16 frame-sized GOP is divided and encoded into sub-GOPs of (2, 2, 4, 8) according to the present invention, <66> FIG. 35 shows a process of decoding the bitstream shown in FIG. 34 in accordance with one embodiment of the present invention; <67> FIG. 36 shows an example of a GOP structure where the 16 frame-sized GOP is adaptively divided in accordance with the present invention; <68> FIG. 37 shows a process where the 16 frame-sized GOP is divided and encoded into sub-GOPs of (8, 4, 2, 2) according to the present invention; and <69> FIG. 38 shows a flowchart of a 1/L resolution-supported algorithm performed at a bitstream extractor in a decoder to provide temporal scalability in accordance with one embodiment of the present invention. <70> FIG. 39 shows the modified syntax of scalability information, SEI message of Joint Scalable Video Model (JSVM) 2.0, according to one embodiment of the present invention.
[Mode for Invention] <71> Hereinafter, the present invention will be described in detail with reference to FIGS. 16 to 39. However, the following description is provided for illustrative purposes only and should not be construed as limiting the scope of the present invention. <72> FIG. 16 is a flowchart of a method of encoding a video sequence based on the adaptive GOP structure according to one embodiment of the present invention.
Assuming that encoding is carried out based on a GOP having 2N frame size, and that a selectable minimum GOP size is 2M (N and M are integers, 0 < M <
N), the method of encoding a video sequence based on the adaptive GOP structure is performed as follows. <73> (1) For one GOP of the video frame sequence, encoding is performed on the basis of each different GOP sizes, from the maximum 2 -sized GOP to the minimum 2M-sized GOP and, for each of the different GOP sizes, MSE (Mean Square Error) between each frame reconstructed per sub-band after the MCTF is performed and each frame reconstructed per sub-band after the encoding is performed are obtained (S1610, S1620 and S1630). MSE is just an example and not limited thereto and any one of SAD (Sum of Absolute Differences), SSE (Sum of Squared Errors), SAD + λSADR (R: the bit number of one GOP), SSE + λ
SSER may be used as the reference. This will be described below.
<74> (2) At least one sub-GOP is selected to divide the 2-sized GOP on the basis of MSE of each GOP size obtained in process (1) (S1640). In this specification, a process of dividing the 2-sized GOP by selecting the sub- GOPs that produce minimum MSE in the GOP is referred to as a "mode decision" procedure.
<75> (3) The video sequence having a 2 frame size is encoded on the basis of the selected sub-GOP structure to generate a bitstream thereof (S1650).
<76> (4) Processes (1) to (3) are repeated for the next GOP of the video frame sequence.
<77> In the above embodiment, in the mode decision procedure, the MSE of each frame can be calculated by the following equation.
<78> Equation 1
Figure imgf000015_0001
<80> In the above Equation, k indicates the number of pixels of one frame, F(O is the pixel value of per-subband frame generated after the MCTF procedure is performed and G(O is the pixel value of per-subband frame reconstructed after the encoding is performed.
<81> FIG. 17 shows the detailed mode decision process shown in FIG. 16, where the GOP size is 16 (or N is 4) and the minimum selectable GOP size is 2 (or M is 1). As shown, when the GOP size is 16 frames, adaptive division of the GOP is as follows:
<82> (1) With regard to 16 frames, encoding is performed based on the various sub- GOP sizes of 16(N=4), 8(N=3), 4(N=2) and 2(N=M=I), respectively, and the MSE values of sub-GOPs are calculated based on the encoding results and then compared with each other (S1702).
<83> Based on the comparison results:
<84> A. When the MSE obtained from the encoding results of 16 frame-sized GOP is the smallest, the 16-frame GOP is selected, and the mode decision process is terminated (S1704).
<85> B. When the MSE obtained from the encoding results of a 16-frame GOP is not the smallest, the front 8 frames arc subjected to the following process (2)
(S1706), and the rear 8 frames are subjected to the following process (3)
(S1708). <86> (2) The front 8 frames are encoded based on the different GOP sizes and each of the MSE values of the encoding results are compared with each other: <87> A. When the MSE obtained from the encoding result of the 8 frame-sized GOP is the smallest, the 8 frame-sized GOP is selected (S1710). <88> B. When the MSE obtained from the encoding result of the 8-frame GOP is not the smallest, the front 4 frames are subjected to the following process (4)
(S1714), and the rear 4 frames are subjected to the following process (5)
(S1716). <89> (3) The rear 8 frames are encoded based on the different GOP sizes and each of the MSE values of the encoding results are compared with each other: <90> A. When the MSE obtained from the encoding result of the 8 frame-sized GOP is the smallest, the 8 frame-sized GOP is selected, and the mode decision process is terminated (S1712). <91> B. When the MSE obtained from the encoding result of the 8 frame-sized GOP is not the smallest, the front 4 frames are subjected to the following process
(6) (S1718), and the rear 4 frames are subjected to the following process (7)
(S1720). <92> (4) The front 4 frames are encoded based on the different GOP sizes and each of the MSE values of the encoding results are compared with each other: <93> A. When the MSE obtained from the encoding result of 4-sized GOP is the smallest, the 4 frame-sized GOP is selected (S1722). <94> B. When the MSE obtained from the encoding result of the 4-sized GOP is not the smallest, two 2-sized GOPs are selected (S1724). <95> (5) The rear 4 frames are encoded based on the different GOP sizes and each of the MSE values of the encoding results are compared with each other: <96> A. When the MSE obtained from the encoding result of 4-sized GOP is the smallest, the 4-sized GOP is selected (S1726). <97> B. When the MSE obtained from the encoding result of the 4-sized GOP is not the smallest, two 2-sized GOPs are selected (S1728). <98> (6) The front 4 frames are encoded based on the different GOP sizes and each of the MSE values of the encoding results are compared with each other: <99> A. When the MSE obtained from the encoding result of 4-sized GOP is the smallest, the 4 frame-sized GOP is selected (S1730). <too> B. When the MSE obtained from the encoding result of the 4-sized GOP is not the smallest, two 2-sized GOPs are selected (S1732). <ioi> (7) The rear 4 frames are encoded based on the different GOP sizes and each of the MSE values of the encoding results are compared with each other: <i02> A. When the obtained from the encoding result of 4-sized GOP is the smallest, the 4 frame-sized GOP is selected and the mode decision process is terminated
(S1734). <1O3> B. When the MSE obtained from the encoding result of the 4-sized GOP is not the smallest, two 2-frame GOPs are selected, and the process is terminated
(S1736).
<IO4>
<iO5> The foregoing algorithm of FIG. 17 is represented in Pseudo Code as follows:
<IO6> <107>
Figure imgf000017_0001
Figure imgf000018_0001
Figure imgf000019_0001
<160> FIG. 18 shows the conceptual locations where the MSE of each sub-GOP is taken within the 16 frame-sized GOP. <161> FIG. 19 shows the MSE values calculated from the encoding results based on the GOP sizes of 16, 8, 4, and 2, respectively, in the process of adaptively dividing a GOP including 81 to 96 frames of "Foreman" QCIF 15Hz sequence, as an example of 16 frame-sized video sequence, in accordance with the present invention, in which the selected sub-GOP sizes are marked with shading. The sub-GOP sizes are determined through the following processes:
<162> (1) First, in comparison of the MSE values obtained from the 16 frame encoding, since the MSE value of 16 frame-sized GOP is not the smallest, the comparison of the MSE values for the front 8 frames and the rear 8 frames, respectively, is performed.
<163> (2) In comparison of the MSE values obtained from the front 8 frame encoding, since the MSE value of 8 frame-sized GOP is the smallest, the 8 frame-sized GOP is selected.
<164> (3) In comparison of the MSE values obtained from the rear 8 frame encoding, since the MSE value of 8 frame-sized GOP is not the smallest, the comparison of the MSE values for the front 4 frames and the rear 4 frames, respectively, is performed. <165> (4) In comparison of the MSE values obtained from the front 4 frame encoding, since the MSE value of 4 frame-sized GOP is the smallest, the 4 frame-sized
GOP is selected. <166> (5) In comparison of the MSE values obtained from the rear 4 frame encoding, since the MSE value of 4 frame-sized GOP is not the smallest, two 2-frame
GOPs are selected, and the process is terminated. <167> FIG. 20 is a graph of frame-based PSNR results when encoding is performed on the basis of an adaptively divided GOP structure in accordance with FIG. 19.
Here, the dotted line denotes the PSNR (picture quality) according to SVM
(Scalable Video Model) 3.0 proposed by HHI for SVC (Scalable Video Coding), which is currently undergoing international standardization. And, the solid line denotes the PSNR when the intelligent GOP selection proposed in the present invention is applied to the SVM 3.0. It can be concluded through
PSNR comparison that performance is improved. <168> FIG. 21 is a flowchart showing a mode decision process according to one embodiment of the present invention. As shown, difference values obtained from encoding based on the different GOP sizes from the maximum 2-sized GOP to the minimum 2M-sized GOP, with respect to a 2N-frame video sequence, are compared with each other (S2110). As a result of the comparison, it is determined if the difference value obtained by the 2 -sized GOP-based encoding is the smallest (S2120). If so, the 2N-sized GOP is selected (S2130). <169> Otherwise, N is decreased by 1 (S2140). The decreased N is compared with M (S2150). As a result of the comparison, if the two values are identical, two
2M-sized GOPs are selected (S2160). <170> If the decreased N is not identical to M, the front 2 -sized sequence and the rear 2 -sized sequence are subjected to repetition of the foregoing processes S2110 to S2160. <171> In the above embodiment, the MSE is used in the comparison of the mode decision process. Alternatively, SAD (Sum of Absolute Difference) or SSE (Sum of Squared Error) between the image frames of the input sequence and the image frames reconstructed after the encoding may be used. They may be used in case that the calculation complexity of the mode decision is a lot considered.
<172> In another embodiment, in the mode decision process, Lagrangian optimization scheme used in the AVC(Advanced Video Coding), the basic international video standard, may be used. This scheme is to use a value represented by the sum of the product of the number of bits and a Lagrangian coefficient, together with the SAD or SSE. The Lagrangian coefficient is defined based on a quantization coefficient (Qp) value as follows:
<173> Equation 2
<174>
Figure imgf000021_0001
<175> The comparison value, J, can be obtained by the following equations: <176> In the case of using the SAD, <177> Equation 3
<178>
Figure imgf000021_0002
<179> In the case of using the SSE,
<180> Equat i on 4
<181>
Figure imgf000021_0003
<182> (In Equations 3 and 4, R indicates the number of bits in one GOP when the GOP is encoded.)
<183> That is, the mode decision may be made, (a) considering the calculation complexity, (2) using an optimizing method that considers calculation complexity to a certain degree in predicting the bit amounts to be transmitted, or (3) predicting or obtaining the actual bit amounts. <i84> FIG. 22 shows the comparison graph of bit rate-PSNR results at QCIF and CIF for "Crew" QCIF and CIF video sequences, wherein among the results, one is based on a SVM 3.0 codec proposed by HHI for SVC, and the other is based on an adaptive GOP structure proposed in the present invention. In the above example, the encoding based on the adaptive GOP structure according to the present invention is performed for the GOP size of 16, or N=4 and M=I in the algorithm of FIG. 16, and uses the MSE values between each frame reconstructed per sub-band after the MCTF is performed and each frame reconstructed per sub-band after the encoding is performed, in determining the sub-GOP sizes. The resulting graph shows that the encoding based on the adaptive GOP structure according to the present invention improves performance from about 0.02 dB to 0.45 dB in comparison with the results of the existing SVM 3.0 encoding method.
<I85> FIG. 23 shows the comparison graph of results of encoding "Crew" video sequence at 4-CIF, wherein one is based on the HHI's codec and the other is based on the adaptive GOP structure according to the present invention. It is shown that the coding scheme according to the present invention improves performance from about 0.18 dB to 0.43 dB.
<i86> FIG. 24 shows the GOP structure for "Crew" QCIF 15Hz video sequence, which is adaptively divided in accordance with the present invention. It can be concluded that the GOP size is divided according to the variation of motion, and coding efficiency is improved.
<i87> FIG. 25 shows the quality comparison between the frames, which are obtained by the HHI codec and the adaptive GOP structure-based encoding of the present invention, respectively, with respect to the 241 frame of "Crew" video sequence. As shown in the left figure, it can be concluded from the result from the HHI codec that there is a blurring phenomenon in the hand part at a left-highlighted portion of the frame and a color spread phenomenon at a right-highlighted portion of the frame. In contrast, as shown in the right figure, it can be concluded from the result of the adaptive GOP structure-based encoding of the present invention that the hand part is not seriously blurred and there is no color spread.
<i88> FIG. 26 the quality comparison between the frames, which are obtained by the HHI codec and the adaptive GOP structure-based encoding of the present invention, respectively, with respect to the 279 frame of "Crew" video sequence. As shown in the left figure, it can be concluded from the resultant frame of the HHI codec that there are serious blocking artifacts at the highlighted portion of the frame.
<189> FIG. 27 the quality comparison between the frames, which are obtained by the HHI codec and the adaptive GOP structure-based encoding of the present th invention, respectively, with respect to the 298 frame of "Crew" video sequence. As shown in the left figure, it can be concluded from the result picture of the HHI codec that the color spread phenomenon is prominent at the highlighted portion of the frame.
<i90> FIG. 28 shows the comparison graph of the results based on the HHI codec and the adaptive GOP structure-based encoding of the present invention with respect to "Football" sequence at QCIF and CIF. As shown, the encoding results according to the present invention show that performance is improved from about O.OldB to O.lδdB.
<i9i> FIG, 29 shows the comparison graph of the results based on the HHI codec and the adaptive GOP structure-based encoding of the present invention with respect to 4-CIF of "Football" video sequence. As shown, the encoding results according to the present invention show that performance is further improved from about 0.06 dB to 0.14 dB.
<i92> FIG. 30 shows the GOP structure for "Football" QCIF 15Hz video sequence, which is adaptively divided in accordance with the present invention, It can be concluded that the GOP is divided according to a degree of motion, and coding efficiency is improved.
<193> FlG. 31 shows the comparison graph of the results based on the HHI codec and the adaptive GOP structure-based encoding of the present invention with respect to "Football" video sequence at QCIF and CIF. As shown, the encoding result according to the present invention shows that performance is improved from about 0.15 dB to 0.65 dB.
<i94> FIG. 32 shows the GOP structure for "Football" QCIF 15Hz video sequence, which is adaptively divided in accordance with the present invention. It can be concluded that the GOP is divided according to a degree of motion, and coding efficiency is improved.
<i95> In one embodiment, in order to decode a bitstream encoded by the adaptive GOP structure-based encoding method in accordance with one embodiment of the present invention, the corresponding GOP size and decomposition stage information may be encoded and transmitted to a decoder using "gop_size" and "decomposition_stage", which are specified as slice header information in SVM 3.0 codec of SVC. The SVM 3.0 codec is currently undergoing international standardization. Then, the encoded bitstream is decoded using the corresponding GOP size in the existing SVC decoder without any additional bit.
<i96> In another example, the GOP size information may be encoded and transmitted into a GOP header, and thereby the adaptively-divided GOP may be decoded. As one example, 1 bit for "variable-GOP-Size" is allocated in a header of a bitstream to indicate the use of the variable GOP size to the decoder, and then decoding can be performed suitably in the corresponding conditions.
<i97> According to another embodiment of the present invention, in order to decode the encoded bitstream at a decoder, an encoder can add a "gop_divide_bit" as one bit flag to the front of each GOP-based bitstream and then transmits it to the decoder, wherein the "gop_divide_bit" indicates whether to divide a GOP. This may be performed by slightly modifying the mode decision process shown in FIG. 21. In other words, difference values obtained from encoding based on the different GOP sizes from the maximum 2-
M N sized GOP to the minimum 2 -sized GOP, with respect to a 2-frame video sequence, are compared with each other (S2110). As a result of the comparison, if the difference value obtained by the 2 -sized GOP-based encoding is the smallest (S2120), the 2 -sized GOP is selected and, at the same time, the flag "gop_divide_bit" is set to "0". Otherwise, the flag "gop_divide_bit" is set to T to indicate that the GOP is divided. The other steps S2140 to S2160 may be performed in the same manner.
<i98> FIG. 33 shows a configuration of a GOP bitstream to which "gop_divide_bit" is added in accordance with one embodiment of the present invention. FIG. 34 shows an example of a bitstream constructed from the result of performing the adaptive GOP structure-based coding of the present invention, when the predetermined GOP size 16 (i.e., N=4 and M=O), the GOP is divided into sub-GOPs of (2, 2, 4, 8).
<199> A decoding algorithm for an encoded bitstream, which the flag "gop_divide_bit" is added thereto, is explained below. In this case, a value of N, which is a power of the original (non-divided) GOP size, will be transmitted together.
<2oo> (1) The flag "gop_divide_bit" in the bitstream corresponding to the 2- frame size is examined.
<20i> A. If the flag is 1O', the 2 -frame GOP is decoded, and <202> B. If the flag is '1', N is decreased by 1 (here, N=N-I). <2O3> (2) With respect to the front 2N-frame GOP and the rear 2 -frame GOP, the process (1) is performed respectively.
<204> FIG. 35 shows an example of a process of decoding the encoded bitstream shown in FIG. 34 in accordance with one embodiment of the present invention.
<205> (1) The flag "gop_divide_bit" is decoded and read out in the bitstream. As a result, because the flag has a value of T, it is determined that the 16-sized GOP has been divided. Accordingly, the front 8 frames are subjected to the following process (2), and the rear 8 frames are subjected to the following process (3).
<206> (2) The flag "gop_divide_bit" is decoded and read out in the bitstream of the front 8 frames. As a result, because the flag has a value of 1I', it is determined that the 8-sized GOP has been divided. Thus, the front 4 frames are subjected to the following process (4), and the rear 4 frames are subjected to the following process (5).
<207> (3) The flag "gop_divide_bit" is decoded and read out in the rear bitstream of 8 frames. As a result, because the flag has a value of '0', an 8-frame GOP bitstream is decoded, and the decoded image frames may be obtained.
<208> (4) The flag "gop_divide_bit" is decoded and read out in the bitstream of the front 4 frames. As a result, because the flag has a value of '1', it is determined that the 4-sized GOP has been divided. Thus, the front 2 frames are subjected to the following process (6), and the rear 2 frames are subjected to the following process (7):
<209> (5) The flag "gop_divide_bit" is decoded and read out in the bitstream of the rear 4 frames. As a result, because the flag has a value of '0', the 4-frame GOP bitstream is decoded, and the decoded image frames may be obtained.
<2io> (6) The flag "gop_divide_bit" is decoded and read out in the bitstream of the front 2frames. As a result, because the flag has a value of '0', the 2-frame GOP bitstream is decoded, and the decoded image frames may be obtained.
<2ii> (7) The flag "gop_divide_bit" is decoded and read out in the bitstream of the rear 2 frames. As a result, because the flag has a value of 1O', the 2-frame GOP bitstream is decoded, and the decoded image frames may be obtained.
<2i2> According to yet another embodiment of the present invention, an encoder of performing the adaptive GOP structure-based video coding according to the present invention can encode information on sub~G0Ps, which are adaptively divided within the constant GOP size, and then transmit it to a decoder. FIG. 36 shows an example of the selected GOP mode in a 16 frame- sized GOP. <2i3> In one example, when the encoding is performed based on 16 frame-sized GOP (i.e., N=4 and M=I), the original GOP size information is encoded in a sequence header and the information on sub-GOPs, which are adaptively divided within the 16 frame-sized GOP, is encoded in the slice header of a first frame of each GOP. The sub-GOP information is represented by the sub-GOP size divided by "N" of the original GOP size, 2 , i.e., 4. Then, each of the divided information is encoded in two fixed bits. For example, if the 16- sized GOP is selected as the sub-GOP, the information is encoded in "00b". If the 8-sized GOP is selected as the sub-GOP, the information is encoded in "01b". If the 4-sized GOP is selected as the sub-GOP, the information is encoded in "10b". If the 2-sized GOP is selected as the sub-GOP, the information is encoded in "lib". When the sum of the GOP sizes selected as the sub-GOPs within the 16 frame size is 16, the encoding of the selected GOP information is terminated. For example, if the sub-GOPs are determined as the sizes of 8, 4, 4, a total of 6 bits, 01-10-10, are needed. If the sub- GOPs are determined as the sizes of 16, a total of 2 bits, 00, are needed. If the sub-GOPs are determined as the sizes of 4, 2, 2, 8, a total of 8 bits, 10-11-11-01, are needed. If the sub-GOPs are determined as the sizes of 2, 2, 2, 2, 2, 2, 2, 2, 2, a total of 16 bits, 11-11-11-11-11-11-11-11, are needed. In this manner, the sub-GOP information can be expressed. In the case of transmitting the sub-GOP information on the encoder side together with the bitstream, the decoder decodes each GOP on the basis of the received sub-GOP information.
<2i4> In an embodiment of the present invention, the decoder may provide temporal scalability upon decoding a bitstream encoded based on the adaptively divided GOP structure. According to an embodiment of the present invention, the bitstream encoded based on the adaptively divided GOP structure should be decoded according the order of frames because it is different in a structure from a bitstream encoded based on the fixed GOP size.
<2i5> For example, as shown in FIG. 37, when the sub-GOPs are selected and encoded with the size of (8, 4, 2, 2) in the 16-frame GOP, the order of bitstream is determined based on each temporal resolution as follows:
<216> Total resolution: L3(0), H3(l), H2(2), H2(3), Hl(4), Hl(5), Hl(6), Hl(7), L2(8), H2(9), H1(10), H2(11), Ll(12), H1(13), L1(14) and Hl(15).
<217> 1/2 resolution: L3(0), H3(l), H2(2), H2(3), L2(8), H2(9), Ll(12) and LK14).
<2i8> 1/4 resolution: L3(0), H3(1), L2(8) and L1(14).
<2i9> 1/8 resolution: L3(0) and L1(14).
<220> 1/16 resolution: L1(14).
<22i> (The symbol "L" denotes a low-frequency image frame, and the symbol "H" denotes a high-frequency image frame.)
<222> FIG. 38 shows a 1/L resolution-supported algorithm performed to provide temporal scalability at a bitstream extractor in a decoder in accordance with one embodiment of the present invention. The following is a video extraction algorithm for supporting 1/L resolution of a bitstream encoded based on an adaptive GOP structure-based encoding method of the present invention. It is assumed that the GOP size is 16.
<223> (1) A value of k is initially set to 0 (here, k is an integer) (S3810).
<224> (2) A value of FramcNum is initially set to 2 (S3820). <225> (3) It is detected whether or not there is a low-frequency frame in the bitstream from the FrameNum frame to the L frame in a reverse direction
(S3830), and FrameNum is decreased by L (S3840). <226> (4) Based on the detecting result in step (3) (S3850), <227> (i) if there is no low-frequency frame, the value of k is increased by
1 (S3870), and <228> (ii) if there is a low-frequency frame, the low-frequency frame detected first in the inverse direction is selected, and if the value of k is not 0, k number of subsequent high-frequency frames are also selected, and then the value of k is set to 0 again (S3860). <229> (5) Steps (3) and (4) are repeated until FramNum reaches 0 (S3880), and 2 /L number of frames are finally selected.
<230> For example, the above-mentioned algorithm for supporting the 1/8 resolution of the encoded bitstream where the sub-GOP sizes are selected as (8, 4, 2, 2) in the 16-frame GOP is executed as follows. Provided that the entire bitstream is configured of "L3(0), H3(l), H2(2), H2(3), Hl(4), Hl(5), HK6), Hl(7), L2(8), H2(9), Hl(IO), H2(ll), Ll(12), Hl(13), Ll(14) and HK15)":
<23i> (1) A parameter k is initially set to 0.
<232> (2) FrameNum is initially set to 16.
<233> (3) A low-frequency frame is detected from the FrameNum (=16) bitstream BK15) in a reverse direction, and FrameNum is decreased by 8 (i.e., FrameNum=FrameNum-8).
<234> (4) Based on the detecting result in step (3), the low-frequency frame, Ll(14), is selected.
<23<>> (5) The low-frequency frame is detected from the FrameNum (=8) bitstream Hl(7) in a reverse direction, and FrameNum is decreased by 8
(FrameNum=FrameNum-8). <2%> (6) Based on the detecting result in step (5), the low-frequency frame,
L3(0) , is selected. <237> (7) Since FrameNum is 0, the algorithm is terminated. As a result of executing the algorithm, it can be seen that two frames L3(0) and Ll(14) are selected in order to support the 1/8 resolution.
<238> An example of supporting a 1/4 resolution is as follows: <239> (1) A parameter k is initially set to 0. <240> (2) FrameNum is initially set to 16.
<24i> (3) A low-frequency frame is detected from a FrameNum (=16) bitstream
HK15) in a reverse direction, and FrameNum is decreased by 4 (FrameNum=FrameNum-4). <242> (4) Based on the detecting result in step (3), the low frequency frame LK14) is selected. <243> (5) Since FrameNum(=12) is not '0' the low-frequency frame is detected from the 12 bitstream H2(10) in a reverse direction, and FrameNum is decreased by 4 (FrameNum=FrameNum-4),
<244> (6) Based on the detecting result in step (5), L2(8) is selected. <245> (7) Since FrameNum(=8) is not 1O' the low-frequency frame is detected from the 8 bitstream Hl(7) in a reverse direction, and FrameNum is decreased by 4 (FrameNum=FrameNum-4). <246> (8) Based on the detecting result in step (7), k is increased by 1 because there is no low-frequency frame. <247> (9) Since FrameNum(=4) is not 'O' the low-frequency frame is detected from the 4 bitstream H2(3) in a reverse direction, and FrameNum is decreased by 4 (FrameNum=FrameNum-4).
<248> (10) Based on the detecting result in step (9), the low-frequency frame L3(0) is selected. Then, because k is 1, the next frame H3(l) is selected, and k is set to 0 again.
<249> (11) Since FrameNum is 0, the algorithm is terminated. As a result of executing the algorithm, it can be seen that four frames L3(0), H3(l), L2(8), and Ll(14) are selected in order to support the 1/4 resolution.
<250> According to another embodiment of the present invention, the encoder may provide desired temporal scalability upon encoding a video based on the adaptive GOP structure, by adjusting a level of a sub-GOP (i.e., a selectable minimum size of the sub-GOP) selected in the mode decision process. For example, when the GOP size is 2 (generally, N≥4), mode decision performed by comparing the encoding results based on 2 sized sub-GOP, 2 sized sub-GOP,
2 sized sub-GOP, and 2 sized sub-GOP units to each other to provide 1/2 or more temporal scalability is defined to as "Level_l", mode decision performed by comparing the encoding results based on 2N sized sub-GOP, 2 N-1 N-_ N-2 sized sub-GOP, and 2 sized sub-GOP units to each other to provide 1/2 or more temporal scalability is defined to as "Level_2", and mode decision performed by comparing encoding result values based on 2 sized sub-GOP and 2 sized sub-GOP units to each other to provide 1/2 or more temporal scalability is defined to as "Level_3". The encoder may encode level information of the mode decision and transmit it to a decoder (extractor), in order to notify a supportable range of the temporal scalability. The following Table 1 shows the range of the temporal scalability that can be provided depending on levels of the mode decision.
<251> [Table 1] <252>
Figure imgf000031_0001
An example of the above-described method will be described. In case
4 that the encoding of a video sequence at 15Hz is performed based on 2 frame- sized GOP, the level information is encoded and transmitted. The level information is defined as follows.
4
<253> When the comparison of the encoding results based on 2 frame-sized GOP,
3 2 1
2 frame-sized GOP, 2 frame-sized GOP, and 2 frame-sized GOP to each other
4-3 is performed to provide 7.5 Hz (1/2 =1/2) or more temporal scalability, the level information is defined as Level_l. When the comparison of the encoding
4 3 2 results based on 2 frame-sized GOP, 2 frame-sized GOP, and 2 frame-sized
4-2
GOP units to each other is performed to provide 3.75 Hz (1/2 =1/4) or more temporal scalability, it is defined as Level_2, and when the comparison of the encoding results based on 2 frame-sized GOP and 2 frame-sized GOP units
4-1 to each other is performed to provide 1.875 Hz (1/2 =1/8) or more temporal scalability, it is defined as Level_3.
<254> That is, in order to provide a certain temporal scalability among the three temporal scalability modes, the encoder encodes the corresponding level information transmits it to the decoder (e.g., for SVC, in order to provide the specific temporal scalability among the three temporal scalability modes, the encoder encodes the corresponding level information and transmits it to the extractor). In one instance, the encoder may encode Level_l into "0", Level_2 into "10", and Level_l into "11". In another instance, the encoder may encode Level_l into "1", Level_2 into "010', and Level_3 into "Oil". It will be appreciated by those skilled in the art that the level information may be encoded by any other manners and the present invention is not limited to the above-mentioned manners.
<255> To transmit the supportable temporal scalability level information to the extractor of the decoder as described above, a flag may be added to the scalability Information, SEI message, of JSVM (Joint Scalable Video Model) 2.0, as shown in FIG. 39.
<256> A flag "use_adaptive_gop_structure_flag" in a hatched area of FIG. 39 is a flag indicating whether the adaptive GOP structure is used upon encoding a video, in which a value of 1 indicates that the adaptive GOP structure has been used. Further, "sub_gop_level" indicates a sub-G0P level of the adaptive GOP structure to notify a temporal scalability level that is supportable to the extractor.
<257> The present invention described above may be provided as one or more computer-readable mediums that are implemented on at least one manufactured object. The manufactured object may be a floppy disc, a hard disc, a CD ROM, a flash memory card, a PROM, a RAM, a ROM, or a magnetic tape. Generally, computer-readable programs may be implemented by any programming language. The language includes C, C++, or JAVA.
<258> Although exemplary embodiments of the present invention have been described with reference to the attached drawings, the present invention is not limited to these embodiments, and it should be appreciated to those skilled in the art that a variety of modifications and changes can be made without departing from the spirit and scope of the present invention.

Claims

[CLAIMS] [Claim 1]
<260> A method for performing motion compensated temporal filtering (MCTF)-based encoding on a video sequence, the method comprising the steps of: for each predefined 2 frame-sized group of pictures (GOP) of the video sequence, <26i> (a) encoding the 2 frame-sized GOP of the video sequence based on each of the different GOP sizes from the maximum size, 2N , to the minimum size, 2M (M is an integer between 1 and N) and obtaining different values between frames reconstructed after the encoding is performed and frames after the MCTF is performed, based on each of the different GOP sizes; <262> (b) selecting at least one sub-GOP based on the difference values obtained by encoding the 2N frame-sized GOP of the video sequence based on each of the different GOP sizes! and <263> (c) generating a bitstream by encoding the 2-frame-sized GOP based on the at least one selected sub-GOP.
[Claim 2]
<264> The method of claim 1, wherein step (b) includes the sub-steps of: <265> (bl) comparing the difference values obtained from the encoding based on each of the different GOP sizes from 2N to 2M and, if the difference value obtained from the encoding based on the 2 -sized GOP unit is the smallest, selecting the 2 -sized GOP as a sub-GOP; and
<266> (b2) if the difference value obtained from the encoding based on the 2N -sized GOP is not the smallest, after decreasing N by 1 (i.e., N=N-I),
<267> i) selecting two 2M-sized GOPs as the sub-GOPs if N has the same value as M, and <268> ϋ) repeating steps (bl) and (b2) for each of the front 2N frames and the rear 2 frames, if N does not have the same value as M.
[Claim 31
<269> The method of claim 1, wherein the difference value is selected from a group of MSE (Mean Square Error), SAD (Sum of Absolute Differences), SSE (Sum of Squared Errors), SAD+λSADR (R is the number of bits of the GOP unit), and
SSE+XSSER.
[Claim 4] <270> The method of claim 3, wherein the MSE has the difference value calculated by the following Equation 5: <27i> [Equation 5]
Figure imgf000035_0001
<273> where k is the number of pixels in one frame, FCiJ is the pixel value of the frame after the MCTF is performed, and GCiJ is the pixel value of the frame reconstructed after the encoding is performed.
[Claim 5]
<274> The method of claim 1, wherein the step (b) includes the sub-steps of: <275> (bl) comparing the difference values obtained from the encoding based on each of the different GOP sizes from 2 to 2 and, if the difference value obtained from the encoding based on the 2N-sized GOP unit is the smallest, selecting the 2 -sized GOP as a sub-GOP and setting a GOP divide bit inserted before the GOP bitstream as "0";
<276> (b2) if the difference value obtained from the encoding based on the 2 N -sized GOP is not the smallest, setting the GOP divide bit inserted before the GOP bitstream as "1" and, after decreasing N by 1 (i.e., N=N-1),
<277> i ) select ing two 2M -s i zed GOPs as the sub-GOPs i f N has the same value as M , and <278> ii) repeating steps (bl) and (b2) for each of the front 2 frames and the rear 2 frames, if N does not have the same value as M.
[Claim 6]
<279> The method of claim 1, further comprising a step of setting the at least one selected sub-GOP information in the first frame header information of the GOP to transmit to a decoder.
[Claim 7]
<280> A method for performing MCTF-based coding on a video sequence, the method comprising the steps of:
<281> (a) for each predetermined 2 -frame-sized GOP of the video sequence, <282> (al) encoding the 2 frame-sized GOP of the video sequence based on each of the different GOP sizes from the maximum size, 2 , to the minimum size, 2M (M is an integer between 1 and N) and selecting at least one sub-GOP based on the encoding result, and <283> (a2) generating a bitstream by encoding the 2 -frame-sized GOP based on the at least one selected sub-GOP; and
<284> (b) inserting temporal scalability range information in the generated bitstream.
[Claim 8]
<285> The method of claim 7, wherein the range of temporal scalability is based on the minimum size, 2M , of the selected sub-GOP.
[Claim 9] <286> The method of claim 7, wherein the temporal scalability range that can be supported is one of 1/2 N-1or more, 1/2 N-2or more, and 1/2 N-3or more.
[Claim 10] <287> A method for decoding an MCTF-based encoded bitstream, the method comprising the steps of: for each predetermined 2 -sized GOP bitstream, <28s> (a) reading and checking a value of a bit indicating whether the GOP is divided;
<289> (b) when the bit value is "0", decoding the GOP bitstream; <290> (c) when the bit value is "1", dividing the GOP bitstreain into a front half-sized GOP bitstream and a rear half-sized GOP bitstream; and <29i> (d) repeating the steps (a) through (C) for each of the front GOP bitstream and the rear GOP bitstream, respectively.
[Claim 11] <292> A method for decoding a MCTF-based encoded bitstream, the method comprising the steps of: for a predetermined-sized GOP bitstream, <293> (a) determining whether the GOP is divided; <294> (b) when the GOP is determined not to be divided, decoding the GOP bitstream; <295> (c) when the GOP is determined to be divided, dividing the GOP bitstream; and <296> (d) decoding the divided GOP bitstreams.
[Claim 12] <297> The method of claim 11, further comprising repeating steps (a) through (c) for each of the divided GOP bitstreams prior to performing the step (d).
[Claim 13] <298> A method for decoding a MCTF-based encoded bitstream, the method comprising the steps of: for each predetermined 2 -sized GOP bitstream, <299> reading adaptively-divided GOP structure information from the GOP bitstream; and <3oo> decoding the GOP bitstream based on the adaptively-divided GOP structure information.
[Claim 14] <30i> A method for decoding a MCTF-based encoded bitstream, the method comprising the steps of: for a predetermined-sized GOP bitstream,
<302> reading variable GOP structure information from the GOP bitstream; and <303> decoding the GOP bitstream based on the variable GOP structure information.
[Claim 15]
<304> A data structure of a MCTF-based encoded bitstream, the data structure comprising information about a variable GOP size in a header of the bitstream.
[Claim 16]
<305> A data structure of a MCTF-based encoded bitstream, the data structure comprising information indicating whether each GOP is divided.
[Claim 17]
<3O6> A method for providing 1/L temporal scalability upon decoding an MCTF-based encoded bitstream, the method comprising the steps of: for each predetermined
N
2 -frame-si zed GOP bi tstream,
<3O7> (a) ini t i al izing "k" to "0" (k i s an integer ) ;
<308> (b) ini t ial izing "FrameNum" to 2 ;
<309> (c) detect ing whether there i s a low-frequency frame in a bi tstream th th from FrameNum frame to L frame in the reverse direction, and decreasing
FrameNum by L;
<3io> (d) based on the result of detecting in the step (c), <3ii> (dl) increasing the value of k by 1, if there is no low-frequency frame, and <3i2> (d2) selecting the low-frequency frame detected first in the reverse direction if there is a low-frequency picture and, if the value of k is not
0, further selecting subsequent k number of high-frequency frames and then re-initializing k to 0; and <3i3> (e) repeating the steps (c) and (d) until FramNum reaches 0, and finally selecting 2 /L number of frames.
[Claim 18]
<314> A computer-readable recording medium having a computer program stored therein for performing the MCTF-based coding method according to any one of claims 1 to 9.
[Claim 19]
<3i5> A computer-readable recording medium having a computer program stored therein for performing the method for decoding an MCTF-based encoded bitstream according to any one of claims 10 to 14.
[Claim 20]
<3i6> A computer-readable recording medium having a computer program stored therein for performing the method for providing 1/L temporal scalability according to claim 17.
PCT/KR2005/003467 2004-10-18 2005-10-18 Method for encoding/decoding video sequence based on mctf using adaptively-adjusted gop structure Ceased WO2006043772A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US11/576,572 US20090080519A1 (en) 2004-10-18 2005-10-18 Method for encoding/decoding video sequence based on mctf using adaptively-adjusted gop structure

Applications Claiming Priority (8)

Application Number Priority Date Filing Date Title
KR10-2004-0083333 2004-10-18
KR20040083333 2004-10-18
KR20050002247 2005-01-10
KR10-2005-0002247 2005-01-10
KR10-2005-0031712 2005-04-16
KR1020050031712A KR20060045796A (en) 2004-10-18 2005-04-16 Method and apparatus for adaptive subdivision of OPP during video encoding using MCF
KR20050068494 2005-07-27
KR10-2005-0068494 2005-07-27

Publications (1)

Publication Number Publication Date
WO2006043772A1 true WO2006043772A1 (en) 2006-04-27

Family

ID=36203171

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/KR2005/003467 Ceased WO2006043772A1 (en) 2004-10-18 2005-10-18 Method for encoding/decoding video sequence based on mctf using adaptively-adjusted gop structure

Country Status (2)

Country Link
KR (1) KR100714071B1 (en)
WO (1) WO2006043772A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2008033602A3 (en) * 2006-09-15 2008-05-08 Freescale Semiconductor Inc Localized content adaptive filter for low power scalable image processing
US7907789B2 (en) 2007-01-05 2011-03-15 Freescale Semiconductor, Inc. Reduction of block effects in spatially re-sampled image information for block-based image coding
WO2024037858A1 (en) * 2022-08-17 2024-02-22 Interdigital Ce Patent Holdings, Sas Rate distortion optimization for time varying textured mesh compression

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20020026254A (en) * 2000-06-14 2002-04-06 요트.게.아. 롤페즈 Color video encoding and decoding method
KR20040069209A (en) * 2001-12-28 2004-08-04 코닌클리케 필립스 일렉트로닉스 엔.브이. Video encoding method
EP1455534A1 (en) * 2003-03-03 2004-09-08 Thomson Licensing S.A. Scalable encoding and decoding of interlaced digital video data

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2002232893A (en) * 2001-02-05 2002-08-16 Matsushita Electric Ind Co Ltd Image coding device
JP3888533B2 (en) * 2002-05-20 2007-03-07 Kddi株式会社 Image coding apparatus according to image characteristics
EP1540964A1 (en) * 2002-09-11 2005-06-15 Koninklijke Philips Electronics N.V. Video coding method and device
KR100654431B1 (en) * 2004-03-08 2006-12-06 삼성전자주식회사 Method for scalable video coding with variable GOP size, and scalable video coding encoder for the same

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20020026254A (en) * 2000-06-14 2002-04-06 요트.게.아. 롤페즈 Color video encoding and decoding method
KR20040069209A (en) * 2001-12-28 2004-08-04 코닌클리케 필립스 일렉트로닉스 엔.브이. Video encoding method
EP1455534A1 (en) * 2003-03-03 2004-09-08 Thomson Licensing S.A. Scalable encoding and decoding of interlaced digital video data

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2008033602A3 (en) * 2006-09-15 2008-05-08 Freescale Semiconductor Inc Localized content adaptive filter for low power scalable image processing
US7760960B2 (en) 2006-09-15 2010-07-20 Freescale Semiconductor, Inc. Localized content adaptive filter for low power scalable image processing
US7907789B2 (en) 2007-01-05 2011-03-15 Freescale Semiconductor, Inc. Reduction of block effects in spatially re-sampled image information for block-based image coding
WO2024037858A1 (en) * 2022-08-17 2024-02-22 Interdigital Ce Patent Holdings, Sas Rate distortion optimization for time varying textured mesh compression

Also Published As

Publication number Publication date
KR100714071B1 (en) 2007-05-02
KR20060054080A (en) 2006-05-22

Similar Documents

Publication Publication Date Title
US5218435A (en) Digital advanced television systems
CA2842551C (en) Signal processing and inheritance in a tiered signal quality hierarchy
JP5559139B2 (en) Video encoding and decoding method and apparatus
US9900599B2 (en) High frequency emphasis in decoding of encoded signals
US8054882B2 (en) Method and system for providing bi-directionally predicted video coding
KR20030020382A (en) Method of and system for activity-based frequency weighting for FGS enhancement layers
Symes 7 Video Compression
US20100254448A1 (en) Selective Local Adaptive Wiener Filter for Video Coding and Decoding
WO2006049412A1 (en) Method for encoding/decoding a video sequence based on hierarchical b-picture using adaptively-adjusted gop structure
EP2678944A1 (en) Methods and devices for data compression using offset-based adaptive reconstruction levels
Nguyen et al. Adaptive downsampling/upsampling for better video compression at low bit rate
US20090080519A1 (en) Method for encoding/decoding video sequence based on mctf using adaptively-adjusted gop structure
WO2006043772A1 (en) Method for encoding/decoding video sequence based on mctf using adaptively-adjusted gop structure
Esmaili et al. Wyner–Ziv video coding with classified correlation noise estimation and key frame coding mode selection
US20230059035A1 (en) Efficient encoding of film grain noise
Netravali et al. A high quality digital HDTV codec
US20250358419A1 (en) Techniques for scaling a rate-distortion multiplier when performing trellis coded quantization
JPH09149420A (en) Method and device for compressing dynamic image
US20150373354A1 (en) Method and device for encoding/decoding image so that image is compatible with multiple codecs
WO2025240358A1 (en) Techniques for selecting between scalar quantization and trellis coded quantization when encoding video data
WO2025240350A1 (en) Techniques for performing both scalar quantization and trellis coded quantization when encoding video data
Horn et al. Pyramid coding using lattice vector quantization for scalable video applications
KR20060045797A (en) Method and Apparatus for Adaptive Subdivision of GP in Video Coding Using Hierarchical Video
KR20040095399A (en) Weighting factor determining method and apparatus in explicit weighted prediction
KR20060045796A (en) Method and apparatus for adaptive subdivision of OPP during video encoding using MCF

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BW BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE EG ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KM KP KZ LC LK LR LS LT LU LV LY MA MD MG MK MN MW MX MZ NA NG NI NO NZ OM PG PH PL PT RO RU SC SD SE SG SK SL SM SY TJ TM TN TR TT TZ UA UG US UZ VC VN YU ZA ZM ZW

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): GM KE LS MW MZ NA SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IS IT LT LU LV MC NL PL PT RO SE SI SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 05808644

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 11576572

Country of ref document: US