EP4576772A1 - Appareil de décodage d'image et appareil de codage d'image utilisant ia et procédé par lesdits appareils - Google Patents
Appareil de décodage d'image et appareil de codage d'image utilisant ia et procédé par lesdits appareils Download PDFInfo
- Publication number
- EP4576772A1 EP4576772A1 EP23855133.7A EP23855133A EP4576772A1 EP 4576772 A1 EP4576772 A1 EP 4576772A1 EP 23855133 A EP23855133 A EP 23855133A EP 4576772 A1 EP4576772 A1 EP 4576772A1
- Authority
- EP
- European Patent Office
- Prior art keywords
- block
- image
- enlarged
- prediction block
- current
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T9/00—Image coding
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/102—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
- H04N19/103—Selection of coding mode or of prediction mode
- H04N19/105—Selection of the reference unit for prediction within a chosen coding or prediction mode, e.g. adaptive choice of position and number of pixels used for prediction
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/102—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
- H04N19/124—Quantisation
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/102—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
- H04N19/132—Sampling, masking or truncation of coding units, e.g. adaptive resampling, frame skipping, frame interpolation or high-frequency transform coefficient masking
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/169—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
- H04N19/17—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
- H04N19/174—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a slice, e.g. a line of blocks or a group of blocks
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/169—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
- H04N19/17—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
- H04N19/176—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a block, e.g. a macroblock
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/50—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
- H04N19/503—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
- H04N19/51—Motion estimation or motion compensation
- H04N19/527—Global motion vector estimation
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/50—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
- H04N19/503—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
- H04N19/51—Motion estimation or motion compensation
- H04N19/563—Motion estimation with padding, i.e. with filling of non-object values in an arbitrarily shaped picture block or region for estimation purposes
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/50—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
- H04N19/503—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
- H04N19/51—Motion estimation or motion compensation
- H04N19/577—Motion compensation with bidirectional frame interpolation, i.e. using B-pictures
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/65—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using error resilience
- H04N19/66—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using error resilience involving data partitioning, i.e. separation of data into packets or partitions according to importance
Definitions
- the disclosure relates to a method and apparatus for processing an image, and more particularly, to a method and apparatus for encoding/decoding an image using artificial intelligence (Al).
- Al artificial intelligence
- an image may be divided into blocks and the blocks may be encoded and decoded through inter prediction or intra prediction.
- AVC advanced video coding
- HEVC high efficiency video coding
- Intra prediction may refer to a method of compressing an image by removing spatial redundancy in an image
- inter prediction may be a method of compressing an image by removing temporal redundancy between images.
- a prediction block of a current block may be generated through intra prediction or inter prediction, a residual block may be generated by subtracting the prediction block from the current block, and residual samples of the residual block may be transformed and quantized.
- residual samples of a residual block may be generated by inverse-quantizing and inverse-transforming quantized transform coefficients of the residual block, and a current block may be reconstructed by adding a prediction block generated through intra prediction or inter prediction to the residual block.
- the reconstructed current block may be processed according to one or more filtering algorithms and then may be output.
- a rule-based prediction mode may be used for inter prediction of a current block.
- the rule-based prediction mode may include, for example, a skip mode, a merge mode, or an advanced motion vector prediction (AMVP) mode.
- AMVP advanced motion vector prediction
- a method of decoding an image according to an embodiment of the disclosure may include obtaining a motion vector of a current block.
- the method may include obtaining a preliminary prediction block by using a reference block indicated by the motion vector in a reference image.
- the method may include obtaining a final prediction block of the current block by applying, to a neural network, at least one of a picture order count (POC) map including a POC difference between the reference image and a current image including the current block, the preliminary prediction block, or a quantization error map.
- POC picture order count
- the method may include reconstructing the current block by using the final prediction block and a residual block obtained from a bitstream.
- sample values of the quantization error map may be calculated from a quantization parameter for the reference block.
- a method of encoding an image according to an embodiment of the disclosure includes obtaining a motion vector indicating a reference block in a reference image corresponding to a current block.
- the method may include obtaining a final prediction block of the current block by applying, to a neural network, at least one of a picture order count (POC) map including a POC difference between the reference image and a current image including the current block, a preliminary prediction block obtained based on the reference block, or a quantization error map.
- POC picture order count
- the method may include obtaining a residual block by using the current block and the final prediction block.
- the method may include generating a bitstream including information about information about the residual block.
- sample values of the quantization error map may be calculated from a quantization parameter for the reference block.
- An image decoding apparatus may include at least one memory storing at least one instruction, and at least one processor configured to operate according to the at least one instruction.
- the at least one processor may be configured to obtain a motion vector of a current block.
- the at least one processor may be configured to obtain a preliminary prediction block by using a reference block indicated by the motion vector in a reference image.
- the at least one processor may be configured to obtain a final prediction block of the current block by applying, to a neural network, at least one of a picture order count (POC) map including a POC difference between the reference image and a current image including the current block, the preliminary prediction block), or a quantization error map.
- POC picture order count
- the at least one processor may be configured to reconstruct the current block by using the final prediction block and a residual block obtained from a bitstream.
- sample values of the quantization error map may be calculated from a quantization parameter for the reference block.
- An image encoding apparatus may include at least one memory storing at least one instruction and at least one processor configured to operate according to the at least one instruction.
- the at least one processor may be configured to obtain a motion vector indicating a reference block in a reference image corresponding to a current block.
- the at least one processor may be configured to obtain a final prediction block of the current block by applying, to a neural network, at least one of a picture order count (POC) map including a POC difference between the reference image and a current image including the current block, a preliminary prediction block obtained based on the reference block, or a quantization error map.
- POC picture order count
- the at least one processor may be configured to obtain a residual block by using the current block and the final prediction block.
- the at least one processor may be configured to generate a bitstream including information about information about the residual block.
- sample values of the quantization error map may be calculated from a quantization parameter for the reference block.
- a method of decoding an image according to an embodiment of the disclosure may include obtaining a motion vector of a current block.
- the method may include obtaining a preliminary prediction block using a reference block indicated by the motion vector in a reference image.
- the method may include obtaining a final prediction block of the current block by applying, to a neural network, at least one of a picture order count (POC) map including a POC difference between the reference image and a current image including the current block, the preliminary prediction block, or a quantization error map.
- POC picture order count
- the method may include reconstructing the current block using the final prediction block and a residual block obtained from a bitstream.
- sample values of the quantization error map may be calculated from a quantization parameter for the reference block.
- the expression "at least one of a, b or c" indicates only a, only b, only c, both a and b, both a and c, both b and c, all of a, b, and c, or variations thereof.
- an element represented as a "... unit” or a “module” two or more elements may be combined into one element or one element may be divided into two or more elements according to subdivided functions.
- Each element described hereinafter may additionally perform some or all of functions performed by another element, in addition to main functions of itself, and some of the main functions of each element may be performed entirely by another component.
- an "image” or a “picture” may refer to a still image, a moving image including a plurality of consecutive still images (or frames), or a video.
- a "neural network” may refer to a representative example of an artificial neural network model simulating brain nerves, and is not limited to an artificial neural network model using a specific algorithm.
- the neural network may be referred to as a deep neural network.
- a weight may refer a value used in an operation process of each of layers constituting a neural network. For example, a weight may be used when an input value is applied to a certain operation expression. The weight is a value set as a result of training, and may be updated using separate training data.
- a "current block” may refer to a block currently being processed.
- the current block may be a slice, a tile, a largest coding unit, a coding unit, a prediction unit, or a transform unit divided from a current image.
- sample may refer to data assigned to a sampling position in data such as an image, a block, a filter kernel, or a map, which is to be processed.
- the sample may include a pixel in a two-dimensional (2D) image.
- FIG. 24 is a diagram illustrating a general image encoding and decoding process, according to an embodiment of the disclosure.
- An encoding apparatus 2410 may transmit a bitstream generated through encoding on an image to a decoding apparatus 2450, and the decoding apparatus 2450 may reconstruct the image by receiving and decoding the bitstream.
- a predictive encoder 2415 of the encoding apparatus 2410 may output a prediction block through inter prediction or intra prediction on a current block, and a transform and quantization unit 2420 (illustrated as "T+Q") may output a quantized transform coefficient by transforming and quantizing residual samples of a residual block between the prediction block and the current block.
- T+Q transform and quantization unit 2420
- An entropy encoder 2425 may encode the quantized transform coefficient and may output a bitstream.
- the quantized transform coefficient may be reconstructed into the residual block including the residual samples in a spatial domain through an inverse-quantization and inverse-transform unit 2430 (illustrated as "Q -1 + T -1 ") .
- a reconstructed block obtained by adding the prediction block to the residual block may be output as a filtered block using a deblocking filtering unit 2435 and a loop filtering unit 2440.
- a reconstructed image including the filtered block may be used as a reference image for a next image in the predictive encoder 2415.
- the bitstream received by the decoding apparatus 2450 may be reconstructed into the residual block including the residual blocks of the spatial domain through an entropy decoder 2455 and an inverse-quantization and inverse-transform unit 2460.
- a reconstructed block may be generated by combining the residual block and a prediction block output from a predictive decoder 2475, and the reconstructed block may be output as a filtered block through a deblocking filtering unit 2465 and a loop filtering unit 2470.
- a reconstructed image including the filtered block may be used as a reference image for a next image in the predictive decoder 2475.
- the predictive encoder 2415 and the predictive decoder 2475 may perform predictive encoding and predictive decoding on the current block according to a rule-based prediction mode and/or a neural network-based prediction mode.
- the rule-based prediction mode may include a merge mode, a skip mode, an advanced motion vector prediction (AMVP) mode, a bi-directional optical flow (BDOF) mode, or a bi-prediction with coding unit (CU)-level weights (BCW).
- AMVP advanced motion vector prediction
- BDOF bi-directional optical flow
- BCW bi-prediction with coding unit-level weights
- the predictive encoder 2415 and the predictive decoder 2475 may apply the rule-based prediction mode and the neural network-based prediction mode to the current block.
- FIGS. 1 to 23 An example of a neural network-based prediction mode according to an embodiment of the disclosure is described in detail with reference to FIGS. 1 to 23 .
- the image decoding apparatus 100 may include a bitstream parser 110 and a decoder 130.
- the decoder 130 may include an artificial intelligence (Al)-based predictive decoder 132 and a reconstruction unit 134.
- the bitstream parser 110 may correspond to the entropy decoder 2455 of FIG. 24 .
- the decoder 130 may correspond to the inverse-quantization and inverse-transform unit 2460, the predictive decoder 2475, the deblocking filtering unit 2465, and the loop filtering unit 2470.
- the bitstream parser 110 and the decoder 130 may be implemented as, by, or using at least one processor.
- the bitstream parser 110 and the decoder 130 may operate according to at least one instruction stored in at least one memory.
- bitstream parser 110 and the decoder 130 are individually illustrated in FIG. 1 , embodiments are not limited thereto.
- the bitstream parser 110 and the decoder 130 may be implemented through one processor.
- the bitstream parser 110 and the decoder 130 may be implemented as, by, or using a dedicated processor, or may be implemented through a combination of software and a general-purpose processor such as an application processor (AP), a central processing unit (CPU), or a graphics processing unit (GPU).
- AP application processor
- CPU central processing unit
- GPU graphics processing unit
- the dedicated processor may include at least one of a memory for implementing an embodiment of the disclosure or a memory processor for using an external memory.
- the bitstream parser 110 and the decoder 130 may include a plurality of processors.
- the bitstream parser 110 and the decoder 130 may be implemented through a combination of dedicated processors, or a combination of software and a plurality of general-purpose processors such as an AP, a CPU, and a GPU.
- the bitstream parser 110 may obtain a bitstream including a result of encoding an image.
- the bitstream parser 110 may receive a bitstream through a network from the image encoding apparatus 1900.
- the bitstream parser 110 may obtain a bitstream from any of data storage media including a magnetic medium such as a hard disk, a floppy disk, or a magnetic tape, an optical recording medium such as a compact disk read-only memory (CD-ROM) or a digital versatile disk (DVD), and a magneto-optical medium such as a floptical disk.
- data storage media including a magnetic medium such as a hard disk, a floppy disk, or a magnetic tape, an optical recording medium such as a compact disk read-only memory (CD-ROM) or a digital versatile disk (DVD), and a magneto-optical medium such as a floptical disk.
- the bitstream parser 110 may obtain pieces of information required to reconstruct the image by parsing the bitstream.
- the bitstream parser 110 may obtain syntax elements for reconstructing the image from the bitstream.
- Binary values corresponding to the syntax elements may be included in the bitstream according to a hierarchical structure of the image.
- the bitstream parser 110 may obtain the syntax elements by entropy-coding the binary values included in the bitstream.
- the bitstream parser 110 may transmit information about a motion vector and information about a residual block obtained from the bitstream to the decoder 130.
- the Al-based predictive decoder 132 may obtain a final prediction block of the current block using the information about the motion vector.
- the Al-based predictive decoder 132 may obtain the final prediction block of the current block using Al, for example, a neural network.
- a mode in which the Al-based predictive decoder 132 obtains the final prediction block of the current block using the neural network may be defined as a neural network-based prediction mode.
- the reconstruction unit 134 may obtain a residual block of the current block using the information about the residual block provided from the bitstream parser 110.
- the information about the residual block may include information about a quantized transform coefficient.
- the reconstruction unit 134 may obtain a residual block in a spatial domain by inverse-quantizing and inverse-transforming the quantized transform coefficient.
- the reconstruction unit 134 may obtain a current reconstructed block corresponding to the current block using the final prediction block and the residual block. In an embodiment of the disclosure, the reconstruction unit 134 may obtain the current reconstructed block by adding sample values of the final prediction block to sample values of the residual block.
- Al-based predictive decoder 132 An example of an Al-based predictive decoder 132 is described in more detail with reference to FIG. 2 .
- FIG. 2 is a diagram illustrating a configuration of the Al-based predictive decoder 132, according to an embodiment of the disclosure.
- the Al-based predictive decoder 132 may include a motion information obtainer 210, a prediction block obtainer 220, a neural network setter 230, and a neural network 240.
- the neural network 240 may be stored in a memory.
- the neural network 240 may be implemented as, by, or using an Al processor.
- the motion information obtainer 210 may obtain a motion vector of a current block using information about a motion vector. As described below, when a precision of the motion vector of the current block is changed from fractional precision to integer precision by a motion information obtainer 2010 of the image encoding apparatus 1900, the motion information obtainer 210 may obtain the motion vector having the integer precision of the current block.
- the information about the motion vector may include information indicating at least one motion vector candidate from among motion vector candidates included in a motion vector candidate list, for example, a flag or an index.
- the information about the motion vector may further include information about a residual motion vector corresponding to a difference between a prediction motion vector of the current block and the motion vector of the current block.
- the prediction block obtainer 220 may obtain a preliminary prediction block using a reference image and the motion vector of the current block obtained by the motion information obtainer 210.
- the rule-based prediction mode may include a merge mode, a skip mode, or an AMVP mode.
- the motion information obtainer 210 may construct a motion vector candidate list including motion vectors of neighboring blocks of the current block as motion vector candidates, and may determine a motion vector candidate indicated by information included in a bitstream from among the motion vector candidates included in the motion vector candidate list as the motion vector of the current block.
- the motion information obtainer 210 may construct a motion vector candidate list including motion vectors of neighboring blocks of the current block as motion vector candidates, and may determine a motion vector candidate indicated by information included in a bitstream from among the motion vector candidates included in the motion vector candidate list as the prediction motion vector of the current block.
- the motion information obtainer 210 may determine the motion vector of the current block using the prediction motion vector of the current block and the residual motion vector.
- the merge mode, the skip mode, or the AMVP mode is an example of the rule-based prediction mode, and in an embodiment of the disclosure, the rule-based prediction mode may further include a decoder-side motion vector refinement (DMVR) mode.
- DMVR decoder-side motion vector refinement
- a process of constructing a motion vector candidate list may be commonly performed. Examples of neighboring blocks that may be included in the motion vector candidate list are described with reference to FIG. 3 .
- neighboring blocks of a current block 300 may include spatial neighboring blocks (e.g., block A0, block A1, block B0, block B1, and block B2) which are spatially adjacent to the current block 300 and temporal blocks (e.g., block Col and block Br) which are temporally adjacent to the current block 300.
- spatial neighboring blocks e.g., block A0, block A1, block B0, block B1, and block B2
- temporal blocks e.g., block Col and block Br
- the spatial neighboring blocks may include at least one of a lower left corner block A0, a lower left block A1, an upper right corner block B0, an upper right block B1, or an upper left corner block B2.
- the block Br may be located at the lower right of the block Col located at the same point as the current block 300.
- the block Col located at the same point as the current block 300 may be a block including a pixel corresponding to a central pixel in the current block 300, from among pixels included in the collocated image.
- the motion information obtainer 210 may determine availability of neighboring blocks according to a certain order, and may sequentially include motion vectors of the neighboring blocks as motion vector candidates in a motion vector candidate list according to a determination result.
- the motion information obtainer 210 may determine that there is no availability of the block.
- the motion vector for list 0 may be a motion vector for indicating a reference block in a reference image included in list 0 (or for example reference image list 0)
- the motion vector for list 1 may be a motion vector for indicating a reference block in a reference image included in list 1 (or for example reference image list 1).
- the prediction block obtainer 220 may obtain a preliminary prediction block using a reference block indicated by a motion vector in a reference image.
- the preliminary prediction block may be obtained by applying interpolation to the reference block indicated by the motion vector in the reference image. Accordingly, the preliminary prediction block may include sub-pixels obtained by applying filtering to integer pixels.
- the reference block indicated by the motion vector in the reference image may be determined as a preliminary prediction block.
- a reference block indicated by the motion vector may be determined as a preliminary prediction block.
- the prediction block obtainer 220 may obtain a reference block indicated by the motion vector for list 0 in a reference image included in list 0, and may obtain a preliminary prediction block for list 0 using the reference block.
- the prediction block obtainer 220 may obtain a reference block indicated by the motion vector for list 1 in a reference image included in list 1, and may obtain a preliminary prediction block for list 1 using the reference block.
- FIG. 4 is a diagram illustrating reference blocks indicated by a motion vector for list 0 and a motion vector for list 1, according to an embodiment of the disclosure.
- the prediction block obtainer 220 may obtain a first reference block 415 indicated by the motion vector mv1 for list 0 in a first reference image 410 included in list 0, and may obtain a second reference block 435 indicated by the motion vector mv2 for list 1 in a second reference image 430 included in list 1.
- a first preliminary prediction block for list 0 may be obtained from the first reference block 415
- a second preliminary prediction block for list 1 may be obtained from the second reference block 435.
- the neural network setter 230 may obtain data to be input to the neural network 240.
- the neural network setter 230 may obtain data to be input to the neural network 240 based on a reference image, a preliminary prediction block, and a quantization parameter for a reference block.
- At least one of the preliminary prediction block, a POC map, or a quantization error map may be input to the neural network 240 by the neural network setter 230.
- the preliminary prediction block which may be for example a block determined to be similar to a current block in a rule-based prediction mode, may be used to obtain a final prediction block that is more similar to the current block.
- the quantization parameter for the reference block may be used to quantize/inverse-quantize residual data of the reference block in a process of encoding/decoding the reference block.
- the amount of error according to quantization/inverse-quantization may vary according to the quantization parameter.
- the quantization parameter may refer to the amount of error or distortion included in the reference block reconstructed through encoding/decoding.
- the POC map may include a difference between a POC of a current image and a POC of a reference image as sample values. In embodiments, this difference may be referred to as a POC difference.
- a POC may indicate an image output order. Accordingly, the POC difference between the current image and the reference image may refer to an output order difference between the current image and the reference image or a temporal difference between the current image and the reference image. Because a position or a size of an object may be changed in successive images due to the object's movement, the neural network 240 may output the final prediction block more similar to the current block by being trained on a temporal difference between the current image and the reference image.
- the neural network 240 may include one or more convolution layers.
- the neural network 240 may output the final prediction block by processing at least one of the preliminary prediction block, the POC map, or the quantization error map input from the neural network setter 230.
- the neural network 240 may individually determine sample values of the final prediction block by applying a certain operation to input data.
- a motion vector may be calculated for each block of an image in a rule-based prediction mode, whereas samples of a final prediction block may be individually determined in a neural network-based prediction mode according to an embodiment. Therefore, the neural network 240 may consider a motion vector of the current block for each sample. Accordingly, according to the neural network-based prediction mode according to an embodiment of the disclosure, the final prediction block which may be obtained according to the neural network-based prediction mode according to an embodiment of the disclosure may be more similar to the current block than a prediction block obtained according to the rule-based prediction mode.
- FIGS. 5 to 8 An example of a method of obtaining a quantization error map input to the neural network 240 is described with reference to FIGS. 5 to 8 .
- FIGS. 5 to 8 are diagrams for describing a method of obtaining a quantization error map based on a quantization parameter for a reference block, according to an embodiment of the disclosure.
- sample values of a quantization error map may be calculated from or based on a quantization parameter for a reference block.
- the quantization parameter for the reference block may be obtained from a bitstream including information for decoding the reference block.
- the quantization error map may include quantization error values calculated based on the quantization parameter as sample values.
- the quantization error values may indicate the amount of error that may be caused by quantization and inverse-quantization applied to residual samples in a process of encoding and decoding the reference block.
- a quantization error value When a quantization error value is large, this may mean that a difference between a transform coefficient before quantization and a transform coefficient after inverse-quantization may be large. As a difference between a transform coefficient before quantization and a transform coefficient after inverse-quantization increases, sameness between an original block and a reference block obtained through decoding on encoded data may decrease.
- neural network-based inter prediction may be performed by considering quantization error values.
- the quantization error value may be proportional to a square of a quantization step size.
- the quantization step size may be used to quantize a transform coefficient, and the transform coefficient may be quantized by dividing the transform coefficient by the quantization step size. Further, a quantized transform coefficient may be inverse-quantized by multiplying the quantized transform coefficient by the quantization step size.
- quantization step size 2 ⁇ quantization parameter / n / quantization scale quantization parameter % n
- quantization scale[quantization parameter %n] may indicate a scale value indicated by a quantization parameter from among pre-determined n scale values.
- n may be 6 according to the HEVC codec.
- the quantization step size may increase and the quantization error value may increase.
- the quantization error map may include a quantization step size calculated based on the quantization parameter as a sample value.
- sample values of a quantization error map 530 may have a2 calculated based on a1.
- a quantization parameter for the reference block 510 of FIG. 5 may be set for the reference block 510, or may be set for an upper block of the reference block 510, for example, a slice including the reference block 510.
- the neural network setter 230 may obtain sample values of the quantization error map 530 from a quantization parameter set for the reference block 510, or a quantization parameter set for a slice including the reference block 510.
- a quantization parameter for the reference block 510 When a quantization parameter for the reference block 510 is set for an upper block of the reference block, for example, a slice including the reference block 510, the same quantization parameter may be applied to blocks included in the slice. For example, because quantization error maps of the blocks included in the slice may be obtained based on the quantization parameter set for the slice, the amount of information, or the number of pieces of information, included in a bitstream may be less than that when a quantization parameter is set for each block or each sample.
- a quantization parameter for a reference block 510-1 may be set for each sample of the reference block 510-1.
- the neural network setter 230 may obtain sample values of a quantization error map 530-1 from the quantization parameter for each sample of the reference block 510-1.
- the neural network setter 230 may calculate a value of an upper left sample 631 of the quantization error map 530-1 to be a2 from a quantization parameter a1 of an upper left sample 611 of the reference block 510-1, and may calculate a value of a sample 632 located at the right of the upper left sample 631 of the quantization error map 530-1 to be b2 from a quantization parameter b1 of a sample 612 located at the right of the upper left sample 611 of the reference block 510-1.
- the amount of information to be obtained from a bitstream to check the quantization parameter for the reference block 510-1 may increase.
- a final prediction block of a current block may be obtained using the reference block 510-1 having a small error, the number of bits for expressing a residual block between the current block and the final prediction block may be reduced.
- the neural network setter 230 may divide a quantization error map 530-2 into sub-areas, for example first sub-area 750, second sub-area 760, third sub-area 770, and fourth sub-area 780 corresponding to lower blocks, which may include for example first lower block 710, second lower block 720, third lower block 730, and fourth lower block 740 of a reference block 510-2, and may calculate sample values respectively included in the sub-areas 750 to 780 of the quantization error map 530-2, from a quantization parameter for a sample at a certain position in the lower blocks 710, 720, 730, and 740 of the reference block 510-2.
- the lower blocks 710, 720, 730, and 740 may correspond to prediction units.
- the certain position may include an upper left position in a lower block.
- sample values of the first sub-area 750 of the quantization error map 530-2 may have a2 calculated based on a quantization parameter a1 for a sample 711 at an upper left position from among samples of the first lower block 710 in the reference block 510-2.
- sample values of the second sub-area 760 of the quantization error map 530-2 may have e2 calculated based on a quantization parameter e1 for a sample 721 at an upper left position from among samples of the second lower block 720 in the reference block 510-2.
- sample values of the third sub-area 770 of the quantization error map 530-2 may have c2 calculated based on a quantization parameter c1 for a sample 731 at an upper left position from among samples of the third lower block 730 in the reference block 510-2
- sample values of the fourth sub-area 780 of the quantization error map 530-2 may have b2 calculated based on a quantization parameter b1 for a sample 741 at an upper left position from among samples of the fourth lower block 740 of the reference block 510-2.
- a sample at a certain position may include a sample at a central position in a lower block.
- a sample at a central position may refer to a sample located at a lower left position, a sample located at an upper left position, a sample located at a lower right position, or a sample located at an upper right position.
- FIG. 8 illustrates an example in which a sample at a central position is a sample located at a lower left position when a width and a height of a certain area are divided in half.
- sample values of a first sub-area 850 of a quantization error map 530-3 may have a2 calculated based on a quantization parameter a1 for a sample 811 at a central position from among samples of the first lower block 710 in the reference block 510-2.
- sample values of a second sub-area 860 of the quantization error map 530-3 may have 2e calculated based on a quantization parameter e1 for a sample 821 at a central position from among samples of the second lower block 720 in the reference block 510-2.
- sample values of a third sub-area 870 of the quantization error map 530-3 may have a2 calculated based on a quantization parameter a1 for a sample 831 at a central position from among samples of the third lower block 730 in the reference block 510-2
- sample values of a fourth sub-area 880 of the quantization error map 530-3 may have c2 calculated based on a quantization parameter c1 for a sample 841 at a central position from among samples of the fourth lower block 740 in the reference block 510-2.
- a sample at an upper left position and a sample at a central position described with reference to FIGS. 7 and 8 are merely examples, and in an embodiment of the disclosure, a specific position in the lower blocks 710, 720, 730, and 740 for obtaining sample values of sub-areas of the quantization error maps 530-2 and 530-3 may be changed in various ways.
- the quantization error maps 530-2 and 530-3 may be obtained more rapidly.
- the neural network setter 230 may select any one of different methods of obtaining a quantization error map (e.g., methods of obtaining a quantization error map of FIGS. 5 to 8 ), based on at least one of a size of a current block, a prediction direction of the current block, a layer to which a current image belongs in a hierarchical structure of an image, or information obtained from a bitstream (e.g., a flag or an index), and may obtain a quantization error map according to the selected method.
- a quantization error map e.g., methods of obtaining a quantization error map of FIGS. 5 to 8
- a final prediction block may be obtained based on at least one of a preliminary prediction block, a quantization error map, or a POC map being input to the neural network 240.
- An example of a structure of the neural network 240 is described with reference to FIG. 9 .
- FIG. 9 is a diagram illustrating a structure of the neural network 240, according to an embodiment of the disclosure.
- a preliminary prediction block 902 may be input to a first convolution layer 910.
- a quantization error map 904 may be input to a first convolution layer 910.
- sizes of the preliminary prediction block 902, the quantization error map 904, and the POC map 906 may be the same as a size of a current block.
- the POC map 906 may include a POC difference between a current image and a reference image as sample values, sample values in the POC map 906 may all be the same.
- the notation 6X5X5X32 shown in the first convolution layer 910 of FIG. 9 may indicate that a convolution operation is performed using 32 filter kernels with a size of 5x5 on input data having six channels. As a result of the convolution operation, 32 feature maps may be generated by the 32 filter kernels.
- the processing may be performed by considering that the current block is bi-directionally predicted.
- a first motion vector for list 0 and a second motion vector for list 1 may be derived for the current block, and a first reference image included in list 0 and a second reference image included in list 1 may be obtained as the reference image of the current block.
- a first preliminary prediction block may be obtained from a first reference block in the first reference image indicated by the first motion vector for list 0, and a second preliminary prediction block may be obtained from a second reference block in the second reference image indicated by the second motion vector for list 1.
- the first preliminary prediction block, the second preliminary prediction block, a first quantization error map including sample values calculated based on a quantization parameter for the first reference block, a second quantization error map including sample values calculated based on a quantization parameter for the second reference block, a first POC map including a POC difference between the current image and the first reference image, and a second POC map including a POC difference between the current image and the second reference image may be obtained.
- the first convolution layer 910 may perform a convolution operation on the first preliminary prediction block, the second preliminary prediction block, the first quantization error map, the second quantization error map, the first POC map, and the second POC map using 32 filter kernels with a size of 5X5.
- unidirectional prediction for example, list 0 prediction or list 1 prediction
- list 0 prediction or list 1 prediction is applied to the current block, for example when only the first motion vector for list 0 is obtained or only the second motion vector for list 1 is obtained for the current block, only input data having three channels may be obtained. Because the number of channels that may be processed by the first convolution layer 910 of FIG. 9 is 6, the input data having three channels may be increased to input data having six channels.
- the neural network setter 230 may copy the first preliminary prediction block (or for example the second preliminary prediction block), the first quantization error map (or for example the second quantization error map), and the first POC map (or for example the second POC map) to obtain two first preliminary prediction blocks (or two second preliminary prediction blocks), two first quantization error maps (or two second quantization error maps), and two first POC maps (or two second POC maps), and may input the input data having six channels to the neural network 240.
- Feature maps generated by the first convolution layer 910 may represent unique features of input data.
- the feature maps may represent features in a vertical direction, features in a horizontal direction, or edge features of the input data.
- One feature map 1030 may be generated based on multiplication and addition between weights of a filter kernel 1010 with a size of 5X5 used in the first convolution layer 910 and sample values in input data 1005 (e.g., the preliminary prediction block 902) corresponding to the weights.
- 32 filter kernels may be used in the first convolution layer 910, 32 feature maps may be generated through convolution operations using the 32 filter kernels.
- Samples I1 to I49 shown in the input data 1005 in FIG. 10 may represent samples of the input data 1005, and samples F1 to F25 shown in the filter kernel 1010 may represent samples of the filter kernel 1010. Also, samples M1 to M9 shown in the feature map 1030 may represent samples of the feature map 1030.
- multiplication may be performed between values of the samples I1 to I5, I8 to I12, I15 to I19, I22 to I26, and I29 to I33 of the input data 1005 and the samples F1 to F25 of the filter kernel 1010, and a value obtained by combining (e.g., adding) results of the multiplication may be assigned as a value of M1 of the feature map 1030.
- multiplication may be performed between values of the samples I2 to I6, I9 to I13, I16 to I20, I23 to I27, and I30 to I34 of the input data 1005 and the samples F1 to F25 of the filter kernel 1010, and a value obtained by combining results of the multiplication may be assigned as a value of M2 of the feature map 1030.
- the feature map 1030 having a certain size may be obtained.
- FIG. 10 illustrates convolution layers included in the neural network 240 as performing operations according to a convolution operation
- the convolution operation described with reference to FIG. 10 is merely an example and embodiments of the disclosure are not limited thereto.
- the feature maps of the first convolution layer 910 may be input to a first activation layer 920.
- the first activation layer 920 may apply non-linear features to each feature map.
- the first activation layer 920 may include, but is not limited to, at least one of a sigmoid function, a Tanh function, or a rectified linear unit (ReLU) function.
- the first activation layer 920 applies non-linear features, this may mean that some sample values of feature maps are changed and output. In this case, the changing may be performed by applying the non-linear features.
- the first activation layer 920 may determine whether to transmit sample values of the feature maps to a second convolution layer 930. For example, some of the sample values of the feature maps may be activated by the first activation layer 920 and may be transmitted to the second convolution layer 930, and other sample values may be deactivated by the first activation layer 920 and may not be transmitted to the second convolution layer 930. Unique features of input data indicated by the feature maps may be emphasized by the first activation layer 920.
- Feature maps 925 output from the first activation layer 920 may be input to the second convolution layer 930. Any one of the feature maps 925 of FIG. 9 may be a result of processing the feature map 1030 described with reference to FIG. 10 at the first activation layer 920.
- the notation 32X5X5X32 shown in the second convolution layer 930 may indicate that a convolution operation is performed on the feature maps 925 having 32 channels using 32 filter kernels with a size of 5x5.
- An output of the second convolution layer 930 may be input to a second activation layer 940.
- the second activation layer 940 may apply non-linear features to input feature maps.
- Feature maps 945 output from the second activation layer 940 may be input to a third convolution layer 950.
- the notation 32X5X5X1 shown in the third convolution layer 950 may indicate that a convolution operation is performed on the 32 feature maps 945 to generate one final prediction block 955 using one filter kernel with a size of 5x5.
- FIG. 9 illustrates the neural network 240 as including three convolution layers (the first convolution layer 910, the second convolution layer 930, and the third convolution layer 950) and two activation layers (the first activation layer 920 and the second activation layer 940), this is merely an example, and embodiments are not limited thereto. For example, in embodiments the number of convolution layers and activation layers included in the neural network 240 may be changed in various ways.
- the neural network 240 may be implemented through a recurrent neural network (RNN).
- RNN recurrent neural network
- a CNN structure of the neural network 240 of FIG. 9 may be changed into an RNN structure.
- the image decoding apparatus 100 and the image encoding apparatus 1900 may include at least one arithmetic logic unit (ALU) for a convolution operation and an arithmetic operation at each activation operation.
- ALU arithmetic logic unit
- the ALU may be implemented as, by, or using a processor.
- the ALU may include a multiplier that performs multiplication between sample values of input data or a feature map output from a previous layer and sample values of a filter kernel, and an adder that adds results of the multiplication.
- the ALU may include a multiplier that multiplies an input sample value by a pre-determined weight used in at least one of a sigmoid function, a Tanh function, or an ReLU function, and a comparator that compares a result of the multiplication with a certain value to determine whether to transmit the input sample value to a next layer.
- the feature map 1030 with a size of 3X3 may be obtained when the filter kernel 1010 with a size of 5X5 is applied to the input data 1005 with a size of 7X7.
- padding may be performed by a distance of 2 in a left direction, a right direction, an upper direction, and a lower direction of the input data 1005.
- a size of data before the convolution operation and a size of data after the convolution operation may remain the same. Accordingly, even when at least one of the enlarged preliminary prediction block, the enlarged quantization error map, or the enlarged POC map is input to the neural network 240, a size of the final prediction block output from the neural network 240 may be the same as a size of at least one of the enlarged preliminary prediction block, the enlarged quantization error map, or the enlarged POC map. In this case, in an embodiment of the disclosure, the final prediction block may be cropped so that a size of the final prediction block output from the neural network240 is the same as a size of the current block.
- the neural network setter 230 may calculate an enlarged distance for padding based on the number of convolution layers included in the neural network 240, a size of a filter kernel used in each convolution layer, and a stride.
- Equation 3 h may denote an enlarged distance in a horizontal direction, v denotes an enlarged distance in a vertical direction, M may denote a size of input data in a horizontal direction, and N may denote a size of input data in a vertical direction.
- the neural network setter 230 may determine an enlarged distance in a horizontal direction and an enlarged distance in a vertical direction for padding of a preliminary prediction block, a quantization error map, and a POC map based on Equation 3 or Equation 4, and may obtain an enlarged preliminary prediction block, an enlarged quantization error map, and an enlarged POC map further including samples corresponding to the enlarged distances.
- the neural network setter 230 may obtain an enlarged preliminary prediction block by adding samples which are enlarged by the enlarged distance of 1 in a left direction and a right direction from a preliminary prediction block and adding samples which are enlarged by the enlarged distance of 1 in an upper direction and a lower direction of the preliminary prediction block.
- the neural network setter 230 may consider neighboring samples of the reference block when padding the preliminary prediction block, a quantization error map, and a POC map. For example, using samples adjacent to the reference block instead of padding the preliminary prediction block according to a pre-determined sample value, spatial characteristics of the reference image may also be considered in inter-predicting a current block.
- FIG. 11 is a diagram for describing a method of obtaining an enlarged preliminary prediction block, according to an embodiment of the disclosure.
- the neural network setter 230 may obtain an enlarged preliminary prediction block including samples of a preliminary prediction block and samples 1120 corresponding to the enlarged distance h from among samples adjacent to a reference block 1110 in a reference image 1100. Accordingly, a horizontal distance of the enlarged preliminary prediction block may be greater by 2h than the preliminary prediction block, and a vertical distance of the enlarged preliminary prediction block may be greater by 2v than the preliminary prediction block.
- FIG. 12 is a diagram for describing a method of obtaining an enlarged preliminary prediction block, when a boundary of the reference block 1110 corresponds to a boundary of the reference image 1100, according to an embodiment of the disclosure.
- the neural network setter 230 may obtain an enlarged preliminary prediction block including samples in a preliminary prediction block and neighboring samples located within the enlarged distance of 3 from among neighboring samples located outside a boundary of the reference block 1110.
- the neural network setter 230 may select neighboring blocks located within the enlarged distance of 3 from a left boundary of the reference block 1110 while being located in a left block 1210 of the reference block 1110, neighboring samples located within the enlarged distance of 3 from a right boundary of the reference block 1110 while being located in a right block 1250 of the reference block 1110, and neighboring samples located within the enlarged distance of 3 from a lower boundary of the reference block 1110 while being located in a lower block 1230 of the reference block 1110.
- neighboring samples located in a lower left block 1220 of the reference block 1110 and neighboring samples located in a lower right block 1240 of the reference block 1110 may also be selected.
- the neural network setter 230 may determine neighboring samples 1260 located outside the upper boundary of the reference block 1110 using samples in the reference image 1100 closest to each of the neighboring blocks 1260 located outside the upper boundary of the reference block 1110.
- the neural network setter 230 may apply an enlarged preliminary prediction block with a size of 11x11, which may be larger than the reference block 1110 (and preliminary prediction block) with a size of 5x5, to the neural network 240, and may obtain a final prediction block with a size of 5x5 that is the same as a size of the current block.
- a size of a quantization error map input to the neural network 240 may be the same as a size of the enlarged preliminary prediction block
- the neural network setter 230 may obtain an enlarged quantization error map with the same size as a size of the enlarged preliminary prediction block, an example of which is described with reference to FIG. 13 .
- FIG. 13 is a diagram for describing a method of obtaining an enlarged quantization error map 1300, according to an embodiment of the disclosure.
- a first sample 1301 to a fourth sample 1304 in the quantization error map 530-4 may have sample values of a2, b2, c2, and a2.
- values of neighboring samples outside the quantization error map 530-4 may be determined according to a quantization parameter for neighboring samples located outside a boundary of the reference block.
- the neighboring samples 1260 located outside the boundary of the reference block 1110 may be determined from closest samples available to the neighboring samples 1260.
- values of neighboring samples located outside a boundary of the quantization error map 530-4 may be determined closest samples available to the neighboring samples.
- samples 1305, 1360, 1355, and 1350 located outside an upper boundary of the quantization error map 530-4 may have values of e2, a2, b2, and e2.
- the sample 1305 located on an upper left side of the first sample 1301 may be determined from the left sample 1310 of the first sample 1301 which is the closest, and the sample 1360 located above the first sample 1301 may be determined from the first sample 1301 that is the closest.
- sample values in a POC map correspond to a POC difference between a current image and a reference image
- sample values in an enlarged POC map may all have the POC value between the current image and the reference image.
- the neural network setter 230 when the neural network setter 230 inputs at least one of an enlarged preliminary prediction block, an enlarged quantization error map, or an enlarged POC map to the neural network 240, the neural network setter 230 may also input, to the neural network 240, an enlarged current reconstructed block with the same size as a size of at least one of the enlarged preliminary prediction block, the enlarged quantization error map, or the enlarged POC map.
- the enlarged current reconstructed block may include samples corresponding to an enlarged distance from among samples reconstructed before the current block 1410.
- the neural network setter 230 may obtain an enlarged current reconstructed block using samples 1420 located at the enlarged distance 1 from a boundary of the current block 1410 from among samples reconstructed before the current block 1410.
- the neural network setter 230 may obtain the samples that are not reconstructed based on the enlarged preliminary prediction block, an example of which is described with reference to FIG. 15 .
- FIG. 15 is a diagram for describing a method of obtaining an enlarged current reconstructed block 1500, according to an embodiment of the disclosure.
- the enlarged current reconstructed block 1500 including samples 1420 that have been reconstructed before the current block 1410 and samples 1125 other than samples corresponding to the samples 1420 that have been reconstructed before the current block from among samples of an enlarged preliminary prediction block 1150 may be obtained.
- the enlarged current reconstructed block 1500 may be processed by the neural network 240 together with an enlarged preliminary prediction block, an enlarged quantization error map, and an enlarged POC map, spatial characteristics in a current image may also be considered.
- Each of the plurality of weight sets may include a weight used in an operation process of a layer included in the neural network 240.
- the neural network setter 230 may select a weight set to be used to obtain a final prediction block from among the plurality of weight sets, based on at least one a size of a current block, a prediction direction of the current block, a quantization parameter for a reference block, a layer to which a current image belongs in a hierarchical structure of an image, or information obtained from a bitstream.
- the neural network setter 230 may obtain a final prediction block which is more similar to a current block by setting a weight set indicated by an index obtained from a bitstream from among a weight set A, a weight set B, and a weight set C, to the neural network 240.
- FIG. 16 illustrates an example in which a weight set is selected based on information obtained
- a weight set to be used to obtain a final prediction block may be selected from among the plurality of weight sets according to a result of comparing a size of the current block with a pre-determined threshold value.
- the weight set C when a size of the current block is equal to or greater than 64X64, the weight set C may be selected; when a size of the current block is equal to or greater than 16X16 and less than 64X64, the weight set B may be selected; and when a size of the current block is less than 16X16, the weight set C may be selected.
- the weight set A when a current image corresponds to layer 1 in a hierarchical structure of an image, the weight set A may be selected; when a current image corresponds to layer 2, the weight set B may be selected; and when a current image corresponds to layer 3, the weight set C may be selected.
- Each of the plurality of weight sets may be generated as a result of training the neural network 240.
- the weight set A, the weight set B, and the weight set C of FIG. 16 may be obtained by training the neural network 240 according to different training purposes or goals, for example by using different types of training images to train the neural network 240, or calculating loss information in different manners.
- loss information 2306 corresponding to a difference between a final prediction block for training 2305 and a current block for training 2301 may be used.
- the weight set A may be generated by training the neural network 240 based on loss information calculated according to a first method and the weight set B may be generated by training the neural network 240 based on loss information calculated according to a second method.
- the weight set C may be generated by training the neural network 240 based on loss information calculated according to a third method.
- the neural network setter 230 may select a neural network to be used to obtain a final prediction block from among a plurality of neural networks, and may obtain a final prediction block of a current block by applying input data (e.g., a preliminary prediction block) to the selected neural network.
- the plurality of neural networks may be included in the Al-based predictive decoder 132.
- the plurality of neural networks may be different from each other in at least one of a type of a layer, the number of layers, a size of a filter kernel, or a stride.
- the neural network setter 230 may select a neural network to be used to obtain a final prediction block from among the plurality of neural networks, based on at least one of a size of a current block, a prediction direction of the current block, a quantization parameter for a reference block, a layer to which a current image belongs in a hierarchical structure of an image, or information obtained from a bitstream.
- the neural network setter 230 may determine whether to apply a neural network-based prediction mode, based on at least one of information obtained from a bitstream, a prediction direction of a current block, or whether an enlarged preliminary prediction block is outside a boundary of a reference image.
- a preliminary prediction block obtained by the prediction block obtainer 220 may be transmitted to the reconstruction unit 134.
- the neural network setter 230 may determine that a neural network-based prediction mode is not applied to a current block, and when an enlarged preliminary prediction block is located within a reference image, for example, when a boundary of a reference block does not correspond to a boundary of a reference image, the neural network setter 230 may determine that a neural network-based prediction mode is applied to a current block.
- the neural network setter 230 may determine that a neural network-based prediction mode is applied to the current block, and when a prediction direction of a current block is unidirectional, the neural network setter 230 may determine that a neural network-based prediction mode is not applied to the current block.
- FIG. 17 is a diagram illustrating an image decoding method performed by the image decoding apparatus 100, according to an embodiment of the disclosure.
- the image decoding apparatus 100 may obtain a motion vector of a current block.
- the image decoding apparatus 100 may obtain the motion vector of the current block according to a rule-based prediction mode.
- the rule-based prediction mode may include a merge mode, a skip mode, an AMVP mode, a BDOF mode, a BCW mode, or a DMVR mode.
- Prediction mode information included in a bitstream may be used to determine which of several rule-based prediction modes should be used to obtain the motion vector of the current block.
- the image decoding apparatus 100 may obtain a preliminary prediction block using the motion vector of the current block and a reference image of the current block.
- the image decoding apparatus 100 may obtain a preliminary prediction block using a reference block indicated by the motion vector of the current block in the reference image.
- a process of obtaining a preliminary prediction block similar to a current block from a reference image using a motion vector may be referred to as a motion compensation process.
- sample values of the quantization error map may be calculated based on a quantization parameter for the reference block.
- the image decoding apparatus 100 may reconstruct the current block using a residual block obtained from a bitstream and the final prediction block.
- the image decoding apparatus 100 may obtain the current block by adding sample values of the final prediction block to sample values of the residual block.
- FIG. 18 is a diagram illustrating a syntax, according to an embodiment of the disclosure.
- a neural network-based prediction mode may be used together with a skip mode, a merge mode, or an AMVP mode.
- NNinter() may be called to apply a neural network-based prediction mode to the current block.
- FIG. 19 is a diagram illustrating a configuration of the image encoding apparatus 1900, according to an embodiment of the disclosure.
- the image encoding apparatus 1900 may include an encoder 1910 and a bitstream generator 1930.
- the encoder 1910 may include an Al-based predictive encoder 1912 and a residual data obtainer 1914.
- the encoder 1910 may correspond to the predictive encoder 2415, the transform and quantization unit 2420, the inverse-quantization and inverse-transform unit 2430, the deblocking filtering unit 2435, and the loop filtering unit 2440 of FIG. 24 .
- the bitstream generator 1930 may correspond to the entropy encoder 2425 of FIG. 24 .
- the encoder 1910 and the bitstream generator 1930 are individually illustrated in FIG. 19 , embodiments are not limited thereto.
- the encoder 1910 and the bitstream generator 1930 may be implemented through one processor.
- the encoder 1910 and the bitstream generator 1930 may be implemented as, by, or using a dedicated processor, or may be implemented through a combination of software and a general-purpose processor such as an AP, a CPU, or a GPU.
- the dedicated processor may include a memory for implementing an embodiment of the disclosure or a memory processor for using an external memory.
- the encoder 1910 may encode a current block using a reference image of the current block.
- Information about a residual block and information about a motion vector may be output as a result of encoding the current block.
- the information about the residual block may not be output by the encoder 1910 according to a rule-based coding mode (e.g., a skip mode) for the current block.
- a rule-based coding mode e.g., a skip mode
- the Al-based predictive encoder 1912 may obtain a final prediction block of the current block using the current block and the reference image.
- the final prediction block may be transmitted to the residual data obtainer 1914.
- the residual data obtainer 1914 may obtain the residual block by subtracting sample values of the final prediction block from sample values of the current block.
- the information about the motion vector may include information indicating one or more motion vector candidates included in a motion vector candidate list, for example, a flag or an index.
- the neural network 2040 may be stored in a memory.
- the neural network 2040 may be implemented as, by, or using an Al processor.
- the motion information obtainer 2010 may construct a motion vector candidate list including a motion vector of at least one of a spatial neighboring block or a temporal neighboring block of the current block as a motion vector candidate.
- the motion vector obtained by the motion information obtainer 2010 may include a motion vector for list 0, a motion vector for list 1, or a motion vector for list 0 and a motion vector for list 1.
- the motion vector for list 0 may be a motion vector for indicating a reference block in a reference image included in list 0 (or a reference image list 0)
- the motion vector for list 1 may be a motion vector for indicating a reference block in a reference image included in list 1 (or reference image list 1).
- the prediction block obtainer 2020 may obtain a preliminary prediction block using a reference block indicated by the motion vector in the reference image.
- the preliminary prediction block may be obtained by applying interpolation to the reference block indicated by the motion vector in the reference image.
- the preliminary prediction block may include sub-pixels obtained through filtering on integer pixels.
- the motion information obtainer 2010 and the prediction block obtainer 2020 may obtain the preliminary prediction block according to a rule-based prediction mode.
- the prediction block obtainer 2020 may obtain the preliminary prediction block using the reference block indicated by the motion vector for list 0 in the reference image included in list 0.
- the prediction block obtainer 2020 may obtain the preliminary prediction block using the reference block indicated by the motion vector for list 1 in the reference image included in list 1.
- the prediction block obtainer 2020 may obtain the preliminary prediction block for list 0 using the reference block indicated by the motion vector for list 0 in the reference image included in list 0, and may obtain the preliminary prediction block for list 1 using the reference block indicated by the motion vector for list 1 in the reference image included in list 1.
- the neural network setter 2030 may obtain data to be input to the neural network 2040.
- the neural network setter 2030 may obtain the data to be input to the neural network 2040 based on the reference image, the preliminary prediction block and a quantization parameter for the reference block.
- a final prediction block of the current block may be obtained when at least one of the preliminary prediction block, a POC map, or a quantization error map is applied by the neural network setter 2030 to the neural network 2040.
- the neural network setter 2030 and the neural network 2040 may be the same as, or similar to the neural network setter 230 and the neural network 240 included in the Al-based predictive decoder 132 of FIG. 2 , and thus, a redundant or duplicative description thereof may be omitted.
- the neural network setter 2030 may select any one of different methods of obtaining a quantization error map (e.g., methods of obtaining a quantization error map of FIGS. 5 to 8 ), and may obtain a quantization error map according to the selected method.
- the number of bits for expressing the motion vector of the current block may be reduced, and thus, a bit rate of a bitstream may be reduced.
- Another reason why precision of the motion vector of the current block may be changed from fractional precision to integer precision may be that when information about a motion vector with low precision is intentionally provided to the neural network 2040, the neural network 2040 may derive an accurate motion vector from the inaccurate motion vector.
- Precision change of the motion vector may be related to an AMVR mode included in the Versatile Video Coding (VVC) standard.
- the AMVR mode may be a mode in which resolutions of a residual motion vector and a motion vector are adaptively selected and used.
- a motion vector and a residual motion vector may be generally encoded/decoded using any one of resolutions of 1/4 pel, 1/2 pel, 1 pel, and 4 pel.
- a final prediction block such as one generated based on a motion vector of a higher resolution through a neural network-based prediction mode may be expected.
- FIG. 21 is a diagram for describing a method of changing a motion vector of fractional precision to a motion vector of integer precision, according to an embodiment of the disclosure.
- the motion vector A may indicate coordinates (19/4, 27/4) 2110 based on coordinates (0,0). Because the coordinates (19/4, 27/4) 2110 are not an integer pixel, the motion information obtainer 2010 may adjust the motion vector A to indicate an integer pixel.
- Coordinates of neighboring integer pixels around the coordinates (19/4, 27/4) 2110 may be (16/4, 28/4) 2130, (16/4, 24/4) 2120, (20/4, 28/4) 2140, and (20/4, 24/4) 2150.
- the motion information obtainer 2010 may change the motion vector A to indicate coordinates (20/4, 28/4) 2140 located at an upper right end, instead of the coordinates (19/4, 27/4) 2110.
- the motion information obtainer 2010 may change the motion vector A to indicate the coordinates 2120 located at a lower left end, the coordinates 2130 located at an upper left end, or the coordinates 2150 located at a lower right end.
- a method of changing fractional precision of the motion vector A to integer precision may be referred to as motion vector rounding.
- FIG. 22 is a diagram illustrating an image encoding method performed by the image encoding apparatus 1900, according to an embodiment of the disclosure.
- the image encoding apparatus 1900 may obtain a motion vector of a current block using a reference image.
- the image encoding apparatus 1900 may obtain a motion vector indicating a block similar to the current block in the reference image.
- a process of obtaining the motion vector of the current block using the reference image may be referred to as a motion prediction process.
- the image encoding apparatus 1900 may obtain a preliminary prediction block using the motion vector of the current block and the reference image of the current block.
- the image encoding apparatus 1900 may obtain the preliminary prediction block using a reference block indicated in the reference image by the motion vector of the current block.
- the preliminary prediction block may correspond to a result of applying interpolation to the reference block indicated by the motion vector of the current block in the reference image.
- sample values of the quantization error map may be calculated based on a quantization parameter for a reference block.
- the image encoding apparatus 1900 may obtain a residual block using the current block and the final prediction block.
- the information about the motion vector may include a differential motion vector between the motion vector of the current block and a prediction motion vector.
- FIG. 23 is a diagram for describing a method of training the neural network 240, according to an embodiment of the disclosure.
- the loss information 2306 corresponding to a difference between the final prediction block for training 2305 and the current block for training 2301 may be calculated, and the weight set in the neural network 240 may be updated according to the loss information 2306.
- the neural network 240 may update the weight to reduce or minimize the loss information 2306.
- the loss information 2306 may include at least one of an L1-norm value, a L2-norm value, a structural similarity index metric (SSIM) value, a peak signal-to-noise ratio-human vision system (PSNR-HVS) value, a multiscale SSIM (MS-SSIM) value, a variance inflation factor (VIF) value, or a video multimethod assessment fusion (VMAF) value, indicating a difference between the current block for training 2301 and the final prediction block for training 2305.
- SSIM structural similarity index metric
- PSNR-HVS peak signal-to-noise ratio-human vision system
- MS-SSIM multiscale SSIM
- VIP variance inflation factor
- VMAF video multimethod assessment fusion
- objectives of the image decoding apparatus 100 and the image encoding apparatus 1900 using Al, and methods thereby according to an embodiment of the disclosure may be to reduce a bit rate of a bitstream including information about a residual block.
- the method of decoding an image may include obtaining a preliminary prediction block (902) using a reference block (415;435;510;510-1;510-2;1110) indicated by the motion vector in a reference image (410;430;1100) (S1720).
- the method of decoding an image may include obtaining a final prediction block (955) for the current block (300;1410) by applying, to a neural network (240), at least one of a picture order count (POC) map (906) including a POC difference between the reference image (410;430;1100) and a current image (400;1400) including the current block (300;1410), the preliminary prediction block (902), or a quantization error map (530;530-1;530-2;530-3;530-4;904) (S1730).
- POC picture order count
- S1730 quantization error map
- the method of decoding an image may include reconstructing the current block (300;1410) based on the final prediction block (955) and a residual block obtained from a bitstream (S1740).
- sample values of the quantization error map may be calculated based on a quantization parameter for the reference block (415;435;510;510-1;510-2;1110).
- the sample values of the quantization error map may correspond to a quantization step size or a quantization error value calculated based on the quantization parameter.
- the quantization error map (530;530-1;530-2;530-3;530-4;904) may be divided into sub-areas corresponding to lower blocks of the reference block (415;435;510;510-1;510-2;1110), and sample values respectively included in the sub-areas of the quantization error map (530;530-1;530-2;530-3;530-4;904) are calculated based on a quantization parameter for a sample at a position in the lower blocks of the reference block (415;435;510;510-1;510-2;1110).
- the obtaining of the final prediction block (955) of the current block (300;1410) may include obtaining the final prediction block (955) of the current block (300;1410) by applying, to the neural network (240), at least one of an enlarged POC map, an enlarged preliminary prediction block (1150), or an enlarged quantization error map (1300).
- At least one of the enlarged POC map, the enlarged preliminary prediction block (1150), or the enlarged quantization error map (1300) may be obtained by padding the at least one of the POC map (906), the preliminary prediction block (902), or the quantization error map (530;530-1;530-2;530-3;530-4;904) according to an enlarged distance.
- the neural network In an embodiment of the disclosure, the neural network
- neighboring samples corresponding to the enlarged distance may be determined from a closest sample available in the reference image (410;430;1100).
- a method of encoding an image may include obtaining a motion vector indicating a reference block (415;435;510;510-1;510-2;1110) in a reference image (410;430;1100) corresponding to a current block (300;1410) (S2210).
- the method of encoding an image may include generating a bitstream including information about the residual block (S2250).
- sample values of the quantization error map may be calculated based on a quantization parameter for the reference block (415;435;510;510-1;510-2;1110).
- the reference block (415;435;510;510-1;510-2;1110) indicated by the motion vector having the integer precision may be determined as the preliminary prediction block (902).
- An image decoding apparatus may include at least one memory configured to store at least one instruction, and at least one processor configured to execute the at least one instruction.
- the at least one processor of the image decoding apparatus may be configured to obtain a motion vector of a current block (300;1410).
- the at least one processor of the image decoding apparatus may be configured to obtain a final prediction block (955) for the current block (300;1410) by applying, to a neural network (240), at least one of a picture order count (POC) map (906) including a POC difference between the reference image (410;430;1100) and a current image (400;1400) including the current block (300;1410), the preliminary prediction block (902), or a quantization error map (530;530-1;530-2;530-3;530-4;904).
- POC picture order count
- the at least one processor of the image decoding apparatus may be configured to reconstruct the current block (300;1410) based on the final prediction block (955) and a residual block obtained from a bitstream.
- sample values of the quantization error map may be calculated based on a quantization parameter for the reference block (415;435;510;510-1;510-2;1110).
- An image encoding apparatus may include at least one memory configured to store at least one instruction, and at least one processor configured to execute the at least one instruction.
- the at least one processor of the image encoding apparatus may be configured to obtain a motion vector indicating a reference block (415;435;510;510-1;510-2;1110) in a reference image (410;430;1100) corresponding to a current block (300;1410).
- the at least one processor of the image encoding apparatus may be configured to obtain a final prediction block (955) for the current block (300;1410) by applying, to a neural network (2040), at least one of a picture order count (POC) map (906) including a POC difference between the reference image (410;430;1100) and a current image (400;1400) including the current block (300;1410), a preliminary prediction block (902) obtained based on the reference block (415;435;510;510-1;510-2;1110), or a quantization error map (530;530-1;530-2;530-3;530-4;904).
- POC picture order count
- the at least one processor of the image encoding apparatus may be configured to obtain a residual block based on the current block (300;1410) and the final prediction block (955).
- the at least one processor of the image encoding apparatus may be configured to generate a bitstream including information about the residual block.
- sample values of the quantization error map may be calculated based on a quantization parameter for the reference block (415;435;510;510-1;510-2;1110).
- the image decoding apparatus 100 and the image encoding apparatus 1900 using AI and methods thereby according to an embodiment of the disclosure may obtain the final prediction block 955 that is more similar to the current block 300 or 1410 compared to an existing rule-based prediction mode.
- the image decoding apparatus 100 and the image encoding apparatus 1900 using AI, and methods thereby according to an embodiment of the disclosure may reduce a bit rate of a bitstream including information about a residual block.
- Embodiments of the disclosure may be provided as a computer-executable program, and the program may be stored in a machine-readable storage medium.
- the machine-readable storage medium may be provided as a non-transitory storage medium.
- 'non-transitory' means that the storage medium does not include a signal (e.g., an electromagnetic wave) and is tangible, but does not distinguish whether data is stored semi-permanently or temporarily in the storage medium.
- the 'non-transitory storage medium' may include a buffer in which data is temporarily stored.
- the computer program product is a product purchasable between a seller and a purchaser.
- the computer program product may be distributed in a form of machine-readable storage medium (e.g., a CD-ROM), or distributed (e.g., downloaded or uploaded) through an application store or directly or online between two user devices (e.g., smart phones).
- machine-readable storage medium e.g., a CD-ROM
- distributed e.g., downloaded or uploaded
- an application store e.g., smart phones
- at least part of the computer program product e.g., a downloadable application
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Compression Or Coding Systems Of Tv Signals (AREA)
Applications Claiming Priority (3)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| KR20220103412 | 2022-08-18 | ||
| KR1020220112984A KR20240025420A (ko) | 2022-08-18 | 2022-09-06 | Ai를 이용하는 영상 복호화 장치, 영상 부호화 장치 및 이들에 의한 방법 |
| PCT/KR2023/012059 WO2024039166A1 (fr) | 2022-08-18 | 2023-08-14 | Appareil de décodage d'image et appareil de codage d'image utilisant ia et procédé par lesdits appareils |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| EP4576772A1 true EP4576772A1 (fr) | 2025-06-25 |
Family
ID=89906350
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| EP23855133.7A Pending EP4576772A1 (fr) | 2022-08-18 | 2023-08-14 | Appareil de décodage d'image et appareil de codage d'image utilisant ia et procédé par lesdits appareils |
Country Status (4)
| Country | Link |
|---|---|
| US (1) | US12425655B2 (fr) |
| EP (1) | EP4576772A1 (fr) |
| CN (1) | CN119654864A (fr) |
| MX (1) | MX2025000974A (fr) |
Family Cites Families (24)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US9924191B2 (en) | 2014-06-26 | 2018-03-20 | Qualcomm Incorporated | Filters for advanced residual prediction in video coding |
| CN108496367B (zh) | 2015-11-11 | 2022-07-12 | 三星电子株式会社 | 用于对视频进行解码的方法和设备以及用于对视频进行编码的方法和设备 |
| EP3618435A4 (fr) | 2017-07-07 | 2020-03-18 | Samsung Electronics Co., Ltd. | Appareil et procédé pour coder un vecteur de mouvement déterminé à l'aide d'une résolution de vecteur de mouvement adaptative, et appareil et procédé de décodage de vecteur de mouvement |
| KR102535361B1 (ko) * | 2017-10-19 | 2023-05-24 | 삼성전자주식회사 | 머신 러닝을 사용하는 영상 부호화기 및 그것의 데이터 처리 방법 |
| WO2019078427A1 (fr) | 2017-10-19 | 2019-04-25 | 엘지전자(주) | Procédé de traitement d'image basé sur un mode d'interprédiction et dispositif associé |
| MX2020012042A (es) | 2018-05-17 | 2021-01-29 | Panasonic Ip Corp America | Dispositivo de codificacion, dispositivo de decodificacion, metodo de codificacion y metodo de decodificacion. |
| WO2020031902A1 (fr) | 2018-08-06 | 2020-02-13 | パナソニック インテレクチュアル プロパティ コーポレーション オブ アメリカ | Dispositif de codage, dispositif de décodage, procédé de codage et procédé de décodage |
| CN111953995A (zh) | 2019-05-17 | 2020-11-17 | 华为技术有限公司 | 一种帧间预测的方法和装置 |
| JP7318314B2 (ja) | 2019-05-30 | 2023-08-01 | 富士通株式会社 | 符号化プログラム、復号プログラム、符号化装置、復号装置、符号化方法及び復号方法 |
| CN114175056A (zh) * | 2019-07-02 | 2022-03-11 | Vid拓展公司 | 用于神经网络压缩的基于聚类的量化 |
| WO2022031003A1 (fr) | 2020-08-04 | 2022-02-10 | 현대자동차주식회사 | Procédé de prédiction d'un paramètre de quantification utilisé dans un dispositif de codage/décodage d'image |
| KR20220017372A (ko) | 2020-08-04 | 2022-02-11 | 현대자동차주식회사 | 영상 부/복호화 장치에서 이용하는 양자화 파라미터 예측 방법 |
| WO2022031115A1 (fr) | 2020-08-06 | 2022-02-10 | 현대자동차주식회사 | Codage et décodage d'image par prédiction inter basée sur apprentissage profond |
| KR20220018447A (ko) | 2020-08-06 | 2022-02-15 | 현대자동차주식회사 | 딥러닝 기반 인터 예측을 이용하는 영상 부호화 및 복호화 |
| US12062150B2 (en) * | 2020-09-30 | 2024-08-13 | Tencent America LLC | Method and apparatus for block-wise neural image compression with post filtering |
| US11451790B2 (en) | 2020-10-09 | 2022-09-20 | Tencent America LLC | Method and apparatus in video coding for machines |
| US11190760B1 (en) * | 2020-10-15 | 2021-11-30 | Tencent America LLC | Neural network based coefficient sign prediction |
| US11665363B2 (en) | 2020-11-26 | 2023-05-30 | Electronics And Telecommunications Research Institute | Method, apparatus, system and computer-readable recording medium for feature map information |
| WO2022128137A1 (fr) | 2020-12-18 | 2022-06-23 | Huawei Technologies Co., Ltd. | Procédé et appareil pour coder une image et décoder un train de bits à l'aide d'un réseau neuronal |
| US11490078B2 (en) * | 2020-12-29 | 2022-11-01 | Tencent America LLC | Method and apparatus for deep neural network based inter-frame prediction in video coding |
| US12327384B2 (en) * | 2021-01-04 | 2025-06-10 | Qualcomm Incorporated | Multiple neural network models for filtering during video coding |
| CN117063470A (zh) * | 2021-02-20 | 2023-11-14 | 抖音视界有限公司 | 图像/视频编解码中的边界上的填充样点滤波 |
| US11889112B2 (en) * | 2021-04-30 | 2024-01-30 | Tencent America LLC | Block-wise content-adaptive online training in neural image compression |
| US12439038B2 (en) * | 2022-07-05 | 2025-10-07 | Qualcomm Incorporated | Reduced complexity multi-mode neural network filtering of video data |
-
2023
- 2023-08-14 EP EP23855133.7A patent/EP4576772A1/fr active Pending
- 2023-08-14 CN CN202380058752.0A patent/CN119654864A/zh active Pending
- 2023-08-23 US US18/237,109 patent/US12425655B2/en active Active
-
2025
- 2025-01-24 MX MX2025000974A patent/MX2025000974A/es unknown
Also Published As
| Publication number | Publication date |
|---|---|
| US20240064336A1 (en) | 2024-02-22 |
| US12425655B2 (en) | 2025-09-23 |
| MX2025000974A (es) | 2025-03-07 |
| CN119654864A (zh) | 2025-03-18 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN110809887B (zh) | 用于多参考预测的运动矢量修正的方法和装置 | |
| US8098731B2 (en) | Intraprediction method and apparatus using video symmetry and video encoding and decoding method and apparatus | |
| CN101573984B (zh) | 用于使用多个运动矢量预测因子来估计运动矢量的方法和设备,编码器、解码器和解码方法 | |
| US20080240247A1 (en) | Method of encoding and decoding motion model parameters and video encoding and decoding method and apparatus using motion model parameters | |
| KR20200064153A (ko) | 영상 코딩 시스템에서 인터 예측에 따른 영상 디코딩 방법 및 장치 | |
| TWI517678B (zh) | Image predictive coding apparatus, image predictive coding method, image predictive coding program, image predictive decoding apparatus, image predictive decoding method, and image predictive decoding program | |
| US11863783B2 (en) | Artificial intelligence-based image encoding and decoding apparatus and method | |
| US20160080769A1 (en) | Encoding system using motion estimation and encoding method using motion estimation | |
| JP7483988B2 (ja) | 画像コーディングシステムにおいてコンストラクテッドアフィンmvp候補を使用するアフィン動き予測に基づいた画像デコード方法及び装置 | |
| US11863756B2 (en) | Image encoding and decoding apparatus and method using artificial intelligence | |
| KR20230022085A (ko) | Ai 기반의 영상의 부호화 및 복호화 장치, 및 이에 의한 방법 | |
| EP4576772A1 (fr) | Appareil de décodage d'image et appareil de codage d'image utilisant ia et procédé par lesdits appareils | |
| US12170786B2 (en) | Device and method for encoding and decoding image using AI | |
| EP4432233A1 (fr) | Dispositif de codage d'image et dispositif de décodage d'image utilisant une intelligence artificielle (ai) et procédé permettant de coder et de décoder une image au moyen de ce dernier | |
| KR20240025420A (ko) | Ai를 이용하는 영상 복호화 장치, 영상 부호화 장치 및 이들에 의한 방법 | |
| US12231646B2 (en) | Apparatus and method for applying artificial intelligence-based filtering to image | |
| US20240073425A1 (en) | Image encoding apparatus and image decoding apparatus both based on artificial intelligence, and image encoding method and image decoding method performed by the image encoding apparatus and the image decoding apparatus | |
| KR20220120432A (ko) | Ai를 이용하는 영상의 부호화 및 복호화 장치 및 이에 의한 영상의 부호화 및 복호화 방법 | |
| EP4576771A1 (fr) | Dispositif de décodage d'image et dispositif de codage d'image pour une quantification adaptative et une quantification inverse, et procédé réalisé par celui-ci | |
| US20250350749A1 (en) | Image decoding method and device, and image encoding method and device | |
| WO2025227394A1 (fr) | Système et procédé de synthèse unifiée d'images de référence | |
| WO2025242148A1 (fr) | Procédés, appareil, support et produit programme d'ordinateur pour codage vidéo | |
| KR20240115147A (ko) | 영상 복호화 방법 및 장치, 및 영상 부호화 방법 및 장치 | |
| EP4654586A1 (fr) | Procédé et dispositif de codage/décodage d'image, et support de stockage pour stocker un flux binaire | |
| KR20250033760A (ko) | 최적화된 양자화 및 역양자화를 위한 영상 복호화 장치, 영상 복호화 방법, 영상 부호화 장치, 및 영상 부호화 방법 |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: THE INTERNATIONAL PUBLICATION HAS BEEN MADE |
|
| PUAI | Public reference made under article 153(3) epc to a published international application that has entered the european phase |
Free format text: ORIGINAL CODE: 0009012 |
|
| STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: REQUEST FOR EXAMINATION WAS MADE |
|
| 17P | Request for examination filed |
Effective date: 20250120 |
|
| AK | Designated contracting states |
Kind code of ref document: A1 Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC ME MK MT NL NO PL PT RO RS SE SI SK SM TR |
|
| DAV | Request for validation of the european patent (deleted) | ||
| DAX | Request for extension of the european patent (deleted) |