[go: up one dir, main page]

WO2025018639A1 - Procédé et appareil de codage et de décodage vidéo à l'aide d'une structure de prédiction intra améliorée - Google Patents

Procédé et appareil de codage et de décodage vidéo à l'aide d'une structure de prédiction intra améliorée Download PDF

Info

Publication number
WO2025018639A1
WO2025018639A1 PCT/KR2024/009075 KR2024009075W WO2025018639A1 WO 2025018639 A1 WO2025018639 A1 WO 2025018639A1 KR 2024009075 W KR2024009075 W KR 2024009075W WO 2025018639 A1 WO2025018639 A1 WO 2025018639A1
Authority
WO
WIPO (PCT)
Prior art keywords
prediction
template
block
vector
information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
PCT/KR2024/009075
Other languages
English (en)
Korean (ko)
Inventor
류창우
이진영
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Kaon Group Co Ltd
Industry Academy Cooperation Foundation of Sejong University
Original Assignee
Kaon Group Co Ltd
Industry Academy Cooperation Foundation of Sejong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from KR1020230142997A external-priority patent/KR20250015701A/ko
Application filed by Kaon Group Co Ltd, Industry Academy Cooperation Foundation of Sejong University filed Critical Kaon Group Co Ltd
Publication of WO2025018639A1 publication Critical patent/WO2025018639A1/fr
Pending legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/103Selection of coding mode or of prediction mode
    • H04N19/105Selection of the reference unit for prediction within a chosen coding or prediction mode, e.g. adaptive choice of position and number of pixels used for prediction
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/132Sampling, masking or truncation of coding units, e.g. adaptive resampling, frame skipping, frame interpolation or high-frequency transform coefficient masking
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/17Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
    • H04N19/176Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a block, e.g. a macroblock
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/186Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being a colour or a chrominance component
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/593Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving spatial prediction techniques
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/70Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by syntax aspects related to video coding, e.g. related to compression standards

Definitions

  • the present invention relates to the field of encoding and decoding of digital video, and more particularly, to a method for encoding and decoding digital video, a method for recording such data, and components, devices, and systems for realizing such a method.
  • the present invention may be in the same technical field as at least one of digital video compression technology standards known by the standard names such as MPEG-2, MPEG-4 Video, H.263, H.264/AVC, H.265/HEVC, H.266/VVC, VC-1, AV1, QuickTime, VP-9, VP-10, Motion JPEG, or in the technical field for improving the inherent efficiency of the standard, or in the technical field for improving or replacing the standard.
  • digital video compression technology standards known by the standard names such as MPEG-2, MPEG-4 Video, H.263, H.264/AVC, H.265/HEVC, H.266/VVC, VC-1, AV1, QuickTime, VP-9, VP-10, Motion JPEG, or in the technical field for improving the inherent efficiency of the standard, or in the technical field for improving or replacing the standard.
  • Digital video encoding and decoding are widely used in various digital video applications. For example, digital television broadcasting, transmission of video through communication networks, video calls/video conversations/video chats, recording and providing video content using optical media including VCD (video compact disc)/DVD (digital versatile disc)/Blu-Ray, all procedures for producing, editing, collecting, and distributing video content, and devices such as video recording devices and camcorders for shooting and recording video for various reasons including personal, commercial, industrial, and security purposes, all depend on video encoding and decoding technologies.
  • VCD video compact disc
  • DVD digital versatile disc
  • Blu-Ray digital versatile disc
  • implementations that may be referred to as digital video encoders and decoders may form part of a wide range of devices, including digital televisions, digital broadcasting systems, wireless broadcasting systems, computers in the form of notebooks/desktops/tablets, e-book readers, digital cameras, digital recording devices, digital multimedia playback devices, video game devices/terminals/console, mobile phones (including smartphones) with multimedia playback capabilities, devices for video conferencing, and other devices related to the generation, recording, and provision of digital video.
  • the above digital video encoders and decoders can be implemented by a digital video compression standard that is widely used and understood by those skilled in the art.
  • the digital video compression standard can include at least one of compression standards known by standard names such as MPEG-2, MPEG-4 Video, H.263, H.264/AVC, H.265/HEVC, H.266/VVC, VC-1, AV1, QuickTime, VP-9, VP-10, and Motion JPEG.
  • Video encoders and decoders can be implemented to encode or decode digital video information more efficiently while complying with the above standards, or by improving or modifying the above standards. Attempts to modify the above standards can also lead to the derivation of new standards.
  • ECM enhanced compression model
  • JVET joint video experts team
  • the encoding process for compressing digital video requires various operations, such as spatial segmentation of digital video, segmentation and/or processing in color channels, removal of spatial redundancy, removal of temporal redundancy, tracking of motion vectors in the video, encoding of differential images, quantization, coefficient scan, run-length coding, entropy coding, and loop filtering. These encoding operations generally consume computing resources and take a certain amount of time to complete. Similarly, the decoding operation for the encoding operation also requires certain computing resources and a certain amount of time. The main goal of video encoding and decoding technology is to ensure that the consumption of resources and time does not interfere with the production, recording, distribution, and viewing of digital videos.
  • the technical problem of the present invention is to provide a new technology that can contribute to at least one of improvement in encoding efficiency, improvement in decoding efficiency, improvement in video quality, reduction in computational amount, reduction in software size, reduction in hardware size, and improvement in other performances related to encoding and decoding.
  • a decoding method based on template matching performed by a decoder includes the steps of: obtaining information about a template, which is a set of samples adjacent to a current block, from a reconstruction area of a current picture; performing template matching for searching a prediction template corresponding to the template within a search area included in the reconstruction area; selecting a prediction block adjacent to the prediction template; and performing intra-prediction decoding for the current block based on the prediction block, wherein the template matching may be performed using some samples within the template or using some of the search area.
  • the step of performing the above template matching may include the step of downsampling the template, and the step of searching for a predicted template location by comparing a sample of the downsampled template with a sample of the search area.
  • the above downsampling step may include a step of adjusting at least one of whether to perform the downsampling and the magnification for upper samples of the template based on the horizontal size of the current block.
  • the above downsampling step may include a step of adjusting at least one of whether to execute the downsampling and the magnification for the left samples of the template based on the vertical size of the current block.
  • the above downsampling step may be characterized in that it is executed based on downsampling symbol information derived from a bitstream.
  • the above search area may be characterized in that it is limited based on a prediction vector from the current block.
  • the above prediction vector may be characterized in that it is generated based on at least one reference prediction vector information of a surrounding block of the current block.
  • the above search area may be characterized by having a first sample width horizontally and a second sample width vertically, with the point indicated by the prediction vector as the center point.
  • the above prediction vector may be characterized in that it is selected by an index from at least one block vector candidate of a surrounding block of the current block.
  • the above index may be characterized as being derived from a bit string.
  • the above prediction vector may be characterized in that it is generated based on at least one reference prediction vector information of a surrounding block of the current block and differential vector information indicating a difference from the reference prediction information.
  • the method may further include a step of searching for an optimal prediction block position closest to the current block sample by comparing the current block sample with samples of a search area included in a restoration area of the image, and the prediction vector may be generated based on differential vector information indicating a difference between at least one reference prediction vector information of a surrounding block of the current block and optimal prediction vector information pointing to the optimal prediction block position.
  • the above differential vector information may be characterized as being derived from a bit string.
  • the above differential vector information may be characterized by being designated as an index designating one of at least two predetermined values.
  • the above method may further include a step of the decoder storing the prediction vector for use in a subsequent decoding process.
  • the method may further include a step of generating the prediction vector for at least one lower decoding unit, a step of designating one of the at least one prediction vector as a representative prediction vector for an upper decoding unit including the lower decoding unit, and a step of storing the representative prediction vector by the decoder.
  • the method may further include, when executing the template matching in the chroma signal domain, a step of generating a prediction vector based on at least one reference prediction vector corresponding to a block corresponding to the same position in the luminance signal domain with respect to the chroma signal domain.
  • a decoder device may include a template matching unit which obtains information about a template, which is a set of samples adjacent to a current block, from a restoration area of a current picture, and performs template matching to search for a prediction template corresponding to the template within a search area included in the restoration area, and a prediction decoding unit which selects a prediction block adjacent to the prediction template and performs intra-prediction decoding for the current block based on the prediction block, wherein the template matching may be performed using some samples within the template or using some of the search area.
  • an encoding method based on template matching performed by an encoder comprises the steps of: obtaining information about a template, which is a set of samples adjacent to a current block, from an already encoded area of a current picture; performing template matching for searching a prediction template corresponding to the template within a search area included in the already encoded area; selecting a prediction block adjacent to the prediction template; and performing intra-prediction encoding for the current block based on the prediction block, wherein the template matching may be performed using some samples within the template or using some of the search area.
  • an encoding device may include: a template matching unit which obtains information about a template, which is a set of samples adjacent to a current block, from an already encoded area of a current picture, and performs template matching to search for a prediction template corresponding to the template within a search area included in the already encoded area; and a prediction encoding unit which selects a prediction block adjacent to the prediction template and performs intra-prediction encoding for the current block based on the prediction block, wherein the template matching may be performed using some samples within the template or using some of the search area.
  • the complexity of template matching for performing intra prediction in a video encoder and decoder is reduced, thereby enabling efficient and high-speed video encoding and decoding.
  • Figure 1 is a conceptual diagram of a video communication system according to one embodiment of the present invention.
  • FIG. 2 is a conceptual diagram of the arrangement of an encoder and decoder in a real-time video streaming environment according to one embodiment of the present invention.
  • Figure 3 is a conceptual diagram of a functional unit of a video decoder according to one embodiment of the present invention.
  • Figure 4 is a conceptual diagram of a functional unit of a video encoder according to one embodiment of the present invention.
  • Figure 5 is a conceptual diagram of a frame type according to one embodiment of the present invention.
  • FIG. 6 is a conceptual diagram showing the structure of a video encoder according to another embodiment of the present invention.
  • Figure 7 is a conceptual diagram illustrating a template matching prediction method according to one embodiment.
  • Figure 8 is a conceptual diagram for the storage and use of block vectors in a template matching prediction method.
  • FIG. 9 is an example of downsampling of a template according to one embodiment of the present invention.
  • FIG. 10 is an example of downsampling of a template by block size according to one embodiment of the present invention.
  • FIG. 11 is an example diagram for setting a search range of a block vector according to one embodiment of the present invention.
  • FIG. 12 is a conceptual diagram of a block vector determination method based on a prediction vector according to one embodiment of the present invention.
  • first, second, etc. may be used to describe various components, the components should not be limited by the terms. The terms are only used to distinguish one component from another.
  • first component could be referred to as the second component, and similarly, the second component could also be referred to as the first component, without departing from the scope of the present invention.
  • the term "and/or" includes any combination of a plurality of related listed items or any of a plurality of related listed items, and is non-exclusive unless otherwise indicated.
  • the listing of items in this application is merely an exemplary description to easily explain the spirit and possible implementation methods of the present invention, and therefore, is not intended to limit the scope of embodiments of the present invention.
  • a or B can mean “only A”, “only B”, or “both A and B”. In other words, as used herein, “A or B” can be interpreted as “A and/or B”. For example, as used herein, “A, B or C” can mean “only A”, “only B”, “only C”, or “any combination of A, B and C”.
  • a slash (/) or a comma can mean “and/or”.
  • A/B can mean “A and/or B”.
  • A/B can mean "only A”, “only B”, or “both A and B”.
  • A, B, C can mean "A, B, or C”.
  • At least one of A and B can mean “only A”, “only B” or “both A and B”. Additionally, as used herein, the expressions “at least one of A or B” or “at least one of A and/or B” can be interpreted identically to “at least one of A and B”.
  • At least one of A, B and C can mean “only A”, “only B”, “only C”, or “any combination of A, B and C”. Additionally, “at least one of A, B or C” or “at least one of A, B and/or C” can mean “at least one of A, B and C”.
  • the embodiments may be described or illustrated in terms of unit blocks that perform the described function or functions.
  • the blocks may be expressed as one or more devices, units, modules, parts, etc. in this application.
  • the blocks may be implemented in hardware by one or more logic gates, integrated circuits, processors, controllers, memories, electronic components, or information processing hardware implementation methods that are not limited thereto.
  • the blocks may be implemented in software by application software, operating system software, firmware, or information processing software implementation methods that are not limited thereto.
  • One block may be implemented by being separated into multiple blocks that perform the same function, or conversely, one block may be implemented to perform the functions of multiple blocks simultaneously.
  • the blocks may also be implemented by being physically separated or combined by any criterion.
  • the blocks may be implemented to operate in an environment where their physical locations are not specified and are separated from each other by a communication network, the Internet, a cloud service, or a communication method that is not limited thereto. All of the above implementation methods are within the scope of various embodiments that can be taken by a person skilled in the field of information and communication technology to implement the same technical idea, and therefore, any detailed implementation method should be interpreted as being included within the scope of the technical idea of the invention of the present application.
  • FIG. 1 is a conceptual diagram of a video communication system according to one embodiment of the present invention.
  • the video communication system (100) may be configured to include at least two terminals (110, 120) that are connected to each other through a network (105).
  • FIG. 1 may mean a block diagram for configuring a one-way video communication network.
  • a first terminal (110) among the terminals may encode the video data in order to transmit (111) the video data through a network (105).
  • a second terminal (120) among the terminals may be configured to receive (121) the encoded video data through a network and decode and display it.
  • each terminal 110, 120 may be configured to encode video data acquired by itself for video transmission (112, 122) to each other terminal via the network.
  • Each terminal may also be configured to receive (113, 123) and decode video data transmitted by another terminal via the network, and display the decoded video data.
  • the terminals (110, 120) shown in Fig. 1 may be exemplified as devices such as server computers, personal computers, portable computers, and smart phones, depending on the embodiment, but are not limited thereto.
  • the present invention is applicable to all environments for forming a one-way or two-way video communication network, and it should be considered that the network (105) can be formed by any means for transporting encoded video data between the terminals (110, 120).
  • the network (105) may mean a wired or wireless communication network.
  • the network may be configured to communicate information using any communication standard, and the communication standard may include packet-based communication.
  • the packet communication may be understood to mean including packets known as TCP or UDP, for example.
  • the network (105) may be understood to include a process of information transmission using a recording medium.
  • the configuration of the network is not limited to a communication medium, and should be understood to include a process of temporarily storing and physically transporting information on a hard disk, a solid state disk (SSD), a flash memory, a compact disc (CD), a digital versatile disc (DVD), a Blu-ray disc, and other mechanical, electronic, or optical recording media.
  • SSD solid state disk
  • CD compact disc
  • DVD digital versatile disc
  • Blu-ray disc and other mechanical, electronic, or optical recording media.
  • Any other means of information communication or transportation applied can be considered to fall within the scope of embodiments of the present invention as long as it has a structure that supports transmitting video data in an encoded state and decoding it. Accordingly, in addition to some of the examples listed above, all means of information communication or transportation known in the past or newly provided can fall within the scope of application of the present invention.
  • FIG. 2 is a conceptual diagram of the arrangement of an encoder and decoder in a real-time video streaming environment according to one embodiment of the present invention.
  • the streaming system (200) illustrated in FIG. 2 can be considered to be applied to, for example, a video data communication network including digital broadcasting, video telephony, and video conferencing.
  • a video data communication network including digital broadcasting, video telephony, and video conferencing.
  • the technical structure identical to or similar to the streaming system can be equally applied to a case where information is transmitted via a recording medium as described above.
  • the streaming system may include a video source (210) that generates a video stream.
  • the video source may include a digital video acquisition means (212), which may be, for example, a digital camera or other device, for acquiring uncompressed raw video.
  • the raw video stream (215) may have a huge capacity and may therefore be compressed by a video encoder (217) coupled or connected to the video source.
  • the above encoder (217) may be configured as a means including hardware, software, or a combination of the two, configured to implement an image encoding method and/or an implementation method thereof according to one embodiment of the present invention.
  • an encoded bit stream (219) with a reduced capacity compared to the original video stream can be output.
  • the bit stream (219) can be provided in real time for communication by a relay device, which may be referred to as a streaming server (220), for example, and/or can be stored in a recording medium (225) of the streaming server (220) for subsequent use.
  • the streaming system (200) may include at least one streaming client (230, 240) that connects to the streaming server (220) to receive the encoded bit stream (229) in real time or obtain it later.
  • the streaming client may include a video decoder (232) that obtains the encoded bit stream (229) (which may also be considered as a copy of the bit stream (219) received by the streaming server), decodes the bit stream (229), and outputs the resulting video data as video data in a form that can be displayed by a display (235) or other visual, auditory, or other sensory display means.
  • the functions for encoding and decoding video data are collectively called a coder-and-decoder, or video codec.
  • FIG. 3 is a conceptual diagram of a functional unit of a video decoder according to an embodiment of the present invention.
  • a receiving unit (310) can receive at least one encoded video data to be decoded by a decoder (305).
  • the encoded video data can be independent for each reception, and a decoding procedure of each independent video data can be independent from a decoding procedure of other video data.
  • the encoded video data can be received by the receiving unit (310) through a hardware or software connection (315) to a device storing the same, and as described above, the storing device may be a kind of streaming server located at the other end of a communication network, or may mean a physical recording medium, but is not limited thereto.
  • the above receiving unit (310) can receive the encoded video data together with other data accompanying it, such as encoded audio data or other auxiliary data, and each of the data can be separated from the video data and provided to an appropriate processing function unit (312) other than the video decoder.
  • a buffer memory (320) may be coupled between the receiving unit (310) and the decoder (305) in order to minimize delay and disconnection according to the network environment.
  • the buffer memory (320) may mean a computer-readable recording medium that temporarily stores the received video data and stably supplies it to a parser (330) corresponding to the input terminal of the decoder (305).
  • the buffer memory may be unnecessary.
  • the video decoder (305) may include the parser (330) as its input terminal to interpret the encoded video data.
  • the parser may perform a function of separating (parsing) a plurality of pieces of information stored in the form of a bit string in the encoded video data according to a predetermined rule, and, if necessary, performing an entropy decoding (335) of entropy-coded video data, thereby performing a function of reconstructing symbols (338), which are paragraphs of video encoding information.
  • the symbols (338) may include all information for controlling the operation of the decoder (305), and/or may further include information for controlling a device that is attached to the decoder (305) and can operate, such as a display device, such as a display. Control information for controlling the above display device may include information in a format called supplementary enhancement information (SEI) or video usability information (VUI).
  • SEI Supplemental Enhancement Information
  • VUI video usability information
  • the parser (330) may be configured to perform entropy decoding (335) of the encoded video data.
  • the entropy encoding method of the encoded video data may vary depending on the encoding standard, and decoding may be performed correspondingly.
  • Representative examples of the entropy encoding standard may include variable length coding, Huffman coding, and arithmetic coding, and each of the encoding methods may be a context-adaptive or context-sensitive method depending on the standard, and may be based on principles widely known to those skilled in the art.
  • the parser (330) may be configured to extract at least one picture from the encoded video data.
  • the definition of the picture may vary depending on the encoding standard, and depending on the standard, one or more of the examples listed below may correspond simultaneously and overlappingly.
  • the picture may be grouped, defined, and/or divided into encoding/decoding units, such as, for example, a group of pictures (GOPs), pictures/frames, tiles, slices, macroblocks, blocks, subblocks, transform units (TUs), and prediction units (PUs).
  • GOPs group of pictures
  • TUs transform units
  • PUs prediction units
  • the parser (330) may be configured to extract encoding information, such as transform coefficients, quantization parameters (QPs), and/or motion vectors, from the encoded video data.
  • the parser (330) may be configured to perform entropy decoding (335) and parsing operations on the video data received from the buffer memory, and to selectively decode symbols (338) representing the encoding information.
  • the parser (330) may be configured to selectively supply a specific symbol (338) to a specific decoding function unit within the decoder (305), such as an inverse quantization and inverse transform unit (340), an intra prediction unit (350), an inter prediction unit (355), or a loop filter unit (360). Control of such information supply can be determined by the information sequence contained in the encoded video, and may vary depending on the encoding standard, and is not limited within the scope of the embodiments of the present invention, and is not described in detail in this conceptual diagram.
  • the above decoder (305) may be composed of a plurality of conceptual functional units that receive and process the encoding information from the parser (330). It is obvious that these conceptual functional units may be combined with each other or further subdivided according to implementation needs. For example, they may be further separated for ease of implementation and integrated into one for efficiency of operation. In any case, each functional unit may be configured to perform close interaction with each other. However, despite the possibility of such integration or separation, the following will be described as a combination of conceptual functional units in order to illustrate a decoding procedure of video data applied as an embodiment of the present invention.
  • the above decoder may include an inverse quantization and inverse transformation unit (340).
  • the inverse quantization and inverse transformation unit (340) may be configured to receive encoding information including a method to be used for numerical transformation from the parser (330), a size of a block, quantization coefficients for recovering quantized information, and distinction information of a quantization matrix that simplifies and represents the quantized coefficients, and may be configured to output block values (341) that may be input to an aggregator (370) as a result of processing the encoding information.
  • the output values of the inverse quantization and inverse transformation unit (340) may include intra-prediction encoded block values.
  • the intra-predicted block values may mean values that can be decoded without using prediction information from a previously decoded picture, for example, a previous frame, but using prediction information within a picture currently being decoded, for example, a current frame.
  • the prediction information within the current picture may be provided by the intra prediction unit (350).
  • the intra prediction unit (350) generates a block value having the same form as the block being decoded as the prediction information by using the picture information of a spatially adjacent area derived from a picture currently being decoded and of which decoding has been partially completed.
  • the picture information may be provided (381) from a buffer for the current picture, a so-called line buffer (380).
  • the merging unit (370) may be configured to merge the prediction information (351) generated by the intra prediction unit (350) with the block values (341) provided by the inverse quantization and inverse transformation unit (340), according to an embodiment.
  • the output values of the inverse quantization and inverse transformation unit (340) may include block values subjected to motion compensation as inter-prediction encoded block values, and in some cases, may include block values subjected to motion compensation.
  • the inter-prediction unit (355) may extract and use sample information (386) used for motion-based prediction from a reference picture buffer (385).
  • the information (356) derived by performing motion compensation on the sample information based on the symbols (338) included in the block values as the output values may be configured to be merged with the block values (341) provided by the inverse quantization and inverse transformation unit (340) by the merger unit (370).
  • the block values (341) may be referred to as so-called differential or residual values.
  • the position information in the memory used by the inter prediction unit (355) to extract the sample information from the reference picture may be determined by a motion vector provided to the inter prediction unit (355) which is composed of a combination of symbols (338) for indicating, for example, X, Y, and other specific points of the reference picture.
  • the inter prediction unit (355) may also include a function capable of interpolating and using the sample values when a so-called 'subsampling' capable motion vector is provided, and may further include a function for predicting and reinforcing the value of the motion vector.
  • the output values (371) of the above merging unit (370) may be provided to the loop filter unit (360) and processed by various loop filtering methods.
  • the loop filter unit (360) may be configured to receive not only the block unit output (371) of the merging unit (370) but also the symbol (338) provided from the parser (330) and control its operation.
  • the output of the loop filter unit (360) may be output to an external display means such as the display device through an output connection (390), but may be stored (361) in a line buffer (380) for use in prediction for interpreting a subsequent intra or inter encoded block value, and may also be stored in a reference picture buffer (385) through this.
  • Specific pictures such as frames, after their decoding is completed, can be utilized as reference pictures for performing predictive decoding in a subsequent decoding process.
  • One picture (or frame) can be gradually accumulated in a line buffer (380) and decoded, and when one frame is decoded, the contents of the line buffer (380) can be transferred (383) to the reference picture buffer (385), and a new line buffer (380) can be allocated for decoding a new frame.
  • the above video decoder (305) may be configured to perform a decoding operation according to a predetermined video compression technique that may be documented by various international standard specifications or commercial standards.
  • the standards may include, for example, international standard recommendations such as H.264, H.265, and H.266 defined by the International Telecommunication Union Standardization Sub-Tier (ITU-T).
  • ITU-T International Telecommunication Union Standardization Sub-Tier
  • ISO International Organization for Standardization
  • IEC International Electrotechnical Conference
  • the encoded video data may comply with a specific bitstream syntax defined by the corresponding standard as defined in the video compression standard document and standard document, and specifically, by the profile and level specified within such document.
  • the complexity of the encoded video data may be limited to a certain level in order to comply with the profile and level.
  • a profile or level may be configured to limit a maximum picture size, a maximum decoding speed, and a maximum reference picture size.
  • the limitations may also, in some embodiments, be further limited via metadata signals for a hypothetical reference decoder (HRD) and HRD buffer management included in the encoded video data.
  • HRD hypothetical reference decoder
  • the receiver (310) may receive additional redundant data along with the encoded video.
  • the additional data may be considered as a part of the encoded video data.
  • the additional data may include information that may be used by the decoder (305) to properly decode the data, or to more accurately reconstruct an image that approximates the original image before encoding.
  • the additional data may be provided in the form of, for example, layers for temporal, spatial, or signal-to-noise ratio (SNR) enhancement, redundant slices, redundant pictures, and forward error correction codes.
  • SNR signal-to-noise ratio
  • Figure 4 is a functional unit conceptual diagram of a video encoder according to one embodiment of the present invention.
  • the encoder (405) may be configured to receive original video information (402) from a video source (401) and perform encoding.
  • the above original video information (402) may have any suitable bit depth, for example, 8 bits, 10 bits, 12 bits, etc.
  • the original video information (402) may have any suitable color space, for example, R/G/B, Y/U/V, Y/Cb/Cr, etc.
  • the original video information (402) may have any suitable sampling structure corresponding to the color space, for example, may have a format such as Y/Cb/Cr 4:2:0, Y/Cb/Cr 4:4:4.
  • the original video information (402) having such a predetermined format may be provided to the encoder in the form of a digital video stream.
  • the original video information (402) can be obtained from a recording medium storing a previously prepared video original.
  • the original video information (402) can be obtained from an image acquisition device, such as a camera, that generates at least one video transmission stream included in the two-way video communication.
  • the video data including the above original video information (402) may be configured as a plurality of pictures configured to simulate motion by being played in time order.
  • the picture may also be expressed as a concept such as a frame in addition to a picture.
  • the picture may include one or more samples depending on the type of sampling structure, color space, etc. being used. A person skilled in the art will understand that the sample and the pixel in a digital image are closely related terms. Hereinafter, the operation of the encoder will be described based on such samples.
  • the encoder (405) may be configured to encode and compress pictures (and/or information grouped or divided thereof) constituting the original video information (402) in real time (or according to other temporal requirements required according to the implementation method) into the form of encoded video information.
  • control unit (450) may be a functional unit configured to control an appropriate encoding speed.
  • the control unit (450) may be configured to control other functional units and be functionally coupled to the following functional units as described below.
  • the parameters set by the control unit (450) may include parameters related to bitrate control, such as skip of a picture, a quantizer, variable values for applying a picture quality optimization technique, and may also include values such as the size of a picture, the structure of a group of pictures (GOP), and the maximum search range of a motion vector.
  • a person skilled in the art will be able to understand various other functions that the control unit (450) may have, and such other functions may be added or removed according to the design of a video encoder optimized for an individual system design.
  • the encoder (405) may be configured to operate in a structure such as a "coding loop" well known to those skilled in the art.
  • the coding loop may be configured with an internal encoder (so-called “source coder”) (410) which is responsible for receiving a picture to be encoded and generating symbols based on at least one reference picture that has been encoded in the past, and a local decoder (420) which is configured to be connected to the internal encoder.
  • the local decoder (420) may be configured to perform an operation to reproduce sample data to be generated by a decoder (490) located at an actual remote location that will receive encoded video information from the encoder (405) by receiving an output of the internal encoder (410).
  • Video data composed of sample data reconstructed by the internal decoder (420) may be configured to be input to the reference picture buffer of the encoder (405).
  • the internal decoder (420) is implemented to reproduce the result output by the encoder (405) and to be decoded by a remote decoder, so the video data recorded in the reference picture buffer may also be identical in bit units to the information of the reference picture buffer of the remote decoder. That is, the prediction function unit that may be included in the encoder (405) may read values identical to the sample values of the previous frame that the decoder will later refer to in the decoding process from the reference picture buffer of the encoder (405).
  • the principle of achieving matching of reference picture buffers between the encoder (405) and the decoder (490) by means of the internal decoder (420) on the encoder (405) side is well known to those skilled in the art, and a method of responding to an environment in which such an environment is not guaranteed (e.g., information loss due to communication failure, etc.) can also follow what is well known to those skilled in the art.
  • the decoder of FIG. 3 may be considered as the "remote" decoder (490) described above.
  • the internal decoder (420) may be implemented excluding lossless encoding and decoding sections such as the parser (330) or the entropy decoding (335). This is because the internal encoder (405) is implemented to simply reproduce the operation of the decoder located at a remote location, and thus the symbols may be decoded directly without requiring a process of compressing and then decompressing the symbols. Accordingly, the functional sections preceding the parser and the entropy decoder as shown in FIG. 3 may not be provided or may be implemented at least partially.
  • any decoder function (excluding a parser and an entropy decoder) present in the decoder can naturally exist as a substantially identical function in the corresponding encoder (405).
  • the operation of the encoding function unit that may be included in the above encoder (405) may be considered as the reverse operation of the decoder function unit. Therefore, the embodiment may be explained by generally performing the operation of the decoder function unit in reverse. For example, a quantization and transform function unit corresponding to the inverse quantization and inverse transform unit may be provided, and an inter prediction encoding unit corresponding to the inter prediction unit may be provided. In addition, some additional explanations will be added.
  • the above internal encoder (410) may be configured to perform encoding on input picture information, for example, an input frame, by a prediction encoding method executed by a prediction encoding unit (440) that operates by referencing at least one reference picture information, for example, at least one temporally previous encoded picture (or frame) from a reference picture buffer (430) designated as a reference frame.
  • the encoder (405) may be configured to encode a differential between blocks of samples constituting the input picture and blocks of samples constituting the reference picture.
  • the internal decoder (420) can decode video data that can be designated as the reference picture from symbols generated by the internal encoder (410). As described above, since the video data is identical to the decoding operation performed by the remote decoder, the video data used as the reference picture can be provided to the encoder (405) in a form in which some damage has occurred due to lossy compression, and this operation can be intended for operation consistency with the decoder.
  • the prediction encoding unit (440) may be configured to perform a prediction search operation within the encoder (405).
  • the prediction search operation may mean an operation corresponding to the inter prediction or intra prediction described in the description of the decoder.
  • the prediction unit may access the reference picture buffer (430) to obtain information such as a motion vector, a block shape, and metadata that may include the motion vector, which are information indicating a point of a reference picture that can function as prediction reference information suitable for the new picture information, and a sample block to be actually referenced.
  • the above prediction encoding unit (440) may operate on the basis of the so-called "sample block by pixel block" criteria in order to obtain appropriate prediction reference information.
  • at least one prediction reference information designating at least one reference picture information stored in the reference picture buffer (430) may be designated for the input picture, as determined based on the search results obtained by the prediction encoding unit (440).
  • control unit (450) may be configured to manage the overall encoding operation of the internal encoder (410), including setting parameters used to encode video data.
  • the entropy encoding (460) may include various entropy coding techniques, such as variable length coding, Huffman coding, and arithmetic coding, for the symbols generated by the various functional units as described above, and each encoding method may be a context-adaptive or context-sensitive method according to the standard, and may also be based on principles widely known to those skilled in the art.
  • the entropy encoding (460) can typically achieve lossless compression, and thus may be configured to convert at least one symbol generated by the functional units into encoded video data.
  • the above control unit (450) can apply a type of encoding of a specific picture during an encoding period to each picture (or frame) when controlling the operation of the encoder (405).
  • the method of encoding the picture may be affected.
  • the type may include being classified into the following "frame types".
  • Fig. 5 is a conceptual diagram of a frame type according to one embodiment of the present invention. The following description will be made with reference to Fig. 5.
  • An intra (“I") picture may mean a picture that can be encoded and decoded only with its own information without referring to other picture information in video data by predictive encoding.
  • the "I” picture may be designated by names such as a key frame, an independent/instantaneous decoder referh (IDR) frame, and a clean random-access (CRA) frame according to a video encoding standard, and the "I” pictures designated by various names as described above may have various modifications and application methods as permitted by each standard and may be partially different from each other.
  • various application methods for implementing the "I” picture may be by various methods that are already known to those skilled in the art or may be newly provided.
  • a prediction (“P") picture may mean a picture that can be encoded and decoded through intra or inter prediction based on at least one prediction information and/or a motion vector that designates at least one reference picture to predict sample values of a block constituting the picture.
  • the "P" picture may be configured to refer to only one reference frame, or may be configured to refer to one or more reference frames, according to a video encoding standard. When referring to one or more reference frames, sample information and/or associated metadata derived from a plurality of reference pictures may be used for reconstructing a single block. However, in common cases, a picture designated as a "P" picture may be understood as a picture that performs reference only to a temporally preceding picture.
  • a bidirectional prediction (“B”) picture may refer to a picture that can be encoded and decoded through intra or inter prediction based on at least one piece of prediction information and/or a motion vector that designates at least two reference pictures in order to predict sample values of blocks constituting the picture.
  • a picture designated as the “B” picture is distinguished from a picture designated as the “P” picture, and may be understood as a picture that performs a reference without being limited to a temporally preceding picture.
  • Video data may be spatially divided into a plurality of sample blocks in the process of encoding and decoding, and encoding may be performed in block units.
  • the block units may include, but are not limited to, sizes such as 4x4, 8x8, 4x8, or 16x16 in units of horizontal/vertical pixels, as is widely known.
  • the block may be encoded by a predictive encoding method with reference to any other (already encoded) blocks, depending on what the type specified for each picture including the block allows and/or restricts. For example, blocks of the "I" picture (510) may be encoded without using a predictive encoding method, or with reference to blocks already encoded in the same partial picture. That is, only the so-called intra prediction method may be used.
  • reference pictures encoded in at least one previous time unit may be further referenced, and thus inter prediction may also be used for encoding along with intra prediction.
  • reference can be made not only to a reference picture that was encoded earlier in the encoding order but also to a reference picture that follows in time units.
  • there may be blocks encoded without relying on predictive encoding within a "P" picture or a "B" picture.
  • the above video encoder (405) may be configured to perform an encoding operation according to a predetermined video compression technique that may be documented by various international standard specifications or commercial standards. Examples of the above standards may include all of those described in the above decoder.
  • the transmitter (470) may buffer the encoded video data generated by the entropy encoding in order to provide/transmit the video data (ultimately to a remote decoder (490)) to a device storing the encoded video data via a hardware or software connection (495).
  • the transmitter (470) may receive and merge other data accompanying the encoded video data, for example, encoded audio data or other auxiliary data, from a separate source (480).
  • the transmitter (470) may be configured to further transmit additional data together with the encoded video.
  • the additional data may be considered as part of the encoded video data.
  • the additional data may include information that may be used by the decoder to properly decode the data, or to more accurately reconstruct an image that approximates the original image before encoding. Examples of the additional data may include all of the examples presented above with respect to the receiver (310) of the decoder.
  • the present invention can be implemented by a digital video compression standard widely used and understood by those skilled in the art as described above.
  • the digital video compression standard may include at least one of compression standards known by standard names such as MPEG-2, MPEG-4 Video, H.263, H.264/AVC, H.265/HEVC, H.266/VVC, VC-1, AV1, QuickTime, VP-9, VP-10, and Motion JPEG.
  • FIG. 6 is a conceptual diagram showing the structure of a video encoder according to another embodiment of the present invention. What is shown in FIG. 6 may be a rough structure of a video encoder widely known as a standard code such as ITU-T H.266 and ISO/IEC 23090-3, and also known as MPEG-I Part 3 or a general term called versatile video coding (VVC).
  • a standard code such as ITU-T H.266 and ISO/IEC 23090-3
  • MPEG-I Part 3 or a general term called versatile video coding (VVC).
  • the video encoder (605) may be configured to receive uncompressed and unencoded original video data (601) as input and output an encoded bit stream (602).
  • the video data (601) may be supplied directly to a luma mapping unit (610a) in the case of intra encoding, or may be supplied to a luma mapping unit (610b) via an inter prediction unit (620) including motion vector extraction.
  • the mapped luma signal may be supplied to an output merger (606) alone, or by selecting (608) at least one of an intra prediction encoded signal via the intra prediction unit (625) or an inter prediction encoded signal output from the luma mapping unit (610b) via the inter prediction unit (620).
  • the result of the above output merger can be applied to a chroma scaling unit (615).
  • the operation of the luminance signal mapping unit (610) and the operation of the chroma scaling unit (615) are collectively referred to as a luma mapping/chroma scalaing (LMCS) process.
  • the reduced chroma signal can be provided to a transform unit (630), and the transform unit (630) can perform an adaptive color transform, particularly on the chroma signal.
  • the coefficient derived as a result of the transform is applied to a quantization unit (640) and quantized.
  • lossy compression is achieved, and the result of the lossy compression can be output as a bit string (602) through a multi-hypothesis CABAC (650), which is a lossless compression method.
  • the result of the lossy compression may enter a decoding procedure by going through inverse quantization (645), inverse transform (635), and luma signal expansion (617) processes to generate a coding loop.
  • the result of the luma signal expansion may be supplied to an internal merger (607) together with a result of selecting (608) at least one of the previously generated intra prediction encoding signal or inter prediction encoding signal.
  • the result of the internal merger may go through inverse luma mapping (617), and then may go through processing such as a deblocking filter (660), a sample adaptive offset (SAO) (670), and an adaptive loop filter (ALF) to reproduce an image quality improvement process in a decoder.
  • the result of reproducing the operation in the decoder as described above is applied to the reference picture buffer (690) and can be reused for prediction encoding by the inter prediction unit (620).
  • the present invention can also be utilized by or in combination with an enhanced compression model (ECM), which is an implementation of a next-generation video codec currently being developed by the joint video experts team (JVET), an international standardization expert body.
  • ECM enhanced compression model
  • the enhanced compression model can include an enhanced intra prediction coding method, an improved inter prediction coding method, an improved transform and transform coefficient coding method, an improved adaptive loop filtering method, a bilateral filtering method, a new sample adaptive offset (SAO) method for improving picture quality, an extended entropy coding method, and an improved gradual decoding refresh (GDR) technique.
  • composition of the present invention is a composition of the present invention
  • the present invention relates to a video encoding and decoding method and a device using an improved intra prediction method that can be used in the video encoding and decoding field including the embodiments described above.
  • FIG. 7 is a conceptual diagram illustrating a template matching prediction method according to one embodiment.
  • the template matching prediction method according to the present embodiment can be performed by a video decoder and/or a video encoder disclosed in the present specification.
  • a current picture may include a reconstructed area (790), which is a previously decoded (or encoded) area therein, and a current block (710) which means an encoding/decoding unit in which decoding (or encoding) is currently in progress.
  • the current picture may be understood to mean at least one concept among a slice, a frame, a subpicture, and a tile, depending on the embodiment, and the current block may be understood to mean a concept such as a block, a subblock, a prediction unit, or a coding unit, depending on the embodiment.
  • a set of samples neighboring the current block (710) can be defined as a template (715).
  • the above template (715) may include samples belonging to the left block, the upper block, and the upper left block of the current block (710) and adjacent to the current block (710), and may be understood to mean a set of L-shaped samples having a predetermined sample thickness (717) that are directly adjacent to the current block (710) among the samples.
  • the sample thickness (717) may be designated as any number greater than or equal to 1 pixel. However, in this specification, for the convenience of technical understanding, it is described as being 1 pixel, but it will be easily understood that the present invention may be substantially equally modified and applied to a template having a sample thickness (717) of 2 pixels or more.
  • the sample thickness (717) may not be constant, may be variable, or may vary depending on the direction or section of the template.
  • the template (715) may further include samples included in the lower left block and/or the upper right block of the current block (710).
  • the template matching prediction method may search for a block (720) corresponding to a current block (710) within the restoration area (790) and use the corresponding block (720) as a prediction block.
  • the template matching prediction method according to the present embodiment may include a step of performing template matching (735) by searching for an appearance location of a sample similar to samples constituting the template (715) within the restoration area (790), a step of deriving a prediction template (725) based on the template matching (735), and a step of using a block (or an arbitrary set of samples corresponding thereto) corresponding to the current block (710) as a prediction block (720) by making the prediction template (725) adjacent thereto.
  • information indicating the direction and distance of the prediction block (720) from the current block (710) can be defined as a block vector.
  • the pixel information of the prediction block (720) can be combined with various conventional encoding methods and used in the subsequent encoding process of the current block (710). For example, it can be utilized in differential coding.
  • FIG. 8 is a conceptual diagram for the storage and use of a block vector in a template matching prediction method according to one embodiment.
  • the template matching prediction method according to the present embodiment can be performed by a video decoder and/or a video encoder disclosed in the present specification.
  • the distance between the current block (810) and the prediction block (820) is defined as a block vector (830), and can be used in subsequent encoding and decoding.
  • the block vector (830) is not only used in the template matching prediction method, but can also be stored for use in methods such as copy encoding and decoding of an intra block.
  • decoding (or encoding) of the next block (e.g., 840) of the current block (810) may be configured to refer to (850) the block vector.
  • the block vector may be a value determined by an operation in the same manner in the encoder and the decoder, respectively. In this case, the block vector may not be included in the encoded bit string.
  • the complexity of the encoder/decoder may increase due to excessive calculation in the restoration area. This is because the search range for template matching in the restoration area is too large or the number of pixels in the template is too large. In addition, an efficient method for storing the block vector used in template matching prediction is required.
  • a template matching prediction method can search for a prediction template by referring to (or using) some samples (i.e., pixels) within the template when searching for a prediction template within a search range of a restoration area.
  • This method of searching for a prediction template by selectively using some pixels within the originally referenced template can be called downsampling.
  • the template matching prediction method can search for prediction templates using the selected samples as they are.
  • the template matching prediction method can compensate the selected samples by referring to the values of the unselected samples, and search for a prediction template using the compensated samples.
  • FIG. 9 is an example of downsampling of a template according to one embodiment of the present invention.
  • a template matching prediction method may be configured to use surrounding samples adjacent to the left and top of a current block (910) as templates (920).
  • only some samples (930) may be selectively used for calculations for template matching by downsampling.
  • only samples selected through downsampling that selects only about half of the samples from an initial template, as shown in FIG. 9 (a) may be utilized for calculations for template matching.
  • FIG. 9 (a) when the sample thickness of the template (920) is 1 pixel thick, an example is shown in which about half of the pixels are selected at a constant sampling interval (1 pixel interval in FIG. 9 (a)) for each sample belonging to the template (920).
  • this method of selection by downsampling is not limited and may be determined in various ways depending on the implementation method. For example, as in (b) of Fig. 9, it may be configured so that about 1/4 of the pixels are selected at intervals of 3 pixels for each sample belonging to the template (920). It is obvious that various downsampling methods not shown can be used. Through this, complexity can be reduced by not performing calculations on all pixels of the template.
  • the downsampling may be configured to be performed differently depending on the size of the template and/or the size of the block. That is, the number of samples required for calculation may be adjusted depending on the size of the template or the size of the block.
  • FIG. 10 is an example of downsampling of a template by block size according to one embodiment of the present invention.
  • a template matching prediction method can adaptively determine whether to perform downsampling depending on the size of a block.
  • the template matching prediction method can be configured to include a step of not performing downsampling when the size of a block is small ((a) of FIG. 10)) and performing downsampling when the size of a block is large ((b) of FIG. 10).
  • a template matching prediction method may be configured to include a step of performing downsampling with a relatively high density, for example, downsampling in which one sample is selected for about every two samples, for blocks of a relatively small size, as in (b) of FIG. 10, and performing downsampling with a relatively low density, for example, downsampling in which one sample is selected for about every four samples, for blocks of a relatively large size, as in (c) of FIG. 10.
  • a template matching prediction method may be configured to vary the horizontal and/or vertical downsampling method depending on the aspect ratio of the block. For example, in the case of a block having a longer horizontal length as in (d) of FIG. 10, the downsampling density on the left side of a shorter template may be configured to be relatively high, or the downsampling density on the top side of a longer template may be configured to be relatively low without performing downsampling.
  • the degree and method of downsampling shown in the examples described above are exemplary and are not limited by the present invention.
  • it may be implemented in a way such as using one sample per about four samples or one sample per more samples, and in a small block, using one sample per about two samples or using all samples, and various other downsampling ratios may be changed and applied in a form necessary for implementing the present invention.
  • the calculation method when a calculation for template matching is performed within a search range of the restoration area, the calculation method may be set for each compression unit, such as a video sequence, a frame, a slice, a subpicture, a tile, a macroblock, or a coding tree unit (CTU), and preferably transmitted from an encoder to a decoder via a symbol included in the syntax information of a bitstream. If only one calculation method is used, transmission is unnecessary. However, the possibility of applying different calculation methods is not excluded. For example, in one embodiment, a calculation method of adding the absolute value of the difference between pixels may be set to be used. In another embodiment, a calculation method using a method such as a Hadamard transform may be set to be used.
  • a method such as a Hadamard transform
  • a plurality of modes or methods for downsampling during the calculation for template matching may be introduced.
  • the encoder can record which of the calculation methods was applied to a specific sequence, frame, or slice, or subpicture, or tile as a symbol included in the syntax information in a bit string, and the decoder can be configured to read the symbol and derive and use the calculation method.
  • the search range when setting a search range for template matching prediction, can be set so as to perform a search centered on a search range block vector generated from block vectors of surrounding blocks or block vectors in a block vector candidate list.
  • FIG. 11 is an example diagram for setting a search range of a block vector according to one embodiment of the present invention.
  • the surrounding blocks adjacent to the current block (1130) may include at least one left block (1132) and an upper block (1134).
  • each of the adjacent blocks (1132, 1134) may be a block having a block vector and/or a candidate list for operation of a block vector.
  • a method for searching a block vector may include a step of generating a search range block vector (1110) based on a current block (1130) by referring to a block vector (1115) of a left block (for example, by replicating it so that the direction and distance are the same by reflecting the distance between blocks). Then, a search range (1140) of a constant size in the upper/lower/left/right directions may be calculated based on a point (1117) indicated by the search range block vector (1110) (or centered on the point).
  • a method for searching a block vector may include a step of generating a search range block vector (1120) by referring to a block vector (1125) of an upper block, and a step of calculating a search range (1145) of a predetermined size based on a point (1127) pointed to by the block vector (1120).
  • a method for searching a block vector may be configured such that, when setting a search range for template matching prediction, information about the search range may be set for each compression unit, such as a sequence, a frame, a slice, a subpicture, a tile, a macroblock, or a CTU, and transmitted from an encoder to a decoder, preferably through a symbol included in syntax information of a bit string.
  • the search range may be defined as information including a horizontal size and a vertical size, and the size of the search range may be changed and applied for each compression unit.
  • information about which block vector candidate corresponds to which order from the list of candidates was used to derive the search range block vector, and/or which block among the left and the top blocks was referenced may be set for each compression unit and transmitted from the encoder to the decoder.
  • a method for searching a block vector may be configured to use a padding method for extending and using a value belonging to the search range when a sample outside the search range needs to be referenced.
  • a padding method for extending and using a value belonging to the search range when a sample outside the search range needs to be referenced.
  • the padding method may be configured to be used.
  • the search may be configured to be reset by considering it as a boundary line.
  • the setting of the search range may be determined within the search range of other prediction techniques that may be included in the encoder/decoder, such as intra block copying.
  • the setting of the search range may be configured to select a direction having a larger or smaller area compared to the search range for intra block copying.
  • the prediction vector for the template matching may be configured to use a block vector in a surrounding block or a block vector candidate list with separate correction.
  • FIG. 12 is a conceptual diagram of a method for determining a prediction vector based on a block vector according to one embodiment of the present invention.
  • the optimal position (1250) where the prediction block (or a set of such samples) most closely matching the current block (1230) exists is identified through a search for an area where a picture has been previously restored.
  • An optimal prediction vector (1260) pointing to the optimal position (1250) can also be calculated.
  • a comparison between samples in the current block and pixel samples in the restored area may be additionally performed, thereby reinforcing the optimality of the template.
  • the block vector (1215) of the surrounding block in the example of FIG. 12, the left block (1232), may be referenced, for example, and the block vector (1215) may be copied so that the direction and distance are the same by reflecting the distance between the blocks, thereby generating a reference prediction vector (1210) based on the current block (1230).
  • information indicating a difference (1270) between the reference prediction vector and the optimal prediction vector may be expressed as information by a combination of a size of a vector, flag information, or index information, and at least one of the information listed above, and the information may be configured to be transmitted from an encoder to a decoder, preferably through a symbol included in the syntax of a bit string.
  • information describing a differential vector (1270) may be transmitted as a symbol.
  • the flag information may be transmitted as a symbol.
  • index information for a table indicating multiple possibilities for the differential vector may be transmitted as a symbol.
  • the correction of the block vector may be configured to be performed only within a specified range.
  • the range for searching for an optimal block vector (1260) for the correction of the block vector may be configured to be performed only within a limited search range (e.g., it may mean the symbol 1140 or 1145) as shown in FIG. 11.
  • a limited search range e.g., it may mean the symbol 1140 or 1145.
  • the current block when storing a block vector used in template matching prediction, may be divided into an arbitrary block size such as 4X4, 8X8, 16X16, 32X32, 64X64 and stored.
  • a video In a video encoding process, a video may be divided and compressed into slices, frames, subpictures, tiles, macroblocks, CTUs, or any other compression units, and may be configured to be divided again into an arbitrary block size and stored after the compression is performed.
  • the block vector may be stored every 4X4 blocks during current frame compression, and stored every 16X16 blocks after compression.
  • a representative block vector may be calculated and stored.
  • the resolution or storage unit of the block vector may be configured to be set differently and stored depending on intra or inter frames. In one embodiment, it may be configured to perform compression on the current slice or frame or subpicture or tile or any unit and then store it again at any block vector resolution. In one embodiment, if there is an optimal block vector or a corrected block vector using the current block, it may be configured to store the corresponding block vector.
  • the above-mentioned stored block vector When the above-mentioned stored block vector is used in a subsequent encoding and decoding process, or when a previously stored block vector is used for template matching prediction of the current block, it may be configured to be used by scaling after considering the resolution of the above-mentioned stored block vector. For example, when the resolution of the stored block vector is relatively high, the resolution can be lowered using a shift operator. For another example, when the resolution of the stored block vector is relatively low, the resolution can be increased using a shift operator.
  • the use of template matching may be restricted if the width of the block is greater than a first threshold value and if the height is greater than a second threshold value. For another example, if the width of the block is less than a third threshold value and if the height is less than a fourth threshold value, the use of template matching may be restricted. For another example, if the number of pixels in the block is greater than or less than a fifth threshold value, the use of another technology in combination with the template matching prediction technique may be restricted. For another example, if the area of the block is greater than or less than a sixth threshold value, the use of the template matching prediction technique in combination with another technology may be restricted. As an example of the restricted combination, it may mean that the combination of geometric segmentation prediction and template matching prediction is restricted in consideration of the condition of the block.
  • template matching prediction when template matching prediction is used in a chroma (which may mean Cb/Cr) signal domain, at least one block vector from a block corresponding to the same position based on a luminance (which may mean Y) signal may be copied as it is and used as a block vector of the chrominance signal block.
  • template matching for a chrominance block may be re-performed based on the block vector of the corresponding luminance block.
  • template matching may be re-performed by setting a search range of template matching based on the block vector of the corresponding luminance block.
  • it may be configured to refer to samples in a chrominance block corresponding to an area pointed to by the block vector of the corresponding luminance block.
  • samples pointed to by at least one block vector in a block at the same position based on the luminance signal block may be used as candidate vectors.
  • it may be configured to generate another candidate sample by weighting and averaging one or more candidate samples.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

La présente invention concerne un procédé et un appareil de codage et de décodage vidéo utilisant une structure de prédiction intra améliorée. Le procédé de décodage basé sur une mise en correspondance de modèles, qui est réalisé par un décodeur, selon un mode de réalisation de la présente invention, comprend les étapes consistant à : acquérir, à partir d'une région de reconstruction de l'image actuelle, des informations concernant un modèle qui est un ensemble d'échantillons voisins du bloc actuel ; effectuer une mise en correspondance de modèle pour rechercher, dans une région de recherche incluse dans la région de reconstruction, un modèle de prédiction correspondant au modèle ; sélectionner un bloc de prédiction voisin du modèle de prédiction ; et sur la base du bloc de prédiction, effectuer un décodage de prédiction intra sur le bloc actuel, la mise en correspondance de modèle étant effectuée en utilisant certains échantillons à l'intérieur du modèle ou en utilisant une partie de la région de recherche.
PCT/KR2024/009075 2023-07-19 2024-06-28 Procédé et appareil de codage et de décodage vidéo à l'aide d'une structure de prédiction intra améliorée Pending WO2025018639A1 (fr)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
KR10-2023-0094036 2023-07-19
KR20230094036 2023-07-19
KR1020230142997A KR20250015701A (ko) 2023-07-19 2023-10-24 개선된 인트라 예측 구조를 이용한 비디오 부호화 및 복호화를 위한 방법 및 장치
KR10-2023-0142997 2023-10-24

Publications (1)

Publication Number Publication Date
WO2025018639A1 true WO2025018639A1 (fr) 2025-01-23

Family

ID=94281986

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/KR2024/009075 Pending WO2025018639A1 (fr) 2023-07-19 2024-06-28 Procédé et appareil de codage et de décodage vidéo à l'aide d'une structure de prédiction intra améliorée

Country Status (1)

Country Link
WO (1) WO2025018639A1 (fr)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20200100656A (ko) * 2018-01-16 2020-08-26 삼성전자주식회사 비디오 복호화 방법 및 장치, 비디오 부호화 방법 및 장치
KR20210025107A (ko) * 2018-09-05 2021-03-08 후아웨이 테크놀러지 컴퍼니 리미티드 크로마 블록 예측 방법 및 디바이스
KR102366528B1 (ko) * 2014-03-31 2022-02-25 돌비 레버러토리즈 라이쎈싱 코오포레이션 템플릿 매칭 기반의 화면 내 픽쳐 부호화 및 복호화 방법 및 장치
KR20230073970A (ko) * 2021-11-19 2023-05-26 현대자동차주식회사 템플릿 매칭 기반의 인트라 예측을 사용하는 비디오 코딩 방법 및 장치
KR20230075499A (ko) * 2021-09-01 2023-05-31 텐센트 아메리카 엘엘씨 Ibc 병합 후보들에 대한 템플릿 매칭

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR102366528B1 (ko) * 2014-03-31 2022-02-25 돌비 레버러토리즈 라이쎈싱 코오포레이션 템플릿 매칭 기반의 화면 내 픽쳐 부호화 및 복호화 방법 및 장치
KR20200100656A (ko) * 2018-01-16 2020-08-26 삼성전자주식회사 비디오 복호화 방법 및 장치, 비디오 부호화 방법 및 장치
KR20210025107A (ko) * 2018-09-05 2021-03-08 후아웨이 테크놀러지 컴퍼니 리미티드 크로마 블록 예측 방법 및 디바이스
KR20230075499A (ko) * 2021-09-01 2023-05-31 텐센트 아메리카 엘엘씨 Ibc 병합 후보들에 대한 템플릿 매칭
KR20230073970A (ko) * 2021-11-19 2023-05-26 현대자동차주식회사 템플릿 매칭 기반의 인트라 예측을 사용하는 비디오 코딩 방법 및 장치

Similar Documents

Publication Publication Date Title
CN113692588A (zh) 用于视频编码的方法和设备
CN118827974A (zh) 视频编解码方法、装置和介质
CN113661702B (zh) 视频编解码的方法、装置及存储介质
CN114375570A (zh) 用于视频编码的方法及装置
CN112640464A (zh) 采用空间和时间合并的运动矢量预测方法和装置
WO2020076066A1 (fr) Procédé de conception de syntaxe et appareil permettant la réalisation d'un codage à l'aide d'une syntaxe
WO2021201515A1 (fr) Procédé et dispositif de codage/décodage d'image pour signalisation hls, et support d'enregistrement lisible par ordinateur dans lequel est stocké un flux binaire
JP2022549296A (ja) 映像コーディングのための方法及び装置
CN113875256A (zh) 用于视频编解码的方法和装置
WO2020040439A1 (fr) Procédé et dispositif de prédiction intra dans un système de codage d'image
WO2021235895A1 (fr) Procédé de codage d'image et dispositif associé
CN111988625A (zh) 视频解码方法和装置以及计算机设备和存储介质
WO2025018639A1 (fr) Procédé et appareil de codage et de décodage vidéo à l'aide d'une structure de prédiction intra améliorée
CN117256146A (zh) 用于jmvd的基于双边匹配的缩放因子推导
WO2025254476A1 (fr) Procédé et dispositif de codage et de décodage vidéo basés sur un vecteur de mouvement radial
WO2025198325A1 (fr) Procédé et dispositif de codage et de décodage vidéo à l'aide d'une structure de prédiction intra améliorée
WO2025254481A1 (fr) Procédé et dispositif de codage et de décodage vidéo basés sur un mode de partitionnement géométrique non linéaire
WO2025258919A1 (fr) Procédé et dispositif de codage et de décodage vidéo basés sur un codage de mode de prédiction intra directionnelle adaptatif
WO2025211686A1 (fr) Procédé et dispositif de codage et de décodage vidéo sur la base d'un vecteur de référence de combinaison
WO2021137588A1 (fr) Procédé et appareil de décodage d'image pour coder des informations d'image comprenant un en-tête d'image
WO2025230284A1 (fr) Procédé et appareil de codage et de décodage vidéo reposant sur une technologie de calcul de décodeur adaptatif
WO2025216496A1 (fr) Procédé et appareil de codage et de décodage vidéo sur la base d'un post-traitement optimisé
WO2021107634A1 (fr) Procédé et appareil de signalisation d'informations de partitionnement d'image
WO2025135813A1 (fr) Procédé et appareil de codage et de décodage vidéo basés sur une prédiction de partitionnement géométrique améliorée
WO2013062175A1 (fr) Procédé et dispositif pour codage adaptable de couche d'enrichissement

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 24843359

Country of ref document: EP

Kind code of ref document: A1