WO2025010558A1 - Video encoding method and apparatus, video decoding method and apparatus, device, system, and storage medium - Google Patents
Video encoding method and apparatus, video decoding method and apparatus, device, system, and storage medium Download PDFInfo
- Publication number
- WO2025010558A1 WO2025010558A1 PCT/CN2023/106426 CN2023106426W WO2025010558A1 WO 2025010558 A1 WO2025010558 A1 WO 2025010558A1 CN 2023106426 W CN2023106426 W CN 2023106426W WO 2025010558 A1 WO2025010558 A1 WO 2025010558A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- view
- image block
- fitting
- image
- pixels
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/50—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
- H04N19/597—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding specially adapted for multi-view video sequence encoding
Definitions
- the present application relates to the field of video coding and decoding technology, and in particular to a video coding and decoding method, device, equipment, system, and storage medium.
- multi-viewpoint videos are used to provide users with an immersive experience.
- VR virtual reality
- AR augmented reality
- MR mixed reality
- the redundant information between the multi-view videos is pruned by pixel pruning to reduce the amount of encoded data.
- the current pixel pruning method has the problem of incomplete pruning, which makes the encoding cost still high.
- the embodiments of the present application provide a video encoding and decoding method, apparatus, device, system, and storage medium, which can achieve complete cropping of pixels, thereby reducing the encoding cost and improving the encoding and decoding efficiency of multi-viewpoint videos.
- the present application provides a video decoding method, applied to a decoder, comprising:
- N views For an i-th view among N views, decode a bitstream, determine patch information of the i-th view, and generate a patch image of the i-th view based on the patch information, wherein the N views are views from N different viewpoints, N is a positive integer greater than 1, and i is a positive integer less than or equal to N;
- the patch image is divided into M image blocks, where M is a positive integer
- pixel fitting is performed on the cropped pixels in the j-th image block to obtain a reconstructed block of the j-th image block.
- an embodiment of the present application provides a video encoding method, applied to an encoder, comprising:
- a first pruning mask map of the i-th view For an i-th view among N views, determine a first pruning mask map of the i-th view, where the first pruning mask map is a pruning mask map obtained by performing pixel pruning on the i-th view, where the N views are views from N different viewpoints, N is a positive integer greater than 1, and i is a positive integer less than or equal to N;
- the i-th view is divided into M image blocks, where M is a positive integer
- a j-th image block among the M image blocks determine unpruned pixels in the j-th image block based on the first pruning mask image, and perform chromaticity fitting on the unpruned pixels in the j-th image block to obtain P color deviation fitting functions of the j-th image block, where j is a positive integer less than or equal to M, and P is a positive integer;
- the patch information and the color cast fitting function are encoded to obtain a bit stream.
- the present application provides a video decoding device, which is used to execute the method in the first aspect or its respective implementations.
- the device includes a functional unit for executing the method in the first aspect or its respective implementations.
- the present application provides a video encoding device, which is used to execute the method in the second aspect or its respective implementations.
- the device includes a functional unit for executing the method in the second aspect or its respective implementations.
- the present application provides a video decoder, comprising a processor and a memory, wherein the memory is used to store a computer program, and the processor is used to call and run the computer program stored in the memory to execute the method in the first aspect or its implementations.
- the present application provides a video encoder, comprising a processor and a memory, wherein the memory is used to store a computer program, and the processor is used to call and run the computer program stored in the memory to execute the method in the second aspect or its implementations.
- the present application provides a video coding and decoding system, including a video encoder and a video decoder.
- the video decoder is used to execute the method in the first aspect or its respective implementations
- the video encoder is used to execute the method in the second aspect or its respective implementations.
- the present application provides a chip for implementing the method in any one of the first to second aspects or their respective implementations.
- the chip includes: a processor for calling and running a computer program from a memory, so that a device equipped with the chip executes the method in any one of the first to second aspects or their respective implementations.
- the present application provides a computer-readable storage medium for storing a computer program, wherein the computer program enables a computer to execute the method of any one of the first to second aspects above or any of their implementations.
- the present application provides a computer program product, comprising computer program instructions, which enable a computer to execute the method in any one of the first to second aspects above or in each of their implementations.
- the present application provides a computer program, which, when executed on a computer, enables the computer to execute the method in any one of the first to second aspects or in each of their implementations.
- a first pruning mask map of the i-th view among N views is determined. Then, based on a preset block division method, the i-th view is divided into M image blocks. For the j-th image block among the M image blocks, the unpruned pixels in the j-th image block are determined based on the first pruning mask map, and the unpruned pixels in the j-th image block are chromatically fitted to obtain a color deviation fitting function of the j-th image block.
- the unpruned pixels in the j-th image block are pruned based on the color deviation fitting function to obtain patch information of the i-th view.
- the above patch information and the color deviation fitting function are encoded to obtain a bitstream. That is, in the embodiment of the present application, the first pruning mask map obtained by pixel pruning is pruned again based on the color deviation fitting function to further reduce the number of unpruned pixels, reduce the amount of data that needs to be encoded at the encoding end, and thereby reduce the encoding cost of the encoding end, and improve the encoding and decoding efficiency of the multi-view video.
- FIG1 is a schematic block diagram of a video encoding and decoding system according to an embodiment of the present application.
- FIG2A is a schematic block diagram of a video encoder according to an embodiment of the present application.
- FIG2B is a schematic block diagram of a video decoder according to an embodiment of the present application.
- FIG3A schematically shows a schematic diagram of three degrees of freedom
- FIG3B schematically shows a schematic diagram of three degrees of freedom +
- FIG3C schematically shows a schematic diagram of six degrees of freedom
- 4A and 4B are schematic diagrams showing the principle of MIV technology
- FIG5A is a schematic diagram of a multi-view video encoding process
- FIG5B is a schematic diagram of a multi-view video decoding process
- FIG6 is a schematic diagram of the encoding process of TMIV14
- FIG7A is a schematic diagram of a view list
- FIG7B is a schematic diagram of an aggregate of trimmed views
- FIG7C is a schematic diagram of patch packaging
- FIG8 is a schematic diagram of a video decoding method flow chart provided by an embodiment of the present application.
- FIG9 is a schematic diagram of a patch image
- FIG10 is a schematic diagram of dividing a patch image into blocks
- FIG11 is a schematic diagram of a directed acyclic graph
- FIG12 is a schematic diagram of decoding
- FIG13 is a schematic diagram of a video encoding method flow chart provided by an embodiment of the present application.
- FIG14A is a schematic diagram of a view from different viewpoints
- FIG14B is a schematic diagram of pixel projection
- FIG14C is a schematic diagram of pixel pruning
- 15 and 16 are schematic diagrams of an encoding process involved in an embodiment of the present application.
- FIG17 is a schematic block diagram of a video decoding device provided by an embodiment of the present application.
- FIG18 is a schematic block diagram of a video encoding device provided by an embodiment of the present application.
- FIG19 is a schematic block diagram of an electronic device provided in an embodiment of the present application.
- FIG. 20 is a schematic block diagram of a video encoding and decoding system provided in an embodiment of the present application.
- the present application can be applied to the field of image coding and decoding, the field of video coding and decoding, the field of hardware video coding and decoding, the field of dedicated circuit video coding and decoding, the field of real-time video coding and decoding, etc.
- AVC H.264/audio and video coding
- HEVC H.265/high efficiency video coding
- VVC VVC
- the solution of the present application may be combined with other proprietary or industry standards and operate, the standards include ITU-TH.261, ISO/IEC MPEG-1 Visual, ITU-TH.262 or ISO/IEC MPEG-2 Visual, ITU-TH.263, ISO/IEC MPEG-4 Visual, ITU-TH.264 (also known as ISO/IEC MPEG-4 AVC), including scalable video coding (SVC) and multi-view video coding (MVC) extensions.
- SVC scalable video coding
- MVC multi-view video coding
- the high-degree-of-freedom immersive coding system can be roughly divided into the following links according to the task line: data collection, data organization and expression, data encoding and compression, data decoding and reconstruction, data synthesis and rendering, and finally presenting the target data to the user.
- the encoding involved in the embodiment of the present application is mainly video encoding and decoding.
- the video encoding and decoding system involved in the embodiment of the present application is first introduced in conjunction with Figure 1.
- FIG1 is a schematic block diagram of a video encoding and decoding system involved in an embodiment of the present application. It should be noted that FIG1 is only an example, and the video encoding and decoding system of the embodiment of the present application includes but is not limited to that shown in FIG1.
- the video encoding and decoding system 100 includes an encoding device 110 and a decoding device 120.
- the encoding device is used to encode (which can be understood as compression) the video data to generate a code stream, and transmit the code stream to the decoding device.
- the decoding device decodes the code stream generated by the encoding device to obtain decoded video data.
- the encoding device 110 of the embodiment of the present application can be understood as a device with a video encoding function
- the decoding device 120 can be understood as a device with a video decoding function, that is, the embodiment of the present application includes a wider range of devices for the encoding device 110 and the decoding device 120, such as smartphones, desktop computers, mobile computing devices, notebook (e.g., laptop) computers, tablet computers, set-top boxes, televisions, cameras, display devices, digital media players, video game consoles, vehicle-mounted computers, etc.
- the encoding device 110 may transmit the encoded video data (eg, a code stream) to the decoding device 120 via the channel 130.
- the channel 130 may include one or more media and/or devices capable of transmitting the encoded video data from the encoding device 110 to the decoding device 120.
- the channel 130 includes one or more communication media that enable the encoding device 110 to transmit the encoded video data directly to the decoding device 120 in real time.
- the encoding device 110 can modulate the encoded video data according to the communication standard and transmit the modulated video data to the decoding device 120.
- the communication medium includes a wireless communication medium, such as a radio frequency spectrum, and optionally, the communication medium may also include a wired communication medium, such as one or more physical transmission lines.
- the channel 130 includes a storage medium, which can store the video data encoded by the encoding device 110.
- the storage medium includes a variety of locally accessible data storage media, such as optical disks, DVDs, flash memories, etc.
- the decoding device 120 can obtain the encoded video data from the storage medium.
- the channel 130 may include a storage server that can store the video data encoded by the encoding device 110.
- the decoding device 120 can download the stored encoded video data from the storage server.
- the storage server can store the encoded video data and transmit the encoded video data to the decoding device 120, such as a web server (e.g., for a website), a file transfer protocol (FTP) server, etc.
- a web server e.g., for a website
- FTP file transfer protocol
- the encoding device 110 includes a video encoder 112 and an output interface 113.
- the output interface 113 may include a modulator/demodulator (modem) and/or a transmitter.
- the encoding device 110 may further include a video source 111 in addition to the video encoder 112 and the input interface 113 .
- the video source 111 may include at least one of a video acquisition device (eg, a video camera), a video archive, a video input interface, and a computer graphics system, wherein the video input interface is used to receive video data from a video content provider, and the computer graphics system is used to generate video data.
- a video acquisition device eg, a video camera
- a video archive e.g., a video archive
- a video input interface e.g., a computer graphics system
- the video input interface is used to receive video data from a video content provider
- the computer graphics system is used to generate video data.
- the video encoder 112 encodes the video data from the video source 111 to generate a bitstream.
- the video data may include one or more pictures or a sequence of pictures.
- the bitstream contains the encoding information of the picture or the sequence of pictures in the form of a bitstream.
- the encoding information may include the encoded picture data and associated data.
- the associated data may include a sequence parameter set (SPS for short), a picture parameter set (PPS for short) and other syntax structures.
- SPS sequence parameter set
- PPS picture parameter set
- the syntax structure refers to a set of zero or more syntax elements arranged in a specified order in the bitstream.
- the video encoder 112 transmits the encoded video data directly to the decoding device 120 via the output interface 113.
- the encoded video data may also be stored in a storage medium or a storage server for subsequent reading by the decoding device 120.
- the decoding device 120 includes an input interface 121 and a video decoder 122 .
- the decoding device 120 may include a display device 123 in addition to the input interface 121 and the video decoder 122 .
- the input interface 121 includes a receiver and/or a modem.
- the input interface 121 can receive the encoded video data through the channel 130 .
- the video decoder 122 is used to decode the encoded video data to obtain decoded video data, and transmit the decoded video data to the display device 123 .
- the display device 123 displays the decoded video data.
- the display device 123 may be integrated with the decoding device 120 or external to the decoding device 120.
- the display device 123 may include a variety of display devices, such as a liquid crystal display (LCD), a plasma display, an organic light emitting diode (OLED) display, or other types of display devices.
- FIG1 is only an example, and the technical solution of the embodiment of the present application is not limited to FIG1 .
- the technology of the present application can also be applied to unilateral video encoding or unilateral video decoding.
- FIG2A is a schematic block diagram of a video encoder according to an embodiment of the present application. It should be understood that the video encoder 200 can be used for lossy compression of an image, or lossless compression of an image.
- the lossless compression can be visually lossless compression or mathematically lossless compression.
- the video encoder 200 can be applied to image data in luminance and chrominance (YCbCr, YUV) format.
- the YUV ratio can be 4:2:0, 4:2:2 or 4:4:4, Y represents brightness (Luma), Cb (U) represents blue chrominance, Cr (V) represents red chrominance, and U and V represent chrominance (Chroma) for describing color and saturation.
- 4:2:0 means that every 4 pixels have 4 luminance components and 2 chrominance components (YYYYCbCr)
- 4:2:2 means that every 4 pixels have 4 luminance components and 4 chrominance components (YYYYCbCrCbCr)
- 4:4:4 means full pixel display (YYYYCbCrCbCrCbCrCbCr).
- the video encoder 200 reads video data, and for each frame of the video data, divides the frame into a number of coding tree units (CTUs).
- CTB may be referred to as a "tree block", “largest coding unit” (LCU) or “coding tree block” (CTB).
- Each CTU may be associated with a pixel block of equal size within the image.
- Each pixel may correspond to a luminance (luminance or luma) sample and two chrominance (chrominance or chroma) samples. Therefore, each CTU may be associated with a luminance sample block and two chrominance sample blocks.
- the size of a CTU is, for example, 128 ⁇ 128, 64 ⁇ 64, 32 ⁇ 32, etc.
- a CTU may be further divided into a number of coding units (CUs) for encoding, and a CU may be a rectangular block or a square block.
- CU can be further divided into prediction unit (PU) and transform unit (TU), which makes encoding, prediction and transform separated and more flexible in processing.
- PU prediction unit
- TU transform unit
- CTU is divided into CU in quadtree mode
- CU is divided into TU and PU in quadtree mode.
- the video encoder and video decoder may support various PU sizes. Assuming that the size of a particular CU is 2N ⁇ 2N, the video encoder and video decoder may support PU sizes of 2N ⁇ 2N or N ⁇ N for intra-frame prediction, and support symmetric PUs of 2N ⁇ 2N, 2N ⁇ N, N ⁇ 2N, N ⁇ N or similar sizes for inter-frame prediction. The video encoder and video decoder may also support asymmetric PUs of 2N ⁇ nU, 2N ⁇ nD, nL ⁇ 2N, and nR ⁇ 2N for inter-frame prediction.
- the video encoder 200 may include: a prediction unit 210, a residual unit 220, a transform/quantization unit 230, an inverse transform/quantization unit 240, a reconstruction unit 250, a loop filter unit 260, a decoded image buffer 270, and an entropy coding unit 280. It should be noted that the video encoder 200 may include more, fewer, or different functional components.
- the current block may be referred to as a current coding unit (CU) or a current prediction unit (PU), etc.
- a prediction block may also be referred to as a prediction image block or an image prediction block, and a reconstructed image block may also be referred to as a reconstructed block or an image reconstructed image block.
- the prediction unit 210 includes an inter-frame prediction unit 211 and an intra-frame estimation unit 212. Since there is a strong correlation between adjacent pixels in a frame of a video, an intra-frame prediction method is used in the video coding and decoding technology to eliminate spatial redundancy between adjacent pixels. Since there is a strong similarity between adjacent frames in a video, an inter-frame prediction method is used in the video coding and decoding technology to eliminate temporal redundancy between adjacent frames, thereby improving coding efficiency.
- the inter-frame prediction unit 211 can be used for inter-frame prediction.
- Inter-frame prediction may include motion estimation and motion compensation. It may refer to image information of different frames.
- Inter-frame prediction uses motion information to find reference blocks from reference frames, and generates prediction blocks based on the reference blocks to eliminate temporal redundancy.
- the frames used for inter-frame prediction may be P frames and/or B frames. P frames refer to forward prediction frames, and B frames refer to bidirectional prediction frames.
- Inter-frame prediction uses motion information to find reference blocks from reference frames, and generates prediction blocks based on the reference blocks.
- Motion information includes a reference frame list where the reference frame is located, a reference frame index, and a motion vector.
- the motion vector may be an integer pixel or a sub-pixel.
- the motion vector is a sub-pixel
- an interpolation filter is required in the reference frame to make the required sub-pixel block.
- the integer pixel or sub-pixel block in the reference frame found according to the motion vector is called a reference block.
- Some technologies will directly use the reference block as the prediction block, while some technologies will generate a prediction block based on the reference block. Generating a prediction block based on the reference block can also be understood as using the reference block as the prediction block and then generating a new prediction block based on the prediction block.
- the intra-frame estimation unit 212 only refers to the information of the same frame image to predict the pixel information in the current code image block to eliminate spatial redundancy.
- the frame used for intra-frame prediction can be an I frame.
- the intra-frame prediction modes used by HEVC are Planar, DC, and 33 angle modes, for a total of 35 prediction modes.
- the intra-frame modes used by VVC are Planar, DC, and 65 angle modes, for a total of 67 prediction modes.
- the residual unit 220 may generate a residual block of the CU based on the pixel blocks of the CU and the prediction blocks of the PUs of the CU. For example, the residual unit 220 may generate a residual block of the CU so that each sample in the residual block has a value equal to the difference between the following two: a sample in the pixel blocks of the CU and a corresponding sample in the prediction blocks of the PUs of the CU.
- the transform/quantization unit 230 may quantize the transform coefficients.
- the transform/quantization unit 230 may quantize the transform coefficients associated with the TUs of the CU based on a quantization parameter (QP) value associated with the CU.
- QP quantization parameter
- the video encoder 200 may adjust the degree of quantization applied to the transform coefficients associated with the CU by adjusting the QP value associated with the CU.
- the inverse transform/quantization unit 240 may apply inverse quantization and inverse transform to the quantized transform coefficients, respectively, to reconstruct a residual block from the quantized transform coefficients.
- the reconstruction unit 250 may add the samples of the reconstructed residual block to the corresponding samples of one or more prediction blocks generated by the prediction unit 210 to generate a reconstructed image block associated with the TU. By reconstructing the sample blocks of each TU of the CU in this manner, the video encoder 200 may reconstruct the pixel blocks of the CU.
- the loop filter unit 260 is used to process the inverse transformed and inverse quantized pixels to compensate for distortion information and provide a better reference for subsequent coded pixels. For example, a deblocking filter operation may be performed to reduce the blocking effect of the pixel blocks associated with the CU.
- the loop filter unit 260 includes a deblocking filter unit and a sample adaptive offset/adaptive loop filter (SAO/ALF) unit, wherein the deblocking filter unit is used to remove the block effect, and the SAO/ALF unit is used to remove the ringing effect.
- SAO/ALF sample adaptive offset/adaptive loop filter
- the decoded image buffer 270 may store the reconstructed pixel blocks.
- the inter prediction unit 211 may use the reference image containing the reconstructed pixel blocks to perform inter prediction on PUs of other images.
- the intra estimation unit 212 may use the reconstructed pixel blocks in the decoded image buffer 270 to perform intra prediction on other PUs in the same image as the CU.
- the entropy encoding unit 280 may receive the quantized transform coefficients from the transform/quantization unit 230.
- the entropy encoding unit 280 may perform one or more entropy encoding operations on the quantized transform coefficients to generate entropy-encoded data.
- FIG. 2B is a schematic block diagram of a video decoder according to an embodiment of the present application.
- the video decoder 300 includes an entropy decoding unit 310, a prediction unit 320, an inverse quantization/transformation unit 330, a reconstruction unit 340, a loop filter unit 350, and a decoded image buffer 360. It should be noted that the video decoder 300 may include more, fewer, or different functional components.
- the video decoder 300 may receive a bitstream.
- the entropy decoding unit 310 may parse the bitstream to extract syntax elements from the bitstream. As part of parsing the bitstream, the entropy decoding unit 310 may parse the syntax elements in the bitstream that have been entropy encoded.
- the prediction unit 320, the inverse quantization/transformation unit 330, the reconstruction unit 340, and the loop filter unit 350 may decode the video data according to the syntax elements extracted from the bitstream, that is, generate decoded video data.
- the prediction unit 320 includes an inter-frame prediction unit 321 and an intra-frame estimation unit 322 .
- the intra estimation unit 322 may perform intra prediction to generate a prediction block for the PU.
- the intra estimation unit 322 may use an intra prediction mode to generate a prediction block for the PU based on pixel blocks of spatially neighboring PUs.
- the intra estimation unit 322 may also determine the intra prediction mode for the PU according to one or more syntax elements parsed from the code stream.
- the inter prediction unit 321 may construct a first reference image list (list 0) and a second reference image list (list 1) according to the syntax elements parsed from the code stream.
- the entropy decoding unit 310 may parse the motion information of the PU.
- the inter prediction unit 321 may determine one or more reference blocks of the PU according to the motion information of the PU.
- the inter prediction unit 321 may generate a prediction block of the PU according to one or more reference blocks of the PU.
- the inverse quantization/transform unit 330 may inversely quantize (ie, dequantize) the transform coefficients associated with the TU.
- the inverse quantization/transform unit 330 may use the QP value associated with the CU of the TU to determine the degree of quantization.
- the inverse quantization/transform unit 330 may apply one or more inverse transforms to the inverse quantized transform coefficients in order to generate a residual block associated with the TU.
- the reconstruction unit 340 uses the residual block associated with the TU of the CU and the prediction block of the PU of the CU to reconstruct the pixel block of the CU. For example, the reconstruction unit 340 may add samples of the residual block to corresponding samples of the prediction block to reconstruct the pixel block of the CU to obtain a reconstructed image block.
- the loop filtering unit 350 may perform a deblocking filtering operation to reduce blocking effects of pixel blocks associated with a CU.
- the video decoder 300 may store the reconstructed image of the CU in the decoded image buffer 360.
- the video decoder 300 may use the reconstructed image in the decoded image buffer 360 as a reference image for subsequent prediction, or transmit the reconstructed image to a display device for presentation.
- the basic process of video encoding and decoding is as follows: at the encoding end, a frame of image is divided into blocks, and for the current block, the prediction unit 210 uses intra-frame prediction or inter-frame prediction to generate a prediction block of the current block.
- the residual unit 220 can calculate the residual block based on the original block of the prediction block and the current block, that is, the difference between the original block of the prediction block and the current block, and the residual block can also be called residual information.
- the residual block can remove information that is not sensitive to the human eye through the transformation and quantization process of the transformation/quantization unit 230 to eliminate visual redundancy.
- the residual block before transformation and quantization by the transformation/quantization unit 230 can be called a time domain residual block, and the time domain residual block after transformation and quantization by the transformation/quantization unit 230 can be called a frequency residual block or a frequency domain residual block.
- the entropy coding unit 280 receives the quantized change coefficient output by the change quantization unit 230, and can entropy encode the quantized change coefficient and output a bit stream. For example, the entropy coding unit 280 can eliminate character redundancy according to the target context model and the probability information of the binary bit stream.
- the entropy decoding unit 310 can parse the bitstream to obtain the prediction information, quantization coefficient matrix, etc. of the current block.
- the prediction unit 320 generates a prediction block of the current block using intra-frame prediction or inter-frame prediction based on the prediction information.
- the inverse quantization/transformation unit 330 uses the quantization coefficient matrix obtained from the bitstream to The coefficient matrix is inversely quantized and inversely transformed to obtain a residual block.
- the reconstruction unit 340 adds the prediction block and the residual block to obtain a reconstructed block.
- the reconstructed blocks form a reconstructed image, and the loop filter unit 350 performs loop filtering on the reconstructed image based on the image or on the block to obtain a decoded image.
- the encoding end also requires similar operations as the decoding end to obtain a decoded image.
- the decoded image can also be called a reconstructed image, and the reconstructed image can be a reference frame for inter-frame prediction for subsequent frames.
- the block division information determined by the encoder as well as the mode information or parameter information such as prediction, transformation, quantization, entropy coding, loop filtering, etc., are carried in the bitstream when necessary.
- the decoder parses the bitstream and determines the same block division information, prediction, transformation, quantization, entropy coding, loop filtering, etc. mode information or parameter information as the encoder by analyzing the existing information, thereby ensuring that the decoded image obtained by the encoder is the same as the decoded image obtained by the decoder.
- the above is the basic process of the video codec under the block-based hybrid coding framework. With the development of technology, some modules or steps of the framework or process may be optimized. The present application is applicable to the basic process of the video codec under the block-based hybrid coding framework, but is not limited to the framework and process.
- the embodiments of the present application involve encoding and decoding of perspective videos.
- the following introduces relevant concepts involved in the embodiments of the present application.
- Multi-view video also known as free viewpoint video
- free viewpoint video is an immersive media video captured by multiple cameras, containing different perspectives and supporting user 3DoF+ or 6DoF interaction.
- Degree of Freedom in a mechanical system refers to the number of independent coordinates, including the degrees of freedom of translation, rotation and vibration. In the embodiment of the present application, it refers to the degrees of freedom that support movement and generate content interaction when the user is watching immersive media.
- Three degrees of freedom refers to the three degrees of freedom of the user's head rotating around the XYZ axis.
- Figure 3A schematically shows a schematic diagram of three degrees of freedom. As shown in Figure 3A, a certain point in a certain place can rotate on three axes, you can turn your head, lower your head up and down, or shake your head. Through the three-degree-of-freedom experience, users can immerse themselves in a scene 360 degrees. If it is static, it can be understood as a panoramic picture. If the panoramic picture is dynamic, it is a panoramic video, that is, a VR video. However, VR videos have certain limitations. Users cannot move and cannot choose any place to watch.
- 3DoF+ On the basis of three degrees of freedom, the user also has the freedom to make limited movements along the XYZ axis, which can also be called limited six degrees of freedom, and the corresponding media stream can be called limited six degrees of freedom media stream.
- Figure 3B schematically shows a schematic diagram of three degrees of freedom+.
- 6DoF That is, on the basis of three degrees of freedom, the user also has the freedom to move freely along the XYZ axis, and the corresponding media code stream can be called a six-degree-of-freedom media code stream.
- Figure 3C schematically shows a schematic diagram of six degrees of freedom.
- 6DoF media refers to 6-degree-of-freedom video, which means that the video can provide users with a high-degree-of-freedom viewing experience of freely moving the viewpoint in the XYZ axis direction of three-dimensional space and freely rotating the viewpoint around the XYX axis.
- 6DoF media is a combination of videos from different perspectives of space acquired by a camera array.
- 6DoF media data is expressed as a combination of the following information: texture maps acquired by multiple cameras, depth maps corresponding to the texture maps of multiple cameras, and corresponding 6DoF media content description metadata, which contains the parameters of multiple cameras, as well as description information such as the splicing layout and edge protection of 6DoF media.
- the texture map information of multiple cameras and the corresponding depth map information are spliced, and the description data of the splicing method is written into the metadata according to the defined syntax and semantics.
- the stitched multi-camera depth map and texture map information is encoded using a planar video compression method and transmitted to the terminal for decoding.
- the 6DoF virtual viewpoint requested by the user is then synthesized, thereby providing the user with a 6DoF media viewing experience.
- Depth map As a way to express three-dimensional scene information, the grayscale value of each pixel in the depth map can be used to represent the distance between a certain point in the scene and the camera.
- MPEG Motion Picture Experts Group
- MIV Motion Picture Experts Group immersive video
- FIG. 4A In order to reduce the transmission pixel rate while retaining scene information as much as possible, so as to ensure that there is enough information for rendering the target view, the solution adopted by MPEG-I is shown in Figure 4A.
- a limited number of viewpoints are selected as basic viewpoints and the visible range of the scene is expressed as much as possible.
- the basic viewpoint is transmitted as a complete image, and the redundant pixels between the remaining non-basic viewpoints and the basic viewpoint are removed, that is, only the effective information that is not expressed repeatedly is retained, and then the sub-block image extracted from the effective information is reorganized with the basic viewpoint image to form a larger rectangular image, which is called a spliced image.
- Figures 4A and 4B show the schematic process of generating the spliced image.
- the spliced image is sent to the codec for compression and reconstruction, and the auxiliary data related to the splicing information of the sub-block image is also sent to the encoder to form a bit stream.
- multi-viewpoint video is encoded and decoded using the frame packing technology in Visual Volumetric Video-based Coding (V3C).
- V3C Visual Volumetric Video-based Coding
- the encoding end includes the following steps:
- Step 1 when encoding the acquired multi-view video, after some pre-processing, a patch of the multi-view video is generated, and then the patch of the multi-view video is organized to generate a multi-view video mosaic.
- a multi-view video is input into TIMV for packaging, and a multi-view video splicing image is output.
- TIMV is a reference software for MIV.
- the packaging in the embodiment of the present application can be understood as splicing.
- the multi-view video mosaic map includes a multi-view video texture mosaic map and a multi-view video geometry mosaic map, that is, it only includes multi-view video sub-blocks.
- Step 2 Input the multi-view video mosaic image into the frame packer, and output the multi-view video mixed mosaic image.
- the multi-view video mixed mosaic image includes a multi-view video texture mixed mosaic image, a multi-view video geometry mixed mosaic image, and a multi-view video texture and geometry mixed mosaic image.
- the multi-view video mosaic is frame packed to generate a multi-view video mixed mosaic, and each multi-view video mosaic occupies a region of the multi-view video mixed mosaic. Accordingly, a flag pin_region_type_id_minus2 is transmitted for each region in the bitstream. This flag records the information of whether the current region belongs to a multi-view video texture mosaic or a multi-view video geometric mosaic, and the information needs to be used at the decoding end.
- Step 3 Use a video encoder to encode the multi-view video mixed splicing image to obtain a bit stream.
- the decoding end includes the following steps:
- Step 1 When decoding a multi-view video, the acquired code stream is input into a video decoder for decoding to obtain a reconstructed multi-view video mixed splicing image.
- Step 2 input the reconstructed multi-view video mixed mosaic image into the frame depacketizer, and output the reconstructed multi-view video mosaic image.
- the flag pin_region_type_id_minus2 from the bitstream, if it is determined that the pin_region_type_id_minus2 is V3C_AVD, it means that the current region is a multi-view video texture mosaic, then the current region is split and output as a reconstructed multi-view video texture mosaic.
- pin_region_type_id_minus2 is V3C_GVD, it means that the current region is a multi-view video geometric mosaic, and the current region is split and output as a reconstructed multi-view video geometric mosaic.
- Step 3 decode the reconstructed multi-view video mosaic to obtain the reconstructed multi-view video.
- the multi-view video texture mosaic map and the multi-view video geometric mosaic map are decoded to obtain the reconstructed multi-view video.
- TMIV14 The encoding process of TMIV14 is introduced below.
- the TMIV encoder has a "group-based” encoder that calls the “single group” encoder shown in Figure 6 for each group.
- the multi-view video encoder shown in FIG6 is introduced below:
- the encoder operates on a selected source view of a given group and has the following stages:
- the input to the TMIV encoder consists of a list of source views (Figure 7A).
- a source view represents a projection of a 3D real or virtual scene.
- Source views can be equirectangular, perspective, or orthographic projections.
- Each source view should have at least view parameters (camera intrinsics, camera extrinsics, geometry quantization, etc.).
- a source view may have a geometry section with range/invalid sampling values.
- a source view may have a texture attributes section.
- Other optional attributes for each source view are entity mapping and transparency attributes sections. The set of components must be the same for all source views.
- Atlas parameters i.e. the number of atlases and the bounding box size of each atlas.
- Some parameters of the TMIV encoder are automatically calculated based on the camera configuration or the first frame of the most source views.
- TMIV has the ability to operate in entity coding mode. In this mode, the patches extracted and packed in the atlas have active pixels belonging to a single entity for each patch, so each patch can be labeled with its associated entity ID. This enables selective encoding and/or decoding of entities individually when needed, saving utilized bandwidth and improving quality.
- entity coding mode is selected, the source view including the base view (texture attributes and geometry information) is sliced into multiple layers such that each layer includes content belonging to one entity. The following encoding stages are then called independently for each entity in order to prune, aggregate and cluster together the layers from all views belonging to the same entity.
- the patches of all entities are packaged and combined in a set of atlases.
- the pruner selects which areas of the view can be safely pruned.
- the pruner operates on a frame-by-frame basis, receiving multiple views with texture attributes and geometry information as well as camera parameters, and outputs a mask image of each source view at the same size. For non-base views, masked pixels are "pruned” and unmasked pixels are "retained”. For base views, all pixels are "retained”.
- Pruning mask aggregation
- the pruning mask for each entity is aggregated frame by frame by activating the active samples of the pruning mask in the aggregate pruning mask.
- the mask is reset at the beginning of each intra-frame period.
- the process is completed by outputting the final aggregated result at the end of the intra-frame period.
- Figure 7 illustrates the pruning view at frame i, and the aggregation of the active samples (drawn in white) between frame i and frame i+k within the intra-frame period; it can be seen that the contours of the changing parts of the geometric components are getting thicker to take into account the motion in the scene.
- a cluster is a connected set of pixels that are active in the aggregate mask of an entity.
- the connectivity criterion for a pixel is that it has at least one other pixel among its eight neighbors.
- irregularly shaped clusters are split.
- Each cluster is split into two smaller clusters if the total area of the bounding boxes of the two newly obtained clusters is smaller than the area of the bounding box of the initial cluster by a threshold.
- the total area of the bounding boxes of the two subclusters is minimized.
- the split is performed along lines parallel to the short sides of the cluster bounding boxes. This approach allows the partitioning of L-shaped clusters.
- the packing process packs each cluster into the atlas in sequence.
- all attribute patch values are modified to reduce the number and size of edges between adjacent patches and between occupied and unoccupied regions in the attribute atlas.
- the mean of each part of the patch is set to a neutral color, 2m –1 for m-bps video.
- patch attribute offsets are added and sent in the atlas data.
- TMIV has the ability to align the different color characteristics of the source views. If the optional color correction is enabled, the color characteristics of each source view will be aligned with the color characteristics of a reference view that corresponds to the view captured by the camera closest to the center of the camera rig.
- the last operation within the single-group encoder is to write the patch to the buffer (geometry and attribute information) of the atlas assigned to it.
- the content of a given patch is extracted from the associated entity view generated by the entity delimiter based on the entity ID of the patch. This ensures that a patch with the correct entity content (texture attributes and geometry) is written to the formed atlas.
- Atlas values are either "invalid/unoccupied” or a geometry value expressed in meters, with a maximum geometry value set to 1 km.
- the MIV specification [2] specifies how occupancy information is encoded in the geometry atlas if occupancy is not explicitly present. Decoding is based on normalized disparity range, geometry thresholds, etc. These values are signaled per view or even per patch.
- the entire range of min-max normalized depth values is divided into a predefined number of equal intervals (set in the encoder configuration), and each interval is adaptively scaled based on the importance of the geometry samples in the corresponding interval. For example, the original geometry values in each interval are mapped to the corresponding scaled geometry range.
- the full-resolution occupancy map is scaled down by a configurable scaling factor.
- the default factor is the native resolution of the occupancy map. For entity-based encoding, higher resolutions are recommended.
- the decoder reconstructs the full-resolution occupancy map by performing upscaling using nearest-neighbor interpolation. Note that for complete atlases (e.g., atlases including only the base view), an occupancy map may not be output because all pixels are occupied.
- Atlas data, geometric video data, attribute video data, occupancy video data, etc. are generated.
- V3C unit header semantics as shown in Table 1:
- the redundant information between the multi-view videos is pruned by pixel pruning to reduce the amount of encoded data.
- the current pixel pruning method has the problem of incomplete pruning, which makes the encoding cost still high.
- a first pruning mask map of the i-th view among N views is determined. Then, based on a preset block division method, the i-th view is divided into M image blocks, and for the j-th image block among the M image blocks, the unpruned pixels in the j-th image block are determined based on the first pruning mask map, and the unpruned pixels in the j-th image block are chromatically fitted to obtain a color deviation fitting function of the j-th image block.
- the unpruned pixels in the j-th image block are pruned based on the color deviation fitting function to obtain patch information of the i-th view.
- the patch information and the color deviation fitting function are encoded to obtain a bitstream. That is, in an embodiment of the present application, based on the color deviation fitting function, the first pruning mask map obtained by pixel pruning is pruned again to further reduce the number of unpruned pixels, reduce the amount of data that needs to be encoded at the encoding end, thereby reducing the encoding cost of the encoding end and improving the encoding and decoding efficiency of the multi-view video.
- the video decoding method provided in the embodiment of the present application is introduced by taking the decoding end as an example.
- FIG8 is a schematic flow chart of a video decoding method provided by an embodiment of the present application, and the embodiment of the present application is applied to the video decoders shown in FIG1 and FIG3. As shown in FIG8, the method of the embodiment of the present application includes:
- the N views are views from N different viewpoints, that is, the viewpoints corresponding to the N views are all different.
- N is a positive integer greater than 1. That is, the N views in the embodiment of the present application include 2 views, 3 views, and any number of views greater than 2.
- the N views are views corresponding to N different viewpoints at the same time.
- the N views are views obtained by N viewpoints at time t, for example, images obtained by cameras at N viewpoints capturing different angles of the same environment at time t.
- the N views are the first image of the k-th GOP of each viewpoint among the N viewpoints, where k is a positive integer.
- the three viewpoints are recorded as the first viewpoint, the second viewpoint and the third viewpoint, and the three views are recorded as the first viewpoint, the second viewpoint and the third viewpoint.
- the video data of the first viewpoint, the video data of the second viewpoint and the video data of the third viewpoint each include 100 images.
- the video data at the viewpoint is first divided into multiple GOP groups. Assume that the 100 images included in the video data of the first viewpoint are divided into 5 groups, recorded as GOP11, GOP12, GOP13, GOP14, GOP15, and each group includes 20 images.
- the 100 images included in the video data of the second viewpoint are divided into 5 groups, recorded as GOP21, GOP22, GOP23, GOP24, GOP25, and each group includes 20 images.
- the 100 images included in the video data of the third viewpoint are divided into 5 groups, which are recorded as GOP31, GOP32, GOP33, GOP34, and GOP35, and each group includes 20 images.
- the above three views may include the first view in GOP11, the first view in GOP21, and the first view in GOP31.
- the above three views may include the first view in GOP12, the first view in GOP22, and the first view in GOP32.
- the above three views may include the first view in GOP13, the first view in GOP23, and the first view in GOP33.
- the above three views may include the first view in GOP14, the first view in GOP24, and the first view in GOP34.
- the above three views may include the first view in GOP15, the first view in GOP25, and the first view in GOP35.
- the decoding end only determines the color deviation fitting function of the first image in each GOP of N different viewpoint videos, and other views in the GOP reuse the color deviation fitting function of the first image in the GOP, which can reduce the number of color deviation fitting functions to be carried by the bitstream. number, thereby saving code words.
- the decoding end decodes the N views.
- the decoding process of any non-basic view in the N views is basically the same.
- the decoding process of the i-th view is taken as an example for explanation, where i is a positive integer less than or equal to N.
- the decoding end first decodes the bitstream to obtain the patch information of the i-th view.
- the patch information is also called patch data.
- the decoding end generates a patch image of the i-th view based on the patch information. For example, the decoding end maps each patch of the i-th view to a blank image using the position information of each patch included in the patch information of the i-th view to obtain a patch image of the i-th view.
- the size of the blank image is consistent with the size of the i-th view, that is, the patch image of the i-th view is consistent with the size of the view.
- the i-th view includes 3 patches, and the 3 patches are mapped to the blank image to obtain a patch image as shown in FIG. 9 .
- the multi-viewpoint video is grouped and each group is encoded separately.
- the above-mentioned N views can be understood as a group of views to be encoded. From the above, it can be seen that when N views are encoded, a basic view is selected from the N views, and the other views in the N views are compared with the basic view to eliminate redundant information, and the patch information after the redundant information is eliminated is encoded. The basic view is fully encoded. In this way, the decoding end can directly obtain the basic view by decoding the code stream, and then restore the other views based on the view and the patch information of other views. It can be seen that in an embodiment of the present application, the above-mentioned i-th view is a non-basic view among the N views.
- M is a positive integer.
- the embodiment of the present application does not limit the specific method of pre-setting block division.
- the preset block division method may be to divide the patch image into M image blocks on average.
- the patch image is divided into M image blocks according to a preset block size.
- the patch image shown in FIG. 9 is evenly divided into 4 image blocks, and each image block is processed separately.
- M is equal to 4.
- the encoding end indicates whether to turn on the color deviation fitting tool through a first flag. Therefore, before the decoding end divides the patch image into M image blocks according to a preset block division method, that is, before executing the above S102, it is necessary to first decode the code stream to obtain the first flag, and then determine whether to execute the above S102 based on the first flag. Specifically, if the first flag indicates that the decoding end turns on the color deviation fitting tool, it means that the code stream may include a color deviation fitting function. At this time, the decoding end divides the patch image into M image blocks according to the preset block division method, and executes the following step S103.
- the decoding end skips the above step S102 and does not execute the following steps S103 and S104.
- the embodiment of the present application does not limit the specific form of the first mark, as long as it is any information that can indicate whether to enable the color deviation fitting tool.
- the field asme_cdpu_enabled_flag is used to represent the first flag. For example, if asme_cdpu_enabled_flag is a first value (such as 0), it indicates that the decoder enables the color deviation fitting tool. For another example, if asme_cdpu_enabled_flag is a second value (such as 1), it indicates that the decoder does not enable the color deviation fitting tool.
- the embodiment of the present application does not limit the specific carrying position of the first flag in the code stream.
- the first flag asme_cdpu_enabled_flag is carried in the atlas sequence parameter set MIV extended syntax.
- Atlas sequence parameter set MIV extended syntax is shown in Table 2:
- the first flag asme_cdpu_enabled_flag can also be used to indicate whether the code stream includes a color shift fitting function.
- the first flag asme_cdpu_enabled_flag is 1, indicating that the code stream may include a color shift function pdu_cdpu_params.
- the first flag asme_cdpu_enabled_flag is 0, indicating that the code stream does not include a color shift function pdu_cdpu_params.
- asme_cdpu_enabled_flag does not exist, its value defaults to 0.
- the color deviation fitting function is obtained by performing chromaticity fitting on the j-th image block corresponding to the j-th image block, the image block is obtained by dividing the first pruning mask map into blocks according to a preset block division method, the first pruning mask map is obtained by pixel pruning of the i-th view, and j is a positive integer less than or equal to M.
- the encoding end when encoding the i-th view among N views, first determines the first pruning mask map of the i-th view, that is, adopts a pixel pruning method to determine the pixels that can be pruned in the i-th view, and sets the mask value (i.e., mask value) of the pixels that can be pruned to a value of 1 (e.g., 0). And determine the pixels that cannot be pruned in the i-th view, set the mask value (i.e., mask value) of the pixels that cannot be pruned in the i-th view to a value of 2 (e.g., 1), and then obtain the first pruning mask map of the i-th view.
- a value of 1 e.g., 0
- the encoding end divides the i-th view into M image blocks according to a preset block division method. Then, the encoder performs chromaticity fitting on each image block in the M image blocks, for example, the jth image block, specifically, determines the unpruned pixels in the jth image block based on the first pruning mask image, and then performs chromaticity fitting on the unpruned pixels in the jth image block to obtain P color deviation fitting functions of the jth image block. Then, the encoder writes the P color deviation fitting functions of the jth image block into the bitstream.
- the encoder performs chromaticity fitting on each image block in the M image blocks, for example, the jth image block, specifically, determines the unpruned pixels in the jth image block based on the first pruning mask image, and then performs chromaticity fitting on the unpruned pixels in the jth image block to obtain P color deviation fitting functions of the jth image block. Then, the encode
- the way in which the decoding end divides the patch image of the i-th view into blocks to obtain M image blocks is the same as the way in which the encoding end divides the i-th view into blocks to obtain M image blocks.
- the decoding end can obtain the color deviation fitting function of the j-th image block by decoding the bit stream, and then determine the color deviation fitting function as the color deviation fitting function of the j-th image block.
- the embodiment of the present application does not limit the specific carrying position of the color shift function in the bit stream.
- the color deviation fitting function is carried in a patch data unit.
- the decoding end can obtain the color deviation fitting function of the j-th image block by decoding the patch data unit.
- the patch data unit includes a patch data unit MIV extended syntax
- the color shift fitting function is carried in the patch data unit MIV extended syntax.
- pdu_cdpu_params[tileID][p] is the parameter of the color deviation fitting function of the j-th image block.
- the pdu_cdpu_params value defaults to an identity transformation matrix, but this value will not be stored in the bitstream.
- the method for the decoding end to reconstruct the j-th image block includes the following steps 1 and 2:
- Step 1 Determine synthetic views of P parent nodes based on the P reconstructed images, and determine P synthetic view blocks of the j-th image block in the synthetic views of the P parent nodes;
- Step 2 Based on the P synthetic view blocks, obtain the reconstruction value of the j-th image block.
- the decoding end projects the reconstructed image of the parent node into the i-th viewport to obtain a synthetic view of the parent node.
- the corresponding image block of the j-th image block is determined in the synthetic view and recorded as a synthetic view block.
- P synthetic view blocks can be obtained.
- a reconstruction value of the j-th image block is obtained.
- the P synthetic view blocks are weighted to obtain a reconstruction value of the j-th image block.
- the decoding end divides the patch image of the i-th view into blocks, and the obtained j-th image block includes the following cases:
- Case 2 some pixels in the j-th image block are cropped, and some pixels are not cropped.
- the decoding end first determines whether the j-th image block includes pruned pixels. If the decoding end determines that the j-th image block includes pruned pixels, the above S103 step is executed to decode the bitstream and obtain the color deviation fitting function corresponding to the j-th image block. If the j-th image block does not include pruned pixels, the decoding end skips the above S103 step and uses a related method, such as the above steps 1 and 2, to decode the j-th image block.
- the decoding end for the j-th image block in situation 2, if the j-th image block includes fewer uncropped pixels, the encoding end, based on the consideration of encoding cost, does not determine the color deviation fitting function of the j-th image block. Based on this, before executing the above S103, the decoding end first determines whether the number of pruned pixels included in the j-th image block is greater than or equal to the preset value. If the decoding end determines that the number of pruned pixels included in the j-th image block is greater than or equal to the preset value, the above step S103 is executed to decode the bitstream and obtain the color deviation fitting function of the j-th image block. If the number of pruned pixels included in the j-th image block is less than the preset value, the decoding end skips the above step S103 and uses a related method, such as the above steps 1 and 2, to decode the j-th image block.
- step S104 After the decoding end determines the color cast fitting function of the j-th image block based on the above steps, it executes the following step S104.
- the decoding end uses the color deviation fitting function to perform pixel fitting on the cropped pixels in the j-th image block to obtain a reconstructed block of the j-th image block.
- the decoding end uses a color deviation fitting function to perform pixel fitting on the cropped pixels in the j-th image block, and the specific method of obtaining the reconstructed block of the j-th image block is not limited.
- the decoder can use the color deviation fitting function to perform pixel fitting on the image block corresponding to the j-th image block in the basic view among N views to obtain a reconstructed block of the j-th image block.
- the above S104 includes the following steps S104-A to S104-C:
- S104-C Based on the P reconstructed images, use the color deviation fitting function to perform pixel fitting on the cropped pixels in the j-th image block to obtain a reconstructed block of the j-th image block.
- the decoding end first determines a directed acyclic graph of the pruning hierarchy of N views, each node in the directed acyclic graph corresponds to one of the N views, and the view corresponding to the child node in the directed acyclic graph refers to the view of each parent node of the child node when pruning.
- the decoding end decodes the i-th view among the N views, it can determine the parent node of the i-th view from the directed acyclic graph determined above.
- the parent node of the i-th view may include one or more, which are recorded as P parent nodes for ease of description, where P is a positive integer.
- the decoding end decodes from the root node to the child node based on the directed acyclic graph when decoding N views, when decoding the i-th view, the views on the P parent nodes of the i-th view have been decoded or reconstructed. In this way, when decoding the i-th view, the decoding end can directly obtain the reconstructed image of the view corresponding to the P parent nodes of the i-th view, that is, obtain P reconstructed images.
- the decoding end can use the color deviation fitting function based on the P reconstructed images to obtain the reconstructed block of the j-th image block for the cropped pixels in the j-th image block, thereby realizing accurate reconstruction of the j-th image block.
- the embodiment of the present application does not limit the specific manner in which the decoding end determines the directed acyclic graph of the pruning levels of N views.
- the decoding end and the encoding end determine the default directed acyclic graph as a directed acyclic graph of the pruning levels of N views.
- the encoder determines a directed acyclic graph of the pruning levels of N views, and indicates the directed acyclic graph of the pruning levels of the N views to the decoder. For example, the encoder writes the relevant information of the determined directed acyclic graph of the pruning levels of the N views into the bitstream. In this way, the decoder can obtain the directed acyclic graph of the pruning levels of the N views by decoding the bitstream.
- the directed acyclic graph of the pruning hierarchy of N views determined by the encoder is shown in FIG11.
- the encoder can write information that view V4 is a basic view into the bitstream, and information that view V1 is encoded based on view V4, information that view V3 is encoded based on view V4 and view V1, and information that view V0 is encoded based on view V4, view V1, and view V3 into the bitstream.
- the decoder can determine the reference relationship between views V0, V1, V3, and V4 by decoding the bitstream, and then restore the directed acyclic graph shown in FIG11 based on the reference relationship.
- the decoder determines the directed acyclic graph of the pruning levels of the N views based on the above steps, it executes the above step S104-B to determine the P reconstructed images corresponding to the P parent nodes of the i-th view based on the directed acyclic graph.
- the parent node of view V1 is view V4, that is, the base view, that is, view V1 refers to base view V4 during decoding.
- the parent nodes of view V3 include two, view V4 and view V1, that is, view V3 refers to base view V4 and view V1 during decoding.
- the parent nodes of view V0 include three, view V4, view V1, and view V3, that is, view V0 refers to view V4, view V1, and view V3 during decoding.
- the decoder determines the P parent nodes of the i-th view, it can obtain the reconstructed image corresponding to each of the P parent nodes. Since one parent node corresponds to one view, P reconstructed images can be obtained for the P parent nodes.
- the color deviation fitting function obtained by the above decoding is used to perform pixel fitting on the cropped pixels in the j-th image block to obtain a reconstructed block of the j-th image block.
- the decoding end performs pixel fitting on the cropped pixels in the j-th image block based on the P reconstructed images using the color deviation fitting function obtained by the above decoding to obtain the specific method of reconstructing the j-th image block.
- the decoding end uses a color deviation fitting function to perform pixel fitting on the P reconstructed images to obtain a fitting image, and determines the image block corresponding to the j-th image block in the fitting image as the reconstructed block of the j-th image block.
- the above S104-C includes the following steps S104-C1 to S104-C3:
- S104-C1 determining synthetic views of P parent nodes based on the P reconstructed images, and determining P synthetic view blocks of the j-th image block in the synthetic views of the P parent nodes;
- weighted fusion is performed on the P color-shifted reconstruction blocks to obtain a reconstruction block of the j-th image block.
- the decoding end projects the reconstructed image of the parent node onto the i-th viewport to obtain a synthetic view of the parent node, and determines the corresponding image block of the j-th image block on the synthetic view, which is recorded as a synthetic view block.
- the P parent nodes can obtain P synthetic view blocks. For example, in the synthetic view of the first parent node among the P parent nodes, a synthetic view block of the j-th image block is determined, and in the synthetic view of the second parent node among the P parent nodes, a synthetic view block of the j-th image block is determined, and so on, P synthetic view blocks can be obtained.
- the decoding end determines the P synthetic view blocks of the j-th image block in the P synthetic views, and then executes the above steps S104-C2, and fits the pruned pixels in the j-th image block based on the P synthetic view blocks using P color deviation fitting functions to obtain a reconstructed block of the j-th image block.
- the embodiment of the present application does not limit the specific manner in which the decoding end fits the pruned pixels in the j-th image block based on P synthetic view blocks using P color deviation fitting functions to obtain the reconstructed block of the j-th image block.
- P color deviation fitting functions are used to fit P synthetic view blocks respectively, to obtain P color deviation reconstruction blocks of the j-th image block.
- the color deviation fitting function corresponding to the synthetic view block 1 is used to fit synthetic view block 1, to obtain color deviation reconstruction block 1 of the j-th image block.
- the color deviation fitting function corresponding to the synthetic view block 2 is used to fit synthetic view block 2, to obtain color deviation reconstruction block 2 of the j-th image block.
- P color deviation reconstruction blocks of the j-th image block can be obtained. Then, these P color deviation reconstruction blocks are weighted fused to obtain a fused reconstruction block.
- the pixel values of the cropped pixels in the jth image block are determined. For example, for the cropped pixel 1 in the jth image block, the reconstruction value of the pixel corresponding to the pixel 1 in the fused reconstructed block is determined as the reconstruction value of the pixel 1, so that the reconstruction value of each cropped pixel in the jth image block can be determined. Then, the reconstruction values of the cropped pixels in the jth image block and the patch values of the non-cropped pixels in the jth image block are combined to obtain the reconstructed block of the jth image block.
- the decoding end restores the cropped pixels in the j-th image block pixel by pixel.
- the above S104-C2 includes the following steps S104-C21 to S104-C23:
- S104-C22 based on the P synthetic pixels and the surrounding pixels of the P synthetic pixels, use the color deviation fitting function to perform pixel fitting on the k-th cropped pixel to obtain a reconstructed value of the k-th cropped pixel;
- the process of determining the reconstruction value of each cropped pixel in the jth image block by the decoding end is consistent.
- the kth cropped pixel in the jth image block is taken as an example for explanation.
- the decoding end first determines the corresponding pixel of the k-th pruned pixel in each of the P synthesized image blocks corresponding to the j-th image block, and records them as synthesized pixel points.
- a corresponding pixel of the k-th pruned pixel is determined, and in the second synthesized image block among the P synthesized image blocks, a corresponding pixel of the k-th pruned pixel is determined, and so on, P corresponding pixel points of the k-th pruned pixel can be determined in the P synthesized image blocks, and recorded as P synthesized pixel points.
- the P synthetic pixels have been reconstructed, and the decoding end can use P color deviation fitting functions to perform pixel fitting on the kth cropped pixel based on the reconstructed values of the P synthetic pixels to obtain the reconstructed value of the kth pruned pixel.
- the color deviation fitting function corresponding to the synthetic pixel is used to fit the reconstructed value of the synthetic pixel and the surrounding pixels of the synthetic pixel (for example, the pixel pattern in the 3X3 area) to obtain a color deviation fitting value.
- P color deviation fitting values can be obtained for the P synthetic pixels, and the P color deviation values are weighted and fused to determine the color deviation reconstruction value of the kth pruned pixel.
- the embodiment of the present application does not limit the specific expression form of the color cast fitting function.
- the color cast fitting function is a weighted least squares fitting function, so that the decoding end can perform weighted least squares fitting on the reconstructed value of the synthesized pixel point and the surrounding pixels of the synthesized pixel point to obtain the reconstructed value of the kth pruned pixel point.
- the decoding end can determine the reconstruction value of each cropped pixel in the j-th image block, and then obtain the reconstruction value of the j-th image block.
- the above embodiment introduces the process of determining the reconstruction value of the j-th image block in the patch image of the i-th view at the decoding end, and the reconstruction values of other image blocks in the patch image can be determined by referring to the determination process of the reconstruction value of the j-th image block, and then the reconstructed image of the i-th view can be obtained.
- the above describes the decoding process of determining the i-th view among N views by the decoding end.
- the decoding end can decode other non-basic views among the N views with reference to the decoding process of the i-th view, thereby completing the decoding of the N views.
- the encoder when encoding, merges the patch information of the two groups of views, wherein view V4 and view V6 are basic views, views V4, V1, V3, and V0 are decoded separately as a group of views, and views V6, V8, V7, and V5 are decoded separately as another group of views.
- the decoder When decoding the first group of views V4, V1, V3, and V0, the decoder first decodes the bitstream to obtain the patch information shown in FIG12, and obtains the patch information of the first group of views V4, V1, V3, and V0 from the patch information, wherein the basic view V4 is the basic view, and the entire information is encoded, so the basic view V4 can be reconstructed from the patch information.
- the decoder determines a directed acyclic graph of the pruning hierarchy of the first group of views V4, V1, V3, and V0, as shown in the upper left corner of FIG12. Based on the directed acyclic graph and the basic view V4, the view V1 is decoded, and specifically, based on the patch information of the view V1, a patch image of the view V1 is generated. Next, the patch image of view V1 is divided into M image blocks. For each of the M image blocks, for example, the j-th image block, the code stream is decoded to obtain the color deviation fitting function of the image block. Next, the cropped pixels in the j-th image block are pixel-fitted using the color deviation fitting function to obtain a reconstructed block of the j-th image block.
- the color deviation fitting function is used to perform pixel fitting on the cropped pixels in the j-th image block to obtain a reconstructed block of the j-th image block. Repeating this process can achieve decoding of each of the M image blocks included in the patch image of view V1, thereby obtaining a decoded image of view V1.
- view V3 is decoded to obtain a reconstructed image of view V3.
- view V0 is decoded to obtain a reconstructed image of view V0.
- V6, V8, V7, and V5 in the second group of views the same method can be used to decode and obtain the reconstructed images corresponding to V6, V8, V7, and V5 in the second group of views.
- the video decoding method provided in the embodiment of the present application is that when the decoding end decodes the multi-view video, for the i-th view among N views, the decoding end decodes the bitstream, determines the patch information of the i-th view, and generates the patch image of the i-th view based on the patch information, where the N views are views of N different viewpoints, N is a positive integer greater than 1, and i is a positive integer less than or equal to N; the patch image is divided into M image blocks according to a preset block division method, where M is a positive integer; for the j-th image block among the M image blocks, the decoding bitstream is performed to obtain P color deviation fitting functions of the j-th image block; using the P color deviation fitting functions, the cropped pixels in the j-th image block are pixel-fitted to obtain a reconstructed block of the j-th image block.
- the embodiment of the present application performs fitting and reconstruction on the cropped pixels in the i-th view based on the color deviation fitting function, which can reduce the number of pixels that need to be encoded by the encoding end, thereby reducing the encoding cost of the encoding end and improving the decoding efficiency of the multi-view video.
- the above describes the multi-view video decoding method of the present application by taking the decoding end as an example, and the following describes it by taking the encoding end as an example.
- FIG13 is a schematic diagram of a video encoding method flow chart provided by an embodiment of the present application, and the embodiment of the present application is applied to the video encoders shown in FIG1 and FIG2. As shown in FIG13, the method of the embodiment of the present application includes:
- the current pixel pruning method determines the pruned pixels, and does not prune the blocks with too large color difference, but retains them.
- FIG. 14 shows four views taken from different perspectives. Due to camera parameters and other reasons, the color difference of the image blocks in the box taken by cameras from different viewpoints is large. The current pixel pruning method retains the area with large color difference, which will increase the encoding burden and encoding cost of the encoding end, thereby reducing the encoding efficiency.
- the embodiment of the present application performs chromaticity fitting function fitting on the area with large color difference, so that the encoding end does not need to retain the area with large color difference, but encodes the color deviation fitting function of the area into the bitstream.
- the decoding end can obtain the color deviation fitting function of the area by decoding the bitstream, and then obtain the area based on the color deviation fitting function.
- the encoding cost of the encoding end can be reduced while ensuring the decoding accuracy in advance, and the encoding efficiency of multi-viewpoint video can be improved.
- the N views are views from N different viewpoints, that is, the viewpoints corresponding to the N views are all different.
- N is a positive integer greater than 1. That is, the N views in the embodiment of the present application include 2 views, 3 views, and any number of views greater than 2.
- the N views are views corresponding to N different viewpoints at the same time.
- the N views are views obtained by N viewpoints at time t, for example, images obtained by cameras at N viewpoints capturing different angles of the same environment at time t.
- the N views are the first image of the k-th GOP of each viewpoint among the N viewpoints, where k is a positive integer.
- N 3 viewpoints, respectively recorded as the first viewpoint, the second viewpoint and the third viewpoint, and 3 views, respectively recorded as the first viewpoint, the second viewpoint and the third viewpoint.
- the video data of the first viewpoint, the video data of the second viewpoint and the video data of the third viewpoint each include 100 images.
- the 100 images included in the video data of the first viewpoint are divided into 5 groups, recorded as GOP11, GOP12, GOP13, GOP14, GOP15, and each group includes 20 images.
- the 100 images included in the video data of the second viewpoint are divided into 5 groups, recorded as GOP21, GOP22, GOP23, GOP24, GOP25, and each group includes 20 images.
- the 100 images included in the video data of the third viewpoint are divided into 5 groups, which are recorded as GOP31, GOP32, GOP33, GOP34, and GOP35, and each group includes 20 images.
- the above three views may include the first view in GOP11, the first view in GOP21, and the first view in GOP31.
- the above three views may include the first view in GOP12, the first view in GOP22, and the first view in GOP32.
- the above three views may include the first view in GOP13, the first view in GOP23, and the first view in GOP33.
- the above three views may include the first view in GOP14, the first view in GOP24, and the first view in GOP34.
- the above three views may include the first view in GOP15, the first view in GOP25, and the first view in GOP35.
- the encoding end only determines the color deviation fitting function of the first image in each GOP of N different viewpoint videos, and other views in the GOP reuse the color deviation fitting function of the first image in the GOP. This can reduce the number of color deviation fitting functions to be carried by the code stream, thereby saving codewords.
- the multi-viewpoint video is grouped and each group is encoded separately.
- the above N views can be understood as a group of views to be encoded. It can be seen from the above that when N views are encoded, a basic view is selected from the N views, and other views in the N views are compared with the basic view to eliminate redundant information, and the patch information after eliminating the redundant information is encoded. The basic view is fully encoded. In this way, the decoding end can directly obtain the basic view by decoding the code stream, and then restore other views based on the view and the patch information of other views. It can be seen that in the embodiment of the present application, the encoding process of other views except the basic view in the N views is mainly introduced. That is to say, the above i-th view is a non-basic view in the N views.
- the encoding process of any non-basic view among the N views at the encoding end is basically the same.
- the encoding process of the i-th view is taken as an example for explanation, where i is a positive integer less than or equal to N.
- the encoding end first determines the first pruning mask map of the i-th view.
- the first pruning mask map is a pruning mask map obtained by performing pixel pruning on the i-th view. That is to say, the encoding end adopts a pixel pruning method to determine the pixels that can be pruned in the i-th view. If it is determined that the pixel can be pruned, the mask value (i.e., mask value) of the pixel that can be pruned is set to a value of 1 (e.g., 0). If it is determined that the pixel cannot be pruned, the mask value (i.e., mask value) of the pixel that cannot be pruned is set to a value of 2 (e.g., 1).
- the embodiment of the present application does not limit the specific method for the encoder to determine the first pruning mask map of the i-th view.
- the encoding end may project the i-th view into the base view, determine the pixels that can be hidden in the i-th view based on the difference (or similarity) between the i-th view and the base view, and then obtain the first pruning mask map of the i-th view. For example, for a certain pixel 1 in the i-th view, the encoding end projects the pixel 1 into the base view through the internal and external parameters of the camera and the depth information, assuming that the projection point of the pixel 1 in the base view is the pixel 2. By determining the similarity between the pixel 1 and the pixel 2, it is determined whether to crop the pixel 1.
- the similarity between the pixel 1 and the pixel 2 is greater than or equal to a preset value, it is determined that the pixel 1 can be cropped, and then the mask value corresponding to the pixel 1 is set to 0. If the similarity between the pixel 1 and the pixel 2 is less than a preset value, it is determined that the pixel 1 cannot be cropped, and then the mask value corresponding to the pixel 1 is set to 1. Based on this step, the first pruning mask map of the i-th view can be determined.
- the encoder may determine the first pruning mask image of the i-th view through the following steps S201-A to S201-C:
- S201-C Based on P synthetic views, perform pixel pruning on the i-th view to obtain a first pruning mask image.
- the encoding end first generates a directed acyclic graph of the pruning hierarchy of N views, each node in the directed acyclic graph corresponds to one of the N views, and the view corresponding to the child node in the directed acyclic graph refers to the view of each parent node of the child node when pruning.
- the encoding end encodes the i-th view among the N views, it can determine the parent node of the i-th view from the directed acyclic graph determined above.
- the parent node of the i-th view may include one or more, and for the convenience of description, it is recorded as P parent nodes, where P is a positive integer.
- the encoding end encodes from the root node to the child node based on the directed acyclic graph when encoding the N views, therefore, when encoding the i-th view, the views on the P parent nodes of the i-th view are projected into the i-th view viewport respectively to obtain P synthetic views. In this way, the encoding end can perform pixel pruning on the i-th view based on the P synthetic views to obtain a first pruning mask map.
- a basic view such as N0
- views other than the basic view in the N views are recorded as additional views, such as N1, N2, and N3.
- additional views such as N1, N2, and N3.
- Step 1 Take the basic view N0 as the root node of the directed acyclic graph.
- Step 2 project all the pixels of the basic view N0 into each additional view, determine the pruning mask map of each additional view, that is, the similarity between the pixel points in the additional view and the projected pixel points. If the similarity is large, the mask value of the pixel point is set to 0; if the similarity is small, the mask value of the pixel point is set to 1. Repeat this step to obtain the pruning mask map of the additional view.
- Step 3 Based on the pruning mask images of each additional view, select the additional view with the smallest mask area, that is, the additional view with the largest number of retained pixels, from these additional views.
- Step 4 make the selected accessory view as the child node of all nodes in the directed acyclic graph. If all views are assigned to nodes in the directed acyclic graph, stop, otherwise execute step 5.
- Step 5 Project all the retained pixels (ie, the pixels that are not cropped) of the selected view to the remaining additional views.
- Step 7 update the clipping mask for each remaining additional view.
- Step 8 skip to step 4.
- the encoder indicates the directed acyclic graph of the pruning hierarchy of N views to the decoder.
- the directed acyclic graph of the pruning hierarchy of N views determined by the encoder is shown in FIG11.
- the encoder can write information that view V4 is a basic view into the bitstream, and information that view V1 is encoded based on view V4, information that view V3 is encoded based on view V4 and view V1, and information that view V0 is encoded based on view V4, view V1, and view V3 into the bitstream.
- the decoder can determine the reference relationship between views V0, V1, V3, and V4 by decoding the bitstream, and then restore the directed acyclic graph shown in FIG11 based on the reference relationship.
- the encoder After the encoder generates a directed acyclic graph of the pruning hierarchy of N views based on the above steps, it executes the above step S201-B to determine P synthetic views of the P parent nodes of the i-th view based on the directed acyclic graph.
- the parent node of view V1 is view V4, that is, the basic view, that is, view V1 refers to basic view V4 during encoding.
- the parent nodes of view V3 include two, view V4 and view V1, that is, view V3 refers to basic view V4 and view V1 during encoding.
- the parent nodes of view V0 include three, view V4, view V1, and view V3, that is, view V0 refers to view V4, view V1, and view V3 during encoding.
- the encoder determines the P parent nodes of the i-th view, it can determine the view of each parent node of the P parent nodes and project it into the i-th viewport to obtain the synthetic view of the parent node. Since one parent node corresponds to one synthetic view, P synthetic views can be obtained for P parent nodes.
- the encoder After the encoder determines P synthetic views, it executes the above step S201-C to perform pixel pruning on the i-th view based on the P synthetic views to obtain a first pruning mask image.
- the embodiment of the present application does not limit the specific manner in which the encoding end performs pixel pruning on the i-th view based on P synthetic views to obtain the first pruning mask map.
- the encoder determines the similarity between each pixel in the i-th view and the corresponding pixel in the P synthetic views, thereby obtaining a first pruning mask image.
- the above S201-C includes the following steps S201-C1 to S201-C2:
- S201-C2 Based on the difference between the chroma components of the i-th view and the chroma components of the P synthetic views, query the non-pruned pixels in the pruned pixels included in the second pruning mask map to obtain a first pruning mask map.
- the pixel pruning module detects repeated parts between views so that the repeated pixels can be pruned in subsequent processing.
- an arbitrary pixel 1 of the i-th view is mapped to a pixel 2 of one of the P synthetic views and its 3x3 neighborhood through the internal and external parameters of the camera and the depth information.
- pixel pruning mainly includes two stages.
- the depth value difference between pixel 1 and pixel 2 should be less than a first threshold t1.
- the depth value comparison is performed in a pixel-to-pixel manner.
- the minimum value of the brightness value difference between pixel 1 and all pixels in the 3 ⁇ 3 pixel block should be smaller than the second threshold t2, and the brightness value comparison is performed in a pixel-to-block manner.
- the suspected similar pixel pairs enter the second-pass process:
- the second stage mainly updates the pruning mask created during the initial pruning stage and re-identifies the pixels not to be pruned among the pixels initially determined to be pruned.
- the main purpose of this process is to take into account the global color component differences that may exist between different views.
- the process is as follows and applies to each pruning pair. Specifically, for the pixels determined to be pruned, the pixel-by-pixel color difference between the P synthetic views and the i-th view is calculated. By using the least squares method, a fitting function that can best model these color differences is calculated. Pixels that meet the fitting function within a specific range defined by the threshold are judged as inner pixels, and these pixels are retained as pixels to be pruned. At the same time, outliers are updated to not be pruned within the pruning mask. After the second-pass process is completed, the similar pixel pairs to be finally cropped can be obtained, and then the first pruning mask map of the i-th view is obtained.
- the encoder After the encoder determines the first pruning mask image of the i-th view based on the above steps, it performs the following step S202.
- S202 Divide the i-th view into M image blocks based on a preset block division method.
- M is a positive integer.
- the embodiment of the present application does not limit the specific method of pre-setting block division.
- the preset block division method may be to divide the i-th view into M image blocks on average.
- the i-th view is divided into M image blocks according to a preset block size.
- the encoder before executing the above S202, the encoder first determines a first flag, which is used to indicate whether the encoder turns on the color deviation fitting tool. If the first flag indicates that the encoder turns on the color deviation fitting tool, the encoder executes the above S202, and based on a preset block division method, the i-th view is divided into M image blocks.
- the encoding end writes the first flag into the bitstream.
- the first flag indicates that the color cast fitting tool is turned on
- the first flag is set to a first value (eg, 1).
- the first flag indicates that the color cast fitting tool is not enabled, the first flag is set to a first value (eg, 1).
- the embodiment of the present application does not limit the specific form of the first mark, as long as it is any information that can indicate whether to enable the color deviation fitting tool.
- the field asme_cdpu_enabled_flag is used to represent the first flag. For example, if asme_cdpu_enabled_flag is a first value (such as 0), it indicates that the color cast fitting tool is enabled. For another example, if asme_cdpu_enabled_flag is a second value (such as 1), it indicates that the color cast fitting tool is not enabled.
- the embodiment of the present application does not limit the specific carrying position of the first flag in the code stream.
- the first flag asme_cdpu_enabled_flag is carried in the atlas sequence parameter set MIV extended syntax.
- Atlas sequence parameter set MIV extended syntax is shown in Table 2.
- j is a positive integer less than or equal to M.
- the encoding end obtains the first pruning mask image based on the pixel pruning method, so that the unpruned pixels in each of the M image blocks of the i-th view can be determined based on the first pruning mask image, and then the unpruned pixels in each image block are pruned again to reduce the amount of coded data and reduce the coding cost.
- the encoding end determines the color deviation fitting function of each of the M image blocks, and based on the color deviation fitting function, the unpruned pixels in the image block are pruned again to further reduce the coding cost.
- the method for the encoding end to determine the chromaticity fitting function of each of the M image blocks is basically the same.
- the j-th image block is again taken as an example for explanation.
- the embodiment of the present application does not limit the specific manner in which the encoding end determines the color cast fitting function of the j-th image block.
- the encoding end determines the corresponding pixel points of the unpruned pixel points in the j-th image block in the basic view, determines P color deviation fitting functions based on these corresponding pixel points and the unpruned pixel points in the j-th image block, and determines the P color deviation fitting functions as the color deviation fitting functions of the j-th image block.
- the above S203 includes the following steps S203-A to S203-B:
- S203-B Based on the chromaticity values of the pixels included in the P synthetic view blocks and the chromaticity values of the pixels included in the j-th image block, perform chromaticity fitting on the unpruned pixels in the j-th image block to obtain P color deviation fitting functions.
- the encoding end determines P synthetic views of the i-th view based on the above steps, so that P color deviation fitting functions of the j-th image block can be determined based on the P synthetic views.
- the encoder first determines an image block corresponding to the j-th image block in each of the P synthetic views determined above, and records it as a synthetic view block, and then obtains P synthetic view blocks. For example, in the first synthetic view of the P synthetic views, a synthetic view block of the j-th image block is determined, and in the second synthetic view of the P synthetic views, a synthetic view block of the j-th image block is determined, and so on, and P synthetic view blocks can be obtained.
- the encoding end determines the P synthetic view blocks of the j-th image block in the P synthetic views, and then executes the above step S203-B to perform chromaticity fitting on the unpruned pixels in the j-th image block based on the chromaticity values of the pixels included in the P synthetic view blocks and the chromaticity values of the pixels included in the j-th image block, so as to obtain P color deviation fitting functions.
- the embodiment of the present application does not limit the specific method of obtaining P color deviation fitting functions by performing chromaticity fitting on the unpruned pixels in the j-th image block based on the chromaticity values of the pixels included in the P synthetic view blocks and the chromaticity values of the pixels included in the j-th image block at the encoding end.
- the encoder performs least squares fitting (eg, weighted minimum multiplication fitting) based on the chrominance value of the synthetic view block and the chrominance value of the j-th image block to obtain a color deviation fitting function.
- least squares fitting eg, weighted minimum multiplication fitting
- the encoder determines, in the synthetic view block, a synthetic pixel corresponding to an uncropped pixel in the j-th image block; and determines a color deviation fitting function based on the chromaticity value of the synthetic pixel and the chromaticity value of the uncropped pixel in the j-th image block.
- a synthetic pixel 2 corresponding to the pixel 1 is determined in the synthetic view block, and chromaticity fitting is performed on the chromaticity value of the pixel 1 based on the chromaticity values of the synthetic pixel 2 and the pixels around the synthetic pixel 2 (such as the pixels in a 3X3 area). Based on this method, a color deviation fitting function can be obtained by fitting the uncropped pixels in the j-th image block.
- the embodiment of the present application does not limit the specific fitting method used in the color shift fitting process.
- the encoding end performs weighted least squares fitting based on the chromaticity values of the synthesized pixel points and the chromaticity values of the uncropped pixel points in the j-th image block to obtain a color deviation fitting function.
- the embodiment of the present application before the encoding end performs chromaticity fitting on the unpruned pixels in the jth image block to obtain the color deviation fitting function of the jth image block, the embodiment of the present application also includes: determining whether the jth image block includes unpruned pixels; if the jth image block includes unpruned pixels, performing chromaticity fitting on the unpruned pixels in the jth image block to obtain the color deviation fitting function.
- chromaticity fitting is performed on the untrimmed pixels in the j-th image block to obtain a color deviation fitting function, including: if the number of untrimmed pixels included in the j-th image block is greater than or equal to a first preset value, chromaticity fitting is performed on the untrimmed pixels in the j-th image block to obtain a color deviation fitting function.
- the encoder determines P color deviation fitting functions of the j-th image block based on the above steps, it executes the following step S204.
- the encoding end determines P color deviation fitting functions of the j-th image block based on the above steps, and uses the P color deviation fitting functions to prune the unpruned pixels in the j-th image block to further reduce the encoding cost of the encoding end.
- the embodiment of the present application does not limit the specific method of trimming the untrimmed pixels in the j-th image block based on P color cast fitting functions to obtain the patch information of the i-th view.
- P color deviation fitting functions are used to determine the un-pruned pixel.
- the pixel value of the pixel point is fitted, and then the fitting error is determined based on the comparison between the fitted pixel value and the original pixel value of the pixel point. If the fitting error is less than a preset value, the unpruned pixel point is determined as a pixel point that can be pruned.
- the encoder before the encoder uses P color deviation fitting functions to prune the unpruned pixels in the j-th image block, it is necessary to first determine whether the color deviation fitting function is valid. That is, it is determined whether the fitting error of the color deviation fitting function is less than or equal to the second preset value. If the fitting errors of the P color deviation fitting functions are less than or equal to the second preset value, it means that the P color deviation fitting functions are valid, and the P color deviation fitting functions are used to prune the unpruned pixels in the j-th image block. If the P color deviation fitting functions are invalid, the encoder uses the patch information of the i-th view based on the first pruning mask map.
- the encoding end can determine the fitting errors of P color deviation fitting functions through the following steps 3 and 4.
- Step 3 for the kth untrimmed pixel in the jth image block, perform pixel fitting on the kth untrimmed pixel based on P color deviation fitting functions to determine the fitting error of the kth untrimmed pixel, where k is a positive integer;
- Step 4 Determine the fitting errors of P color deviation fitting functions based on the fitting errors of the unpruned pixels in the j-th image block.
- the encoder first determines the fitting error of each unpruned pixel in the jth image block based on P color deviation fitting functions.
- the process of determining the fitting error of each unpruned pixel by the encoder is basically the same.
- the kth unpruned pixel is taken as an example for description.
- the embodiment of the present application does not limit the specific method of performing pixel fitting on the kth unpruned pixel based on P color deviation fitting functions and determining the fitting error of the kth unpruned pixel.
- pixel fitting is performed on the kth untrimmed pixel based on P color deviation fitting functions to obtain a fitting pixel value of the kth untrimmed pixel, and a fitting error of the kth untrimmed pixel is determined based on the fitting pixel value of the kth untrimmed pixel and the original pixel value of the kth untrimmed pixel.
- the pixel value includes a YUV value.
- performing pixel fitting on the kth unpruned pixel based on the P color deviation fitting functions and determining the fitting error of the kth unpruned pixel comprises the following steps:
- Step 31 Use P color deviation fitting functions to perform chromaticity fitting on the kth unpruned pixel to obtain a fitting chromaticity value of the kth unpruned pixel;
- Step 32 Determine the fitting error of the kth unpruned pixel based on the fitted chromaticity value of the kth unpruned pixel and the chromaticity value of the kth unpruned pixel.
- the color deviation error is determined by using P color deviation fitting functions to perform chromaticity fitting on the kth untrimmed pixel to obtain the fitted chromaticity value of the kth untrimmed pixel, and determining the fitting error of the kth untrimmed pixel based on the fitted chromaticity value of the kth untrimmed pixel and the chromaticity value of the kth untrimmed pixel.
- the encoder can determine the fitting error of each unpruned pixel in the j-th image block, and then execute the above step 4.
- the implementation methods of the above step 4 include but are not limited to the following:
- the sum of the fitting errors of the unpruned pixels in the j-th image block is determined as the fitting error of the color deviation fitting function.
- Method 2 The average value of the fitting errors of the unpruned pixels in the j-th image block is determined as the fitting error of the color deviation fitting function.
- the encoder determines the fitting errors of the P color deviation fitting functions of the j-th image block. If the fitting errors of the P color deviation fitting functions are less than or equal to the second preset value, the P color deviation fitting functions are used to trim the untrimmed pixels in the j-th image block to obtain the patch information of the i-th view. If the P color deviation fitting functions are invalid, the encoder determines the patch information of the i-th view based on the first trimming mask map.
- the specific method in which the encoder prunes the unpruned pixels in the j-th image block based on the color cast fitting function to obtain the patch information of the i-th view is not limited.
- the encoder processes each of the M image blocks in the first pruning mask image and obtains the following situations:
- situation 3 the encoding end adopts relevant technology for encoding, and the embodiment of the present application mainly encodes the above-mentioned situation 1 and situation 2.
- the encoding end prunes all unpruned pixels in an image block having a valid color deviation fitting function, and determines patch information of the i-th view based on the image block after all unpruned pixels in the image block having a valid color deviation fitting function are pruned.
- S204 includes the following steps S204-A and S204-B:
- the encoding end performs pixel-by-pixel pruning on the unpruned pixels in the j-th image block based on the P color deviation fitting functions to obtain the third cropping mask map of the j-th image block.
- the third cropping mask map of the image block with the effective color deviation fitting function in the i-th view can be determined.
- the first cropping mask map is updated based on the third cropping mask map, and the patch information of the i-th view is determined based on the updated first cropping mask map.
- the following describes a specific method of trimming the untrimmed pixels in the j-th image block based on the P color cast fitting functions in S204-A to obtain the third trimming mask image of the j-th image block.
- the specific method in which the encoding end trims the untrimmed pixels in the j-th image block based on P color deviation fitting functions to obtain the third cropping mask image of the j-th image block is not limited.
- the encoding end trims all untrimmed pixels in the j-th image block based on P color deviation fitting functions to obtain a third cropping mask image of the j-th image block.
- the encoder cuts off the k-th unpruned pixel of the j-th image block to obtain a third cropping mask image of the j-th image block. If the fitting error of the k-th unpruned pixel is greater than the third preset value, the encoder does not trim the k-th unpruned pixel of the j-th image block to obtain a third cropping mask image of the j-th image block.
- the encoder can determine the patch information of the i-th view and the color deviation fitting function. Then, the encoder encodes the patch information and the color deviation fitting function to obtain a bitstream.
- the color deviation fitting function is encoded. That is, the encoding end writes the color deviation fitting function into the bitstream only when it determines that the color deviation fitting function of the image block is valid, that is, the fitting error of the color deviation fitting function is less than or equal to the second preset value.
- the encoding end skips the step of encoding the color deviation fitting function if the fitting error of the color deviation fitting function is greater than a second preset value.
- the embodiment of the present application does not limit the specific carrying position of the color shift function in the code stream.
- the encoder writes the chromaticity function into the patch data unit.
- the decoder can obtain the color cast fitting function by decoding the patch data unit.
- the patch data unit includes a patch data unit MIV extended syntax
- the color shift fitting function is carried in the patch data unit MIV extended syntax.
- the above describes the encoding process of the i-th view among the N views by the encoder.
- the encoder can refer to the encoding process of the i-th view to encode other non-basic views among the N views, thereby completing the encoding of the N views.
- the encoder when encoding, encodes two groups of views separately, wherein view V4 and view V6 are basic views, views V4, V1, V3, and V0 are individually encoded as a group of views, and views V6, V8, V7, and V5 are individually encoded as another group of views.
- the encoder first determines the first pruning mask map of the i-th view.
- view V1 is pixel-cropped, that is, for each pixel in view V1, it is determined whether the difference between the brightness value of the pixel and the brightness value of the projected pixel of the pixel in the basic view V4 is less than a preset value. If the difference between the brightness is greater than the preset value, the mask value of the pixel is set to 1. If the difference between the brightness is less than or equal to the preset value, the second stage of cropping is performed. Specifically, it is determined whether the deviation between the chromaticity value of the pixel and the chromaticity value of the projected pixel is greater than a preset value.
- the mask value of the pixel is set to 1. If the deviation between the chromaticities is greater than a preset value, the mask value of the pixel is set to 1. If the deviation between the chromaticities is less than a preset value, the method of the embodiment of the present application is executed. Specifically, based on the above steps, the first pruning mask map of view V1 can be determined, and the i-th view is divided into M image blocks based on a preset block division method; for the j-th image block in the M image blocks, the unpruned pixels in the j-th image block are determined based on the first pruning mask map, and the unpruned pixels in the j-th image block are chromatically fitted to obtain a color deviation fitting function of the j-th image block; the unpruned pixels in the j-th image block are trimmed based on the color deviation fitting function to obtain patch information of the i-th view; the patch information and the color deviation fitting function are encoded to obtain a code stream. Based on
- V6, V8, V7, and V5 in the second group of views the same method can be used to encode the patch information and color deviation fitting functions corresponding to V6, V8, V7, and V5 in the second group of views, and then encode the patch information and the color deviation fitting function to obtain a bitstream.
- the video encoding method provided in the embodiment of the present application when encoding a multi-view video, first determines the first pruning mask map of the i-th view in N views based on the pixel pruning method. Then, based on the preset block division method, the i-th view is divided into M image blocks, and for the j-th image block in the M image blocks, the unpruned pixels in the j-th image block are determined based on the first pruning mask map, and the unpruned pixels in the j-th image block are chromatically fitted to obtain the color deviation fitting function of the j-th image block.
- the unpruned pixels in the j-th image block are pruned based on the color deviation fitting function to obtain the patch information of the i-th view.
- the above patch information and the color deviation fitting function are encoded to obtain a code stream. That is, the embodiment of the present application prunes the first pruning mask map obtained by pixel pruning again based on the color deviation fitting function to further reduce the number of unpruned pixels, reduce the amount of data required to be encoded at the encoding end, and thus reduce the encoding cost of the encoding end, and improve the encoding efficiency of the multi-view video.
- Figures 8 to 16 are merely examples of the present application and should not be construed as limitations to the present application.
- the size of the sequence number of each process does not mean the order of execution, and the execution order of each process should be determined by its function and internal logic, and should not constitute any limitation on the implementation process of the embodiments of the present application.
- the term "and/or” is merely a description of the association relationship of associated objects, indicating that three relationships may exist. Specifically, A and/or B can represent: A exists alone, A and B exist at the same time, and B exists alone.
- the character "/" in the present application generally indicates that the associated objects before and after are in an "or" relationship.
- FIG. 17 is a schematic block diagram of a video decoding device provided in an embodiment of the present application.
- the video decoding device 10 is applied to the above-mentioned video decoder.
- the video decoding device 10 includes:
- a determining unit 11 is used to decode a bitstream for an i-th view among N views, determine patch information of the i-th view, and generate a patch image of the i-th view based on the patch information, wherein the N views are views from N different viewpoints, N is a positive integer greater than 1, and i is a positive integer less than or equal to N;
- a dividing unit 12 configured to divide the patch image into M image blocks according to a preset block dividing method, where M is a positive integer;
- a decoding unit 13 is used to decode the code stream for a j-th image block among the M image blocks to obtain P color deviation fitting functions of the j-th image block, where j is a positive integer less than or equal to M, and P is a positive integer;
- the fitting unit 14 is configured to perform pixel fitting on the cropped pixels in the j-th image block using the P color deviation fitting functions to obtain a reconstructed block of the j-th image block.
- the fitting unit 14 is specifically used to determine a directed acyclic graph of the pruning hierarchy of the N views; based on the directed acyclic graph, determine P reconstructed images corresponding to the P parent nodes of the i-th view, where P is a positive integer; based on the P reconstructed images, use the P color deviation fitting functions to perform pixel fitting on the cropped pixels in the j-th image block to obtain a reconstructed block of the j-th image block.
- the fitting unit 14 is specifically used to determine the synthetic views of the P parent nodes based on the P reconstructed images, and determine the P synthetic view blocks of the j-th image block in the synthetic views of the P parent nodes; based on the P synthetic view blocks, use the P color deviation fitting functions to fit the pruned pixels in the j-th image block to obtain the P color deviation reconstructed blocks of the j-th image block; and perform weighted fusion on the P color deviation reconstructed blocks to obtain the reconstructed block of the j-th image block.
- the fitting unit 14 is specifically used to determine, for the kth trimmed pixel in the jth second image block, the P synthetic pixels corresponding to the kth trimmed pixel in the P synthetic view blocks, where k is a positive integer; based on the P synthetic pixels and the surrounding pixels of the P synthetic pixels, use the color deviation fitting function to perform pixel fitting on the kth trimmed pixel to obtain a color deviation reconstruction value of the kth trimmed pixel; based on the color deviation reconstruction value of the trimmed pixel in the jth image block, obtain a color deviation reconstruction block of the jth image block.
- the decoding unit 13 is also used to decode the code stream to obtain a first flag before dividing the patch image into M image blocks according to a preset block division method, and the first flag is used to indicate whether to turn on the color deviation fitting tool; if the first flag indicates to turn on the color deviation fitting tool, the patch image is divided into M image blocks according to the preset block division method.
- the decoding unit 13 is further configured to skip the step of dividing the patch image into M image blocks according to a preset block division method if the first flag indicates that the color cast fitting tool is not enabled.
- the fitting unit 14 is further used to determine the synthetic views of the P parent nodes based on the P reconstructed images, and determine the P synthetic view blocks of the j-th image block in the synthetic views of the P parent nodes; based on the P synthetic view blocks, obtain the reconstructed block of the j-th image block.
- the fitting unit 14 is specifically configured to perform weighted processing on the P synthetic view blocks to obtain a reconstructed block of the j-th image block.
- the decoding unit 13 is specifically configured to decode the bitstream to obtain P color deviation fitting functions of the j-th image block if the j-th image block includes pruned pixels.
- the decoding unit 13 is specifically configured to decode the bitstream to obtain P color deviation fitting functions of the jth image block if the number of pruned pixels included in the jth image block is greater than or equal to a preset value.
- the N views are views corresponding to the N different viewpoints at the same time.
- the N views are the first image of the kth GOP of each viewpoint among the N viewpoints, and k is a positive integer.
- the fitting unit 14 is further configured to determine the color deviation fitting function corresponding to the i-th view as the color deviation fitting function of other views in the GOP where the i-th view is located.
- the code stream includes a patch data unit
- the decoding unit 13 is specifically configured to decode the patch data unit to obtain P color deviation fitting functions of the j-th image block.
- the i-th view is a non-base view among the N views.
- the device embodiment and the method embodiment may correspond to each other, and similar descriptions may refer to the method embodiment. To avoid repetition, no further description is given here.
- the device 10 shown in FIG. 17 can execute the decoding method of the decoding end of the embodiment of the present application, and the aforementioned and other operations and/or functions of each unit in the device 10 are respectively for implementing the corresponding processes in each method such as the decoding method of the decoding end, and for the sake of brevity, no further description is given here.
- FIG18 is a schematic block diagram of a video encoding device provided in an embodiment of the present application, and the video encoding device is applied to the above-mentioned encoder.
- the video encoding device 20 may include:
- a determining unit 21 configured to determine, for an i-th view among N views, a first pruning mask map of the i-th view, wherein the first pruning mask map is a pruning mask map obtained by performing pixel pruning on the i-th view, wherein the N views are views from N different viewpoints, N is a positive integer greater than 1, and i is a positive integer less than or equal to N;
- a dividing unit 22 configured to divide the i-th view into M image blocks based on a preset block dividing method, where M is a positive integer;
- a fitting unit 23 configured to determine, for a j-th image block among the M image blocks, unpruned pixels in the j-th image block based on the first pruning mask image, and perform chromaticity fitting on the unpruned pixels in the j-th image block to obtain P color deviation fitting functions of the j-th image block, where j is a positive integer less than or equal to M, and P is a positive integer;
- a pruning unit 24 configured to prune unpruned pixels in the j-th image block based on the P color cast fitting functions to obtain patch information of the i-th view;
- the encoding unit 25 is used to encode the patch information and the color cast fitting function to obtain a code stream.
- the determination unit 21 is specifically used to generate a directed acyclic graph of the pruning hierarchy of the N views; based on the directed acyclic graph, determine P synthetic views of the P parent nodes of the i-th view; based on the P synthetic views, perform pixel pruning on the i-th view to obtain the first pruning mask map.
- the determination unit 21 is specifically used to prune the i-th view based on the difference between the luminance component of the i-th view and the luminance component of the P synthetic views to obtain a second pruning mask map of the i-th view; based on the difference between the chrominance component of the i-th view and the chrominance component of the P synthetic views, query the non-pruned pixels among the pruned pixels included in the second pruning mask map to obtain the first pruning mask map.
- the fitting unit 23, before performing chromaticity fitting on the untrimmed pixels in the j-th image block to obtain the color deviation fitting function of the j-th image block is also used to determine whether the j-th image block includes untrimmed pixels; if the j-th image block includes untrimmed pixels, chromaticity fitting is performed on the untrimmed pixels in the j-th image block to obtain the P color deviation fitting functions.
- the fitting unit 23 is specifically used to perform chromaticity fitting on the untrimmed pixels in the jth image block if the number of untrimmed pixels included in the jth image block is greater than or equal to a first preset value to obtain the color deviation fitting function.
- the fitting unit 23 is specifically used to determine the P corresponding image blocks of the j-th image block in the P synthetic views; based on the chromaticity values of the pixels included in the P corresponding image blocks and the chromaticity values of the pixels included in the j-th image block, perform chromaticity fitting on the untrimmed pixels in the j-th image block to obtain the color deviation fitting function.
- the fitting unit 23 is specifically used to perform chromaticity fitting on the unpruned pixels in the jth image block if the number of unpruned pixels included in the jth image block is greater than or equal to a first preset value to obtain the P color deviation fitting functions.
- the fitting unit 23 is specifically used to determine P synthetic view blocks of the j-th image block in the P synthetic views; based on the chromaticity values of the pixels included in the P synthetic view blocks and the chromaticity values of the pixels included in the j-th image block, perform chromaticity fitting on the unpruned pixels in the j-th image block to obtain the P color deviation fitting functions.
- the fitting unit 23 is specifically used to determine, for any one of the P synthetic view blocks, in the synthetic view block, a synthetic pixel point corresponding to an uncropped pixel point in the j-th image block; and determine the color deviation fitting function based on the chromaticity value of the synthetic pixel point and the chromaticity value of the uncropped pixel point in the j-th image block.
- the fitting unit 23 is specifically configured to perform weighted least squares fitting based on the chromaticity values of the synthesized pixel points and the chromaticity values of the uncropped pixel points in the j-th image block to obtain the color deviation fitting function.
- the trimming unit 24 trims the untrimmed pixels in the jth image block based on the P color deviation fitting functions to obtain the patch information of the i-th view, it is also used to perform pixel fitting on the k-th untrimmed pixel in the j-th image block based on the P color deviation fitting functions to determine the fitting error of the k-th untrimmed pixel, where k is a positive integer; determine the fitting errors of the P color deviation fitting functions based on the fitting errors of the untrimmed pixels in the j-th image block; if the fitting errors of the P color deviation fitting functions are less than or equal to a second preset value, trim the untrimmed pixels in the j-th image block based on the color deviation fitting function to obtain the patch information of the i-th view.
- the trimming unit 24 is specifically used to trim the untrimmed pixels in the j-th image block based on the P color deviation fitting functions to obtain a third cropping mask image of the j-th image block; and determine the patch information of the i-th view based on the third cropping mask image of the image block in the i-th view.
- the pruning unit 24 is specifically used to use the P color deviation fitting functions to perform chromaticity fitting on the kth unpruned pixel to obtain the fitted chromaticity value of the kth unpruned pixel; based on the fitted chromaticity value of the kth unpruned pixel and the chromaticity value of the kth unpruned pixel, determine the fitting error of the kth unpruned pixel.
- the trimming unit 24 is specifically used to use the color deviation fitting function to perform chromaticity fitting on the kth untrimmed pixel to obtain the fitted chromaticity value of the kth untrimmed pixel; based on the fitted chromaticity value of the kth untrimmed pixel and the chromaticity value of the kth untrimmed pixel, determine the fitting error of the kth untrimmed pixel.
- the pruning unit 24 is specifically configured to determine the sum of the fitting errors of the unpruned pixels in the j-th image block as the fitting errors of the P color deviation fitting functions.
- the pruning unit 24 is specifically configured to determine an average value of fitting errors of unpruned pixels in the j-th image block as the fitting errors of the P color deviation fitting functions.
- the encoding unit 25 is further used to determine a first flag before dividing the i-th view into M image blocks based on a preset block division method, where the first flag is used to indicate whether to turn on the color cast fitting tool; if the first flag indicates to turn on the color cast fitting tool, the i-th view is divided into M image blocks based on the preset block division method.
- the encoding unit 25 is further configured to write the first flag into the bit stream.
- the encoding unit 25 is further configured to skip the step of encoding the color deviation fitting function if the fitting errors of the P color deviation fitting functions are greater than the second preset value.
- the N views are views generated by the N different viewpoints at the same time.
- the N views are the first image of the kth GOP of each viewpoint among the N viewpoints, and k is a positive integer.
- the fitting unit 24 is further configured to determine the color deviation fitting function corresponding to the i-th view as the color deviation fitting function of other views in the GOP where the i-th view is located.
- the encoding unit 25 is specifically configured to write the color cast fitting function into the patch data unit.
- the i-th view is a non-base view among the N views.
- the device embodiment and the method embodiment may correspond to each other, and similar descriptions may refer to the method embodiment. To avoid repetition, it will not be repeated here.
- the device 20 shown in Figure 18 may correspond to the corresponding subject in the encoding method of the encoding end of the embodiment of the present application, and the aforementioned and other operations and/or functions of each unit in the device 20 are respectively for implementing the corresponding processes in each method such as the encoding method of the encoding end, and for the sake of brevity, it will not be repeated here.
- the functional unit can be implemented in hardware form, can be implemented by instructions in software form, and can also be implemented by a combination of hardware and software units.
- the steps of the method embodiment in the embodiment of the present application can be completed by the hardware integrated logic circuit and/or software form instructions in the processor, and the steps of the method disclosed in the embodiment of the present application can be directly embodied as a hardware decoding processor to perform, or a combination of hardware and software units in the decoding processor to perform.
- the software unit can be located in a mature storage medium in the field such as a random access memory, a flash memory, a read-only memory, a programmable read-only memory, an electrically erasable programmable memory, a register, etc.
- the storage medium is located in a memory, and the processor reads the information in the memory, and completes the steps in the above method embodiment in conjunction with its hardware.
- FIG. 19 is a schematic block diagram of an electronic device provided in an embodiment of the present application.
- the electronic device 30 may be a video encoder or a video decoder as described in the embodiment of the present application, and the electronic device 30 may include:
- the memory 33 and the processor 32, the memory 33 is used to store the computer program 34 and transmit the program code 34 to the processor 32.
- the processor 32 can call and run the computer program 34 from the memory 33 to implement the method in the embodiment of the present application.
- the processor 32 may be configured to execute the steps in the above method according to the instructions in the computer program 34 .
- the processor 32 may include but is not limited to:
- DSP digital signal processor
- ASIC application-specific integrated circuit
- FPGA field programmable gate array
- the memory 33 includes but is not limited to:
- Non-volatile memory can be read-only memory (ROM), programmable ROM (PROM), erasable programmable read-only memory (EPROM), electrically erasable programmable read-only memory (EEPROM) or flash memory.
- the volatile memory can be random access memory (RAM), which is used as an external cache.
- RAM random access memory
- SRAM static RAM
- DRAM dynamic RAM
- SDRAM synchronous DRAM
- DDR SDRAM double data rate synchronous dynamic random access memory
- ESDRAM enhanced synchronous dynamic random access memory
- SLDRAM synchronous link DRAM
- Direct Rambus RAM Direct Rambus RAM
- the computer program 34 may be divided into one or more units, which are stored in the memory 33 and executed by the processor 32 to complete the method provided by the present application.
- the one or more units may be a series of computer program instruction segments capable of completing specific functions, and the instruction segments are used to describe the execution process of the computer program 34 in the electronic device 30.
- the electronic device 30 may further include:
- the transceiver 33 may be connected to the processor 32 or the memory 33 .
- the processor 32 may control the transceiver 33 to communicate with other devices, specifically, to send information or data to other devices, or to receive information or data sent by other devices.
- the transceiver 33 may include a transmitter and a receiver.
- the transceiver 33 may further include an antenna, and the number of antennas may be one or more.
- bus system includes not only a data bus but also a power bus, a control bus and a status signal bus.
- FIG. 20 is a schematic block diagram of a video encoding and decoding system provided in an embodiment of the present application.
- the video encoding and decoding system 40 may include: a video encoder 41 and a video decoder 42 , wherein the video encoder 41 is used to execute the video encoding method involved in the embodiment of the present application, and the video decoder 42 is used to execute the video decoding method involved in the embodiment of the present application.
- the present application also provides a computer storage medium on which a computer program is stored, and when the computer program is executed by a computer, the computer can perform the method of the above method embodiment.
- the present application embodiment also provides a computer program product containing instructions, and when the instructions are executed by a computer, the computer can perform the method of the above method embodiment.
- the present application also provides a code stream, which is generated according to the above encoding method.
- the computer program product includes one or more computer instructions.
- the computer can be a general-purpose computer, a special-purpose computer, a computer network, or other programmable devices.
- the computer instructions can be stored in a computer-readable storage medium, or transmitted from one computer-readable storage medium to another computer-readable storage medium.
- the computer instructions can be transmitted from a website site, computer, server or data center by wired (e.g., coaxial cable, optical fiber, digital subscriber line (digital subscriber line, DSL)) or wireless (e.g., infrared, wireless, microwave, etc.) mode to another website site, computer, server or data center.
- the computer-readable storage medium can be any available medium that a computer can access or a data storage device such as a server or data center that includes one or more available media integrations.
- the available medium can be a magnetic medium (e.g., a floppy disk, a hard disk, a magnetic tape), an optical medium (e.g., a digital video disc (DVD)), or a semiconductor medium (e.g., a solid state disk (SSD)), etc.
- a magnetic medium e.g., a floppy disk, a hard disk, a magnetic tape
- an optical medium e.g., a digital video disc (DVD)
- DVD digital video disc
- SSD solid state disk
- the disclosed systems, devices and methods can be implemented in other ways.
- the device embodiments described above are only schematic.
- the division of the unit is only a logical function division.
- Another point is that the mutual coupling or direct coupling or communication connection shown or discussed can be through some interfaces, indirect coupling or communication connection of devices or units, which can be electrical, mechanical or other forms.
- each functional unit in each embodiment of the present application may be integrated into a processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit.
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Compression Or Coding Systems Of Tv Signals (AREA)
Abstract
Description
本申请涉及视频编解码技术领域,尤其涉及一种视频编解码方法、装置、设备、系统、及存储介质。The present application relates to the field of video coding and decoding technology, and in particular to a video coding and decoding method, device, equipment, system, and storage medium.
在三维应用场景中,例如虚拟现实(Virtual Reality,VR)、增强现实(Augmented Reality,AR)、混合现实(Mix Reality,MR)等应用场景中,通过多视点视频给用户带来沉浸式体验。In three-dimensional application scenarios, such as virtual reality (VR), augmented reality (AR), and mixed reality (MR), multi-viewpoint videos are used to provide users with an immersive experience.
目前的多视点视频的编码过程中,为了减少编码数据量,则通过像素修剪的方法,对多视点视频之间的冗余信息进行修剪,以降低编码数据量。但是,目前的像素修剪方法,存在修剪不彻底的问题,使得编码代价依然较高。In the current multi-view video encoding process, in order to reduce the amount of encoded data, the redundant information between the multi-view videos is pruned by pixel pruning to reduce the amount of encoded data. However, the current pixel pruning method has the problem of incomplete pruning, which makes the encoding cost still high.
发明内容Summary of the invention
本申请实施例提供了一种视频编解码方法、装置、设备、系统、及存储介质,可以实现对像素点的彻底裁剪,进而降低编码代价,提升多视点视频的编解码效率。The embodiments of the present application provide a video encoding and decoding method, apparatus, device, system, and storage medium, which can achieve complete cropping of pixels, thereby reducing the encoding cost and improving the encoding and decoding efficiency of multi-viewpoint videos.
第一方面,本申请提供了一种视频解码方法,应用于解码器,包括:In a first aspect, the present application provides a video decoding method, applied to a decoder, comprising:
对于N个视图中的第i个视图,解码码流,确定所述第i个视图的补丁信息,并基于所述补丁信息,生成所述第i个视图的补丁图像,所述N个视图为N个不同视点的视图,所述N为大于1的正整数,所述i为小于或等于N的正整数;For an i-th view among N views, decode a bitstream, determine patch information of the i-th view, and generate a patch image of the i-th view based on the patch information, wherein the N views are views from N different viewpoints, N is a positive integer greater than 1, and i is a positive integer less than or equal to N;
按照预设的块划分方式,将所述补丁图像划分为M个图像块,所述M为正整数;According to a preset block division method, the patch image is divided into M image blocks, where M is a positive integer;
对于所述M个图像块中的第j个图像块,解码所述码流,得到所述第j个图像块的P个色偏拟合函数,所述j为小于或等于M的正整数,所述P为正整数;For a j-th image block among the M image blocks, decode the bitstream to obtain P color deviation fitting functions of the j-th image block, where j is a positive integer less than or equal to M, and P is a positive integer;
使用所述P个色偏拟合函数,对所述第j个图像块中被裁剪的像素点进行像素拟合,得到所述第j个图像块的重建块。Using the P color deviation fitting functions, pixel fitting is performed on the cropped pixels in the j-th image block to obtain a reconstructed block of the j-th image block.
第二方面,本申请实施例提供一种视频编码方法,应用于编码器,包括:In a second aspect, an embodiment of the present application provides a video encoding method, applied to an encoder, comprising:
对于N个视图中的第i个视图,确定所述第i个视图的第一修剪掩码图,所述第一修剪掩码图是对所述第i个视图进行像素修剪得到的修剪掩码图,所述N个视图为N个不同视点的视图,所述N为大于1的正整数,所述i为小于或等于N的正整数;For an i-th view among N views, determine a first pruning mask map of the i-th view, where the first pruning mask map is a pruning mask map obtained by performing pixel pruning on the i-th view, where the N views are views from N different viewpoints, N is a positive integer greater than 1, and i is a positive integer less than or equal to N;
基于预设的块划分方式,将所述第i个视图划分为M个图像块,所述M为正整数;Based on a preset block division method, the i-th view is divided into M image blocks, where M is a positive integer;
对于所述M个图像块中的第j个图像块,基于所述第一修剪掩码图确定所述第j个图像块中未被修剪的像素点,并对所述第j个图像块中未被修剪的像素点进行色度拟合,得到所述第j个图像块的P个色偏拟合函数,所述j为小于或等于M的正整数,所述P为正整数;For a j-th image block among the M image blocks, determine unpruned pixels in the j-th image block based on the first pruning mask image, and perform chromaticity fitting on the unpruned pixels in the j-th image block to obtain P color deviation fitting functions of the j-th image block, where j is a positive integer less than or equal to M, and P is a positive integer;
基于所述P个色偏拟合函数对所述第j个图像块中未被修剪的像素点进行修剪,得到所述第i个视图的补丁信息;Pruning unpruned pixels in the j-th image block based on the P color cast fitting functions to obtain patch information of the i-th view;
对所述补丁信息和所述色偏拟合函数进行编码,得到码流。The patch information and the color cast fitting function are encoded to obtain a bit stream.
第三方面,本申请提供了一种视频解码装置,用于执行上述第一方面或其各实现方式中的方法。具体地,该装置包括用于执行上述第一方面或其各实现方式中的方法的功能单元。In a third aspect, the present application provides a video decoding device, which is used to execute the method in the first aspect or its respective implementations. Specifically, the device includes a functional unit for executing the method in the first aspect or its respective implementations.
第四方面,本申请提供了一种视频编码装置,用于执行上述第二方面或其各实现方式中的方法。具体地,该装置包括用于执行上述第二方面或其各实现方式中的方法的功能单元。In a fourth aspect, the present application provides a video encoding device, which is used to execute the method in the second aspect or its respective implementations. Specifically, the device includes a functional unit for executing the method in the second aspect or its respective implementations.
第五方面,本申请提供了一种视频解码器,包括处理器和存储器。该存储器用于存储计算机程序,该处理器用于调用并运行该存储器中存储的计算机程序,以执行上述第一方面或其各实现方式中的方法。In a fifth aspect, the present application provides a video decoder, comprising a processor and a memory, wherein the memory is used to store a computer program, and the processor is used to call and run the computer program stored in the memory to execute the method in the first aspect or its implementations.
第六方面,本申请提供了一种视频编码器,包括处理器和存储器。该存储器用于存储计算机程序,该处理器用于调用并运行该存储器中存储的计算机程序,以执行上述第二方面或其各实现方式中的方法。In a sixth aspect, the present application provides a video encoder, comprising a processor and a memory, wherein the memory is used to store a computer program, and the processor is used to call and run the computer program stored in the memory to execute the method in the second aspect or its implementations.
第七方面,本申请提供了一种视频编解码系统,包括视频编码器和视频解码器。视频解码器用于执行上述第一方面或其各实现方式中的方法,视频编码器用于执行上述第二方面或其各实现方式中的方法。In a seventh aspect, the present application provides a video coding and decoding system, including a video encoder and a video decoder. The video decoder is used to execute the method in the first aspect or its respective implementations, and the video encoder is used to execute the method in the second aspect or its respective implementations.
第八方面,本申请提供了一种芯片,用于实现上述第一方面至第二方面中的任一方面或其各实现方式中的方法。具体地,该芯片包括:处理器,用于从存储器中调用并运行计算机程序,使得安装有该芯片的设备执行如上述第一方面至第二方面中的任一方面或其各实现方式中的方法。In an eighth aspect, the present application provides a chip for implementing the method in any one of the first to second aspects or their respective implementations. Specifically, the chip includes: a processor for calling and running a computer program from a memory, so that a device equipped with the chip executes the method in any one of the first to second aspects or their respective implementations.
第九方面,本申请提供了一种计算机可读存储介质,用于存储计算机程序,该计算机程序使得计算机执行上述第一方面至第二方面中的任一方面或其各实现方式中的方法。In a ninth aspect, the present application provides a computer-readable storage medium for storing a computer program, wherein the computer program enables a computer to execute the method of any one of the first to second aspects above or any of their implementations.
第十方面,本申请提供了一种计算机程序产品,包括计算机程序指令,该计算机程序指令使得计算机执行上述第一方面至第二方面中的任一方面或其各实现方式中的方法。In a tenth aspect, the present application provides a computer program product, comprising computer program instructions, which enable a computer to execute the method in any one of the first to second aspects above or in each of their implementations.
第十一方面,本申请提供了一种计算机程序,当其在计算机上运行时,使得计算机执行上述第一方面至第二方面中的任一方面或其各实现方式中的方法。In an eleventh aspect, the present application provides a computer program, which, when executed on a computer, enables the computer to execute the method in any one of the first to second aspects or in each of their implementations.
基于以上技术方案,在对多视点视频进行编码时,首先基于像素修剪方式,确定N个视图中的第i个视图的第一修剪掩码图。接着,基于预设的块划分方式,将该第i个视图划分为M个图像块,对于这M个图像块中的第j个图像块,基于第一修剪掩码图确定第j个图像块中未被修剪的像素点,并对第j个图像块中未被修剪的像素点进行色度拟合,得到第j个图像块的色偏拟合函数。然后,基于该色偏拟合函数对第j个图像块中未被修剪的像素点进行修剪,得到第i个视图的补丁信息。最后,对上述补丁信息和色偏拟合函数进行编码,得到码流。也就是说,本申请实施例基于色偏拟合函数,对像素修剪得到的第一修剪掩码图进行再次修剪,以进一步减少未被裁剪的像素点的个数,降低编码端需要编码的数据量,进而降低编码端的编码代价,提升多视点视频的编解码效率。 Based on the above technical solution, when encoding a multi-view video, firstly, based on the pixel pruning method, a first pruning mask map of the i-th view among N views is determined. Then, based on a preset block division method, the i-th view is divided into M image blocks. For the j-th image block among the M image blocks, the unpruned pixels in the j-th image block are determined based on the first pruning mask map, and the unpruned pixels in the j-th image block are chromatically fitted to obtain a color deviation fitting function of the j-th image block. Then, the unpruned pixels in the j-th image block are pruned based on the color deviation fitting function to obtain patch information of the i-th view. Finally, the above patch information and the color deviation fitting function are encoded to obtain a bitstream. That is, in the embodiment of the present application, the first pruning mask map obtained by pixel pruning is pruned again based on the color deviation fitting function to further reduce the number of unpruned pixels, reduce the amount of data that needs to be encoded at the encoding end, and thereby reduce the encoding cost of the encoding end, and improve the encoding and decoding efficiency of the multi-view video.
图1为本申请实施例涉及的一种视频编解码系统的示意性框图;FIG1 is a schematic block diagram of a video encoding and decoding system according to an embodiment of the present application;
图2A是本申请实施例涉及的视频编码器的示意性框图;FIG2A is a schematic block diagram of a video encoder according to an embodiment of the present application;
图2B是本申请实施例涉及的视频解码器的示意性框图;FIG2B is a schematic block diagram of a video decoder according to an embodiment of the present application;
图3A示意性示出了三自由度的示意图;FIG3A schematically shows a schematic diagram of three degrees of freedom;
图3B示意性示出了三自由度+的示意图;FIG3B schematically shows a schematic diagram of three degrees of freedom +;
图3C示意性示出了六自由度的示意图;FIG3C schematically shows a schematic diagram of six degrees of freedom;
图4A和图4B为MIV技术的原理示意图;4A and 4B are schematic diagrams showing the principle of MIV technology;
图5A为多视点视频的编码过程示意图;FIG5A is a schematic diagram of a multi-view video encoding process;
图5B为多视点视频的解码过程示意图;FIG5B is a schematic diagram of a multi-view video decoding process;
图6为TMIV14的编码过程示意图;FIG6 is a schematic diagram of the encoding process of TMIV14;
图7A为视图列表示意图;FIG7A is a schematic diagram of a view list;
图7B为修剪视图的聚合示意图;FIG7B is a schematic diagram of an aggregate of trimmed views;
图7C为补丁打包示意图;FIG7C is a schematic diagram of patch packaging;
图8为本申请一实施例提供的视频解码方法流程示意图;FIG8 is a schematic diagram of a video decoding method flow chart provided by an embodiment of the present application;
图9为一种补丁图像示意图;FIG9 is a schematic diagram of a patch image;
图10对补丁图像进行块划分的示意图;FIG10 is a schematic diagram of dividing a patch image into blocks;
图11为一种有向无环图的示意图;FIG11 is a schematic diagram of a directed acyclic graph;
图12为一种解码示意图;FIG12 is a schematic diagram of decoding;
图13为本申请实一施例提供的视频编码方法流程示意图;FIG13 is a schematic diagram of a video encoding method flow chart provided by an embodiment of the present application;
图14A为不同视点的视图示意图;FIG14A is a schematic diagram of a view from different viewpoints;
图14B为像素投影示意图;FIG14B is a schematic diagram of pixel projection;
图14C为像素修剪示意图;FIG14C is a schematic diagram of pixel pruning;
图15和图16为本申请实施例涉及的一种编码过程示意图;15 and 16 are schematic diagrams of an encoding process involved in an embodiment of the present application;
图17是本申请一实施例提供的视频解码装置的示意性框图;FIG17 is a schematic block diagram of a video decoding device provided by an embodiment of the present application;
图18是本申请一实施例提供的视频编码装置的示意性框图;FIG18 is a schematic block diagram of a video encoding device provided by an embodiment of the present application;
图19是本申请实施例提供的电子设备的示意性框图;FIG19 is a schematic block diagram of an electronic device provided in an embodiment of the present application;
图20是本申请实施例提供的视频编解码系统的示意性框图。FIG. 20 is a schematic block diagram of a video encoding and decoding system provided in an embodiment of the present application.
本申请可应用于图像编解码领域、视频编解码领域、硬件视频编解码领域、专用电路视频编解码领域、实时视频编解码领域等。例如,本申请的方案可结合至音视频编码标准(audio video coding standard,简称AVS),例如,H.264/音视频编码(audio video coding,简称AVC)标准,H.265/高效视频编码(high efficiency video coding,简称HEVC)标准以及H.266/多功能视频编码(versatile video coding,简称VVC)标准。或者,本申请的方案可结合至其它专属或行业标准而操作,所述标准包含ITU-TH.261、ISO/IECMPEG-1Visual、ITU-TH.262或ISO/IECMPEG-2Visual、ITU-TH.263、ISO/IECMPEG-4Visual,ITU-TH.264(还称为ISO/IECMPEG-4AVC),包含可分级视频编解码(SVC)及多视图视频编解码(MVC)扩展。应理解,本申请的技术不限于任何特定编解码标准或技术。The present application can be applied to the field of image coding and decoding, the field of video coding and decoding, the field of hardware video coding and decoding, the field of dedicated circuit video coding and decoding, the field of real-time video coding and decoding, etc. For example, the solution of the present application can be combined with an audio and video coding standard (AVS), such as the H.264/audio and video coding (AVC) standard, the H.265/high efficiency video coding (HEVC) standard, and the H.266/versatile video coding (VVC) standard. Alternatively, the solution of the present application may be combined with other proprietary or industry standards and operate, the standards include ITU-TH.261, ISO/IEC MPEG-1 Visual, ITU-TH.262 or ISO/IEC MPEG-2 Visual, ITU-TH.263, ISO/IEC MPEG-4 Visual, ITU-TH.264 (also known as ISO/IEC MPEG-4 AVC), including scalable video coding (SVC) and multi-view video coding (MVC) extensions. It should be understood that the technology of the present application is not limited to any particular codec standard or technology.
高自由度沉浸式编码系统根据任务线可大致分为以下几个环节:数据采集、数据的组织与表达、数据编码压缩、数据解码重建、数据合成渲染,最终将目标数据呈现给用户。The high-degree-of-freedom immersive coding system can be roughly divided into the following links according to the task line: data collection, data organization and expression, data encoding and compression, data decoding and reconstruction, data synthesis and rendering, and finally presenting the target data to the user.
本申请实施例涉及的编码主要为视频编解码,为了便于理解,首先结合图1对本申请实施例涉及的视频编解码系统进行介绍。The encoding involved in the embodiment of the present application is mainly video encoding and decoding. For ease of understanding, the video encoding and decoding system involved in the embodiment of the present application is first introduced in conjunction with Figure 1.
图1为本申请实施例涉及的一种视频编解码系统的示意性框图。需要说明的是,图1只是一种示例,本申请实施例的视频编解码系统包括但不限于图1所示。如图1所示,该视频编解码系统100包含编码设备110和解码设备120。其中编码设备用于对视频数据进行编码(可以理解成压缩)产生码流,并将码流传输给解码设备。解码设备对编码设备编码产生的码流进行解码,得到解码后的视频数据。FIG1 is a schematic block diagram of a video encoding and decoding system involved in an embodiment of the present application. It should be noted that FIG1 is only an example, and the video encoding and decoding system of the embodiment of the present application includes but is not limited to that shown in FIG1. As shown in FIG1, the video encoding and decoding system 100 includes an encoding device 110 and a decoding device 120. The encoding device is used to encode (which can be understood as compression) the video data to generate a code stream, and transmit the code stream to the decoding device. The decoding device decodes the code stream generated by the encoding device to obtain decoded video data.
本申请实施例的编码设备110可以理解为具有视频编码功能的设备,解码设备120可以理解为具有视频解码功能的设备,即本申请实施例对编码设备110和解码设备120包括更广泛的装置,例如包含智能手机、台式计算机、移动计算装置、笔记本(例如,膝上型)计算机、平板计算机、机顶盒、电视、相机、显示装置、数字媒体播放器、视频游戏控制台、车载计算机等。The encoding device 110 of the embodiment of the present application can be understood as a device with a video encoding function, and the decoding device 120 can be understood as a device with a video decoding function, that is, the embodiment of the present application includes a wider range of devices for the encoding device 110 and the decoding device 120, such as smartphones, desktop computers, mobile computing devices, notebook (e.g., laptop) computers, tablet computers, set-top boxes, televisions, cameras, display devices, digital media players, video game consoles, vehicle-mounted computers, etc.
在一些实施例中,编码设备110可以经由信道130将编码后的视频数据(如码流)传输给解码设备120。信道130可以包括能够将编码后的视频数据从编码设备110传输到解码设备120的一个或多个媒体和/或装置。In some embodiments, the encoding device 110 may transmit the encoded video data (eg, a code stream) to the decoding device 120 via the channel 130. The channel 130 may include one or more media and/or devices capable of transmitting the encoded video data from the encoding device 110 to the decoding device 120.
在一个实例中,信道130包括使编码设备110能够实时地将编码后的视频数据直接发射到解码设备120的一个或多个通信媒体。在此实例中,编码设备110可根据通信标准来调制编码后的视频数据,且将调制后的视频数据发射到解码设备120。其中通信媒体包含无线通信媒体,例如射频频谱,可选的,通信媒体还可以包含有线通信媒体,例如一根或多根物理传输线。In one example, the channel 130 includes one or more communication media that enable the encoding device 110 to transmit the encoded video data directly to the decoding device 120 in real time. In this example, the encoding device 110 can modulate the encoded video data according to the communication standard and transmit the modulated video data to the decoding device 120. The communication medium includes a wireless communication medium, such as a radio frequency spectrum, and optionally, the communication medium may also include a wired communication medium, such as one or more physical transmission lines.
在另一实例中,信道130包括存储介质,该存储介质可以存储编码设备110编码后的视频数据。存储介质包含多种本地存取式数据存储介质,例如光盘、DVD、快闪存储器等。在该实例中,解码设备120可从该存储介质中获取编码后的视频数据。In another example, the channel 130 includes a storage medium, which can store the video data encoded by the encoding device 110. The storage medium includes a variety of locally accessible data storage media, such as optical disks, DVDs, flash memories, etc. In this example, the decoding device 120 can obtain the encoded video data from the storage medium.
在另一实例中,信道130可包含存储服务器,该存储服务器可以存储编码设备110编码后的视频数据。在此实例 中,解码设备120可以从该存储服务器中下载存储的编码后的视频数据。可选的,该存储服务器可以存储编码后的视频数据且可以将该编码后的视频数据发射到解码设备120,例如web服务器(例如,用于网站)、文件传送协议(FTP)服务器等。In another example, the channel 130 may include a storage server that can store the video data encoded by the encoding device 110. In the example, the decoding device 120 can download the stored encoded video data from the storage server. Optionally, the storage server can store the encoded video data and transmit the encoded video data to the decoding device 120, such as a web server (e.g., for a website), a file transfer protocol (FTP) server, etc.
一些实施例中,编码设备110包含视频编码器112及输出接口113。其中,输出接口113可以包含调制器/解调器(调制解调器)和/或发射器。In some embodiments, the encoding device 110 includes a video encoder 112 and an output interface 113. The output interface 113 may include a modulator/demodulator (modem) and/or a transmitter.
在一些实施例中,编码设备110除了包括视频编码器112和输入接口113外,还可以包括视频源111。In some embodiments, the encoding device 110 may further include a video source 111 in addition to the video encoder 112 and the input interface 113 .
视频源111可包含视频采集装置(例如,视频相机)、视频存档、视频输入接口、计算机图形系统中的至少一个,其中,视频输入接口用于从视频内容提供者处接收视频数据,计算机图形系统用于产生视频数据。The video source 111 may include at least one of a video acquisition device (eg, a video camera), a video archive, a video input interface, and a computer graphics system, wherein the video input interface is used to receive video data from a video content provider, and the computer graphics system is used to generate video data.
视频编码器112对来自视频源111的视频数据进行编码,产生码流。视频数据可包括一个或多个图像(picture)或图像序列(sequence of pictures)。码流以比特流的形式包含了图像或图像序列的编码信息。编码信息可以包含编码图像数据及相关联数据。相关联数据可包含序列参数集(sequence parameter set,简称SPS)、图像参数集(picture parameter set,简称PPS)及其它语法结构。SPS可含有应用于一个或多个序列的参数。PPS可含有应用于一个或多个图像的参数。语法结构是指码流中以指定次序排列的零个或多个语法元素的集合。The video encoder 112 encodes the video data from the video source 111 to generate a bitstream. The video data may include one or more pictures or a sequence of pictures. The bitstream contains the encoding information of the picture or the sequence of pictures in the form of a bitstream. The encoding information may include the encoded picture data and associated data. The associated data may include a sequence parameter set (SPS for short), a picture parameter set (PPS for short) and other syntax structures. The SPS may contain parameters applied to one or more sequences. The PPS may contain parameters applied to one or more pictures. The syntax structure refers to a set of zero or more syntax elements arranged in a specified order in the bitstream.
视频编码器112经由输出接口113将编码后的视频数据直接传输到解码设备120。编码后的视频数据还可存储于存储介质或存储服务器上,以供解码设备120后续读取。The video encoder 112 transmits the encoded video data directly to the decoding device 120 via the output interface 113. The encoded video data may also be stored in a storage medium or a storage server for subsequent reading by the decoding device 120.
在一些实施例中,解码设备120包含输入接口121和视频解码器122。In some embodiments, the decoding device 120 includes an input interface 121 and a video decoder 122 .
在一些实施例中,解码设备120除包括输入接口121和视频解码器122外,还可以包括显示装置123。In some embodiments, the decoding device 120 may include a display device 123 in addition to the input interface 121 and the video decoder 122 .
其中,输入接口121包含接收器及/或调制解调器。输入接口121可通过信道130接收编码后的视频数据。The input interface 121 includes a receiver and/or a modem. The input interface 121 can receive the encoded video data through the channel 130 .
视频解码器122用于对编码后的视频数据进行解码,得到解码后的视频数据,并将解码后的视频数据传输至显示装置123。The video decoder 122 is used to decode the encoded video data to obtain decoded video data, and transmit the decoded video data to the display device 123 .
显示装置123显示解码后的视频数据。显示装置123可与解码设备120整合或在解码设备120外部。显示装置123可包括多种显示装置,例如液晶显示器(LCD)、等离子体显示器、有机发光二极管(OLED)显示器或其它类型的显示装置。The display device 123 displays the decoded video data. The display device 123 may be integrated with the decoding device 120 or external to the decoding device 120. The display device 123 may include a variety of display devices, such as a liquid crystal display (LCD), a plasma display, an organic light emitting diode (OLED) display, or other types of display devices.
此外,图1仅为实例,本申请实施例的技术方案不限于图1,例如本申请的技术还可以应用于单侧的视频编码或单侧的视频解码。In addition, FIG1 is only an example, and the technical solution of the embodiment of the present application is not limited to FIG1 . For example, the technology of the present application can also be applied to unilateral video encoding or unilateral video decoding.
下面对本申请实施例涉及的视频编码框架进行介绍。The following is an introduction to the video encoding framework involved in the embodiments of the present application.
图2A是本申请实施例涉及的视频编码器的示意性框图。应理解,该视频编码器200可用于对图像进行有损压缩(lossy compression),也可用于对图像进行无损压缩(lossless compression)。该无损压缩可以是视觉无损压缩(visually lossless compression),也可以是数学无损压缩(mathematically lossless compression)。FIG2A is a schematic block diagram of a video encoder according to an embodiment of the present application. It should be understood that the video encoder 200 can be used for lossy compression of an image, or lossless compression of an image. The lossless compression can be visually lossless compression or mathematically lossless compression.
该视频编码器200可应用于亮度色度(YCbCr,YUV)格式的图像数据上。例如,YUV比例可以为4:2:0、4:2:2或者4:4:4,Y表示明亮度(Luma),Cb(U)表示蓝色色度,Cr(V)表示红色色度,U和V表示为色度(Chroma)用于描述色彩及饱和度。例如,在颜色格式上,4:2:0表示每4个像素有4个亮度分量,2个色度分量(YYYYCbCr),4:2:2表示每4个像素有4个亮度分量,4个色度分量(YYYYCbCrCbCr),4:4:4表示全像素显示(YYYYCbCrCbCrCbCrCbCr)。The video encoder 200 can be applied to image data in luminance and chrominance (YCbCr, YUV) format. For example, the YUV ratio can be 4:2:0, 4:2:2 or 4:4:4, Y represents brightness (Luma), Cb (U) represents blue chrominance, Cr (V) represents red chrominance, and U and V represent chrominance (Chroma) for describing color and saturation. For example, in color format, 4:2:0 means that every 4 pixels have 4 luminance components and 2 chrominance components (YYYYCbCr), 4:2:2 means that every 4 pixels have 4 luminance components and 4 chrominance components (YYYYCbCrCbCr), and 4:4:4 means full pixel display (YYYYCbCrCbCrCbCrCbCr).
例如,该视频编码器200读取视频数据,针对视频数据中的每帧图像,将一帧图像划分成若干个编码树单元(coding tree unit,CTU),在一些例子中,CTB可被称作“树型块”、“最大编码单元”(Largest Coding unit,简称LCU)或“编码树型块”(coding tree block,简称CTB)。每一个CTU可以与图像内的具有相等大小的像素块相关联。每一像素可对应一个亮度(luminance或luma)采样及两个色度(chrominance或chroma)采样。因此,每一个CTU可与一个亮度采样块及两个色度采样块相关联。一个CTU大小例如为128×128、64×64、32×32等。一个CTU又可以继续被划分成若干个编码单元(Coding Unit,CU)进行编码,CU可以为矩形块也可以为方形块。CU可以进一步划分为预测单元(prediction Unit,简称PU)和变换单元(transform unit,简称TU),进而使得编码、预测、变换分离,处理的时候更灵活。在一种示例中,CTU以四叉树方式划分为CU,CU以四叉树方式划分为TU、PU。For example, the video encoder 200 reads video data, and for each frame of the video data, divides the frame into a number of coding tree units (CTUs). In some examples, CTB may be referred to as a "tree block", "largest coding unit" (LCU) or "coding tree block" (CTB). Each CTU may be associated with a pixel block of equal size within the image. Each pixel may correspond to a luminance (luminance or luma) sample and two chrominance (chrominance or chroma) samples. Therefore, each CTU may be associated with a luminance sample block and two chrominance sample blocks. The size of a CTU is, for example, 128×128, 64×64, 32×32, etc. A CTU may be further divided into a number of coding units (CUs) for encoding, and a CU may be a rectangular block or a square block. CU can be further divided into prediction unit (PU) and transform unit (TU), which makes encoding, prediction and transform separated and more flexible in processing. In one example, CTU is divided into CU in quadtree mode, and CU is divided into TU and PU in quadtree mode.
视频编码器及视频解码器可支持各种PU大小。假定特定CU的大小为2N×2N,视频编码器及视频解码器可支持2N×2N或N×N的PU大小以用于帧内预测,且支持2N×2N、2N×N、N×2N、N×N或类似大小的对称PU以用于帧间预测。视频编码器及视频解码器还可支持2N×nU、2N×nD、nL×2N及nR×2N的不对称PU以用于帧间预测。The video encoder and video decoder may support various PU sizes. Assuming that the size of a particular CU is 2N×2N, the video encoder and video decoder may support PU sizes of 2N×2N or N×N for intra-frame prediction, and support symmetric PUs of 2N×2N, 2N×N, N×2N, N×N or similar sizes for inter-frame prediction. The video encoder and video decoder may also support asymmetric PUs of 2N×nU, 2N×nD, nL×2N, and nR×2N for inter-frame prediction.
在一些实施例中,如图2A所示,该视频编码器200可包括:预测单元210、残差单元220、变换/量化单元230、反变换/量化单元240、重建单元250、环路滤波单元260、解码图像缓存270和熵编码单元280。需要说明的是,视频编码器200可包含更多、更少或不同的功能组件。In some embodiments, as shown in FIG2A , the video encoder 200 may include: a prediction unit 210, a residual unit 220, a transform/quantization unit 230, an inverse transform/quantization unit 240, a reconstruction unit 250, a loop filter unit 260, a decoded image buffer 270, and an entropy coding unit 280. It should be noted that the video encoder 200 may include more, fewer, or different functional components.
可选的,在本申请中,当前块(current block)可以称为当前编码单元(CU)或当前预测单元(PU)等。预测块也可称为预测图像块或图像预测块,重建图像块也可称为重建块或图像重建图像块。Optionally, in the present application, the current block may be referred to as a current coding unit (CU) or a current prediction unit (PU), etc. A prediction block may also be referred to as a prediction image block or an image prediction block, and a reconstructed image block may also be referred to as a reconstructed block or an image reconstructed image block.
在一些实施例中,预测单元210包括帧间预测单元211和帧内估计单元212。由于视频的一个帧中的相邻像素之间存在很强的相关性,在视频编解码技术中使用帧内预测的方法消除相邻像素之间的空间冗余。由于视频中的相邻帧之间存在着很强的相似性,在视频编解码技术中使用帧间预测方法消除相邻帧之间的时间冗余,从而提高编码效率。In some embodiments, the prediction unit 210 includes an inter-frame prediction unit 211 and an intra-frame estimation unit 212. Since there is a strong correlation between adjacent pixels in a frame of a video, an intra-frame prediction method is used in the video coding and decoding technology to eliminate spatial redundancy between adjacent pixels. Since there is a strong similarity between adjacent frames in a video, an inter-frame prediction method is used in the video coding and decoding technology to eliminate temporal redundancy between adjacent frames, thereby improving coding efficiency.
帧间预测单元211可用于帧间预测,帧间预测可以包括运动估计(motion estimation)和运动补偿(motion compensation),可以参考不同帧的图像信息,帧间预测使用运动信息从参考帧中找到参考块,根据参考块生成预测块,用于消除时间冗余;帧间预测所使用的帧可以为P帧和/或B帧,P帧指的是向前预测帧,B帧指的是双向预测帧。帧间预测使用运动信息从参考帧中找到参考块,根据参考块生成预测块。运动信息包括参考帧所在的参考帧列表,参考帧索引,以及运动矢量。运动矢量可以是整像素的或者是分像素的,如果运动矢量是分像素的,那么需要在参考帧中使用插值滤波做出所需的分像素的块,这里把根据运动矢量找到的参考帧中的整像素或者分像素的块叫参考块。 有的技术会直接把参考块作为预测块,有的技术会在参考块的基础上再处理生成预测块。在参考块的基础上再处理生成预测块也可以理解为把参考块作为预测块然后再在预测块的基础上处理生成新的预测块。The inter-frame prediction unit 211 can be used for inter-frame prediction. Inter-frame prediction may include motion estimation and motion compensation. It may refer to image information of different frames. Inter-frame prediction uses motion information to find reference blocks from reference frames, and generates prediction blocks based on the reference blocks to eliminate temporal redundancy. The frames used for inter-frame prediction may be P frames and/or B frames. P frames refer to forward prediction frames, and B frames refer to bidirectional prediction frames. Inter-frame prediction uses motion information to find reference blocks from reference frames, and generates prediction blocks based on the reference blocks. Motion information includes a reference frame list where the reference frame is located, a reference frame index, and a motion vector. The motion vector may be an integer pixel or a sub-pixel. If the motion vector is a sub-pixel, an interpolation filter is required in the reference frame to make the required sub-pixel block. Here, the integer pixel or sub-pixel block in the reference frame found according to the motion vector is called a reference block. Some technologies will directly use the reference block as the prediction block, while some technologies will generate a prediction block based on the reference block. Generating a prediction block based on the reference block can also be understood as using the reference block as the prediction block and then generating a new prediction block based on the prediction block.
帧内估计单元212只参考同一帧图像的信息,预测当前码图像块内的像素信息,用于消除空间冗余。帧内预测所使用的帧可以为I帧。The intra-frame estimation unit 212 only refers to the information of the same frame image to predict the pixel information in the current code image block to eliminate spatial redundancy. The frame used for intra-frame prediction can be an I frame.
帧内预测有多种预测模式,以国际数字视频编码标准H系列为例,H.264/AVC标准有8种角度预测模式和1种非角度预测模式,H.265/HEVC扩展到33种角度预测模式和2种非角度预测模式。HEVC使用的帧内预测模式有平面模式(Planar)、DC和33种角度模式,共35种预测模式。VVC使用的帧内模式有Planar、DC和65种角度模式,共67种预测模式。There are multiple prediction modes for intra-frame prediction. Taking the H series of international digital video coding standards as an example, the H.264/AVC standard has 8 angle prediction modes and 1 non-angle prediction mode, and H.265/HEVC is extended to 33 angle prediction modes and 2 non-angle prediction modes. The intra-frame prediction modes used by HEVC are Planar, DC, and 33 angle modes, for a total of 35 prediction modes. The intra-frame modes used by VVC are Planar, DC, and 65 angle modes, for a total of 67 prediction modes.
需要说明的是,随着角度模式的增加,帧内预测将会更加精确,也更加符合对高清以及超高清数字视频发展的需求。It should be noted that with the increase of angle modes, intra-frame prediction will be more accurate and more in line with the needs of the development of high-definition and ultra-high-definition digital videos.
残差单元220可基于CU的像素块及CU的PU的预测块来产生CU的残差块。举例来说,残差单元220可产生CU的残差块,使得残差块中的每一采样具有等于以下两者之间的差的值:CU的像素块中的采样,及CU的PU的预测块中的对应采样。The residual unit 220 may generate a residual block of the CU based on the pixel blocks of the CU and the prediction blocks of the PUs of the CU. For example, the residual unit 220 may generate a residual block of the CU so that each sample in the residual block has a value equal to the difference between the following two: a sample in the pixel blocks of the CU and a corresponding sample in the prediction blocks of the PUs of the CU.
变换/量化单元230可量化变换系数。变换/量化单元230可基于与CU相关联的量化参数(QP)值来量化与CU的TU相关联的变换系数。视频编码器200可通过调整与CU相关联的QP值来调整应用于与CU相关联的变换系数的量化程度。The transform/quantization unit 230 may quantize the transform coefficients. The transform/quantization unit 230 may quantize the transform coefficients associated with the TUs of the CU based on a quantization parameter (QP) value associated with the CU. The video encoder 200 may adjust the degree of quantization applied to the transform coefficients associated with the CU by adjusting the QP value associated with the CU.
反变换/量化单元240可分别将逆量化及逆变换应用于量化后的变换系数,以从量化后的变换系数重建残差块。The inverse transform/quantization unit 240 may apply inverse quantization and inverse transform to the quantized transform coefficients, respectively, to reconstruct a residual block from the quantized transform coefficients.
重建单元250可将重建后的残差块的采样加到预测单元210产生的一个或多个预测块的对应采样,以产生与TU相关联的重建图像块。通过此方式重建CU的每一个TU的采样块,视频编码器200可重建CU的像素块。The reconstruction unit 250 may add the samples of the reconstructed residual block to the corresponding samples of one or more prediction blocks generated by the prediction unit 210 to generate a reconstructed image block associated with the TU. By reconstructing the sample blocks of each TU of the CU in this manner, the video encoder 200 may reconstruct the pixel blocks of the CU.
环路滤波单元260用于对反变换与反量化后的像素进行处理,弥补失真信息,为后续编码像素提供更好的参考,例如可执行消块滤波操作以减少与CU相关联的像素块的块效应。The loop filter unit 260 is used to process the inverse transformed and inverse quantized pixels to compensate for distortion information and provide a better reference for subsequent coded pixels. For example, a deblocking filter operation may be performed to reduce the blocking effect of the pixel blocks associated with the CU.
在一些实施例中,环路滤波单元260包括去块滤波单元和样点自适应补偿/自适应环路滤波(SAO/ALF)单元,其中去块滤波单元用于去方块效应,SAO/ALF单元用于去除振铃效应。In some embodiments, the loop filter unit 260 includes a deblocking filter unit and a sample adaptive offset/adaptive loop filter (SAO/ALF) unit, wherein the deblocking filter unit is used to remove the block effect, and the SAO/ALF unit is used to remove the ringing effect.
解码图像缓存270可存储重建后的像素块。帧间预测单元211可使用含有重建后的像素块的参考图像来对其它图像的PU执行帧间预测。另外,帧内估计单元212可使用解码图像缓存270中的重建后的像素块来对在与CU相同的图像中的其它PU执行帧内预测。The decoded image buffer 270 may store the reconstructed pixel blocks. The inter prediction unit 211 may use the reference image containing the reconstructed pixel blocks to perform inter prediction on PUs of other images. In addition, the intra estimation unit 212 may use the reconstructed pixel blocks in the decoded image buffer 270 to perform intra prediction on other PUs in the same image as the CU.
熵编码单元280可接收来自变换/量化单元230的量化后的变换系数。熵编码单元280可对量化后的变换系数执行一个或多个熵编码操作以产生熵编码后的数据。The entropy encoding unit 280 may receive the quantized transform coefficients from the transform/quantization unit 230. The entropy encoding unit 280 may perform one or more entropy encoding operations on the quantized transform coefficients to generate entropy-encoded data.
图2B是本申请实施例涉及的视频解码器的示意性框图。FIG. 2B is a schematic block diagram of a video decoder according to an embodiment of the present application.
如图2B所示,视频解码器300包含:熵解码单元310、预测单元320、反量化/变换单元330、重建单元340、环路滤波单元350及解码图像缓存360。需要说明的是,视频解码器300可包含更多、更少或不同的功能组件。2B , the video decoder 300 includes an entropy decoding unit 310, a prediction unit 320, an inverse quantization/transformation unit 330, a reconstruction unit 340, a loop filter unit 350, and a decoded image buffer 360. It should be noted that the video decoder 300 may include more, fewer, or different functional components.
视频解码器300可接收码流。熵解码单元310可解析码流以从码流提取语法元素。作为解析码流的一部分,熵解码单元310可解析码流中的经熵编码后的语法元素。预测单元320、反量化/变换单元330、重建单元340及环路滤波单元350可根据从码流中提取的语法元素来解码视频数据,即产生解码后的视频数据。The video decoder 300 may receive a bitstream. The entropy decoding unit 310 may parse the bitstream to extract syntax elements from the bitstream. As part of parsing the bitstream, the entropy decoding unit 310 may parse the syntax elements in the bitstream that have been entropy encoded. The prediction unit 320, the inverse quantization/transformation unit 330, the reconstruction unit 340, and the loop filter unit 350 may decode the video data according to the syntax elements extracted from the bitstream, that is, generate decoded video data.
在一些实施例中,预测单元320包括帧间预测单元321和帧内估计单元322。In some embodiments, the prediction unit 320 includes an inter-frame prediction unit 321 and an intra-frame estimation unit 322 .
帧内估计单元322可执行帧内预测以产生PU的预测块。帧内估计单元322可使用帧内预测模式以基于空间相邻PU的像素块来产生PU的预测块。帧内估计单元322还可根据从码流解析的一个或多个语法元素来确定PU的帧内预测模式。The intra estimation unit 322 may perform intra prediction to generate a prediction block for the PU. The intra estimation unit 322 may use an intra prediction mode to generate a prediction block for the PU based on pixel blocks of spatially neighboring PUs. The intra estimation unit 322 may also determine the intra prediction mode for the PU according to one or more syntax elements parsed from the code stream.
帧间预测单元321可根据从码流解析的语法元素来构造第一参考图像列表(列表0)及第二参考图像列表(列表1)。此外,如果PU使用帧间预测编码,则熵解码单元310可解析PU的运动信息。帧间预测单元321可根据PU的运动信息来确定PU的一个或多个参考块。帧间预测单元321可根据PU的一个或多个参考块来产生PU的预测块。The inter prediction unit 321 may construct a first reference image list (list 0) and a second reference image list (list 1) according to the syntax elements parsed from the code stream. In addition, if the PU is encoded using inter prediction, the entropy decoding unit 310 may parse the motion information of the PU. The inter prediction unit 321 may determine one or more reference blocks of the PU according to the motion information of the PU. The inter prediction unit 321 may generate a prediction block of the PU according to one or more reference blocks of the PU.
反量化/变换单元330可逆量化(即,解量化)与TU相关联的变换系数。反量化/变换单元330可使用与TU的CU相关联的QP值来确定量化程度。The inverse quantization/transform unit 330 may inversely quantize (ie, dequantize) the transform coefficients associated with the TU. The inverse quantization/transform unit 330 may use the QP value associated with the CU of the TU to determine the degree of quantization.
在逆量化变换系数之后,反量化/变换单元330可将一个或多个逆变换应用于逆量化变换系数,以便产生与TU相关联的残差块。After inverse quantizing the transform coefficients, the inverse quantization/transform unit 330 may apply one or more inverse transforms to the inverse quantized transform coefficients in order to generate a residual block associated with the TU.
重建单元340使用与CU的TU相关联的残差块及CU的PU的预测块以重建CU的像素块。例如,重建单元340可将残差块的采样加到预测块的对应采样以重建CU的像素块,得到重建图像块。The reconstruction unit 340 uses the residual block associated with the TU of the CU and the prediction block of the PU of the CU to reconstruct the pixel block of the CU. For example, the reconstruction unit 340 may add samples of the residual block to corresponding samples of the prediction block to reconstruct the pixel block of the CU to obtain a reconstructed image block.
环路滤波单元350可执行消块滤波操作以减少与CU相关联的像素块的块效应。The loop filtering unit 350 may perform a deblocking filtering operation to reduce blocking effects of pixel blocks associated with a CU.
视频解码器300可将CU的重建图像存储于解码图像缓存360中。视频解码器300可将解码图像缓存360中的重建图像作为参考图像用于后续预测,或者,将重建图像传输给显示装置呈现。The video decoder 300 may store the reconstructed image of the CU in the decoded image buffer 360. The video decoder 300 may use the reconstructed image in the decoded image buffer 360 as a reference image for subsequent prediction, or transmit the reconstructed image to a display device for presentation.
视频编解码的基本流程如下:在编码端,将一帧图像划分成块,针对当前块,预测单元210使用帧内预测或帧间预测产生当前块的预测块。残差单元220可基于预测块与当前块的原始块计算残差块,即预测块和当前块的原始块的差值,该残差块也可称为残差信息。该残差块经由变换/量化单元230变换与量化等过程,可以去除人眼不敏感的信息,以消除视觉冗余。可选的,经过变换/量化单元230变换与量化之前的残差块可称为时域残差块,经过变换/量化单元230变换与量化之后的时域残差块可称为频率残差块或频域残差块。熵编码单元280接收到变化量化单元230输出的量化后的变化系数,可对该量化后的变化系数进行熵编码,输出码流。例如,熵编码单元280可根据目标上下文模型以及二进制码流的概率信息消除字符冗余。The basic process of video encoding and decoding is as follows: at the encoding end, a frame of image is divided into blocks, and for the current block, the prediction unit 210 uses intra-frame prediction or inter-frame prediction to generate a prediction block of the current block. The residual unit 220 can calculate the residual block based on the original block of the prediction block and the current block, that is, the difference between the original block of the prediction block and the current block, and the residual block can also be called residual information. The residual block can remove information that is not sensitive to the human eye through the transformation and quantization process of the transformation/quantization unit 230 to eliminate visual redundancy. Optionally, the residual block before transformation and quantization by the transformation/quantization unit 230 can be called a time domain residual block, and the time domain residual block after transformation and quantization by the transformation/quantization unit 230 can be called a frequency residual block or a frequency domain residual block. The entropy coding unit 280 receives the quantized change coefficient output by the change quantization unit 230, and can entropy encode the quantized change coefficient and output a bit stream. For example, the entropy coding unit 280 can eliminate character redundancy according to the target context model and the probability information of the binary bit stream.
在解码端,熵解码单元310可解析码流得到当前块的预测信息、量化系数矩阵等,预测单元320基于预测信息对当前块使用帧内预测或帧间预测产生当前块的预测块。反量化/变换单元330使用从码流得到的量化系数矩阵,对量 化系数矩阵进行反量化、反变换得到残差块。重建单元340将预测块和残差块相加得到重建块。重建块组成重建图像,环路滤波单元350基于图像或基于块对重建图像进行环路滤波,得到解码图像。编码端同样需要和解码端类似的操作获得解码图像。该解码图像也可以称为重建图像,重建图像可以为后续的帧作为帧间预测的参考帧。At the decoding end, the entropy decoding unit 310 can parse the bitstream to obtain the prediction information, quantization coefficient matrix, etc. of the current block. The prediction unit 320 generates a prediction block of the current block using intra-frame prediction or inter-frame prediction based on the prediction information. The inverse quantization/transformation unit 330 uses the quantization coefficient matrix obtained from the bitstream to The coefficient matrix is inversely quantized and inversely transformed to obtain a residual block. The reconstruction unit 340 adds the prediction block and the residual block to obtain a reconstructed block. The reconstructed blocks form a reconstructed image, and the loop filter unit 350 performs loop filtering on the reconstructed image based on the image or on the block to obtain a decoded image. The encoding end also requires similar operations as the decoding end to obtain a decoded image. The decoded image can also be called a reconstructed image, and the reconstructed image can be a reference frame for inter-frame prediction for subsequent frames.
需要说明的是,编码端确定的块划分信息,以及预测、变换、量化、熵编码、环路滤波等模式信息或者参数信息等在必要时携带在码流中。解码端通过解析码流及根据已有信息进行分析确定与编码端相同的块划分信息,预测、变换、量化、熵编码、环路滤波等模式信息或者参数信息,从而保证编码端获得的解码图像和解码端获得的解码图像相同。It should be noted that the block division information determined by the encoder, as well as the mode information or parameter information such as prediction, transformation, quantization, entropy coding, loop filtering, etc., are carried in the bitstream when necessary. The decoder parses the bitstream and determines the same block division information, prediction, transformation, quantization, entropy coding, loop filtering, etc. mode information or parameter information as the encoder by analyzing the existing information, thereby ensuring that the decoded image obtained by the encoder is the same as the decoded image obtained by the decoder.
上述是基于块的混合编码框架下的视频编解码器的基本流程,随着技术的发展,该框架或流程的一些模块或步骤可能会被优化,本申请适用于该基于块的混合编码框架下的视频编解码器的基本流程,但不限于该框架及流程。The above is the basic process of the video codec under the block-based hybrid coding framework. With the development of technology, some modules or steps of the framework or process may be optimized. The present application is applicable to the basic process of the video codec under the block-based hybrid coding framework, but is not limited to the framework and process.
本申请实施例涉及对视角视频进行编解码,下面对本申请实施例涉及的相关概念进行介绍。The embodiments of the present application involve encoding and decoding of perspective videos. The following introduces relevant concepts involved in the embodiments of the present application.
多视点视频,也称为自由视角视频,是由多个摄像机采集的,包含不同视角的,支持用户3DoF+或6DoF交互的沉浸式媒体视频。Multi-view video, also known as free viewpoint video, is an immersive media video captured by multiple cameras, containing different perspectives and supporting user 3DoF+ or 6DoF interaction.
自由度(Degree of Freedom,简称DoF),力学系统中是指独立坐标的个数,除了平移的自由度外,还有转动及振动自由度。本申请实施例中指用户在观看沉浸式媒体时,支持的运动并产生内容交互的自由度。Degree of Freedom (DoF) in a mechanical system refers to the number of independent coordinates, including the degrees of freedom of translation, rotation and vibration. In the embodiment of the present application, it refers to the degrees of freedom that support movement and generate content interaction when the user is watching immersive media.
三自由度(即3DoF),指用户头部围绕XYZ轴旋转的三种自由度。图3A示意性示出了三自由度的示意图。如图3A所示,就是在某个地方、某一个点在三个轴上都可以旋转,可以转头,也可以上下低头,也可以摆头。通过三自由度的体验,用户能够360度地沉浸在一个现场中。如果是静态的,可以理解为是全景的图片。如果全景的图片是动态,就是全景视频,也就是VR视频。但是VR视频是有一定局限性的,用户是不能够移动的,不能选择任意的一个地方去看。Three degrees of freedom (3DoF) refers to the three degrees of freedom of the user's head rotating around the XYZ axis. Figure 3A schematically shows a schematic diagram of three degrees of freedom. As shown in Figure 3A, a certain point in a certain place can rotate on three axes, you can turn your head, lower your head up and down, or shake your head. Through the three-degree-of-freedom experience, users can immerse themselves in a scene 360 degrees. If it is static, it can be understood as a panoramic picture. If the panoramic picture is dynamic, it is a panoramic video, that is, a VR video. However, VR videos have certain limitations. Users cannot move and cannot choose any place to watch.
3DoF+:即在三自由度的基础上,用户还拥有沿XYZ轴做有限运动的自由度,也可以将其称之为受限六自由度,对应的媒体码流可以称之为受限六自由度媒体码流。图3B示意性示出了三自由度+的示意图。3DoF+: On the basis of three degrees of freedom, the user also has the freedom to make limited movements along the XYZ axis, which can also be called limited six degrees of freedom, and the corresponding media stream can be called limited six degrees of freedom media stream. Figure 3B schematically shows a schematic diagram of three degrees of freedom+.
6DoF:即在三自由度的基础上,用户还拥有沿XYZ轴自由运动的自由度,对应的媒体码流可以称之为六自由度媒体码流。图3C示意性示出了六自由度的示意图。其中,6DoF媒体是指的6自由度视频,是指视频可以提供用户在三维空间的XYZ轴方向自由移动视点,以及围绕XYX轴自由旋转视点的高自由度观看体验。6DoF媒体是以摄像机阵列采集得到的空间不同视角的视频组合。为了便于6DoF媒体的表达、存储、压缩和处理,将6DoF媒体数据表达为以下信息的组合:多摄像机采集的纹理图,多摄像机纹理图所对应的深度图,以及相应的6DoF媒体内容描述元数据,元数据中包含了多摄像机的参数,以及6DoF媒体的拼接布局和边缘保护等描述信息。在编码端,把多摄像机的纹理图信息和对应的深度图信息进行拼接处理,并且把拼接方式的描述数据根据所定义的语法和语义写入元数据。拼接后的多摄像机深度图和纹理图信息通过平面视频压缩方式进行编码,并且传输到终端解码后,进行用户所请求的6DoF虚拟视点的合成,从而提供用户6DoF媒体的观看体验。6DoF: That is, on the basis of three degrees of freedom, the user also has the freedom to move freely along the XYZ axis, and the corresponding media code stream can be called a six-degree-of-freedom media code stream. Figure 3C schematically shows a schematic diagram of six degrees of freedom. Among them, 6DoF media refers to 6-degree-of-freedom video, which means that the video can provide users with a high-degree-of-freedom viewing experience of freely moving the viewpoint in the XYZ axis direction of three-dimensional space and freely rotating the viewpoint around the XYX axis. 6DoF media is a combination of videos from different perspectives of space acquired by a camera array. In order to facilitate the expression, storage, compression and processing of 6DoF media, 6DoF media data is expressed as a combination of the following information: texture maps acquired by multiple cameras, depth maps corresponding to the texture maps of multiple cameras, and corresponding 6DoF media content description metadata, which contains the parameters of multiple cameras, as well as description information such as the splicing layout and edge protection of 6DoF media. At the encoding end, the texture map information of multiple cameras and the corresponding depth map information are spliced, and the description data of the splicing method is written into the metadata according to the defined syntax and semantics. The stitched multi-camera depth map and texture map information is encoded using a planar video compression method and transmitted to the terminal for decoding. The 6DoF virtual viewpoint requested by the user is then synthesized, thereby providing the user with a 6DoF media viewing experience.
深度图(Depth map):作为一种三维场景信息表达方式,深度图的每个像素点的灰度值可用于表征场景中某一点距离摄像机的远近。Depth map: As a way to express three-dimensional scene information, the grayscale value of each pixel in the depth map can be used to represent the distance between a certain point in the scene and the camera.
在一些实施例中,对于多视点视频采用MPEG(Moving Picture Experts Group,动态图像专家组)沉浸式视频(MPEG Immersive Video,简称MIV)技术进行编解码。In some embodiments, MPEG (Moving Picture Experts Group) immersive video (MPEG Immersive Video, MIV for short) technology is used for encoding and decoding of multi-viewpoint videos.
下面对MIV技术进行介绍。The following is an introduction to MIV technology.
MIV技术:为了降低传输像素率的同时尽可能保留场景信息,以便保证有足够的信息用于渲染目标视图,MPEG-I采用的方案如图4A所示,选择有限数量视点作为基础视点且尽可能表达场景的可视范围,基础视点作为完整图像传输,去除剩余非基础视点与基础视点之间的冗余像素,即仅保留非重复表达的有效信息,再将有效信息提取的子块图像与基础视点图像进行重组织,形成更大的矩形图像,该矩形图像称为拼接图像,图4A和图4B给出拼接图像的生成示意过程。将拼接图像送入编解码器压缩重建,并且子块图像拼接信息有关的辅助数据也一并送入编码器形成码流。MIV technology: In order to reduce the transmission pixel rate while retaining scene information as much as possible, so as to ensure that there is enough information for rendering the target view, the solution adopted by MPEG-I is shown in Figure 4A. A limited number of viewpoints are selected as basic viewpoints and the visible range of the scene is expressed as much as possible. The basic viewpoint is transmitted as a complete image, and the redundant pixels between the remaining non-basic viewpoints and the basic viewpoint are removed, that is, only the effective information that is not expressed repeatedly is retained, and then the sub-block image extracted from the effective information is reorganized with the basic viewpoint image to form a larger rectangular image, which is called a spliced image. Figures 4A and 4B show the schematic process of generating the spliced image. The spliced image is sent to the codec for compression and reconstruction, and the auxiliary data related to the splicing information of the sub-block image is also sent to the encoder to form a bit stream.
在一些实施例中,多视点视频使用可视体视频编码(Visual Volumetric Video-based Coding,简称V3C)中的帧打包(frame packing)技术进行编解码。In some embodiments, multi-viewpoint video is encoded and decoded using the frame packing technology in Visual Volumetric Video-based Coding (V3C).
下面对frame packing技术进行介绍。The following is an introduction to frame packing technology.
以多视点视频为例,示例性的,如图5A所示,编码端包括如下步骤:Taking multi-view video as an example, illustratively, as shown in FIG5A , the encoding end includes the following steps:
步骤1,对获取的多视点视频进行编码时,经过一些前处理,生成多视点视频的补丁(patch),接着,将多视点视频的补丁进行组织,生成多视点视频拼接图。Step 1, when encoding the acquired multi-view video, after some pre-processing, a patch of the multi-view video is generated, and then the patch of the multi-view video is organized to generate a multi-view video mosaic.
例如,图5A所示,将多视点视频输入TIMV中进行打包,输出多视点视频拼接图。TIMV为一种MIV的参考软件。本申请实施例的打包可以理解为拼接。For example, as shown in Figure 5A, a multi-view video is input into TIMV for packaging, and a multi-view video splicing image is output. TIMV is a reference software for MIV. The packaging in the embodiment of the present application can be understood as splicing.
其中,多视点视频拼接图包括多视点视频纹理拼接图、多视点视频几何拼接图,即只包含多视点视频子块。The multi-view video mosaic map includes a multi-view video texture mosaic map and a multi-view video geometry mosaic map, that is, it only includes multi-view video sub-blocks.
步骤2,将多视点视频拼接图输入帧打包器,输出多视点视频混合拼接图。Step 2: Input the multi-view video mosaic image into the frame packer, and output the multi-view video mixed mosaic image.
其中,多视点视频混合拼接图包括多视点视频纹理混合拼接图,多视点视频几何混合拼接图,多视点视频纹理与几何混合拼接图。Among them, the multi-view video mixed mosaic image includes a multi-view video texture mixed mosaic image, a multi-view video geometry mixed mosaic image, and a multi-view video texture and geometry mixed mosaic image.
具体的,如图5A所示,将多视点视频拼接图进行帧打包(frame packing),生成多视点视频混合拼接图,每个多视点视频拼接图占用多视点视频混合拼接图的一个区域(region)。相应地,在码流中要为每个区域传送一个标志pin_region_type_id_minus2,这个标志记录了当前区域属于多视点视频纹理拼接图还是多视点视频几何拼接图的信息,在解码端需要利用该信息。Specifically, as shown in FIG5A , the multi-view video mosaic is frame packed to generate a multi-view video mixed mosaic, and each multi-view video mosaic occupies a region of the multi-view video mixed mosaic. Accordingly, a flag pin_region_type_id_minus2 is transmitted for each region in the bitstream. This flag records the information of whether the current region belongs to a multi-view video texture mosaic or a multi-view video geometric mosaic, and the information needs to be used at the decoding end.
步骤3,使用视频编码器对多视点视频混合拼接图进行编码,得到码流。Step 3: Use a video encoder to encode the multi-view video mixed splicing image to obtain a bit stream.
示例性的,如图5B所示,解码端包括如下步骤: Exemplarily, as shown in FIG5B , the decoding end includes the following steps:
步骤1,在多视点视频解码时,将获取的码流输入视频解码器中进行解码,得到重建多视点视频混合拼接图。Step 1: When decoding a multi-view video, the acquired code stream is input into a video decoder for decoding to obtain a reconstructed multi-view video mixed splicing image.
步骤2,将重建多视点视频混合拼接图输入帧解打包器中,输出重建多视点视频拼接图。Step 2: input the reconstructed multi-view video mixed mosaic image into the frame depacketizer, and output the reconstructed multi-view video mosaic image.
具体的,首先,从码流中获取标志pin_region_type_id_minus2,若确定该pin_region_type_id_minus2是V3C_AVD,则表示当前区域是多视点视频纹理拼接图,则将该当前区域拆分并输出为重建多视点视频纹理拼接图。Specifically, first, obtain the flag pin_region_type_id_minus2 from the bitstream, if it is determined that the pin_region_type_id_minus2 is V3C_AVD, it means that the current region is a multi-view video texture mosaic, then the current region is split and output as a reconstructed multi-view video texture mosaic.
若确定该pin_region_type_id_minus2是V3C_GVD,则表示当前区域是多视点视频几何拼接图,将该当前区域拆分并输出为重建多视点视频几何拼接图。If it is determined that pin_region_type_id_minus2 is V3C_GVD, it means that the current region is a multi-view video geometric mosaic, and the current region is split and output as a reconstructed multi-view video geometric mosaic.
步骤3,对重建多视点视频拼接图进行解码,得到重建多视点视频。Step 3: decode the reconstructed multi-view video mosaic to obtain the reconstructed multi-view video.
具体是,对多视点视频纹理拼接图和多视点视频几何拼接图进行解码,得到重建多视点视频。Specifically, the multi-view video texture mosaic map and the multi-view video geometric mosaic map are decoded to obtain the reconstructed multi-view video.
下面对TMIV14的编码过程进行介绍。The encoding process of TMIV14 is introduced below.
TMIV编码器有一个“基于组”的编码器,它为每个组调用图6所示的“单个组”编码器进编码。The TMIV encoder has a "group-based" encoder that calls the "single group" encoder shown in Figure 6 for each group.
下面对图6所示的多视点视频编码器进行介绍:The multi-view video encoder shown in FIG6 is introduced below:
如图6所示,该编码器作用于给定组的选定源视图,并具有以下阶段:As shown in Figure 6, the encoder operates on a selected source view of a given group and has the following stages:
1、获取源代码视图。1. Get the source code view.
TMIV编码器的输入包括一个源视图列表(图7A)。源视图表示3D真实或虚拟场景的投影。源代码视图可以是等矩形、透视或正交投影。每个源视图至少应具有视图参数(相机内部、相机外部、几何量化等)。源视图可能具有具有范围/无效采样值的几何部分。此外,源视图可能具有纹理属性部分。每个源视图的其他可选属性是实体映射和透明度属性部分。所有源代码视图的组件集必须相同。The input to the TMIV encoder consists of a list of source views (Figure 7A). A source view represents a projection of a 3D real or virtual scene. Source views can be equirectangular, perspective, or orthographic projections. Each source view should have at least view parameters (camera intrinsics, camera extrinsics, geometry quantization, etc.). A source view may have a geometry section with range/invalid sampling values. Additionally, a source view may have a texture attributes section. Other optional attributes for each source view are entity mapping and transparency attributes sections. The set of components must be the same for all source views.
2、自动选择参数。2. Automatically select parameters.
以设置图集参数,即图集的数量和每个图集的边框大小。TMIV编码器的一些参数是基于相机配置或最多源视图的第一帧自动计算得到。to set the atlas parameters, i.e. the number of atlases and the bounding box size of each atlas. Some parameters of the TMIV encoder are automatically calculated based on the camera configuration or the first frame of the most source views.
3、实体层分离。3. Physical layer separation.
将视图分离为实体层。当为源视图提供实体映射时,TMIV具有在实体编码模式下操作的能力。在这种模式下,提取并封装在图册中的补丁具有属于每个补丁的单个实体的活动像素,因此可以用其相关联的实体ID标记每个补丁。这使得能够在需要时单独地对实体进行选择性编码和/或解码,从而节省所利用的带宽并提高质量。如果选择了实体编码模式,则包括基本视图的源视图(纹理属性和几何信息)将被切片为多个层,使得每个层都包括属于一个实体的内容。然后,为每个实体独立调用以下编码阶段,以便将属于同一实体的所有视图中的层修剪、聚合和聚集在一起。将所有实体的补丁打包组合在一组图集中。Separation of views into entity layers. When an entity map is provided for the source view, TMIV has the ability to operate in entity coding mode. In this mode, the patches extracted and packed in the atlas have active pixels belonging to a single entity for each patch, so each patch can be labeled with its associated entity ID. This enables selective encoding and/or decoding of entities individually when needed, saving utilized bandwidth and improving quality. If entity coding mode is selected, the source view including the base view (texture attributes and geometry information) is sliced into multiple layers such that each layer includes content belonging to one entity. The following encoding stages are then called independently for each entity in order to prune, aggregate and cluster together the layers from all views belonging to the same entity. The patches of all entities are packaged and combined in a set of atlases.
4、像素修剪。4. Pixel trimming.
即对冗余信息进行修剪。一个场景的多视角呈现本质上具有采访冗余。修剪器选择可以安全地修剪视图的哪些区域。修剪器按帧操作,接收具有纹理属性和几何体信息以及相机参数的多个视图,并按相同大小,输出每个源视图的掩码图像。对于非基本视图,被掩码的像素点被“修剪”,未被掩码的像素点被“保留”。对于基本视图,将“保留”所有像素。That is, redundant information is pruned. Multi-view presentation of a scene is inherently redundant. The pruner selects which areas of the view can be safely pruned. The pruner operates on a frame-by-frame basis, receiving multiple views with texture attributes and geometry information as well as camera parameters, and outputs a mask image of each source view at the same size. For non-base views, masked pixels are "pruned" and unmasked pixels are "retained". For base views, all pixels are "retained".
5、修剪掩码聚合。5. Pruning mask aggregation.
通过激活聚合修剪掩码中的修剪掩码的活动样本,逐帧聚合每个实体的修剪掩码。掩码在每个帧内时段的开始处被重置。在帧内时段结束时,通过输出最后的聚合结果来完成该处理。7说明了帧i处的修剪视图,帧i和帧i+k之间的活动样本(以白色绘制)在帧内周期内的聚合;可以看出,考虑到场景中的运动,几何组件的变化部分的轮廓越来越厚。The pruning mask for each entity is aggregated frame by frame by activating the active samples of the pruning mask in the aggregate pruning mask. The mask is reset at the beginning of each intra-frame period. The process is completed by outputting the final aggregated result at the end of the intra-frame period. Figure 7 illustrates the pruning view at frame i, and the aggregation of the active samples (drawn in white) between frame i and frame i+k within the intra-frame period; it can be seen that the contours of the changing parts of the geometric components are getting thicker to take into account the motion in the scene.
6、对活动像素进行聚类。6. Cluster active pixels.
集群是实体的聚合掩码中活动的像素的连接集。一个像素的连接标准是八个相邻像素中至少有一个其他像素。A cluster is a connected set of pixels that are active in the aggregate mask of an entity. The connectivity criterion for a pixel is that it has at least one other pixel among its eight neighbors.
集群的一个例子中,已经修剪的视图的每个集群都由一个特定的假颜色表示。然后按大小递减的顺序对集群进行排序。An example of clustering, where each cluster of pruned views is represented by a specific false color. The clusters are then sorted in decreasing order of size.
7、集群拆分。7. Cluster splitting.
为了减少图谱中数据的空间冗余,对形状不规则的聚类进行了拆分。如果两个新得到的簇的边界框的总面积比初始簇的边界盒的面积小阈值,则每个簇被分割成两个较小的簇。为了决定如何分割聚类,两个子聚类的边界框的总面积被最小化。分割是沿着平行于簇边界框短边的线进行的。这种方法允许划分L形集群。In order to reduce the spatial redundancy of the data in the atlas, irregularly shaped clusters are split. Each cluster is split into two smaller clusters if the total area of the bounding boxes of the two newly obtained clusters is smaller than the area of the bounding box of the initial cluster by a threshold. To decide how to split the cluster, the total area of the bounding boxes of the two subclusters is minimized. The split is performed along lines parallel to the short sides of the cluster bounding boxes. This approach allows the partitioning of L-shaped clusters.
8、补丁打包。8. Patch packaging.
如图7C所示,打包过程按顺序将每个集群打包到图册中。As shown in Figure 7C, the packing process packs each cluster into the atlas in sequence.
9、补丁平均值修改。9. Modification of patch average value.
将补丁打包到图谱中后,修改所有属性面片值,以减少属性图谱中相邻补丁之间的边以及已占用和未占用区域之间的边的数量和大小。补丁的每个部分的平均值被设置为中性色,2m–1用于m-bps视频。为了恢复解码器侧的原始属性值,添加了补丁属性偏移,并在图集数据中发送。After packing the patches into the atlas, all attribute patch values are modified to reduce the number and size of edges between adjacent patches and between occupied and unoccupied regions in the attribute atlas. The mean of each part of the patch is set to a neutral color, 2m –1 for m-bps video. To restore the original attribute values at the decoder side, patch attribute offsets are added and sent in the atlas data.
10、颜色校正。10. Color correction.
TMIV具有对齐源视图的不同颜色特征的能力。如果启用了可选的颜色校正,则每个源视图的颜色特征将与参考视图的颜色特性对齐,该参考视图对应于最靠近摄影机装备中心的摄影机捕获的视图。TMIV has the ability to align the different color characteristics of the source views. If the optional color correction is enabled, the color characteristics of each source view will be aligned with the color characteristics of a reference view that corresponds to the view captured by the camera closest to the center of the camera rig.
11、视频数据生成。11. Video data generation.
单组编码器内的最后一个操作是将补丁写入分配给的图集的缓冲区(几何体和属性信息)。请注意,对于实体编码模式,给定补丁的内容是从由实体分隔符基于补丁的实体ID生成的关联实体视图中提取的。这确保了将正确的实体内容(纹理属性和几何体)的补丁写入形成的图集中。The last operation within the single-group encoder is to write the patch to the buffer (geometry and attribute information) of the atlas assigned to it. Note that for entity encoding mode, the content of a given patch is extracted from the associated entity view generated by the entity delimiter based on the entity ID of the patch. This ensures that a patch with the correct entity content (texture attributes and geometry) is written to the formed atlas.
12、几何量化。 12. Geometric quantization.
图集值要么是“无效/未占用”,要么是以米表示的几何值,最大几何值设置为1公里。MIV规范[2]规定了如果占用未明确存在,如何在几何图集中对占用信息进行编码。解码基于归一化视差范围、几何阈值等。这些值是按视图甚至按补丁发出信号的。Atlas values are either "invalid/unoccupied" or a geometry value expressed in meters, with a maximum geometry value set to 1 km. The MIV specification [2] specifies how occupancy information is encoded in the geometry atlas if occupancy is not explicitly present. Decoding is based on normalized disparity range, geometry thresholds, etc. These values are signaled per view or even per patch.
13、几何缩放。13. Geometric scaling.
为了最大限度地提高几何编码的效率,将最小-最大归一化深度值的整个范围划分为预定义数量的相等间隔(在编码器配置中设置),并根据相应间隔中几何样本的重要性自适应地缩放每个间隔。例如,每个区间中的原始几何值被映射到相应的缩放几何范围。To maximize the efficiency of geometry encoding, the entire range of min-max normalized depth values is divided into a predefined number of equal intervals (set in the encoder configuration), and each interval is adaptively scaled based on the importance of the geometry samples in the corresponding interval. For example, the original geometry values in each interval are mapped to the corresponding scaled geometry range.
14、占用率缩放。14. Occupancy scaling.
如果TMIV编码器被配置为输出占用视频数据(而不是将占用信息嵌入几何视频数据中),则通过可配置的缩放因子来缩小全分辨率占用图。默认因素是占用图的固有分辨率。对于基于实体的编码,建议使用更高的分辨率。解码器通过使用最近邻插值执行放大来重建全分辨率占用图。请注意,对于完整的地图册(例如,仅包括基本视图的地图册),可能不会输出占用图,因为所有像素都被占用了。If the TMIV encoder is configured to output occupancy video data (instead of embedding occupancy information in the geometry video data), the full-resolution occupancy map is scaled down by a configurable scaling factor. The default factor is the native resolution of the occupancy map. For entity-based encoding, higher resolutions are recommended. The decoder reconstructs the full-resolution occupancy map by performing upscaling using nearest-neighbor interpolation. Note that for complete atlases (e.g., atlases including only the base view), an occupancy map may not be output because all pixels are occupied.
经过上述编码活成,生成图集数据、几何视频数据、属性视频数据和占用视频数据等。After the above encoding process, atlas data, geometric video data, attribute video data, occupancy video data, etc. are generated.
示例性的,V3C单元头语义,如表1所示:Exemplary, V3C unit header semantics, as shown in Table 1:
表1
Table 1
由上述可知,目前的多视点视频的编码过程中,为了减少编码数据量,则通过像素修剪的方法,对多视点视频之间的冗余信息进行修剪,以降低编码数据量。但是,目前的像素修剪方法,存在修剪不彻底的问题,使得编码代价依然较高。As can be seen from the above, in the current multi-view video encoding process, in order to reduce the amount of encoded data, the redundant information between the multi-view videos is pruned by pixel pruning to reduce the amount of encoded data. However, the current pixel pruning method has the problem of incomplete pruning, which makes the encoding cost still high.
为了解决上述技术问题,本申请实施例,在对多视点视频进行编码时,首先基于像素修剪方式,确定N个视图中的第i个视图的第一修剪掩码图。接着,基于预设的块划分方式,将该第i个视图划分为M个图像块,对于这M个图像块中的第j个图像块,基于第一修剪掩码图确定第j个图像块中未被修剪的像素点,并对第j个图像块中未被修剪的像素点进行色度拟合,得到第j个图像块的色偏拟合函数。然后,基于该色偏拟合函数对第j个图像块中未被修剪的像素点进行修剪,得到第i个视图的补丁信息。最后,对上述补丁信息和色偏拟合函数进行编码,得到码流。也就是说,本申请实施例基于色偏拟合函数,对像素修剪得到的第一修剪掩码图进行再次修剪,以进一步减少未被裁剪的像素点的个数,降低编码端需要编码的数据量,进而降低编码端的编码代价,提升多视点视频的编解码效率。In order to solve the above technical problems, in an embodiment of the present application, when encoding a multi-view video, firstly, based on a pixel pruning method, a first pruning mask map of the i-th view among N views is determined. Then, based on a preset block division method, the i-th view is divided into M image blocks, and for the j-th image block among the M image blocks, the unpruned pixels in the j-th image block are determined based on the first pruning mask map, and the unpruned pixels in the j-th image block are chromatically fitted to obtain a color deviation fitting function of the j-th image block. Then, the unpruned pixels in the j-th image block are pruned based on the color deviation fitting function to obtain patch information of the i-th view. Finally, the patch information and the color deviation fitting function are encoded to obtain a bitstream. That is, in an embodiment of the present application, based on the color deviation fitting function, the first pruning mask map obtained by pixel pruning is pruned again to further reduce the number of unpruned pixels, reduce the amount of data that needs to be encoded at the encoding end, thereby reducing the encoding cost of the encoding end and improving the encoding and decoding efficiency of the multi-view video.
下面结合图8,以解码端为例,对本申请实施例提供的视频解码方法进行介绍。In conjunction with FIG. 8 , the video decoding method provided in the embodiment of the present application is introduced by taking the decoding end as an example.
图8为本申请一实施例提供的视频解码方法流程示意图,本申请实施例应用于图1和图3所示视频解码器。如图8所示,本申请实施例的方法包括:FIG8 is a schematic flow chart of a video decoding method provided by an embodiment of the present application, and the embodiment of the present application is applied to the video decoders shown in FIG1 and FIG3. As shown in FIG8, the method of the embodiment of the present application includes:
S101、对于N个视图中的第i个视图,解码码流,确定第i个视图的补丁信息,并基于补丁信息,生成第i个视图的补丁图像。S101. For an i-th view among N views, decode a bitstream, determine patch information of the i-th view, and generate a patch image of the i-th view based on the patch information.
本申请实施例的,N个视图为N个不同视点的视图,也就是说,这N个视图对应的视点均不相同。In the embodiment of the present application, the N views are views from N different viewpoints, that is, the viewpoints corresponding to the N views are all different.
其中N为大于1的正整数。也就是说,本申请实施例的N个视图包括2个视图、3个视图等任意大于2的视图。Wherein N is a positive integer greater than 1. That is, the N views in the embodiment of the present application include 2 views, 3 views, and any number of views greater than 2.
在一些实施例中,上述N个视图为N个不同视点在同一时刻对应的视图。例如,该N个视图均为N个视点在时刻t得到的视图,例如为N个视点上的相机在时刻t对同一环境的不同角度进行采集得到的图像。In some embodiments, the N views are views corresponding to N different viewpoints at the same time. For example, the N views are views obtained by N viewpoints at time t, for example, images obtained by cameras at N viewpoints capturing different angles of the same environment at time t.
在一些实施例中,上述N个视图为N个视点中各视点的第k个GOP的第一张图像,k为正整数。In some embodiments, the N views are the first image of the k-th GOP of each viewpoint among the N viewpoints, where k is a positive integer.
举例说明,假设N为3,此时将3个视点,分别记为第一视点、第二视点和第三视点,3个视图分别记为第一视图、第二视图和第三视图。假设第一视点的视频数据、第二视点的视频数据和第三视点的视频数据均包括100张图像。在对每一个视点上的视频数据进行编码时,首先将该视点上的视频数据划分为多个GOP组。假设,将第一视点的视频数据所包括的100张图像划分为5组,记为GOP11、GOP12、GOP13、GOP14、GOP15,每一组包括20张图像。同理,将第二视点的视频数据所包括的100张图像划分为5组,记为GOP21、GOP22、GOP23、GOP24、GOP25,每一组包括20张图像。将第三视点的视频数据所包括的100张图像划分为5组,记为GOP31、GOP32、GOP33、GOP34、GOP35,每一组包括20张图像。此时,上述3个视图可以包括GOP11中的第一张视图、GOP21中的第一张视图和GOP31中的第一张视图。或者,上述3个视图可以包括GOP12中的第一张视图、GOP22中的第一张视图和GOP32中的第一张视图。或者,上述3个视图可以包括GOP13中的第一张视图、GOP23中的第一张视图和GOP33中的第一张视图。或者,上述3个视图可以包括GOP14中的第一张视图、GOP24中的第一张视图和GOP34中的第一张视图。或者,上述3个视图可以包括GOP15中的第一张视图、GOP25中的第一张视图和GOP35中的第一张视图。For example, assume that N is 3. At this time, the three viewpoints are recorded as the first viewpoint, the second viewpoint and the third viewpoint, and the three views are recorded as the first viewpoint, the second viewpoint and the third viewpoint. Assume that the video data of the first viewpoint, the video data of the second viewpoint and the video data of the third viewpoint each include 100 images. When encoding the video data at each viewpoint, the video data at the viewpoint is first divided into multiple GOP groups. Assume that the 100 images included in the video data of the first viewpoint are divided into 5 groups, recorded as GOP11, GOP12, GOP13, GOP14, GOP15, and each group includes 20 images. Similarly, the 100 images included in the video data of the second viewpoint are divided into 5 groups, recorded as GOP21, GOP22, GOP23, GOP24, GOP25, and each group includes 20 images. The 100 images included in the video data of the third viewpoint are divided into 5 groups, which are recorded as GOP31, GOP32, GOP33, GOP34, and GOP35, and each group includes 20 images. At this time, the above three views may include the first view in GOP11, the first view in GOP21, and the first view in GOP31. Alternatively, the above three views may include the first view in GOP12, the first view in GOP22, and the first view in GOP32. Alternatively, the above three views may include the first view in GOP13, the first view in GOP23, and the first view in GOP33. Alternatively, the above three views may include the first view in GOP14, the first view in GOP24, and the first view in GOP34. Alternatively, the above three views may include the first view in GOP15, the first view in GOP25, and the first view in GOP35.
也就是说,本申请实施例中,解码端只确定N个不同视点视频的每一个GOP中的第一张图像的色偏拟合函数,而该GOP中的其他视图复用该GOP中的第一张图像的色偏拟合函数,这样可以减少码流要携带的色偏拟合函数的个 数,进而节约码字。That is to say, in the embodiment of the present application, the decoding end only determines the color deviation fitting function of the first image in each GOP of N different viewpoint videos, and other views in the GOP reuse the color deviation fitting function of the first image in the GOP, which can reduce the number of color deviation fitting functions to be carried by the bitstream. number, thereby saving code words.
解码端对这N个视图进行解码。解码端对这N个视图中任意非基本视图的解码过程基本一致,为了便于描述,以第i个视图的解码过程为例进行说明,i为小于或等于N的正整数。The decoding end decodes the N views. The decoding process of any non-basic view in the N views is basically the same. For ease of description, the decoding process of the i-th view is taken as an example for explanation, where i is a positive integer less than or equal to N.
对于第i个视图,解码端首先解码码流,得到该第i个视图的补丁信息。在一些实施例中,补丁信息也称为补丁数据。For the i-th view, the decoding end first decodes the bitstream to obtain the patch information of the i-th view. In some embodiments, the patch information is also called patch data.
接着,解码端基于该补丁信息,生成该第i个视图的补丁图像。例如,解码端将该第i个视图的补丁信息所包括的各补丁的位置信息,将第i个视图的每个补丁对应到空白图像中,得到该第i个视图的补丁图像。Next, the decoding end generates a patch image of the i-th view based on the patch information. For example, the decoding end maps each patch of the i-th view to a blank image using the position information of each patch included in the patch information of the i-th view to obtain a patch image of the i-th view.
可选的,该空白图像的大小与第i个视图的大小一致,也就是说,第i个视图的补丁图像,与该视图的大小一致。Optionally, the size of the blank image is consistent with the size of the i-th view, that is, the patch image of the i-th view is consistent with the size of the view.
在一种示例中,第i个视图包括3个补丁,将该3个补丁对应到空白图像中,得到如图9所示的补丁图像。In an example, the i-th view includes 3 patches, and the 3 patches are mapped to the blank image to obtain a patch image as shown in FIG. 9 .
由上述TIMV14的编码过程可知,将多视点视频进行分组,对每一组进行单独编码。上述N个视图可以理解为一组待编码的视图。由上述可知,在N个视图进行编码时,从这N个视图中选出基本视图,将N个视图中的其他视图与基本视图进行比较,以消除冗余信息,并对消除冗余信息后的补丁信息进行编码。而对于基本视图则进行全部编码。这样解码端可以直接通过解码码流,得到基本视图,进而基于视图和其他视图的补丁信息,恢复出其他视图。由此可知,在本申请实施例中,上述第i个视图为N个视图中的非基本视图。It can be seen from the above-mentioned TIMV14 encoding process that the multi-viewpoint video is grouped and each group is encoded separately. The above-mentioned N views can be understood as a group of views to be encoded. From the above, it can be seen that when N views are encoded, a basic view is selected from the N views, and the other views in the N views are compared with the basic view to eliminate redundant information, and the patch information after the redundant information is eliminated is encoded. The basic view is fully encoded. In this way, the decoding end can directly obtain the basic view by decoding the code stream, and then restore the other views based on the view and the patch information of other views. It can be seen that in an embodiment of the present application, the above-mentioned i-th view is a non-basic view among the N views.
S102、按照预设的块划分方式,将补丁图像划分为M个图像块。S102: Divide the patch image into M image blocks according to a preset block division method.
其中,M为正整数。Wherein, M is a positive integer.
本申请实施例对预设的块划分的具体方式不做限制。The embodiment of the present application does not limit the specific method of pre-setting block division.
在一些实施例中,预设的块划分方式可以是将补丁图像平均划分为M个图像块。In some embodiments, the preset block division method may be to divide the patch image into M image blocks on average.
在一些实施例中,按照预设的块大小,将补丁图像划分为M个图像块。In some embodiments, the patch image is divided into M image blocks according to a preset block size.
在一种示例中,如图10所示,将图9所示的补丁图像平均划分为4个图像块,对每一个图像块单独进行处理,此时M等于4。In an example, as shown in FIG. 10 , the patch image shown in FIG. 9 is evenly divided into 4 image blocks, and each image block is processed separately. In this case, M is equal to 4.
在一些实施例中,编码端通过第一标志来指示是否开启色偏拟合工具。因此,解码端按照预设的块划分方式,将补丁图像划分为M个图像块之前,即执行上述S102之前,首先需要解码码流,得到第一标志,进而基于该第一标志,确定是否执行上述S102。具体的,若第一标志指示解码端开启色偏拟合工具时,则说明码流中可能包括色偏拟合函数,此时解码端按照预设的块划分方式,将补丁图像划分为M个图像块,且执行如下S103的步骤。In some embodiments, the encoding end indicates whether to turn on the color deviation fitting tool through a first flag. Therefore, before the decoding end divides the patch image into M image blocks according to a preset block division method, that is, before executing the above S102, it is necessary to first decode the code stream to obtain the first flag, and then determine whether to execute the above S102 based on the first flag. Specifically, if the first flag indicates that the decoding end turns on the color deviation fitting tool, it means that the code stream may include a color deviation fitting function. At this time, the decoding end divides the patch image into M image blocks according to the preset block division method, and executes the following step S103.
若第一标志指示解码端开启色偏拟合工具时,则说明码流中不包括色偏拟合函数,此时解码端跳过执行上述S102的步骤,进行不执行如下S103和S104的步骤。If the first flag indicates that the decoding end turns on the color deviation fitting tool, it means that the code stream does not include the color deviation fitting function. At this time, the decoding end skips the above step S102 and does not execute the following steps S103 and S104.
本申请实施例对第一标志的具体表现形式不做限制,只有是可以指示是否开启色偏拟合工具的任意信息即可。The embodiment of the present application does not limit the specific form of the first mark, as long as it is any information that can indicate whether to enable the color deviation fitting tool.
在一种示例中,使用字段asme_cdpu_enabled_flag表示第一标志。例如,若asme_cdpu_enabled_flag为第一数值(例如0)时,表示解码端开启色偏拟合工具。再例如,若asme_cdpu_enabled_flag为第二数值(例如1)时,表示解码端未开启色偏拟合工具。In one example, the field asme_cdpu_enabled_flag is used to represent the first flag. For example, if asme_cdpu_enabled_flag is a first value (such as 0), it indicates that the decoder enables the color deviation fitting tool. For another example, if asme_cdpu_enabled_flag is a second value (such as 1), it indicates that the decoder does not enable the color deviation fitting tool.
本申请实施例对第一标志在码流中的具体携带位置不做限制。The embodiment of the present application does not limit the specific carrying position of the first flag in the code stream.
在一种可能的实现方式中,第一标志asme_cdpu_enabled_flag携带在图集序列参数集MIV扩展语法中。In a possible implementation manner, the first flag asme_cdpu_enabled_flag is carried in the atlas sequence parameter set MIV extended syntax.
示例性的,图集序列参数集MIV扩展语法如表2所示:Exemplarily, the atlas sequence parameter set MIV extended syntax is shown in Table 2:
表2
Table 2
在一种示例中,上述第一标志asme_cdpu_enabled_flag也可以用于指示码流中是否包括色偏拟合函数。例如,第一标志asme_cdpu_enabled_flag为1表示码流中可能包括色偏函数pdu_cdpu_params。例如,第一标志asme_cdpu_enabled_flag为0表示码流中不包括色偏函数pdu_cdpu_params。当asme_cdpu_enabled_flag不存在时,其值默认为0。In an example, the first flag asme_cdpu_enabled_flag can also be used to indicate whether the code stream includes a color shift fitting function. For example, the first flag asme_cdpu_enabled_flag is 1, indicating that the code stream may include a color shift function pdu_cdpu_params. For example, the first flag asme_cdpu_enabled_flag is 0, indicating that the code stream does not include a color shift function pdu_cdpu_params. When asme_cdpu_enabled_flag does not exist, its value defaults to 0.
S103、对于M个图像块中的第j个图像块,解码码流,得到第j个图像块的P个色偏拟合函数。S103 . For the j-th image block among the M image blocks, decode the bitstream to obtain P color deviation fitting functions of the j-th image block.
其中,色偏拟合函数是对第j个图像块对应的第j个图像块进行色度拟合得到,图像块是按照预设的块划分方式对第一修剪掩码图进行块划分得到,第一修剪掩码图是对第i个视图进行像素修剪得到,j为小于或等于M的正整数。Among them, the color deviation fitting function is obtained by performing chromaticity fitting on the j-th image block corresponding to the j-th image block, the image block is obtained by dividing the first pruning mask map into blocks according to a preset block division method, the first pruning mask map is obtained by pixel pruning of the i-th view, and j is a positive integer less than or equal to M.
在本申请实施例中,编码端在对N个视图中的第i个视图进行编码时,首先确定该第i个视图的第一修剪掩码图,即采用像素修剪方式,确定第i个视图中可以被修剪掉的像素点,且将该可以被修剪的像素点的掩码值(即mask值)置为数值1(例如0)。以及确定第i个视图中不可以被修剪掉的像素点,将第i个视图中不可以被修剪掉的像素点的掩码值(即mask值)置为数值2(例如1),进而可以得到该第i个视图的第一修剪掩码图。接着,编码端按照预设的块划分方式,将第i个视图划分为M个图像块。然后,编码端对于M个图像块中的每一个图像块,例如第j个图像块,对该第j个图像块进行色度拟合,具体是基于第一修剪掩码图确定第j个图像块中未被修剪的像素点,进而对第j个图像块中的未被修剪的像素点进行色度拟合,得到该第j个图像块的P个色偏拟合函数。接着,编码端将该第j个图像块的P个色偏拟合函数写入码流。In an embodiment of the present application, when encoding the i-th view among N views, the encoding end first determines the first pruning mask map of the i-th view, that is, adopts a pixel pruning method to determine the pixels that can be pruned in the i-th view, and sets the mask value (i.e., mask value) of the pixels that can be pruned to a value of 1 (e.g., 0). And determine the pixels that cannot be pruned in the i-th view, set the mask value (i.e., mask value) of the pixels that cannot be pruned in the i-th view to a value of 2 (e.g., 1), and then obtain the first pruning mask map of the i-th view. Then, the encoding end divides the i-th view into M image blocks according to a preset block division method. Then, the encoder performs chromaticity fitting on each image block in the M image blocks, for example, the jth image block, specifically, determines the unpruned pixels in the jth image block based on the first pruning mask image, and then performs chromaticity fitting on the unpruned pixels in the jth image block to obtain P color deviation fitting functions of the jth image block. Then, the encoder writes the P color deviation fitting functions of the jth image block into the bitstream.
需要说明的是,解码端对上述第i个视图的补丁图像进行块划分,得到M个图像块的方式,与编码端对第i个视图进行块划分,得到M个图像块的方式相同。It should be noted that the way in which the decoding end divides the patch image of the i-th view into blocks to obtain M image blocks is the same as the way in which the encoding end divides the i-th view into blocks to obtain M image blocks.
基于此,这样解码端可以通过解码码流,得到第j个图像块的色偏拟合函数,进而将该色偏拟合函数确定为第j个图像块的色偏拟合函数。Based on this, the decoding end can obtain the color deviation fitting function of the j-th image block by decoding the bit stream, and then determine the color deviation fitting function as the color deviation fitting function of the j-th image block.
本申请实施例对色偏函数在码流中的具体携带位置不做限制。The embodiment of the present application does not limit the specific carrying position of the color shift function in the bit stream.
在一种可能的实现方式中,该色偏拟合函数携带在补丁数据单元中。此时,解码端可以通过解码补丁数据单元,得到第j个图像块的色偏拟合函数。In a possible implementation, the color deviation fitting function is carried in a patch data unit. In this case, the decoding end can obtain the color deviation fitting function of the j-th image block by decoding the patch data unit.
在一种示例中,补丁数据单元包括补丁数据单元MIV扩展语法,该色偏拟合函数携带在补丁数据单元MIV扩展语法中。In an example, the patch data unit includes a patch data unit MIV extended syntax, and the color shift fitting function is carried in the patch data unit MIV extended syntax.
示例性的,补丁数据单元MIV扩展语法如表3所示:Exemplarily, the patch data unit MIV extended syntax is shown in Table 3:
表3
Table 3
上述表3中,pdu_cdpu_params[tileID][p]为该第j个图像块的色偏拟合函数的参数。当asme_cdpu_enable_flag不存在时,pdu_cdpu_params值默认为一个恒等变换矩阵,但该值不会存储于码流。In the above Table 3, pdu_cdpu_params[tileID][p] is the parameter of the color deviation fitting function of the j-th image block. When asme_cdpu_enable_flag does not exist, the pdu_cdpu_params value defaults to an identity transformation matrix, but this value will not be stored in the bitstream.
在一些实施例中,若上述第一标志指示解码端开启色偏拟合工具,且解码端在码流中未解码出第j个图像块的P个色偏拟合函数时,即该第j个图像块无法使得色偏拟合函数重建时,则解码端对第j个图像块进行重建的方法包括如下步骤1和步骤2:In some embodiments, if the first flag indicates that the decoding end starts the color deviation fitting tool, and the decoding end does not decode P color deviation fitting functions of the j-th image block in the bitstream, that is, the j-th image block cannot enable the color deviation fitting function to be reconstructed, then the method for the decoding end to reconstruct the j-th image block includes the following steps 1 and 2:
步骤1、基于P个重建图像确定P个父节点的合成视图,并在P个父节点的合成视图中确定第j个图像块的P个合成视图块;Step 1: Determine synthetic views of P parent nodes based on the P reconstructed images, and determine P synthetic view blocks of the j-th image block in the synthetic views of the P parent nodes;
步骤2、基于P个合成视图块,得到第j个图像块的重建值。Step 2: Based on the P synthetic view blocks, obtain the reconstruction value of the j-th image block.
具体的,对于P个父节点中的每一个父节点,解码端将该父节点的重建图像投影到第i个视图视口中,得到该父节点的合成视图。在该合成视图中确定第j个图像块的对应图像块,记为合成视图块。这样对于P个父节点,可以得到P个合成视图块。接着,基于P个合成视图块,得到第j个图像块的重建值。例如,对P个合成视图块进行加权处理,得到第j个图像块的重建值。Specifically, for each of the P parent nodes, the decoding end projects the reconstructed image of the parent node into the i-th viewport to obtain a synthetic view of the parent node. The corresponding image block of the j-th image block is determined in the synthetic view and recorded as a synthetic view block. In this way, for the P parent nodes, P synthetic view blocks can be obtained. Then, based on the P synthetic view blocks, a reconstruction value of the j-th image block is obtained. For example, the P synthetic view blocks are weighted to obtain a reconstruction value of the j-th image block.
在一些实施例中,解码端对第i个视图的补丁图像进行块划分,得到的第j个图像块包括如下几种情况:In some embodiments, the decoding end divides the patch image of the i-th view into blocks, and the obtained j-th image block includes the following cases:
情况1、该第j个图像块中所有像素点的均未被裁剪。Case 1: All pixels in the j-th image block are not cropped.
情况2、该第j个图像块中部分像素点被裁剪,部分像素点未被裁剪。Case 2: some pixels in the j-th image block are cropped, and some pixels are not cropped.
情况3、该第j个图像块中全部像素点均被裁剪。Case 3: All pixels in the j-th image block are cropped.
基于上述3种情况,对于情况1中第j个图像块,该第j个图像块不包括色偏拟合函数。基于此,解码端在执行上述S103之前,首先判断该第j个图像块中是否包括被修剪的像素点。若解码端确定该第j个图像块中包括被修剪的像素点时,则执行上述S103的步骤,解码所述码流,得到该第j个图像块对应的色偏拟合函数。若该第j个图像块中不包括被修剪的像素点时,则解码端跳过执行上述S103的步骤,采用相关方法,例如上述步骤1和步骤2的方法对第j个图像块进行解码。Based on the above three situations, for the j-th image block in situation 1, the j-th image block does not include the color deviation fitting function. Based on this, before executing the above S103, the decoding end first determines whether the j-th image block includes pruned pixels. If the decoding end determines that the j-th image block includes pruned pixels, the above S103 step is executed to decode the bitstream and obtain the color deviation fitting function corresponding to the j-th image block. If the j-th image block does not include pruned pixels, the decoding end skips the above S103 step and uses a related method, such as the above steps 1 and 2, to decode the j-th image block.
在一些实施例中,对于情况2中的第j个图像块,若该第j个图像块所包括的未被裁剪的像素点较少时,则编码端基于编码代价考量,不确定该第j个图像块的色偏拟合函数。基于此,解码端在执行上述S103之前,首先判断该第j个图像块中所包括的被修剪的像素点的个数是否大于或等于预设值。若解码端确定第j个图像块中所包括的被修剪的像素点的个数大于或等于预设值时,则执行上述S103的步骤,解码所述码流,得到该第j个图像块的色偏拟合函数。若该第j个图像块中所包括的被修剪的像素点的个数小于预设值时,则解码端跳过执行上述S103的步骤,采用相关方法,例如上述步骤1和步骤2的方法对第j个图像块进行解码。In some embodiments, for the j-th image block in situation 2, if the j-th image block includes fewer uncropped pixels, the encoding end, based on the consideration of encoding cost, does not determine the color deviation fitting function of the j-th image block. Based on this, before executing the above S103, the decoding end first determines whether the number of pruned pixels included in the j-th image block is greater than or equal to the preset value. If the decoding end determines that the number of pruned pixels included in the j-th image block is greater than or equal to the preset value, the above step S103 is executed to decode the bitstream and obtain the color deviation fitting function of the j-th image block. If the number of pruned pixels included in the j-th image block is less than the preset value, the decoding end skips the above step S103 and uses a related method, such as the above steps 1 and 2, to decode the j-th image block.
解码端基于上述步骤,确定出第j个图像块的色偏拟合函数后,执行如下S104的步骤。After the decoding end determines the color cast fitting function of the j-th image block based on the above steps, it executes the following step S104.
S104、使用P个色偏拟合函数,对第j个图像块中被裁剪的像素点进行像素拟合,得到第j个图像块的重建块。S104 . Use P color deviation fitting functions to perform pixel fitting on the cropped pixels in the j th image block to obtain a reconstructed block of the j th image block.
在本申请实施例中,解码端基于上述步骤,得到第j个图像块的色偏拟合函数后,使用色偏拟合函数,对第j个图像块中被裁剪的像素点进行像素拟合,得到第j个图像块的重建块。In an embodiment of the present application, after the decoding end obtains the color deviation fitting function of the j-th image block based on the above steps, it uses the color deviation fitting function to perform pixel fitting on the cropped pixels in the j-th image block to obtain a reconstructed block of the j-th image block.
本申请实施例对解码端使用色偏拟合函数,对第j个图像块中被裁剪的像素点进行像素拟合,得到第j个图像块的重建块的具体方式不做限制。In the embodiment of the present application, the decoding end uses a color deviation fitting function to perform pixel fitting on the cropped pixels in the j-th image block, and the specific method of obtaining the reconstructed block of the j-th image block is not limited.
在一些实施例中,若上述色偏拟合函数为编码端基于N个视图中的基本视图中第j个图像块对应的图像块,和该第j个图像块进行拟合得到的。对应的,解码端可以使用该色偏拟合函数,对N个视图中的基本视图中第j个图像块对应的图像块进行像素拟合,得到第j个图像块的重建块。In some embodiments, if the color deviation fitting function is obtained by fitting the image block corresponding to the j-th image block in the basic view among N views by the encoder, the decoder can use the color deviation fitting function to perform pixel fitting on the image block corresponding to the j-th image block in the basic view among N views to obtain a reconstructed block of the j-th image block.
在一些实施例中,上述S104包括如下S104-A至S104-C的步骤:In some embodiments, the above S104 includes the following steps S104-A to S104-C:
S104-A、确定N个视图的修剪层次的有向无环图;S104-A, determining a directed acyclic graph of pruning levels of N views;
S104-B、基于有向无环图,确定第i个视图的P个父节点所对应的P个重建图像,P为正整数;S104-B, based on the directed acyclic graph, determining P reconstructed images corresponding to P parent nodes of the i-th view, where P is a positive integer;
S104-C、基于P个重建图像,使用色偏拟合函数,对第j个图像块中被裁剪的像素点进行像素拟合,得到第j个图像块的重建块。S104-C: Based on the P reconstructed images, use the color deviation fitting function to perform pixel fitting on the cropped pixels in the j-th image block to obtain a reconstructed block of the j-th image block.
在该实施例中,解码端首先确定N个视图的修剪层次的有向无环图,该有向无环图中的每一个节点对应N个视图中的一个视图,该有向无环图中子节点对应的视图在修剪时,参照该子节点的各父节点的视图。这样,解码端在对N个视图中的第i个视图进行解码时,可以从上述确定的有向图无环图中,确定该第i个视图的父节点,在本申请实 施例的有向图无环图中,第i个视图的父节点可以包括1个或多个,为了便于描述,记为P个父节点,其中P为正整数。需要说明的是,由于解码端在对N个视图进行解码时,基于该有向无环图从根节点向子节点进行解码,因此,在对第i个视图进行解码时,该第i个视图的P个父节点上的视图均已解码或重建。这样,解码端在对第i个视图进行解码时,可以直接得到第i个视图的P个父节点对应的视图的重建图像,即得到P个重建图像。这样,解码端可以基于这P个重建图像,使用色偏拟合函数,对第j个图像块中被裁剪的像素点,得到第j个图像块的重建块,实现对第j个图像块的准确重建。In this embodiment, the decoding end first determines a directed acyclic graph of the pruning hierarchy of N views, each node in the directed acyclic graph corresponds to one of the N views, and the view corresponding to the child node in the directed acyclic graph refers to the view of each parent node of the child node when pruning. In this way, when the decoding end decodes the i-th view among the N views, it can determine the parent node of the i-th view from the directed acyclic graph determined above. In the directed acyclic graph of the embodiment, the parent node of the i-th view may include one or more, which are recorded as P parent nodes for ease of description, where P is a positive integer. It should be noted that since the decoding end decodes from the root node to the child node based on the directed acyclic graph when decoding N views, when decoding the i-th view, the views on the P parent nodes of the i-th view have been decoded or reconstructed. In this way, when decoding the i-th view, the decoding end can directly obtain the reconstructed image of the view corresponding to the P parent nodes of the i-th view, that is, obtain P reconstructed images. In this way, the decoding end can use the color deviation fitting function based on the P reconstructed images to obtain the reconstructed block of the j-th image block for the cropped pixels in the j-th image block, thereby realizing accurate reconstruction of the j-th image block.
下面对解码端确定N个视图的修剪层次的有向无环图的具体过程进行介绍:The specific process of determining the directed acyclic graph of the pruning levels of N views at the decoding end is described below:
本申请实施例对解码端确定N个视图的修剪层次的有向无环图的具体方式不做限制。The embodiment of the present application does not limit the specific manner in which the decoding end determines the directed acyclic graph of the pruning levels of N views.
在一种可能的实现方式中,解码端和编码端将默认的有向无环图,确定为N个视图的修剪层次的有向无环图。In a possible implementation, the decoding end and the encoding end determine the default directed acyclic graph as a directed acyclic graph of the pruning levels of N views.
在一种可能的实现方式中,编码端确定N个视图的修剪层次的有向无环图,并将N个视图的修剪层次的有向无环图指示给解码端。例如,编码端将确定的N个视图的修剪层次的有向无环图的相关信息写入码流。这样解码端通过解码码流,可以的N个视图的修剪层次的有向无环图。In a possible implementation, the encoder determines a directed acyclic graph of the pruning levels of N views, and indicates the directed acyclic graph of the pruning levels of the N views to the decoder. For example, the encoder writes the relevant information of the determined directed acyclic graph of the pruning levels of the N views into the bitstream. In this way, the decoder can obtain the directed acyclic graph of the pruning levels of the N views by decoding the bitstream.
例如,编码端确定的N个视图的修剪层次的有向无环图如图11所示,编码端可以将视图V4为基本视图的信息写入码流,以及将视图V1基于视图V4编码的信息,将视图V3基于视图V4和视图V1编码的信息,以及将视图V0基于视图V4、视图V1和视图V3编码的信息写入码流。这样解码端通过解码码流,可以确定出视图V0、V1、V3和V4之间的参照关系,进而基于该参照关系,恢复出图11所示的有向无环图。For example, the directed acyclic graph of the pruning hierarchy of N views determined by the encoder is shown in FIG11. The encoder can write information that view V4 is a basic view into the bitstream, and information that view V1 is encoded based on view V4, information that view V3 is encoded based on view V4 and view V1, and information that view V0 is encoded based on view V4, view V1, and view V3 into the bitstream. In this way, the decoder can determine the reference relationship between views V0, V1, V3, and V4 by decoding the bitstream, and then restore the directed acyclic graph shown in FIG11 based on the reference relationship.
解码端基于上述步骤,确定出N个视图的修剪层次的有向无环图后,执行上述S104-B的步骤,基于有向无环图,确定第i个视图的P个父节点所对应的P个重建图像。After the decoder determines the directed acyclic graph of the pruning levels of the N views based on the above steps, it executes the above step S104-B to determine the P reconstructed images corresponding to the P parent nodes of the i-th view based on the directed acyclic graph.
继续以上述图11所示的有向无环图为例,例如第i个视图为视图V1时,则该视图V1的父节点为视图V4,即基本视图,即视图V1在解码时参照基本视图V4。再例如,若第i个视图为视图V3时,则该视图V3的父节点包括2个,分别为视图V4和视图V1,即视图V3在解码时参照基本视图V4和视图V1。再例如,若第i个视图为视图V0时,则该视图V0的父节点包括3个,分别为视图V4、视图V1和视图V3,即视图V0在解码时参照视图V4、视图V1和视图V3。Continuing with the directed acyclic graph shown in FIG. 11 above, for example, when the i-th view is view V1, the parent node of view V1 is view V4, that is, the base view, that is, view V1 refers to base view V4 during decoding. For another example, if the i-th view is view V3, the parent nodes of view V3 include two, view V4 and view V1, that is, view V3 refers to base view V4 and view V1 during decoding. For another example, if the i-th view is view V0, the parent nodes of view V0 include three, view V4, view V1, and view V3, that is, view V0 refers to view V4, view V1, and view V3 during decoding.
解码端基于上述步骤,确定出第i个视图的P个父节点后,可以获得这P个父节点中每一个父节点对应的重建图像。由于一个父节点对应一张视图,因此,对于P个父节点可以得到P个重建图像。Based on the above steps, after the decoder determines the P parent nodes of the i-th view, it can obtain the reconstructed image corresponding to each of the P parent nodes. Since one parent node corresponds to one view, P reconstructed images can be obtained for the P parent nodes.
解码端确定第i个视图对应的P个重建图像后,基于该P个重建图像,使用上述解码得到色偏拟合函数,对第j个图像块中被裁剪的像素点进行像素拟合,得到该第j个图像块的重建块。After the decoding end determines the P reconstructed images corresponding to the i-th view, based on the P reconstructed images, the color deviation fitting function obtained by the above decoding is used to perform pixel fitting on the cropped pixels in the j-th image block to obtain a reconstructed block of the j-th image block.
本申请实施例对解码端基于该P个重建图像,使用上述解码得到的色偏拟合函数,对第j个图像块中被裁剪的像素点进行像素拟合,得到该第j个图像块的重建块的具体方式不做限制。In the embodiment of the present application, the decoding end performs pixel fitting on the cropped pixels in the j-th image block based on the P reconstructed images using the color deviation fitting function obtained by the above decoding to obtain the specific method of reconstructing the j-th image block.
在一些实施例中,解码端使用色偏拟合函数,对这P个重建图像进行像素点拟合,拟合得到一个拟合图像,将该拟合图像中第j个图像块对应的图像块,确定为该第j个图像块的重建块。In some embodiments, the decoding end uses a color deviation fitting function to perform pixel fitting on the P reconstructed images to obtain a fitting image, and determines the image block corresponding to the j-th image block in the fitting image as the reconstructed block of the j-th image block.
在一些实施例中,上述S104-C包括如下S104-C1至S104-C3的步骤:In some embodiments, the above S104-C includes the following steps S104-C1 to S104-C3:
S104-C1、基于P个重建图像确定P个父节点的合成视图,并在P个父节点的合成视图中确定第j个图像块的P个合成视图块;S104-C1, determining synthetic views of P parent nodes based on the P reconstructed images, and determining P synthetic view blocks of the j-th image block in the synthetic views of the P parent nodes;
S104-C2、基于P个合成视图块,使用P个色偏拟合函数,对第j个图像块中被修剪的像素点进行拟合,得到第j个图像块的P个色偏重建块;S104-C2, based on the P synthetic view blocks, using P color deviation fitting functions, fitting the pruned pixels in the j-th image block to obtain P color deviation reconstruction blocks of the j-th image block;
S104-C3、对P个色偏重建块进行加权融合,得到第j个图像块的重建块。S104-C3, weighted fusion is performed on the P color-shifted reconstruction blocks to obtain a reconstruction block of the j-th image block.
在该实施例中,对于P个父节点中的每一个父节点,解码端将该父节点的重建图像投影到第i个视图视口上,得到该父节点的合成视图,并在该合成视图上,确定该第j个图像块的对应图像块,记为合成视图块。这样P个父节点可以得到P个合成视图块。例如,在P个父节点中的第一个父节点的合成视图中,确定第j个图像块的一个合成视图块,在P个父节点中的第二个父节点的合成视图中,确定第j个图像块的一个合成视图块,依次类推,可以得到P个合成视图块。In this embodiment, for each of the P parent nodes, the decoding end projects the reconstructed image of the parent node onto the i-th viewport to obtain a synthetic view of the parent node, and determines the corresponding image block of the j-th image block on the synthetic view, which is recorded as a synthetic view block. In this way, the P parent nodes can obtain P synthetic view blocks. For example, in the synthetic view of the first parent node among the P parent nodes, a synthetic view block of the j-th image block is determined, and in the synthetic view of the second parent node among the P parent nodes, a synthetic view block of the j-th image block is determined, and so on, P synthetic view blocks can be obtained.
解码端基于上述步骤,在P个合成视图中,确定第j个图像块的P个合成视图块后,执行上述S104-C2的步骤,基于P个合成视图块,使用P个色偏拟合函数,对第j个图像块中被修剪的像素点进行拟合,得到第j个图像块的重建块。Based on the above steps, the decoding end determines the P synthetic view blocks of the j-th image block in the P synthetic views, and then executes the above steps S104-C2, and fits the pruned pixels in the j-th image block based on the P synthetic view blocks using P color deviation fitting functions to obtain a reconstructed block of the j-th image block.
本申请实施例对解码端基于P个合成视图块,使用P个色偏拟合函数,对第j个图像块中被修剪的像素点进行拟合,得到第j个图像块的重建块的具体方式不做限制。The embodiment of the present application does not limit the specific manner in which the decoding end fits the pruned pixels in the j-th image block based on P synthetic view blocks using P color deviation fitting functions to obtain the reconstructed block of the j-th image block.
在一些实施例中,以图像块为拟合单元,使用P个色偏拟合函数分别对P个合成视图块进行拟合,得到第j个图像块的P个色偏重建块。例如,对于P个合成视图块中的合成视图块1,使用该合成视图块1对应的色偏拟合函数,对合成视图块1进行拟合,得到第j个图像块的色偏重建块1。对于P个合成视图块中的合成视图块2,使用该合成视图块2对应的色偏拟合函数,对合成视图块2进行拟合,得到第j个图像块的色偏重建块2。依次类推,可以得到第j个图像块的P个色偏重建块。接着,将这P个色偏重建块进行加权融合,得到一个融合重建块。从该融合重建块,确定出第j个图像块中被裁剪的像素点的像素值,例如对于第j个图像块中被裁剪的像素点1,将融合重建块中该像素点1对应的像素点的重建值,确定为该像素点1的重建值,这样可以确定出第j个图像块中每一个被裁剪的像素点的重建值。进而将第j个图像块中被裁剪的像素点的重建值,以及第j个图像块中未被裁剪的像素点的补丁值进行组合,得到第j个图像块的重建块。In some embodiments, with an image block as a fitting unit, P color deviation fitting functions are used to fit P synthetic view blocks respectively, to obtain P color deviation reconstruction blocks of the j-th image block. For example, for synthetic view block 1 among the P synthetic view blocks, the color deviation fitting function corresponding to the synthetic view block 1 is used to fit synthetic view block 1, to obtain color deviation reconstruction block 1 of the j-th image block. For synthetic view block 2 among the P synthetic view blocks, the color deviation fitting function corresponding to the synthetic view block 2 is used to fit synthetic view block 2, to obtain color deviation reconstruction block 2 of the j-th image block. By analogy, P color deviation reconstruction blocks of the j-th image block can be obtained. Then, these P color deviation reconstruction blocks are weighted fused to obtain a fused reconstruction block. From the fused reconstructed block, the pixel values of the cropped pixels in the jth image block are determined. For example, for the cropped pixel 1 in the jth image block, the reconstruction value of the pixel corresponding to the pixel 1 in the fused reconstructed block is determined as the reconstruction value of the pixel 1, so that the reconstruction value of each cropped pixel in the jth image block can be determined. Then, the reconstruction values of the cropped pixels in the jth image block and the patch values of the non-cropped pixels in the jth image block are combined to obtain the reconstructed block of the jth image block.
在一些实施例中,解码端对第j个图像块中被裁剪的像素点进行逐像素点恢复,此时上述S104-C2包括如下S104-C21至S104-C23的步骤:In some embodiments, the decoding end restores the cropped pixels in the j-th image block pixel by pixel. In this case, the above S104-C2 includes the following steps S104-C21 to S104-C23:
S104-C21、对于第j个图像块中第k个被修剪的像素点,确定第k个被修剪的像素点在P个合成视图块中对应的P个合成像素点,k为正整数; S104-C21, for the k-th pruned pixel in the j-th image block, determine P synthetic pixels corresponding to the k-th pruned pixel in the P synthetic view blocks, where k is a positive integer;
S104-C22、基于P个合成像素点和P个合成像素点的周围像素点,使用色偏拟合函数对第k个被裁剪的像素点进行像素拟合,得到第k个被修剪的像素点的重建值;S104-C22, based on the P synthetic pixels and the surrounding pixels of the P synthetic pixels, use the color deviation fitting function to perform pixel fitting on the k-th cropped pixel to obtain a reconstructed value of the k-th cropped pixel;
S104-C23、基于第j个图像块中被修剪的像素点的重建值,得到第j个图像块的重建块。S104-C23, obtaining a reconstructed block of the j-th image block based on the reconstructed values of the pruned pixels in the j-th image block.
在该实施例中,解码端确定第j个图像块中每一个被裁剪的像素点的重建值的过程一致,为了便于描述,以第j个图像块中的第k个被修剪的像素点为例进行说明。In this embodiment, the process of determining the reconstruction value of each cropped pixel in the jth image block by the decoding end is consistent. For ease of description, the kth cropped pixel in the jth image block is taken as an example for explanation.
在该实施例中,解码端对于第j个图像块中第k个被修剪的像素点,首先在第j个图像块对应的P个合成图像块中的每一个合成图像块中,确定第k个被修剪的像素点的对应像素点,记为合成像素点。例如,在P个合成图像块中的第一个合成图像块中,确定第k个被修剪的像素点的一个对应像素点,在P个合成图像块中的第二个合成图像块中,确定第k个被修剪的像素点的一个对应像素点,依次类推,可以在P个合成图像块中确定出第k个被修剪的像素点的P个对应像素点,记为P个合成像素点。In this embodiment, for the k-th pruned pixel in the j-th image block, the decoding end first determines the corresponding pixel of the k-th pruned pixel in each of the P synthesized image blocks corresponding to the j-th image block, and records them as synthesized pixel points. For example, in the first synthesized image block among the P synthesized image blocks, a corresponding pixel of the k-th pruned pixel is determined, and in the second synthesized image block among the P synthesized image blocks, a corresponding pixel of the k-th pruned pixel is determined, and so on, P corresponding pixel points of the k-th pruned pixel can be determined in the P synthesized image blocks, and recorded as P synthesized pixel points.
这P个合成像素点已重建,解码端可以基于这P个合成像素点的重建值,使用P个色偏拟合函数对第k个被裁剪的像素点进行像素拟合,得到第k个被修剪的像素点的重建值。例如,对于P个合成像素点中的每一个合成像素点,使用该合成像素点对应的色偏拟合函数,对该合成像素点的重建值和该合成像素点的周围像素点(例如3X3区域内像素案)进行拟合,得到一个色偏拟合值,这样对于P个合成像素点可以得到P个色偏拟合值,将这P个色偏合值间加权融合,确定为第k个被修剪的像素点的色偏重建值。The P synthetic pixels have been reconstructed, and the decoding end can use P color deviation fitting functions to perform pixel fitting on the kth cropped pixel based on the reconstructed values of the P synthetic pixels to obtain the reconstructed value of the kth pruned pixel. For example, for each of the P synthetic pixels, the color deviation fitting function corresponding to the synthetic pixel is used to fit the reconstructed value of the synthetic pixel and the surrounding pixels of the synthetic pixel (for example, the pixel pattern in the 3X3 area) to obtain a color deviation fitting value. In this way, P color deviation fitting values can be obtained for the P synthetic pixels, and the P color deviation values are weighted and fused to determine the color deviation reconstruction value of the kth pruned pixel.
本申请实施例对色偏拟合函数的具体表现形式不做限制。The embodiment of the present application does not limit the specific expression form of the color cast fitting function.
在一种可能的实现方式中,该色偏拟合函数为加权最小二乘法拟合函数,这样解码端可以对该合成像素点的重建值和该合成像素点的周围像素点进行加权最小二乘法拟合,得到第k个被修剪的像素点的重建值。In one possible implementation, the color cast fitting function is a weighted least squares fitting function, so that the decoding end can perform weighted least squares fitting on the reconstructed value of the synthesized pixel point and the surrounding pixels of the synthesized pixel point to obtain the reconstructed value of the kth pruned pixel point.
解码端基于上述步骤,可以确定出第j个图像块中的每一个被裁剪的像素点的重建值,进而得到第j个图像块的重建值。Based on the above steps, the decoding end can determine the reconstruction value of each cropped pixel in the j-th image block, and then obtain the reconstruction value of the j-th image block.
上述实施例对解码端确定第i个视图的补丁图像中的第j个图像块的重建值的过程进行介绍,该补丁图像中的其他图像块参照上述第j个图像块的重建值的确定过程,可以确定出其他图像块的重建值。进而可以得到第i个视图的重建图像。The above embodiment introduces the process of determining the reconstruction value of the j-th image block in the patch image of the i-th view at the decoding end, and the reconstruction values of other image blocks in the patch image can be determined by referring to the determination process of the reconstruction value of the j-th image block, and then the reconstructed image of the i-th view can be obtained.
上文解码端确定N个视图中的第i个视图的解码过程进行介绍。解码端可以参照第i个视图的解码过程,对N个视图中其他非基本视图进行解码,进而完成对N个视图的解码。The above describes the decoding process of determining the i-th view among N views by the decoding end. The decoding end can decode other non-basic views among the N views with reference to the decoding process of the i-th view, thereby completing the decoding of the N views.
示例性的,如图12所示,编码端在编码时,将两组视图的补丁信息合并在一起,其中视图V4和视图V6为基本视图,视图V4、V1、V3、V0为一组视图进行单独解码,视图V6、V8、V7、V5为另一组视图进行单独解码。在对第一组视图V4、V1、V3、V0进行解码时,解码端首先解码码流,得到图12所示的补丁信息,从该补丁信息中,得到第一组视图V4、V1、V3、V0的补丁信息,其中基本视图V4为基本视图,整个信息均被编码,因此通过补丁信息中,可以重建出基本视图V4。解码端确定第一组视图V4、V1、V3、V0的修剪层次的有向无环图,如图12的左上方所示。基于该有向无环图和基本视图V4,对视图V1进行解码,具体的,基于视图V1的补丁信息,生成视图V1的补丁图像。接着,将视图V1的补丁图像划分为M个图像块。对于M个图像块中的每一个图像块,例如第j个图像块,解码码流,得到该图像块的色偏拟合函数。接着,使用该色偏拟合函数对第j个图像块中被裁剪的像素点进行像素拟合,得到第j个图像块的重建块,具体过程参照上述实施例的描述,例如基于基本视图V4,使用色偏拟合函数,对第j个图像块中被裁剪的像素点进行像素拟合,得到第j个图像块的重建块,重复该过程,可以实现对视图V1的补丁图像所包括的M个图像块中每一个图像块的解码,进而得到视图V1的解码图像。接着,基于基本视图V4和视图V1的解码图像,对视图V3进行解码,得到视图V3的重建图像。基于基本视图V4、视图V1和视图V3的解码图像,对视图V0进行解码,得到视图V0的重建图像。Exemplarily, as shown in FIG12, when encoding, the encoder merges the patch information of the two groups of views, wherein view V4 and view V6 are basic views, views V4, V1, V3, and V0 are decoded separately as a group of views, and views V6, V8, V7, and V5 are decoded separately as another group of views. When decoding the first group of views V4, V1, V3, and V0, the decoder first decodes the bitstream to obtain the patch information shown in FIG12, and obtains the patch information of the first group of views V4, V1, V3, and V0 from the patch information, wherein the basic view V4 is the basic view, and the entire information is encoded, so the basic view V4 can be reconstructed from the patch information. The decoder determines a directed acyclic graph of the pruning hierarchy of the first group of views V4, V1, V3, and V0, as shown in the upper left corner of FIG12. Based on the directed acyclic graph and the basic view V4, the view V1 is decoded, and specifically, based on the patch information of the view V1, a patch image of the view V1 is generated. Next, the patch image of view V1 is divided into M image blocks. For each of the M image blocks, for example, the j-th image block, the code stream is decoded to obtain the color deviation fitting function of the image block. Next, the cropped pixels in the j-th image block are pixel-fitted using the color deviation fitting function to obtain a reconstructed block of the j-th image block. The specific process refers to the description of the above embodiment. For example, based on the basic view V4, the color deviation fitting function is used to perform pixel fitting on the cropped pixels in the j-th image block to obtain a reconstructed block of the j-th image block. Repeating this process can achieve decoding of each of the M image blocks included in the patch image of view V1, thereby obtaining a decoded image of view V1. Next, based on the decoded images of the basic view V4 and view V1, view V3 is decoded to obtain a reconstructed image of view V3. Based on the decoded images of the basic view V4, view V1 and view V3, view V0 is decoded to obtain a reconstructed image of view V0.
对于第二组视图中的V6、V8、V7、V5,采用同样的方法,可以解码得到第二组视图中的V6、V8、V7、V5分别对应的重建图像。For V6, V8, V7, and V5 in the second group of views, the same method can be used to decode and obtain the reconstructed images corresponding to V6, V8, V7, and V5 in the second group of views.
本申请实施例提供的视频解码方法,解码端在对多视点视频进行解码时,对于N个视图中的第i个视图,解码码流,确定第i个视图的补丁信息,并基于补丁信息,生成第i个视图的补丁图像,N个视图为N个不同视点的视图,N为大于1的正整数,i为小于或等于N的正整数;按照预设的块划分方式,将补丁图像划分为M个图像块,M为正整数;对于M个图像块中的第j个图像块,解码码流,得到第j个图像块的P个色偏拟合函数;使用这P个色偏拟合函数,对第j个图像块中被裁剪的像素点进行像素拟合,得到第j个图像块的重建块。也就是说,本申请实施例基于色偏拟合函数对第i个视图中被修剪的像素点进行拟合重建,可以降低编码端需要编码的像素点的数量,进而降低编码端的编码代价,提升多视点视频的解码效率。The video decoding method provided in the embodiment of the present application is that when the decoding end decodes the multi-view video, for the i-th view among N views, the decoding end decodes the bitstream, determines the patch information of the i-th view, and generates the patch image of the i-th view based on the patch information, where the N views are views of N different viewpoints, N is a positive integer greater than 1, and i is a positive integer less than or equal to N; the patch image is divided into M image blocks according to a preset block division method, where M is a positive integer; for the j-th image block among the M image blocks, the decoding bitstream is performed to obtain P color deviation fitting functions of the j-th image block; using the P color deviation fitting functions, the cropped pixels in the j-th image block are pixel-fitted to obtain a reconstructed block of the j-th image block. That is, the embodiment of the present application performs fitting and reconstruction on the cropped pixels in the i-th view based on the color deviation fitting function, which can reduce the number of pixels that need to be encoded by the encoding end, thereby reducing the encoding cost of the encoding end and improving the decoding efficiency of the multi-view video.
上文以解码端为例对本申请的多视点视频的解码方法进行介绍,下面以编码端为例进行说明。The above describes the multi-view video decoding method of the present application by taking the decoding end as an example, and the following describes it by taking the encoding end as an example.
图13为本申请实一施例提供的视频编码方法流程示意图,本申请实施例应用于图1和图2所示视频编码器。如图13所示,本申请实施例的方法包括:FIG13 is a schematic diagram of a video encoding method flow chart provided by an embodiment of the present application, and the embodiment of the present application is applied to the video encoders shown in FIG1 and FIG2. As shown in FIG13, the method of the embodiment of the present application includes:
S201、对于N个视图中的第i个视图,确定第i个视图的第一修剪掩码图。S201 . For an i th view among N views, determine a first pruning mask map of the i th view.
目前在对多视点视频进行编码时,目前的基于像素修剪的方法确定被修剪的像素点时,对于色差过大的块不进行修剪,而是进行保留。例如图14示出了4个不同视角拍摄的视图,由于相机参数等原因,使得不同视点的相机拍摄的方框内的图像块的色差较大,目前的像素修剪方法,对该色差较大的区域进行保留,这样会增加编码端的编码负担和编码代价,进而降低编码效率。At present, when encoding multi-viewpoint videos, the current pixel pruning method determines the pruned pixels, and does not prune the blocks with too large color difference, but retains them. For example, FIG. 14 shows four views taken from different perspectives. Due to camera parameters and other reasons, the color difference of the image blocks in the box taken by cameras from different viewpoints is large. The current pixel pruning method retains the area with large color difference, which will increase the encoding burden and encoding cost of the encoding end, thereby reducing the encoding efficiency.
为了解决该技术问题,本申请实施例对色差较大的区域进行色度拟合函数拟合,这样编码端无需保留该色差较大的区域,而是将该区域的色偏拟合函数编入码流。这样解码端可以通过解码码流,得到该区域的色偏拟合函数,进而基于色偏拟合函数拟合得到该区域。这样可以在保证解码准确性的提前下,降低编码端的编码代价,提升多视点视频的编码效率。 In order to solve this technical problem, the embodiment of the present application performs chromaticity fitting function fitting on the area with large color difference, so that the encoding end does not need to retain the area with large color difference, but encodes the color deviation fitting function of the area into the bitstream. In this way, the decoding end can obtain the color deviation fitting function of the area by decoding the bitstream, and then obtain the area based on the color deviation fitting function. In this way, the encoding cost of the encoding end can be reduced while ensuring the decoding accuracy in advance, and the encoding efficiency of multi-viewpoint video can be improved.
本申请实施例的,N个视图为N个不同视点的视图,也就是说,这N个视图对应的视点均不相同。In the embodiment of the present application, the N views are views from N different viewpoints, that is, the viewpoints corresponding to the N views are all different.
其中N为大于1的正整数。也就是说,本申请实施例的N个视图包括2个视图、3个视图等任意大于2的视图。Wherein N is a positive integer greater than 1. That is, the N views in the embodiment of the present application include 2 views, 3 views, and any number of views greater than 2.
在一些实施例中,上述N个视图为N个不同视点在同一时刻对应的视图。例如,该N个视图均为N个视点在时刻t得到的视图,例如为N个视点上的相机在时刻t对同一环境的不同角度进行采集得到的图像。In some embodiments, the N views are views corresponding to N different viewpoints at the same time. For example, the N views are views obtained by N viewpoints at time t, for example, images obtained by cameras at N viewpoints capturing different angles of the same environment at time t.
在一些实施例中,上述N个视图为N个视点中各视点的第k个GOP的第一张图像,k为正整数。In some embodiments, the N views are the first image of the k-th GOP of each viewpoint among the N viewpoints, where k is a positive integer.
举例说明,假设N为3,3个视点,分别记为第一视点、第二视点和第三视点,3个视图分别记为第一视图、第二视图和第三视图。假设第一视点的视频数据、第二视点的视频数据和第三视点的视频数据均包括100张图像。在对每一个视点上的视频数据进行编码时,首先将该视点上的视频数据划分为多个GOP组。假设,将第一视点的视频数据所包括的100张图像划分为5组,记为GOP11、GOP12、GOP13、GOP14、GOP15,每一组包括20张图像。同理,将第二视点的视频数据所包括的100张图像划分为5组,记为GOP21、GOP22、GOP23、GOP24、GOP25,每一组包括20张图像。将第三视点的视频数据所包括的100张图像划分为5组,记为GOP31、GOP32、GOP33、GOP34、GOP35,每一组包括20张图像。此时,上述3个视图可以包括GOP11中的第一张视图、GOP21中的第一张视图和GOP31中的第一张视图。或者,上述3个视图可以包括GOP12中的第一张视图、GOP22中的第一张视图和GOP32中的第一张视图。或者,上述3个视图可以包括GOP13中的第一张视图、GOP23中的第一张视图和GOP33中的第一张视图。或者,上述3个视图可以包括GOP14中的第一张视图、GOP24中的第一张视图和GOP34中的第一张视图。或者,上述3个视图可以包括GOP15中的第一张视图、GOP25中的第一张视图和GOP35中的第一张视图。For example, assume that N is 3, 3 viewpoints, respectively recorded as the first viewpoint, the second viewpoint and the third viewpoint, and 3 views, respectively recorded as the first viewpoint, the second viewpoint and the third viewpoint. Assume that the video data of the first viewpoint, the video data of the second viewpoint and the video data of the third viewpoint each include 100 images. When encoding the video data at each viewpoint, first divide the video data at the viewpoint into multiple GOP groups. Assume that the 100 images included in the video data of the first viewpoint are divided into 5 groups, recorded as GOP11, GOP12, GOP13, GOP14, GOP15, and each group includes 20 images. Similarly, the 100 images included in the video data of the second viewpoint are divided into 5 groups, recorded as GOP21, GOP22, GOP23, GOP24, GOP25, and each group includes 20 images. The 100 images included in the video data of the third viewpoint are divided into 5 groups, which are recorded as GOP31, GOP32, GOP33, GOP34, and GOP35, and each group includes 20 images. At this time, the above three views may include the first view in GOP11, the first view in GOP21, and the first view in GOP31. Alternatively, the above three views may include the first view in GOP12, the first view in GOP22, and the first view in GOP32. Alternatively, the above three views may include the first view in GOP13, the first view in GOP23, and the first view in GOP33. Alternatively, the above three views may include the first view in GOP14, the first view in GOP24, and the first view in GOP34. Alternatively, the above three views may include the first view in GOP15, the first view in GOP25, and the first view in GOP35.
也就是说,本申请实施例中,编码端只确定N个不同视点视频的每一个GOP中的第一张图像的色偏拟合函数,而该GOP中的其他视图复用该GOP中的第一张图像的色偏拟合函数,这样可以减少码流要携带的色偏拟合函数的个数,进而节约码字。That is to say, in an embodiment of the present application, the encoding end only determines the color deviation fitting function of the first image in each GOP of N different viewpoint videos, and other views in the GOP reuse the color deviation fitting function of the first image in the GOP. This can reduce the number of color deviation fitting functions to be carried by the code stream, thereby saving codewords.
由上述TIMV14的编码过程可知,将多视点视频进行分组,对每一组进行单独编码。上述N个视图可以理解为一组待编码的视图。由上述可知,在N个视图进行编码时,从这N个视图中选出基本视图,将N个视图中的其他视图与基本视图进行比较,以消除冗余信息,并对消除冗余信息后的补丁信息进行编码。而对于基本视图则进行全部编码。这样解码端可以直接通过解码码流,得到基本视图,进而基于视图和其他视图的补丁信息,恢复出其他视图。由此可知,在本申请实施例中,主要是对N个视图中除基本视图外的其他视图的编码过程进行介绍。也就是说,上述第i个视图为N个视图中的非基本视图。It can be seen from the encoding process of TIMV14 mentioned above that the multi-viewpoint video is grouped and each group is encoded separately. The above N views can be understood as a group of views to be encoded. It can be seen from the above that when N views are encoded, a basic view is selected from the N views, and other views in the N views are compared with the basic view to eliminate redundant information, and the patch information after eliminating the redundant information is encoded. The basic view is fully encoded. In this way, the decoding end can directly obtain the basic view by decoding the code stream, and then restore other views based on the view and the patch information of other views. It can be seen that in the embodiment of the present application, the encoding process of other views except the basic view in the N views is mainly introduced. That is to say, the above i-th view is a non-basic view in the N views.
编码端对这N个视图中任意一个非基本视图的编码过程基本一致,为了便于描述,以第i个视图的编码过程为例进行说明,i为小于或等于N的正整数。The encoding process of any non-basic view among the N views at the encoding end is basically the same. For ease of description, the encoding process of the i-th view is taken as an example for explanation, where i is a positive integer less than or equal to N.
对于第i个视图,编码端首先确定第i个视图的第一修剪掩码图。其中,第一修剪掩码图是对第i个视图进行像素修剪得到的修剪掩码图。也就是说,编码端采用像素修剪方式,确定第i个视图中可以被修剪掉的像素点,若确定该像素点可以被裁剪时,则将该可以被修剪的像素点的掩码值(即mask值)置为数值1(例如0)。若确定该像素点不可以被裁剪时,则将该不可以被修剪掉的像素点的掩码值(即mask值)置为数值2(例如1)。For the i-th view, the encoding end first determines the first pruning mask map of the i-th view. The first pruning mask map is a pruning mask map obtained by performing pixel pruning on the i-th view. That is to say, the encoding end adopts a pixel pruning method to determine the pixels that can be pruned in the i-th view. If it is determined that the pixel can be pruned, the mask value (i.e., mask value) of the pixel that can be pruned is set to a value of 1 (e.g., 0). If it is determined that the pixel cannot be pruned, the mask value (i.e., mask value) of the pixel that cannot be pruned is set to a value of 2 (e.g., 1).
下面对编码端确定第i个视图的第一修剪掩码图的具体过程进行介绍。The specific process of determining the first pruning mask map of the i-th view at the encoder is introduced below.
本申请实施例对编码端确定第i个视图的第一修剪掩码图的具体方法不做限制。The embodiment of the present application does not limit the specific method for the encoder to determine the first pruning mask map of the i-th view.
在一些实施例中,编码端可以将第i个视图投影到基本视图中,基于第i个视图与基本视图之间的差异(或相似度),确定第i个视图中可以被隐藏的像素点,进而得到该第i个视图的第一修剪掩码图。例如,对于第i个视图中的某一个像素点1,编码端通过相机内外参数和深度信息将该像素点1投影到基本视图中,假设像素点1在基本视图中的投影点为像素点2。通过确定像素点1与像素点2之间的相似度,来确定是否将该像素点1裁剪掉。例如,若像素点1与像素点2之间的相似度大于或等于某一预设值时,则确定像素点1可以被裁剪掉,进而将该像素点1对应的mask值置为0。若像素点1与像素点2之间的相似度小于某一预设值时,则确定像素点1不可以被裁剪掉,进而将该像素点1对应的mask值置为1。基于该步骤,可以确定出第i个视图的第一修剪掩码图。In some embodiments, the encoding end may project the i-th view into the base view, determine the pixels that can be hidden in the i-th view based on the difference (or similarity) between the i-th view and the base view, and then obtain the first pruning mask map of the i-th view. For example, for a certain pixel 1 in the i-th view, the encoding end projects the pixel 1 into the base view through the internal and external parameters of the camera and the depth information, assuming that the projection point of the pixel 1 in the base view is the pixel 2. By determining the similarity between the pixel 1 and the pixel 2, it is determined whether to crop the pixel 1. For example, if the similarity between the pixel 1 and the pixel 2 is greater than or equal to a preset value, it is determined that the pixel 1 can be cropped, and then the mask value corresponding to the pixel 1 is set to 0. If the similarity between the pixel 1 and the pixel 2 is less than a preset value, it is determined that the pixel 1 cannot be cropped, and then the mask value corresponding to the pixel 1 is set to 1. Based on this step, the first pruning mask map of the i-th view can be determined.
在一些实施例中,编码端可以通过如下S201-A至S201-C的步骤,确定出第i个视图的第一修剪掩码图:In some embodiments, the encoder may determine the first pruning mask image of the i-th view through the following steps S201-A to S201-C:
S201-A、生成N个视图的修剪层次的有向无环图;S201-A, generating a directed acyclic graph of pruning levels of N views;
S201-B、基于有向无环图,确定第i个视图的P个父节点的P个合成视图;S201-B, determining P synthetic views of P parent nodes of the i-th view based on the directed acyclic graph;
S201-C、基于P个合成视图,对第i个视图进行像素修剪,得到第一修剪掩码图。S201-C: Based on P synthetic views, perform pixel pruning on the i-th view to obtain a first pruning mask image.
在该实施例中,编码端首先生成N个视图的修剪层次的有向无环图,该有向无环图中的每一个节点对应N个视图中的一个视图,该有向无环图中子节点对应的视图在修剪时,参照该子节点的各父节点的视图。这样,编码端在对N个视图中的第i个视图进行编码时,可以从上述确定的有向图无环图中,确定该第i个视图的父节点,在本申请实施例的有向图无环图中,第i个视图的父节点可以包括1个或多个,为了便于描述,记为P个父节点,其中P为正整数。需要说明的是,由于编码端在对N个视图进行编码时,基于该有向无环图从根节点向子节点进行编码,因此,在对第i个视图进行编码时,将第i个视图的P个父节点上的视图分别投影到第i个视图视口中,得到P个合成视图。这样,编码端可以基于这P个合成视图,对第i个视图进行像素修剪,得到第一修剪掩码图。In this embodiment, the encoding end first generates a directed acyclic graph of the pruning hierarchy of N views, each node in the directed acyclic graph corresponds to one of the N views, and the view corresponding to the child node in the directed acyclic graph refers to the view of each parent node of the child node when pruning. In this way, when the encoding end encodes the i-th view among the N views, it can determine the parent node of the i-th view from the directed acyclic graph determined above. In the directed acyclic graph of the embodiment of the present application, the parent node of the i-th view may include one or more, and for the convenience of description, it is recorded as P parent nodes, where P is a positive integer. It should be noted that since the encoding end encodes from the root node to the child node based on the directed acyclic graph when encoding the N views, therefore, when encoding the i-th view, the views on the P parent nodes of the i-th view are projected into the i-th view viewport respectively to obtain P synthetic views. In this way, the encoding end can perform pixel pruning on the i-th view based on the P synthetic views to obtain a first pruning mask map.
下面对解码端确定N个视图的修剪层次的有向无环图的具体过程进行介绍:The specific process of determining the directed acyclic graph of the pruning levels of N views at the decoding end is described below:
在本申请实施例中,从N个视图中确定出基本视图,例如N0,将N个视图中除基本视图的视图记为附加视图,例如N1、N2和N3等。在创建有向无环图时包括如下步骤:In the embodiment of the present application, a basic view, such as N0, is determined from N views, and views other than the basic view in the N views are recorded as additional views, such as N1, N2, and N3. The following steps are included when creating a directed acyclic graph:
步骤1,将基本视图N0作为有向无环图的根节点。Step 1: Take the basic view N0 as the root node of the directed acyclic graph.
步骤2,将基本视图N0的所有像素点投影到每一个附加视图中,确定每一个附加视图的修剪掩码图,即将附加视图中的像素点与投影像素点之间的相似度,若相似度较大时,对该像素点掩码值置为0,若相似度较小时,对该像素点掩码值置为1,重复该步骤,可以得到附加视图的修剪掩码图。Step 2, project all the pixels of the basic view N0 into each additional view, determine the pruning mask map of each additional view, that is, the similarity between the pixel points in the additional view and the projected pixel points. If the similarity is large, the mask value of the pixel point is set to 0; if the similarity is small, the mask value of the pixel point is set to 1. Repeat this step to obtain the pruning mask map of the additional view.
步骤3,基于每一个附加视图的修剪掩码图中,从这些附加视图中选出掩码面积最小,即具有最大保留像素数的附加视图。 Step 3: Based on the pruning mask images of each additional view, select the additional view with the smallest mask area, that is, the additional view with the largest number of retained pixels, from these additional views.
步骤4,将选定的附件视图作为有向无环图中所有节点的子节点,如果所有视图都分配给有向无环图中的节点,则停止,否则执行步骤5。Step 4, make the selected accessory view as the child node of all nodes in the directed acyclic graph. If all views are assigned to nodes in the directed acyclic graph, stop, otherwise execute step 5.
步骤5,将所选视图的所有保留像素点(即未被裁剪的像素点)投影到剩余的附加视图。Step 5: Project all the retained pixels (ie, the pixels that are not cropped) of the selected view to the remaining additional views.
步骤7,为每个剩余的附加视图更新修剪遮罩。Step 7, update the clipping mask for each remaining additional view.
步骤8,跳转至步骤4。Step 8, skip to step 4.
经过上述步骤,可以得到如图11所示的有向无环图。After the above steps, a directed acyclic graph as shown in FIG11 can be obtained.
在一些实施例中,编码端将N个视图的修剪层次的有向无环图指示给解码端。例如,编码端确定的N个视图的修剪层次的有向无环图如图11所示,编码端可以将视图V4为基本视图的信息写入码流,以及将视图V1基于视图V4编码的信息,将视图V3基于视图V4和视图V1编码的信息,以及将视图V0基于视图V4、视图V1和视图V3编码的信息写入码流。这样解码端通过解码码流,可以确定出视图V0、V1、V3和V4之间的参照关系,进而基于该参照关系,恢复出图11所示的有向无环图。In some embodiments, the encoder indicates the directed acyclic graph of the pruning hierarchy of N views to the decoder. For example, the directed acyclic graph of the pruning hierarchy of N views determined by the encoder is shown in FIG11. The encoder can write information that view V4 is a basic view into the bitstream, and information that view V1 is encoded based on view V4, information that view V3 is encoded based on view V4 and view V1, and information that view V0 is encoded based on view V4, view V1, and view V3 into the bitstream. In this way, the decoder can determine the reference relationship between views V0, V1, V3, and V4 by decoding the bitstream, and then restore the directed acyclic graph shown in FIG11 based on the reference relationship.
编码端基于上述步骤,生成N个视图的修剪层次的有向无环后,执行上述S201-B的步骤,基于有向无环图,确定第i个视图的P个父节点的P个合成视图。After the encoder generates a directed acyclic graph of the pruning hierarchy of N views based on the above steps, it executes the above step S201-B to determine P synthetic views of the P parent nodes of the i-th view based on the directed acyclic graph.
继续以上述图11所示的有向无环图为例,例如第i个视图为视图V1时,则该视图V1的父节点为视图V4,即基本视图,即视图V1在编码时参照基本视图V4。再例如,若第i个视图为视图V3时,则该视图V3的父节点包括2个,分别为视图V4和视图V1,即视图V3在编码时参照基本视图V4和视图V1。再例如,若第i个视图为视图V0时,则该视图V0的父节点包括3个,分别为视图V4、视图V1和视图V3,即视图V0在编码时参照视图V4、视图V1和视图V3。Continuing with the directed acyclic graph shown in FIG. 11 above, for example, when the i-th view is view V1, the parent node of view V1 is view V4, that is, the basic view, that is, view V1 refers to basic view V4 during encoding. For another example, if the i-th view is view V3, the parent nodes of view V3 include two, view V4 and view V1, that is, view V3 refers to basic view V4 and view V1 during encoding. For another example, if the i-th view is view V0, the parent nodes of view V0 include three, view V4, view V1, and view V3, that is, view V0 refers to view V4, view V1, and view V3 during encoding.
编码端基于上述步骤,确定出第i个视图的P个父节点后,可以确定出这P个父节点中每一个父节点的视图投影到第i个视图视口中,得到该父节点的合成视图。由于一个父节点对应一张合成视图,因此,对于P个父节点可以得到P个合成视图。Based on the above steps, after the encoder determines the P parent nodes of the i-th view, it can determine the view of each parent node of the P parent nodes and project it into the i-th viewport to obtain the synthetic view of the parent node. Since one parent node corresponds to one synthetic view, P synthetic views can be obtained for P parent nodes.
编码端确定出P个合成视图后,执行上述S201-C的步骤,基于P个合成视图,对第i个视图进行像素修剪,得到第一修剪掩码图。After the encoder determines P synthetic views, it executes the above step S201-C to perform pixel pruning on the i-th view based on the P synthetic views to obtain a first pruning mask image.
本申请实施例对编码端基于P个合成视图,对第i个视图进行像素修剪,得到第一修剪掩码图的具体方式不做限制。The embodiment of the present application does not limit the specific manner in which the encoding end performs pixel pruning on the i-th view based on P synthetic views to obtain the first pruning mask map.
在一些实施例中,编码端确定第i个视图中的每一个像素点与这P个合成视图中的对应像素点之间的相似度,进而得到第一修剪掩码图。In some embodiments, the encoder determines the similarity between each pixel in the i-th view and the corresponding pixel in the P synthetic views, thereby obtaining a first pruning mask image.
在一些实施例中,上述S201-C包括如下S201-C1至S201-C2的步骤:In some embodiments, the above S201-C includes the following steps S201-C1 to S201-C2:
S201-C1、基于第i个视图的亮度分量与P个合成视图的亮度分量的差异,对第i个视图进行修剪,得到第i个视图的第二修剪掩码图;S201-C1, pruning the i-th view based on the difference between the brightness component of the i-th view and the brightness components of the P synthetic views, to obtain a second pruning mask map of the i-th view;
S201-C2、基于第i个视图的色度分量与P个合成视图的色度分量的差异,在第二修剪掩码图所包括被修剪的像素点中查询不被修剪的像素点,得到第一修剪掩码图。S201-C2: Based on the difference between the chroma components of the i-th view and the chroma components of the P synthetic views, query the non-pruned pixels in the pruned pixels included in the second pruning mask map to obtain a first pruning mask map.
在该实施例中,像素裁剪(Prune Pixel)模块检测视图间重复的部分,以便在后续的处理中将重复的像素点裁剪(prune)掉。如图14B所示,通过相机内外参数和深度信息将一个第i个视图的任意像素1映射到P个合成视图中某一个合成视图的像素2及其3x3邻域上。通过检测像素1和像素2之间的相似程度,决定是否将像素1裁剪(prune)掉。具体的,如图14C所示,像素裁剪主要包括两个阶段。In this embodiment, the pixel pruning module detects repeated parts between views so that the repeated pixels can be pruned in subsequent processing. As shown in FIG14B , an arbitrary pixel 1 of the i-th view is mapped to a pixel 2 of one of the P synthetic views and its 3x3 neighborhood through the internal and external parameters of the camera and the depth information. By detecting the similarity between pixel 1 and pixel 2, it is decided whether to prune pixel 1. Specifically, as shown in FIG14C , pixel pruning mainly includes two stages.
在prune pixels的第一个阶段,用以下两个标准确定像素对是否相似:In the first stage of pruning pixels, two criteria are used to determine whether pairs of pixels are similar:
首先,像素1和像素2对应深度值差异应小于第一阈值t1。深度值比较以像素对像素(pixel-to-pixel)的方式进行。First, the depth value difference between pixel 1 and pixel 2 should be less than a first threshold t1. The depth value comparison is performed in a pixel-to-pixel manner.
接着,像素1和3×3像素块内所有像素的亮度值差异的最小值应小于第二阈值t2,亮度值比较以像素对像素块(pixel-to-block)的方式进行。Next, the minimum value of the brightness value difference between pixel 1 and all pixels in the 3×3 pixel block should be smaller than the second threshold t2, and the brightness value comparison is performed in a pixel-to-block manner.
第一阶段的检测结束后,疑似相似的像素对进入第二阶段的second-pass流程:After the first phase of detection, the suspected similar pixel pairs enter the second-pass process:
第二阶段主要更新在初始修剪阶段期间创建的修剪掩码,并在最初确定要修剪的像素中重新识别不要修剪的象素。该过程的主要目的是考虑不同视图之间可能存在的全局颜色分量差异。该过程如下,适用于每个修剪对,具体的,对于确定要修剪的像素,计算P个合成视图和第i个视图之间的逐像素色差。通过使用最小二乘法,计算出可以对这些色差进行最佳建模的拟合函数。在阈值定义的特定范围内符合该拟合函数的像素被判断为内像素,这些像素被保留为待修剪像素。同时,异常值被更新为不在修剪掩码内被修剪。second-pass流程结束之后,就可以得到最终要裁剪的相似像素对,进而得到第i个视图的第一修剪掩码图。The second stage mainly updates the pruning mask created during the initial pruning stage and re-identifies the pixels not to be pruned among the pixels initially determined to be pruned. The main purpose of this process is to take into account the global color component differences that may exist between different views. The process is as follows and applies to each pruning pair. Specifically, for the pixels determined to be pruned, the pixel-by-pixel color difference between the P synthetic views and the i-th view is calculated. By using the least squares method, a fitting function that can best model these color differences is calculated. Pixels that meet the fitting function within a specific range defined by the threshold are judged as inner pixels, and these pixels are retained as pixels to be pruned. At the same time, outliers are updated to not be pruned within the pruning mask. After the second-pass process is completed, the similar pixel pairs to be finally cropped can be obtained, and then the first pruning mask map of the i-th view is obtained.
编码端基于上述步骤,确定出第i个视图的第一修剪掩码图后,执行如下步骤S202的步骤。After the encoder determines the first pruning mask image of the i-th view based on the above steps, it performs the following step S202.
S202、基于预设的块划分方式,将第i个视图划分为M个图像块。S202: Divide the i-th view into M image blocks based on a preset block division method.
其中,M为正整数。Wherein, M is a positive integer.
本申请实施例对预设的块划分的具体方式不做限制。The embodiment of the present application does not limit the specific method of pre-setting block division.
在一些实施例中,预设的块划分方式可以是将第i个视图平均划分为M个图像块。In some embodiments, the preset block division method may be to divide the i-th view into M image blocks on average.
在一些实施例中,按照预设的块大小,将第i个视图划分为M个图像块。In some embodiments, the i-th view is divided into M image blocks according to a preset block size.
在一些实施例中,编码端在执行上述S202之前,首先确定第一标志,该第一标志用于指示编码端是否开启色偏拟合工具。若第一标志指示编码端开启色偏拟合工具时,则编码端执行上述S202,基于预设的块划分方式,,第i个视图划分为M个图像块。In some embodiments, before executing the above S202, the encoder first determines a first flag, which is used to indicate whether the encoder turns on the color deviation fitting tool. If the first flag indicates that the encoder turns on the color deviation fitting tool, the encoder executes the above S202, and based on a preset block division method, the i-th view is divided into M image blocks.
在一些实施例中,编码端将第一标志写入码流。In some embodiments, the encoding end writes the first flag into the bitstream.
例如,若第一标志指示开启色偏拟合工具时,则将第一标志置为第一数值(例如1)。For example, if the first flag indicates that the color cast fitting tool is turned on, the first flag is set to a first value (eg, 1).
例如,若第一标志指示不开启色偏拟合工具时,则将第一标志置为第一数值(例如1)。 For example, if the first flag indicates that the color cast fitting tool is not enabled, the first flag is set to a first value (eg, 1).
本申请实施例对第一标志的具体表现形式不做限制,只有是可以指示是否开启色偏拟合工具的任意信息即可。The embodiment of the present application does not limit the specific form of the first mark, as long as it is any information that can indicate whether to enable the color deviation fitting tool.
在一种示例中,使用字段asme_cdpu_enabled_flag表示第一标志。例如,若asme_cdpu_enabled_flag为第一数值(例如0)时,表示开启色偏拟合工具。再例如,若asme_cdpu_enabled_flag为第二数值(例如1)时,表示不开启色偏拟合工具。In one example, the field asme_cdpu_enabled_flag is used to represent the first flag. For example, if asme_cdpu_enabled_flag is a first value (such as 0), it indicates that the color cast fitting tool is enabled. For another example, if asme_cdpu_enabled_flag is a second value (such as 1), it indicates that the color cast fitting tool is not enabled.
本申请实施例对第一标志在码流中的具体携带位置不做限制。The embodiment of the present application does not limit the specific carrying position of the first flag in the code stream.
在一种可能的实现方式中,第一标志asme_cdpu_enabled_flag携带在图集序列参数集MIV扩展语法中。In a possible implementation manner, the first flag asme_cdpu_enabled_flag is carried in the atlas sequence parameter set MIV extended syntax.
示例性的,图集序列参数集MIV扩展语法如表2所示。Exemplarily, the atlas sequence parameter set MIV extended syntax is shown in Table 2.
S203、对于M个图像块中的第j个图像块,基于第一修剪掩码图确定第j个图像块中未被修剪的像素点,并对第j个图像块中未被修剪的像素点进行色度拟合,得到第j个图像块的色偏拟合函数。S203. For the j-th image block among the M image blocks, determine the unpruned pixels in the j-th image block based on the first pruning mask image, and perform chromaticity fitting on the unpruned pixels in the j-th image block to obtain a color cast fitting function of the j-th image block.
其中,j为小于或等于M的正整数。Wherein, j is a positive integer less than or equal to M.
本申请实施例中,编码端将基于像素修剪方式得到的第一修剪掩码图,这样基于第一修剪掩膜图可以确定出第i个视图的M个图像块中每一个图像块中未被修剪的像素点,进而对每一个图像块中未被修剪的像素点进行再次修剪,以降低编码数据量,降低编码代价。具体的,编码端确定M个图像块中每一个图像块的色偏拟合函数,基于该色偏拟合函数,对该图像块中未被修剪的像素点进行再次修剪,以进一步降低编码代价。In the embodiment of the present application, the encoding end obtains the first pruning mask image based on the pixel pruning method, so that the unpruned pixels in each of the M image blocks of the i-th view can be determined based on the first pruning mask image, and then the unpruned pixels in each image block are pruned again to reduce the amount of coded data and reduce the coding cost. Specifically, the encoding end determines the color deviation fitting function of each of the M image blocks, and based on the color deviation fitting function, the unpruned pixels in the image block are pruned again to further reduce the coding cost.
在本申请实施例中,编码端确定M个图像块中每一个图像块的色度拟合函数的方法基本一致,为了便于描述,再次以第j个图像块为例进行说明。In the embodiment of the present application, the method for the encoding end to determine the chromaticity fitting function of each of the M image blocks is basically the same. For the sake of ease of description, the j-th image block is again taken as an example for explanation.
本申请实施例对编码端确定第j个图像块的色偏拟合函数的具体方式不做限制。The embodiment of the present application does not limit the specific manner in which the encoding end determines the color cast fitting function of the j-th image block.
在一些实施例中,编码端在基本视图中确定第j个图像块中未被修剪的像素点的对应像素点,基于这些对应像素点和第j个图像块中未被修剪的像素点,确定P个色偏拟合函数,将该P个色偏拟合函数,确定为第j个图像块的色偏拟合函数。In some embodiments, the encoding end determines the corresponding pixel points of the unpruned pixel points in the j-th image block in the basic view, determines P color deviation fitting functions based on these corresponding pixel points and the unpruned pixel points in the j-th image block, and determines the P color deviation fitting functions as the color deviation fitting functions of the j-th image block.
在一些实施例中,上述S203包括如下S203-A至S203-B的步骤:In some embodiments, the above S203 includes the following steps S203-A to S203-B:
S203-A、在P个合成视图中,确定第j个图像块的P个合成视图块;S203-A, determining P synthetic view blocks of the j-th image block in P synthetic views;
S203-B、基于P个合成视图块所包括的像素点的色度值和第j个图像块所包括的像素点的色度值,对第j个图像块中未被修剪的像素点进行色度拟合,得到P个色偏拟合函数。S203-B: Based on the chromaticity values of the pixels included in the P synthetic view blocks and the chromaticity values of the pixels included in the j-th image block, perform chromaticity fitting on the unpruned pixels in the j-th image block to obtain P color deviation fitting functions.
在该实施例中,编码端基于上述步骤,确定出第i个视图的P个合成视图,这样可以基于这P个合成视图来确定第j个图像块的P个色偏拟合函数。In this embodiment, the encoding end determines P synthetic views of the i-th view based on the above steps, so that P color deviation fitting functions of the j-th image block can be determined based on the P synthetic views.
具体的,编码端首先在上述确定的P个合成视图中的每一个合成视图,确定该第j个图像块对应的图像块,记为合成视图块,进而可以得到P个合成视图块。例如,在P个合成视图中的第一个合成视图中,确定第j个图像块的一个合成视图块,在P个合成视图中的第二个合成视图中,确定第j个图像块的一个合成视图块,依次类推,可以得到P个合成视图块。Specifically, the encoder first determines an image block corresponding to the j-th image block in each of the P synthetic views determined above, and records it as a synthetic view block, and then obtains P synthetic view blocks. For example, in the first synthetic view of the P synthetic views, a synthetic view block of the j-th image block is determined, and in the second synthetic view of the P synthetic views, a synthetic view block of the j-th image block is determined, and so on, and P synthetic view blocks can be obtained.
编码端基于上述步骤,在P个合成视图中,确定第j个图像块的P个合成视图块后,执行上述S203-B的步骤,基于P个合成视图块所包括的像素点的色度值和第j个图像块所包括的像素点的色度值,对第j个图像块中未被修剪的像素点进行色度拟合,得到P个色偏拟合函数。Based on the above steps, the encoding end determines the P synthetic view blocks of the j-th image block in the P synthetic views, and then executes the above step S203-B to perform chromaticity fitting on the unpruned pixels in the j-th image block based on the chromaticity values of the pixels included in the P synthetic view blocks and the chromaticity values of the pixels included in the j-th image block, so as to obtain P color deviation fitting functions.
本申请实施例对编码端基于P个合成视图块所包括的像素点的色度值和第j个图像块所包括的像素点的色度值,对第j个图像块中未被修剪的像素点进行色度拟合,得到P个色偏拟合函数的具体方式不做限制。The embodiment of the present application does not limit the specific method of obtaining P color deviation fitting functions by performing chromaticity fitting on the unpruned pixels in the j-th image block based on the chromaticity values of the pixels included in the P synthetic view blocks and the chromaticity values of the pixels included in the j-th image block at the encoding end.
在一些实施例中,对于P个合成视图块中每一个合成视图块,编码端基于该合成视图块的色度值和第j个图像块的色度值,进行最小二乘法拟合(例如加权最小乘法拟合),得到一个色偏拟合函数。In some embodiments, for each of the P synthetic view blocks, the encoder performs least squares fitting (eg, weighted minimum multiplication fitting) based on the chrominance value of the synthetic view block and the chrominance value of the j-th image block to obtain a color deviation fitting function.
在一些实施例中,编码端对于P个合成视图块中的任一合成视图块,在合成视图块中,确定第j个图像块中未被裁剪的像素点对应的合成像素点;基于合成像素点的色度值和第j个图像块中未被裁剪的像素点的色度值,确定色偏拟合函数。例如,对于第j个图像块中的每一个未被裁剪的像素点,例如像素点1,在该合成视图块中确定该像素点1对应的合成像素点2,基于合成像素点2、合成像素点2周围的像素点(例如3X3区域内的像素点)的色度值,对像素点1的色度值进行色度拟合,基于该方法,对第j个图像块中未被裁剪的像素点进行拟合,可以得到一个色偏拟合函数。In some embodiments, for any one of the P synthetic view blocks, the encoder determines, in the synthetic view block, a synthetic pixel corresponding to an uncropped pixel in the j-th image block; and determines a color deviation fitting function based on the chromaticity value of the synthetic pixel and the chromaticity value of the uncropped pixel in the j-th image block. For example, for each uncropped pixel in the j-th image block, such as pixel 1, a synthetic pixel 2 corresponding to the pixel 1 is determined in the synthetic view block, and chromaticity fitting is performed on the chromaticity value of the pixel 1 based on the chromaticity values of the synthetic pixel 2 and the pixels around the synthetic pixel 2 (such as the pixels in a 3X3 area). Based on this method, a color deviation fitting function can be obtained by fitting the uncropped pixels in the j-th image block.
本申请实施例对色偏拟合过程中所使用的具体拟合方法不做限制。The embodiment of the present application does not limit the specific fitting method used in the color shift fitting process.
在一种示例中,编码端基于合成像素点的色度值和第j个图像块中未被裁剪的像素点的色度值,进行加权最小二乘法拟合,得到色偏拟合函数。In one example, the encoding end performs weighted least squares fitting based on the chromaticity values of the synthesized pixel points and the chromaticity values of the uncropped pixel points in the j-th image block to obtain a color deviation fitting function.
在一些实施例中,编码端对第j个图像块中的未被修剪的像素点进行色度拟合,得到第j个图像块的色偏拟合函数之前,本申请实施例还包括:确定第j个图像块是否包括未被修剪的像素点;若第j个图像块中包括未被修剪的像素点时,则对第j个图像块中未被修剪的像素点进行色度拟合,得到色偏拟合函数。In some embodiments, before the encoding end performs chromaticity fitting on the unpruned pixels in the jth image block to obtain the color deviation fitting function of the jth image block, the embodiment of the present application also includes: determining whether the jth image block includes unpruned pixels; if the jth image block includes unpruned pixels, performing chromaticity fitting on the unpruned pixels in the jth image block to obtain the color deviation fitting function.
在一些实施例中,若第j个图像块中包括未被修剪的像素点时,则对第j个图像块中未被修剪的像素点进行色度拟合,得到色偏拟合函数,包括:若第j个图像块所包括的未被修剪的像素点的个数大于或等于第一预设值时,则对第j个图像块中未被修剪的像素点进行色度拟合,得到色偏拟合函数。In some embodiments, if the j-th image block includes untrimmed pixels, chromaticity fitting is performed on the untrimmed pixels in the j-th image block to obtain a color deviation fitting function, including: if the number of untrimmed pixels included in the j-th image block is greater than or equal to a first preset value, chromaticity fitting is performed on the untrimmed pixels in the j-th image block to obtain a color deviation fitting function.
编码端基于上述步骤,确定出第j个图像块的P个色偏拟合函数后,执行如下S204的步骤。After the encoder determines P color deviation fitting functions of the j-th image block based on the above steps, it executes the following step S204.
S204、基于P个色偏拟合函数对第j个图像块中未被修剪的像素点进行修剪,得到第i个视图的补丁信息。S204 , prune the unpruned pixels in the j th image block based on P color cast fitting functions to obtain patch information of the ith view.
在本申请实施例中,编码端基于上述步骤,确定出第j个图像块的P个色偏拟合函数,使用该P个色偏拟合函数对第j个图像块中未被修剪的像素点进行修剪,以进一步降低编码端的编码代价。In an embodiment of the present application, the encoding end determines P color deviation fitting functions of the j-th image block based on the above steps, and uses the P color deviation fitting functions to prune the unpruned pixels in the j-th image block to further reduce the encoding cost of the encoding end.
本申请实施例对基于P个色偏拟合函数对第j个图像块中未被修剪的像素点进行修剪,得到第i个视图的补丁信息的具体方式不做限制。The embodiment of the present application does not limit the specific method of trimming the untrimmed pixels in the j-th image block based on P color cast fitting functions to obtain the patch information of the i-th view.
在一些实施例中,对于第j个图像块中的每一个未被修剪的像素点,使用P个色偏拟合函数确定该未被修剪的像 素点的拟合像素值,进而基于拟合像素值和该像素点的原始像素值进行比较,确定拟合误差,若拟合误差小于某一个预设值时,则将该未被修剪的像素点确定为可以修剪的像素点进行修剪点。In some embodiments, for each un-pruned pixel in the j-th image block, P color deviation fitting functions are used to determine the un-pruned pixel. The pixel value of the pixel point is fitted, and then the fitting error is determined based on the comparison between the fitted pixel value and the original pixel value of the pixel point. If the fitting error is less than a preset value, the unpruned pixel point is determined as a pixel point that can be pruned.
在一些实施例中,编码端在使用基于P个色偏拟合函数对第j个图像块中未被修剪的像素点进行修剪之前,首先需要判断该色偏拟合函数是否有效。即判断该色偏拟合函数的拟合误差是否小于或等于第二预设值,若该P个色偏拟合函数的拟合误差是否小于或等于第二预设值,则说明该P个色偏拟合函数有效,则使用该P个色偏拟合函数对第j个图像块中未被修剪的像素点进行修剪。若该P个色偏拟合函数无效时,则编码端基于该第一修剪掩码图,第i个视图的补丁信息。In some embodiments, before the encoder uses P color deviation fitting functions to prune the unpruned pixels in the j-th image block, it is necessary to first determine whether the color deviation fitting function is valid. That is, it is determined whether the fitting error of the color deviation fitting function is less than or equal to the second preset value. If the fitting errors of the P color deviation fitting functions are less than or equal to the second preset value, it means that the P color deviation fitting functions are valid, and the P color deviation fitting functions are used to prune the unpruned pixels in the j-th image block. If the P color deviation fitting functions are invalid, the encoder uses the patch information of the i-th view based on the first pruning mask map.
下面对上述确定P个色偏拟合函数的拟合误差的具体过程进行介绍。The specific process of determining the fitting errors of the P color deviation fitting functions is introduced below.
在本申请实施例中,编码端通过如下步骤3和步骤4,可以确定出P个色偏拟合函数的拟合误差。In the embodiment of the present application, the encoding end can determine the fitting errors of P color deviation fitting functions through the following steps 3 and 4.
步骤3,对于第j个图像块中的第k个未被修剪的像素点,基于P个色偏拟合函数对第k个未被修剪的像素点进行像素拟合,确定第k个未被修剪的像素点的拟合误差,k为正整数;Step 3: for the kth untrimmed pixel in the jth image block, perform pixel fitting on the kth untrimmed pixel based on P color deviation fitting functions to determine the fitting error of the kth untrimmed pixel, where k is a positive integer;
步骤4,基于第j个图像块中的未被修剪的像素点的拟合误差,确定P个色偏拟合函数的拟合误差。Step 4: Determine the fitting errors of P color deviation fitting functions based on the fitting errors of the unpruned pixels in the j-th image block.
在该实施例中,编码端首先基于P个色偏拟合函数,确定第j个图像块中每一个未被修剪的像素点的拟合误差。编码端确定每一个未被修剪的像素点的拟合误差的过程基本一致,为了便于描述,在此以第k个未被修剪的像素点为例进行说明。In this embodiment, the encoder first determines the fitting error of each unpruned pixel in the jth image block based on P color deviation fitting functions. The process of determining the fitting error of each unpruned pixel by the encoder is basically the same. For ease of description, the kth unpruned pixel is taken as an example for description.
本申请实施例对基于P个色偏拟合函数对第k个未被修剪的像素点进行像素拟合,确定第k个未被修剪的像素点的拟合误差的具体方式不做限制。The embodiment of the present application does not limit the specific method of performing pixel fitting on the kth unpruned pixel based on P color deviation fitting functions and determining the fitting error of the kth unpruned pixel.
在一些实施例中,基于P个色偏拟合函数对第k个未被修剪的像素点进行像素拟合,得到第k个未被修剪的像素点的拟合像素值,基于第k个未被修剪的像素点的拟合像素值和第k个未被修剪的像素点的原始像素值,确定第k个未被修剪的像素点的拟合误差。该像素值包括YUV的值。In some embodiments, pixel fitting is performed on the kth untrimmed pixel based on P color deviation fitting functions to obtain a fitting pixel value of the kth untrimmed pixel, and a fitting error of the kth untrimmed pixel is determined based on the fitting pixel value of the kth untrimmed pixel and the original pixel value of the kth untrimmed pixel. The pixel value includes a YUV value.
在一些实施例中,上述步骤3中基于P个色偏拟合函数对第k个未被修剪的像素点进行像素拟合,确定第k个未被修剪的像素点的拟合误差包括如下步骤:In some embodiments, in the above step 3, performing pixel fitting on the kth unpruned pixel based on the P color deviation fitting functions and determining the fitting error of the kth unpruned pixel comprises the following steps:
步骤31、使用P个色偏拟合函数,对第k个未被修剪的像素点进行色度拟合,得到第k个未被修剪的像素点的拟合色度值;Step 31: Use P color deviation fitting functions to perform chromaticity fitting on the kth unpruned pixel to obtain a fitting chromaticity value of the kth unpruned pixel;
步骤32、基于第k个未被修剪的像素点的拟合色度值和第k个未被修剪的像素点的色度值,确定第k个未被修剪的像素点的拟合误差。Step 32: Determine the fitting error of the kth unpruned pixel based on the fitted chromaticity value of the kth unpruned pixel and the chromaticity value of the kth unpruned pixel.
在该实施例中,确定色偏误差,具体是使用P个色偏拟合函数,对第k个未被修剪的像素点进行色度拟合,得到第k个未被修剪的像素点的拟合色度值,基于第k个未被修剪的像素点的拟合色度值和第k个未被修剪的像素点的色度值,确定第k个未被修剪的像素点的拟合误差。In this embodiment, the color deviation error is determined by using P color deviation fitting functions to perform chromaticity fitting on the kth untrimmed pixel to obtain the fitted chromaticity value of the kth untrimmed pixel, and determining the fitting error of the kth untrimmed pixel based on the fitted chromaticity value of the kth untrimmed pixel and the chromaticity value of the kth untrimmed pixel.
编码端基于上述步骤,可以确定出第j个图像块中每一个未被修剪的像素点的拟合误差,接着执行上述步骤4。Based on the above steps, the encoder can determine the fitting error of each unpruned pixel in the j-th image block, and then execute the above step 4.
本申请实施例中,上述步骤4的实现方式包括但不限于如下几种:In the embodiment of the present application, the implementation methods of the above step 4 include but are not limited to the following:
方式1,将第j个图像块中的未被修剪的像素点的拟合误差之和,确定为色偏拟合函数的拟合误差。In method 1, the sum of the fitting errors of the unpruned pixels in the j-th image block is determined as the fitting error of the color deviation fitting function.
方式2,将第j个图像块中的未被修剪的像素点的拟合误差平均值,确定为色偏拟合函数的拟合误差。Method 2: The average value of the fitting errors of the unpruned pixels in the j-th image block is determined as the fitting error of the color deviation fitting function.
编码端基于上述步骤,确定出第j个图像块的P个色偏拟合函数的拟合误差,若P个色偏拟合函数的拟合误差小于或等于第二预设值时,则使用该P个色偏拟合函数对第j个图像块中未被修剪的像素点进行修剪,得到第i个视图的补丁信息。若该P个色偏拟合函数无效时,则编码端基于该第一修剪掩码图,确定第i个视图的补丁信息。Based on the above steps, the encoder determines the fitting errors of the P color deviation fitting functions of the j-th image block. If the fitting errors of the P color deviation fitting functions are less than or equal to the second preset value, the P color deviation fitting functions are used to trim the untrimmed pixels in the j-th image block to obtain the patch information of the i-th view. If the P color deviation fitting functions are invalid, the encoder determines the patch information of the i-th view based on the first trimming mask map.
下面对上述S204中,基于P个色偏拟合函数对第j个图像块中未被修剪的像素点进行修剪,得到第i个视图的补丁信息的具体过程进行介绍。The specific process of pruning the unpruned pixels in the j-th image block based on the P color deviation fitting functions in the above S204 to obtain the patch information of the i-th view is introduced below.
本申请实施例对编码端基于色偏拟合函数对第j个图像块中未被修剪的像素点进行修剪,得到第i个视图的补丁信息的具体方式不做限制。In the embodiment of the present application, the specific method in which the encoder prunes the unpruned pixels in the j-th image block based on the color cast fitting function to obtain the patch information of the i-th view is not limited.
编码端基于上述步骤,对第一修剪掩码图中的M个图像块中的每一个图像块进行处理,得到如下几种情况:Based on the above steps, the encoder processes each of the M image blocks in the first pruning mask image and obtains the following situations:
情况1、M个图像块均具有有效色偏拟合函数。Case 1: All M image blocks have valid color cast fitting functions.
情况2、M个图像块中部分图像块具有有效色偏拟合函数。Case 2: Some of the M image blocks have valid color cast fitting functions.
情况3、M个图像块均不具有有效色偏拟合函数。Case 3: None of the M image blocks has a valid color cast fitting function.
对于情况3,编码端采用相关技术进行编码,本申请实施例主要对上述情况1和情况2这两个情况进行编码。For situation 3, the encoding end adopts relevant technology for encoding, and the embodiment of the present application mainly encodes the above-mentioned situation 1 and situation 2.
在一些实施例中,编码端对于具有有效色偏拟合函数的图像块,将该图像块中所有未被修剪的像素点均被修剪掉,基于具有有效色偏拟合函数的图像块中所有未被修剪的像素点均被修剪掉后的图像块,确定第i个视图的补丁信息。In some embodiments, the encoding end prunes all unpruned pixels in an image block having a valid color deviation fitting function, and determines patch information of the i-th view based on the image block after all unpruned pixels in the image block having a valid color deviation fitting function are pruned.
在一些实施例中,S204包括如下S204-A和S204-B的步骤:In some embodiments, S204 includes the following steps S204-A and S204-B:
S204-A、基于P个色偏拟合函数对第j个图像块中未被修剪的像素点进行修剪,得到第j个图像块的第三裁剪掩码图;S204-A, trimming untrimmed pixels in the j-th image block based on P color deviation fitting functions to obtain a third trimming mask image of the j-th image block;
S204-B、基于第i个视图中图像块的第三裁剪掩码图,确定第i个视图的补丁信息。S204-B. Determine patch information of the i-th view based on the third cropping mask map of the image block in the i-th view.
在该实施例中,编码端基于P个色偏拟合函数对第j个图像块中未被修剪的像素点进行逐像素修剪,得到第j个图像块的第三裁剪掩码图。这样可以确定出第i个视图中具有有效色偏拟合函数的图像块的第三裁剪掩码图。基于该第三裁剪掩码图对上述第一修剪掩码图进行更新,基于更新后的第一修剪掩码图,确定第i个视图的补丁信息。In this embodiment, the encoding end performs pixel-by-pixel pruning on the unpruned pixels in the j-th image block based on the P color deviation fitting functions to obtain the third cropping mask map of the j-th image block. In this way, the third cropping mask map of the image block with the effective color deviation fitting function in the i-th view can be determined. The first cropping mask map is updated based on the third cropping mask map, and the patch information of the i-th view is determined based on the updated first cropping mask map.
下面对S204-A中基于P个色偏拟合函数对第j个图像块中未被修剪的像素点进行修剪,得到第j个图像块的第三裁剪掩码图的具体方式进行介绍。The following describes a specific method of trimming the untrimmed pixels in the j-th image block based on the P color cast fitting functions in S204-A to obtain the third trimming mask image of the j-th image block.
本申请实施例中,对编码端基于P个色偏拟合函数对第j个图像块中未被修剪的像素点进行修剪,得到第j个图像块的第三裁剪掩码图的具体方式不做限制。In the embodiment of the present application, the specific method in which the encoding end trims the untrimmed pixels in the j-th image block based on P color deviation fitting functions to obtain the third cropping mask image of the j-th image block is not limited.
在一些实施例中,编码端基于P个色偏拟合函数,将第j个图像块中所有未被修剪的像素点均修剪掉,得到第j个图像块的第三裁剪掩码图。 In some embodiments, the encoding end trims all untrimmed pixels in the j-th image block based on P color deviation fitting functions to obtain a third cropping mask image of the j-th image block.
在一些实施例中,对于第j个图像块中每一个未被修剪的像素点,例如第k个未被修剪的像素点,若第k个未被修剪的像素点的拟合误差小于或等于第三预设值时,则编码端将第j个图像块的第k未被修剪的像素点裁剪掉,得到第j个图像块的第三裁剪掩码图。若该第k个未被修剪的像素点的拟合误差大于第三预设值时,则编码端对第j个图像块的第k未被修剪不进行修剪,得到第j个图像块的第三裁剪掩码图。In some embodiments, for each unpruned pixel in the j-th image block, for example, the k-th unpruned pixel, if the fitting error of the k-th unpruned pixel is less than or equal to a third preset value, the encoder cuts off the k-th unpruned pixel of the j-th image block to obtain a third cropping mask image of the j-th image block. If the fitting error of the k-th unpruned pixel is greater than the third preset value, the encoder does not trim the k-th unpruned pixel of the j-th image block to obtain a third cropping mask image of the j-th image block.
S205、对补丁信息和色偏拟合函数进行编码,得到码流。S205 : Encode the patch information and the color cast fitting function to obtain a bit stream.
编码端基于上述步骤,可以确定出第i个视图的补丁信息,以及色偏拟合函数。接着,编码端对上述补丁信息和色偏拟合函数进行编码,得到码流。Based on the above steps, the encoder can determine the patch information of the i-th view and the color deviation fitting function. Then, the encoder encodes the patch information and the color deviation fitting function to obtain a bitstream.
在一些实施例中,若第一标志被置为第一数值时,则对色偏拟合函数进行编码。也就是说,编码端在确定图像块的色偏拟合函数有效,即色偏拟合函数的拟合误差小于或等于第二预设值时,才将该色偏拟合函数写入码流。In some embodiments, if the first flag is set to a first value, the color deviation fitting function is encoded. That is, the encoding end writes the color deviation fitting function into the bitstream only when it determines that the color deviation fitting function of the image block is valid, that is, the fitting error of the color deviation fitting function is less than or equal to the second preset value.
在一些实施例中,若色偏拟合函数的拟合误差大于第二预设值,则编码端跳过对色偏拟合函数进行编码的步骤。In some embodiments, if the fitting error of the color deviation fitting function is greater than a second preset value, the encoding end skips the step of encoding the color deviation fitting function.
本申请实施例对色偏函数在码流中的具体携带位置不做限制。The embodiment of the present application does not limit the specific carrying position of the color shift function in the code stream.
在一种可能的实现方式中,编码端将色度函数写入补丁数据单元中。此时,解码端可以通过解码补丁数据单元,得到色偏拟合函数。In a possible implementation, the encoder writes the chromaticity function into the patch data unit. At this time, the decoder can obtain the color cast fitting function by decoding the patch data unit.
在一种示例中,补丁数据单元包括补丁数据单元MIV扩展语法,该色偏拟合函数携带在补丁数据单元MIV扩展语法中。In an example, the patch data unit includes a patch data unit MIV extended syntax, and the color shift fitting function is carried in the patch data unit MIV extended syntax.
上文对编码端对N个视图中的第i个视图的编码过程进行介绍。编码端可以参照第i个视图的编码过程,对N个视图中其他非基本视图进行编码,进而完成对N个视图的编码。The above describes the encoding process of the i-th view among the N views by the encoder. The encoder can refer to the encoding process of the i-th view to encode other non-basic views among the N views, thereby completing the encoding of the N views.
示例性的,如图15和图16所示,编码端在编码时,对两组视图分别进行编码,其中视图V4和视图V6为基本视图,视图V4、V1、V3、V0为一组视图进行单独编码,视图V6、V8、V7、V5为另一组视图进行单独编码。在对第一组视图V4、V1、V3、V0进行编码时,编码端首先确定第i个视图的第一修剪掩码图,具体的,视图V1为例,基于基本视图V4,对视图V1进行像素裁剪,即对于视图V1中的每一个像素点,判断该像素点的亮度值与该像素点在基本视图V4中的投影像素点的亮度值的差值是否小于一个预设值。若亮度之间的差值大于预设值,则将该像素点的掩码值置为1,若亮度之间的差值小于或等于预设值,则进行第二阶段裁剪。具体是,判断该像素点的色度值与投影像素点的色度值之间的偏差是否大于一个预设值。若色度之间的偏差大于预设值时,则将该像素点的掩码值置为1。若色度之间的偏差小于预设值时,则执行本申请实施例的方法。具体的,基于上述步骤,可以确定出视图V1的第一修剪掩码图,基于预设的块划分方式,将第i个视图划分为M个图像块;对于M个图像块中的第j个图像块,基于第一修剪掩码图确定第j个图像块中未被修剪的像素点,并对第j个图像块中未被修剪的像素点进行色度拟合,得到第j个图像块的色偏拟合函数;基于色偏拟合函数对第j个图像块中未被修剪的像素点进行修剪,得到第i个视图的补丁信息;对补丁信息和色偏拟合函数进行编码,得到码流。基于上述步骤,可以实现对第一组视图中的V3和V0编码。Exemplarily, as shown in FIG. 15 and FIG. 16, when encoding, the encoder encodes two groups of views separately, wherein view V4 and view V6 are basic views, views V4, V1, V3, and V0 are individually encoded as a group of views, and views V6, V8, V7, and V5 are individually encoded as another group of views. When encoding the first group of views V4, V1, V3, and V0, the encoder first determines the first pruning mask map of the i-th view. Specifically, taking view V1 as an example, based on the basic view V4, view V1 is pixel-cropped, that is, for each pixel in view V1, it is determined whether the difference between the brightness value of the pixel and the brightness value of the projected pixel of the pixel in the basic view V4 is less than a preset value. If the difference between the brightness is greater than the preset value, the mask value of the pixel is set to 1. If the difference between the brightness is less than or equal to the preset value, the second stage of cropping is performed. Specifically, it is determined whether the deviation between the chromaticity value of the pixel and the chromaticity value of the projected pixel is greater than a preset value. If the deviation between the chromaticities is greater than a preset value, the mask value of the pixel is set to 1. If the deviation between the chromaticities is less than a preset value, the method of the embodiment of the present application is executed. Specifically, based on the above steps, the first pruning mask map of view V1 can be determined, and the i-th view is divided into M image blocks based on a preset block division method; for the j-th image block in the M image blocks, the unpruned pixels in the j-th image block are determined based on the first pruning mask map, and the unpruned pixels in the j-th image block are chromatically fitted to obtain a color deviation fitting function of the j-th image block; the unpruned pixels in the j-th image block are trimmed based on the color deviation fitting function to obtain patch information of the i-th view; the patch information and the color deviation fitting function are encoded to obtain a code stream. Based on the above steps, encoding of V3 and V0 in the first group of views can be achieved.
对于第二组视图中的V6、V8、V7、V5,采用同样的方法,可以编码得到第二组视图中的V6、V8、V7、V5分别对应的补丁信息和色偏拟合函数,进而对补丁信息和色偏拟合函数进行编码,得到码流。For V6, V8, V7, and V5 in the second group of views, the same method can be used to encode the patch information and color deviation fitting functions corresponding to V6, V8, V7, and V5 in the second group of views, and then encode the patch information and the color deviation fitting function to obtain a bitstream.
本申请实施例提供的视频编码方法,在对多视点视频进行编码时,首先基于像素修剪方式,确定N个视图中的第i个视图的第一修剪掩码图。接着,基于预设的块划分方式,将该第i个视图划分为M个图像块,对于这M个图像块中的第j个图像块,基于第一修剪掩码图确定第j个图像块中未被修剪的像素点,并对第j个图像块中未被修剪的像素点进行色度拟合,得到第j个图像块的色偏拟合函数。然后,基于该色偏拟合函数对第j个图像块中未被修剪的像素点进行修剪,得到第i个视图的补丁信息。最后,对上述补丁信息和色偏拟合函数进行编码,得到码流。也就是说,本申请实施例基于色偏拟合函数,对像素修剪得到的第一修剪掩码图进行再次修剪,以进一步减少未被裁剪的像素点的个数,降低编码端需要编码的数据量,进而降低编码端的编码代价,提升多视点视频的编码效率。The video encoding method provided in the embodiment of the present application, when encoding a multi-view video, first determines the first pruning mask map of the i-th view in N views based on the pixel pruning method. Then, based on the preset block division method, the i-th view is divided into M image blocks, and for the j-th image block in the M image blocks, the unpruned pixels in the j-th image block are determined based on the first pruning mask map, and the unpruned pixels in the j-th image block are chromatically fitted to obtain the color deviation fitting function of the j-th image block. Then, the unpruned pixels in the j-th image block are pruned based on the color deviation fitting function to obtain the patch information of the i-th view. Finally, the above patch information and the color deviation fitting function are encoded to obtain a code stream. That is, the embodiment of the present application prunes the first pruning mask map obtained by pixel pruning again based on the color deviation fitting function to further reduce the number of unpruned pixels, reduce the amount of data required to be encoded at the encoding end, and thus reduce the encoding cost of the encoding end, and improve the encoding efficiency of the multi-view video.
应理解,图8至图16仅为本申请的示例,不应理解为对本申请的限制。It should be understood that Figures 8 to 16 are merely examples of the present application and should not be construed as limitations to the present application.
以上结合附图详细描述了本申请的优选实施方式,但是,本申请并不限于上述实施方式中的具体细节,在本申请的技术构思范围内,可以对本申请的技术方案进行多种简单变型,这些简单变型均属于本申请的保护范围。例如,在上述具体实施方式中所描述的各个具体技术特征,在不矛盾的情况下,可以通过任何合适的方式进行组合,为了避免不必要的重复,本申请对各种可能的组合方式不再另行说明。又例如,本申请的各种不同的实施方式之间也可以进行任意组合,只要其不违背本申请的思想,其同样应当视为本申请所公开的内容。The preferred embodiments of the present application are described in detail above in conjunction with the accompanying drawings. However, the present application is not limited to the specific details in the above embodiments. Within the technical concept of the present application, the technical solution of the present application can be subjected to a variety of simple modifications, and these simple modifications all belong to the protection scope of the present application. For example, the various specific technical features described in the above specific embodiments can be combined in any suitable manner without contradiction. In order to avoid unnecessary repetition, the present application will not further explain various possible combinations. For another example, the various different embodiments of the present application can also be arbitrarily combined, as long as they do not violate the ideas of the present application, they should also be regarded as the contents disclosed in the present application.
还应理解,在本申请的各种方法实施例中,上述各过程的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本申请实施例的实施过程构成任何限定。另外,本申请实施例中,术语“和/或”,仅仅是一种描述关联对象的关联关系,表示可以存在三种关系。具体地,A和/或B可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。另外,本申请中字符“/”,一般表示前后关联对象是一种“或”的关系。It should also be understood that in the various method embodiments of the present application, the size of the sequence number of each process does not mean the order of execution, and the execution order of each process should be determined by its function and internal logic, and should not constitute any limitation on the implementation process of the embodiments of the present application. In addition, in the embodiments of the present application, the term "and/or" is merely a description of the association relationship of associated objects, indicating that three relationships may exist. Specifically, A and/or B can represent: A exists alone, A and B exist at the same time, and B exists alone. In addition, the character "/" in the present application generally indicates that the associated objects before and after are in an "or" relationship.
上文结合图8至图16,详细描述了本申请的方法实施例,下文结合图10至图13,详细描述本申请的装置实施例。The method embodiment of the present application is described in detail above in conjunction with Figures 8 to 16 , and the device embodiment of the present application is described in detail below in conjunction with Figures 10 to 13 .
图17是本申请一实施例提供的视频解码装置的示意性框图,该视频解码装置10应用于上述视频解码器。FIG. 17 is a schematic block diagram of a video decoding device provided in an embodiment of the present application. The video decoding device 10 is applied to the above-mentioned video decoder.
如图17所示,视频解码装置10包括:As shown in FIG. 17 , the video decoding device 10 includes:
确定单元11,用于对于N个视图中的第i个视图,解码码流,确定所述第i个视图的补丁信息,并基于所述补丁信息,生成所述第i个视图的补丁图像,所述N个视图为N个不同视点的视图,所述N为大于1的正整数,所述i为小于或等于N的正整数;A determining unit 11 is used to decode a bitstream for an i-th view among N views, determine patch information of the i-th view, and generate a patch image of the i-th view based on the patch information, wherein the N views are views from N different viewpoints, N is a positive integer greater than 1, and i is a positive integer less than or equal to N;
划分单元12,用于按照预设的块划分方式,将所述补丁图像划分为M个图像块,所述M为正整数;A dividing unit 12, configured to divide the patch image into M image blocks according to a preset block dividing method, where M is a positive integer;
解码单元13,用于对于所述M个图像块中的第j个图像块,解码所述码流,得到所述第j个图像块的P个色偏拟合函数,所述j为小于或等于M的正整数,所述P为正整数;A decoding unit 13 is used to decode the code stream for a j-th image block among the M image blocks to obtain P color deviation fitting functions of the j-th image block, where j is a positive integer less than or equal to M, and P is a positive integer;
拟合单元14,用于使用所述P个色偏拟合函数,对所述第j个图像块中被裁剪的像素点进行像素拟合,得到所述第j个图像块的重建块。 The fitting unit 14 is configured to perform pixel fitting on the cropped pixels in the j-th image block using the P color deviation fitting functions to obtain a reconstructed block of the j-th image block.
在一些实施例中,拟合单元14,具体用于确定所述N个视图的修剪层次的有向无环图;基于所述有向无环图,确定所述第i个视图的P个父节点所对应的P个重建图像,所述P为正整数;基于所述P个重建图像,使用所述P个色偏拟合函数,对所述第j个图像块中被裁剪的像素点进行像素拟合,得到所述第j个图像块的重建块。In some embodiments, the fitting unit 14 is specifically used to determine a directed acyclic graph of the pruning hierarchy of the N views; based on the directed acyclic graph, determine P reconstructed images corresponding to the P parent nodes of the i-th view, where P is a positive integer; based on the P reconstructed images, use the P color deviation fitting functions to perform pixel fitting on the cropped pixels in the j-th image block to obtain a reconstructed block of the j-th image block.
在一些实施例中,拟合单元14,具体用于基于所述P个重建图像确定所述P个父节点的合成视图,并在所述P个父节点的合成视图中确定所述第j个图像块的P个合成视图块;基于所述P个合成视图块,使用所述P个色偏拟合函数,对所述第j个图像块中被修剪的像素点进行拟合,得到所述第j个图像块的P个色偏重建块;对所述P个色偏重建块进行加权融合,得到所述第j个图像块的重建块。In some embodiments, the fitting unit 14 is specifically used to determine the synthetic views of the P parent nodes based on the P reconstructed images, and determine the P synthetic view blocks of the j-th image block in the synthetic views of the P parent nodes; based on the P synthetic view blocks, use the P color deviation fitting functions to fit the pruned pixels in the j-th image block to obtain the P color deviation reconstructed blocks of the j-th image block; and perform weighted fusion on the P color deviation reconstructed blocks to obtain the reconstructed block of the j-th image block.
在一些实施例中,拟合单元14,具体用于对于所述第j个第二图像块中第k个被修剪的像素点,确定所述第k个被修剪的像素点在所述P个合成视图块中对应的P个合成像素点,所述k为正整数;基于所述P个合成像素点和所述P个合成像素点的周围像素点,使用所述色偏拟合函数对所述第k个被裁剪的像素点进行像素拟合,得到所述第k个被修剪的像素点的色偏重建值;基于所述第j个图像块中被修剪的像素点的色偏重建值,得到所述第j个图像块的一个色偏重建块。In some embodiments, the fitting unit 14 is specifically used to determine, for the kth trimmed pixel in the jth second image block, the P synthetic pixels corresponding to the kth trimmed pixel in the P synthetic view blocks, where k is a positive integer; based on the P synthetic pixels and the surrounding pixels of the P synthetic pixels, use the color deviation fitting function to perform pixel fitting on the kth trimmed pixel to obtain a color deviation reconstruction value of the kth trimmed pixel; based on the color deviation reconstruction value of the trimmed pixel in the jth image block, obtain a color deviation reconstruction block of the jth image block.
在一些实施例中,解码单元13,在按照预设的块划分方式,将所述补丁图像划分为M个图像块之前,还用于解码所述码流,得到第一标志,所述第一标志用于指示是否开启色偏拟合工具;若所述第一标志指示开启色偏拟合工具时,则按照预设的块划分方式,将所述补丁图像划分为M个图像块。In some embodiments, the decoding unit 13 is also used to decode the code stream to obtain a first flag before dividing the patch image into M image blocks according to a preset block division method, and the first flag is used to indicate whether to turn on the color deviation fitting tool; if the first flag indicates to turn on the color deviation fitting tool, the patch image is divided into M image blocks according to the preset block division method.
在一些实施例中,解码单元13,还用于若所述第一标志指示不开启所述色偏拟合工具时,则跳过按照预设的块划分方式,将所述补丁图像划分为M个图像块的步骤。In some embodiments, the decoding unit 13 is further configured to skip the step of dividing the patch image into M image blocks according to a preset block division method if the first flag indicates that the color cast fitting tool is not enabled.
在一些实施例中,若所述第一标志指示所述第一标志指示开启所述色偏拟合工具,且在所述码流中未解码出所述第j个图像块的P个色偏拟合函数时,则拟合单元14,还用于基于所述P个重建图像确定所述P个父节点的合成视图,并在所述P个父节点的合成视图中确定所述第j个图像块的P个合成视图块;基于所述P个合成视图块,得到所述第j个图像块的重建块。In some embodiments, if the first flag indicates that the color deviation fitting tool is turned on, and the P color deviation fitting functions of the j-th image block are not decoded in the code stream, the fitting unit 14 is further used to determine the synthetic views of the P parent nodes based on the P reconstructed images, and determine the P synthetic view blocks of the j-th image block in the synthetic views of the P parent nodes; based on the P synthetic view blocks, obtain the reconstructed block of the j-th image block.
在一些实施例中,拟合单元14,具体用于对所述P个合成视图块进行加权处理,得到所述第j个图像块的重建块。In some embodiments, the fitting unit 14 is specifically configured to perform weighted processing on the P synthetic view blocks to obtain a reconstructed block of the j-th image block.
在一些实施例中,解码单元13,具体用于若所述第j个图像块中包括被修剪的像素点时,则解码所述码流,得到所述第j个图像块的P个色偏拟合函数。In some embodiments, the decoding unit 13 is specifically configured to decode the bitstream to obtain P color deviation fitting functions of the j-th image block if the j-th image block includes pruned pixels.
在一些实施例中,解码单元13,具体用若所述第j个图像块所包括的被修剪的像素点的个数大于或等于预设值时,则则解码所述码流,得到所述第j个图像块的P个色偏拟合函数。In some embodiments, the decoding unit 13 is specifically configured to decode the bitstream to obtain P color deviation fitting functions of the jth image block if the number of pruned pixels included in the jth image block is greater than or equal to a preset value.
在一些实施例中,所述N个视图为所述N个不同视点在同一时刻对应的视图。In some embodiments, the N views are views corresponding to the N different viewpoints at the same time.
在一些实施例中,所述N个视图为所述N个视点中各视点的第k个GOP的第一张图像,所述k为正整数。In some embodiments, the N views are the first image of the kth GOP of each viewpoint among the N viewpoints, and k is a positive integer.
在一些实施例中,拟合单元14,还用于将所述第i个视图对应的色偏拟合函数,确定为所述第i个视图所在的GOP中其他视图的色偏拟合函数。In some embodiments, the fitting unit 14 is further configured to determine the color deviation fitting function corresponding to the i-th view as the color deviation fitting function of other views in the GOP where the i-th view is located.
在一些实施例中,所述码流包括补丁数据单元,解码单元13,具体用于解码所述补丁数据单元,得到所述第j个图像块的P个色偏拟合函数。In some embodiments, the code stream includes a patch data unit, and the decoding unit 13 is specifically configured to decode the patch data unit to obtain P color deviation fitting functions of the j-th image block.
在一些实施例中,所述第i个视图为所述N个视图中的非基本视图。In some embodiments, the i-th view is a non-base view among the N views.
应理解,装置实施例与方法实施例可以相互对应,类似的描述可以参照方法实施例。为避免重复,此处不再赘述。具体地,图17所示的装置10可以执行本申请实施例的解码端的解码方法,并且装置10中的各个单元的前述和其它操作和/或功能分别为了实现上述解码端的解码方法等各个方法中的相应流程,为了简洁,在此不再赘述。It should be understood that the device embodiment and the method embodiment may correspond to each other, and similar descriptions may refer to the method embodiment. To avoid repetition, no further description is given here. Specifically, the device 10 shown in FIG. 17 can execute the decoding method of the decoding end of the embodiment of the present application, and the aforementioned and other operations and/or functions of each unit in the device 10 are respectively for implementing the corresponding processes in each method such as the decoding method of the decoding end, and for the sake of brevity, no further description is given here.
图18是本申请一实施例提供的视频编码装置的示意性框图,该视频编码装置应用于上述编码器。FIG18 is a schematic block diagram of a video encoding device provided in an embodiment of the present application, and the video encoding device is applied to the above-mentioned encoder.
如图18所示,该视频编码装置20可以包括:As shown in FIG. 18 , the video encoding device 20 may include:
确定单元21,用于对于N个视图中的第i个视图,确定所述第i个视图的第一修剪掩码图,所述第一修剪掩码图是对所述第i个视图进行像素修剪得到的修剪掩码图,所述N个视图为N个不同视点的视图,所述N为大于1的正整数,所述i为小于或等于N的正整数;a determining unit 21, configured to determine, for an i-th view among N views, a first pruning mask map of the i-th view, wherein the first pruning mask map is a pruning mask map obtained by performing pixel pruning on the i-th view, wherein the N views are views from N different viewpoints, N is a positive integer greater than 1, and i is a positive integer less than or equal to N;
划分单元22,用于基于预设的块划分方式,将所述第i个视图划分为M个图像块,所述M为正整数;A dividing unit 22, configured to divide the i-th view into M image blocks based on a preset block dividing method, where M is a positive integer;
拟合单元23,用于对于所述M个图像块中的第j个图像块,基于所述第一修剪掩码图确定所述第j个图像块中未被修剪的像素点,并对所述第j个图像块中未被修剪的像素点进行色度拟合,得到所述第j个图像块的P个色偏拟合函数,所述j为小于或等于M的正整数,所述P为正整数;a fitting unit 23, configured to determine, for a j-th image block among the M image blocks, unpruned pixels in the j-th image block based on the first pruning mask image, and perform chromaticity fitting on the unpruned pixels in the j-th image block to obtain P color deviation fitting functions of the j-th image block, where j is a positive integer less than or equal to M, and P is a positive integer;
修剪单元24,用于基于所述P个色偏拟合函数对所述第j个图像块中未被修剪的像素点进行修剪,得到所述第i个视图的补丁信息;A pruning unit 24, configured to prune unpruned pixels in the j-th image block based on the P color cast fitting functions to obtain patch information of the i-th view;
编码单元25,用于对所述补丁信息和所述色偏拟合函数进行编码,得到码流。The encoding unit 25 is used to encode the patch information and the color cast fitting function to obtain a code stream.
在一些实施例中,确定单元21,具体用于生成所述N个视图的修剪层次的有向无环图;基于所述有向无环图,确定所述第i个视图的P个父节点的P个合成视图;基于所述P个合成视图,对所述第i个视图进行像素修剪,得到所述第一修剪掩码图。In some embodiments, the determination unit 21 is specifically used to generate a directed acyclic graph of the pruning hierarchy of the N views; based on the directed acyclic graph, determine P synthetic views of the P parent nodes of the i-th view; based on the P synthetic views, perform pixel pruning on the i-th view to obtain the first pruning mask map.
在一些实施例中,确定单元21,具体用于基于所述第i个视图的亮度分量与所述P个合成视图的亮度分量的差异,对所述第i个视图进行修剪,得到所述第i个视图的第二修剪掩码图;基于所述第i个视图的色度分量与所述P个合成视图的色度分量的差异,在所述第二修剪掩码图所包括被修剪的像素点中查询不被修剪的像素点,得到所述第一修剪掩码图。In some embodiments, the determination unit 21 is specifically used to prune the i-th view based on the difference between the luminance component of the i-th view and the luminance component of the P synthetic views to obtain a second pruning mask map of the i-th view; based on the difference between the chrominance component of the i-th view and the chrominance component of the P synthetic views, query the non-pruned pixels among the pruned pixels included in the second pruning mask map to obtain the first pruning mask map.
在一些实施例中,拟合单元23,在对所述第j个图像块中的未被修剪的像素点进行色度拟合,得到所述第j个图像块的色偏拟合函数之前,还用于确定所述第j个图像块是否包括未被修剪的像素点;若所述第j个图像块中包括未被修剪的像素点时,则对所述第j个图像块中未被修剪的像素点进行色度拟合,得到所述P个色偏拟合函数。 In some embodiments, the fitting unit 23, before performing chromaticity fitting on the untrimmed pixels in the j-th image block to obtain the color deviation fitting function of the j-th image block, is also used to determine whether the j-th image block includes untrimmed pixels; if the j-th image block includes untrimmed pixels, chromaticity fitting is performed on the untrimmed pixels in the j-th image block to obtain the P color deviation fitting functions.
在一些实施例中,拟合单元23,具体用于若所述第j个图像块所包括的未被修剪的像素点的个数大于或等于第一预设值时,则对所述第j个图像块中未被修剪的像素点进行色度拟合,得到所述色偏拟合函数。In some embodiments, the fitting unit 23 is specifically used to perform chromaticity fitting on the untrimmed pixels in the jth image block if the number of untrimmed pixels included in the jth image block is greater than or equal to a first preset value to obtain the color deviation fitting function.
在一些实施例中,拟合单元23,具体用于在所述P个合成视图中,确定所述第j个图像块的P个对应图像块;基于所述P个对应图像块所包括的像素点的色度值和所述第j个图像块所包括的像素点的色度值,对所述第j个图像块中未被修剪的像素点进行色度拟合,得到所述色偏拟合函数。In some embodiments, the fitting unit 23 is specifically used to determine the P corresponding image blocks of the j-th image block in the P synthetic views; based on the chromaticity values of the pixels included in the P corresponding image blocks and the chromaticity values of the pixels included in the j-th image block, perform chromaticity fitting on the untrimmed pixels in the j-th image block to obtain the color deviation fitting function.
在一些实施例中,拟合单元23,具体用于若所述第j个图像块所包括的未被修剪的像素点的个数大于或等于第一预设值时,则对所述第j个图像块中未被修剪的像素点进行色度拟合,得到所述P个色偏拟合函数。In some embodiments, the fitting unit 23 is specifically used to perform chromaticity fitting on the unpruned pixels in the jth image block if the number of unpruned pixels included in the jth image block is greater than or equal to a first preset value to obtain the P color deviation fitting functions.
在一些实施例中,拟合单元23,具体用于在所述P个合成视图中,确定所述第j个图像块的P个合成视图块;基于所述P个合成视图块所包括的像素点的色度值和所述第j个图像块所包括的像素点的色度值,对所述第j个图像块中未被修剪的像素点进行色度拟合,得到所述P个色偏拟合函数。In some embodiments, the fitting unit 23 is specifically used to determine P synthetic view blocks of the j-th image block in the P synthetic views; based on the chromaticity values of the pixels included in the P synthetic view blocks and the chromaticity values of the pixels included in the j-th image block, perform chromaticity fitting on the unpruned pixels in the j-th image block to obtain the P color deviation fitting functions.
在一些实施例中,拟合单元23,具体用于对于所述P个合成视图块中的任一合成视图块,在所述合成视图块中,确定所述第j个图像块中未被裁剪的像素点对应的合成像素点;基于所述合成像素点的色度值和所述第j个图像块中未被裁剪的像素点的色度值,确定所述色偏拟合函数。In some embodiments, the fitting unit 23 is specifically used to determine, for any one of the P synthetic view blocks, in the synthetic view block, a synthetic pixel point corresponding to an uncropped pixel point in the j-th image block; and determine the color deviation fitting function based on the chromaticity value of the synthetic pixel point and the chromaticity value of the uncropped pixel point in the j-th image block.
在一些实施例中,拟合单元23,具体用于基于所述合成像素点的色度值和所述第j个图像块中未被裁剪的像素点的色度值,进行加权最小二乘法拟合,得到所述色偏拟合函数。In some embodiments, the fitting unit 23 is specifically configured to perform weighted least squares fitting based on the chromaticity values of the synthesized pixel points and the chromaticity values of the uncropped pixel points in the j-th image block to obtain the color deviation fitting function.
在一些实施例中,修剪单元24在基于所述P个色偏拟合函数对所述第j个图像块中未被修剪的像素点进行修剪,得到所述第i个视图的补丁信息之前,还用于对于所述第j个图像块中的第k个未被修剪的像素点,基于所述P个色偏拟合函数对所述第k个未被修剪的像素点进行像素拟合,确定所述第k个未被修剪的像素点的拟合误差,所述k为正整数;基于所述第j个图像块中的未被修剪的像素点的拟合误差,确定所述P个色偏拟合函数的拟合误差;若所述P个色偏拟合函数的拟合误差小于或等于第二预设值,则基于所述色偏拟合函数对所述第j个图像块中未被修剪的像素点进行修剪,得到所述第i个视图的补丁信息。In some embodiments, before the trimming unit 24 trims the untrimmed pixels in the jth image block based on the P color deviation fitting functions to obtain the patch information of the i-th view, it is also used to perform pixel fitting on the k-th untrimmed pixel in the j-th image block based on the P color deviation fitting functions to determine the fitting error of the k-th untrimmed pixel, where k is a positive integer; determine the fitting errors of the P color deviation fitting functions based on the fitting errors of the untrimmed pixels in the j-th image block; if the fitting errors of the P color deviation fitting functions are less than or equal to a second preset value, trim the untrimmed pixels in the j-th image block based on the color deviation fitting function to obtain the patch information of the i-th view.
在一些实施例中,修剪单元24,具体用于基于所述P个色偏拟合函数对所述第j个图像块中未被修剪的像素点进行修剪,得到所述第j个图像块的第三裁剪掩码图;基于所述第i个视图中图像块的第三裁剪掩码图,确定所述第i个视图的补丁信息。In some embodiments, the trimming unit 24 is specifically used to trim the untrimmed pixels in the j-th image block based on the P color deviation fitting functions to obtain a third cropping mask image of the j-th image block; and determine the patch information of the i-th view based on the third cropping mask image of the image block in the i-th view.
在一些实施例中,修剪单元24,具体用于使用所述P个色偏拟合函数,对所述第k个未被修剪的像素点进行色度拟合,得到所述第k个未被修剪的像素点的拟合色度值;基于所述第k个未被修剪的像素点的拟合色度值和所述第k个未被修剪的像素点的色度值,确定所述第k个未被修剪的像素点的拟合误差。In some embodiments, the pruning unit 24 is specifically used to use the P color deviation fitting functions to perform chromaticity fitting on the kth unpruned pixel to obtain the fitted chromaticity value of the kth unpruned pixel; based on the fitted chromaticity value of the kth unpruned pixel and the chromaticity value of the kth unpruned pixel, determine the fitting error of the kth unpruned pixel.
在一些实施例中,修剪单元24,具体用于使用所述色偏拟合函数,对所述第k个未被修剪的像素点进行色度拟合,得到所述第k个未被修剪的像素点的拟合色度值;基于所述第k个未被修剪的像素点的拟合色度值和所述第k个未被修剪的像素点的色度值,确定所述第k个未被修剪的像素点的拟合误差。In some embodiments, the trimming unit 24 is specifically used to use the color deviation fitting function to perform chromaticity fitting on the kth untrimmed pixel to obtain the fitted chromaticity value of the kth untrimmed pixel; based on the fitted chromaticity value of the kth untrimmed pixel and the chromaticity value of the kth untrimmed pixel, determine the fitting error of the kth untrimmed pixel.
在一些实施例中,修剪单元24,具体用于将所述第j个图像块中的未被修剪的像素点的拟合误差之和,确定为所述P个色偏拟合函数的拟合误差。In some embodiments, the pruning unit 24 is specifically configured to determine the sum of the fitting errors of the unpruned pixels in the j-th image block as the fitting errors of the P color deviation fitting functions.
在一些实施例中,修剪单元24,具体用于将所述第j个图像块中的未被修剪的像素点的拟合误差平均值,确定为所述P个色偏拟合函数的拟合误差。In some embodiments, the pruning unit 24 is specifically configured to determine an average value of fitting errors of unpruned pixels in the j-th image block as the fitting errors of the P color deviation fitting functions.
在一些实施例中,编码单元25,在基于预设的块划分方式,将所述第i个视图划分为M个图像块之前,还用于确定第一标志,所述第一标志用于指示是否开启色偏拟合工具;若所述第一标志指示开启所述色偏拟合工具时,则基于预设的块划分方式,将所述第i个视图划分为M个图像块。In some embodiments, the encoding unit 25 is further used to determine a first flag before dividing the i-th view into M image blocks based on a preset block division method, where the first flag is used to indicate whether to turn on the color cast fitting tool; if the first flag indicates to turn on the color cast fitting tool, the i-th view is divided into M image blocks based on the preset block division method.
在一些实施例中,编码单元25,还用于将所述第一标志写入码流。In some embodiments, the encoding unit 25 is further configured to write the first flag into the bit stream.
在一些实施例中,编码单元25,还用于若所述P个色偏拟合函数的拟合误差大于所述第二预设值,则跳过对所述色偏拟合函数进行编码的步骤。In some embodiments, the encoding unit 25 is further configured to skip the step of encoding the color deviation fitting function if the fitting errors of the P color deviation fitting functions are greater than the second preset value.
在一些实施例中,所述N个视图为所述N个不同视点在同一时间生成的视图。In some embodiments, the N views are views generated by the N different viewpoints at the same time.
在一些实施例中,所述N个视图为所述N个视点中各视点的第k个GOP的第一张图像,所述k为正整数。In some embodiments, the N views are the first image of the kth GOP of each viewpoint among the N viewpoints, and k is a positive integer.
在一些实施例中,拟合单元24,还用于将所述第i个视图对应的色偏拟合函数,确定为所述第i个视图所在的GOP中其他视图的色偏拟合函数。In some embodiments, the fitting unit 24 is further configured to determine the color deviation fitting function corresponding to the i-th view as the color deviation fitting function of other views in the GOP where the i-th view is located.
在一些实施例中,编码单元25,具体用于将所述色偏拟合函数,写入补丁数据单元中。In some embodiments, the encoding unit 25 is specifically configured to write the color cast fitting function into the patch data unit.
在一些实施例中,所述第i个视图为所述N个视图中的非基本视图。In some embodiments, the i-th view is a non-base view among the N views.
应理解,装置实施例与方法实施例可以相互对应,类似的描述可以参照方法实施例。为避免重复,此处不再赘述。具体地,图18所示的装置20可以对应于执行本申请实施例的编码端的编码方法中的相应主体,并且装置20中的各个单元的前述和其它操作和/或功能分别为了实现编码端的编码方法等各个方法中的相应流程,为了简洁,在此不再赘述。It should be understood that the device embodiment and the method embodiment may correspond to each other, and similar descriptions may refer to the method embodiment. To avoid repetition, it will not be repeated here. Specifically, the device 20 shown in Figure 18 may correspond to the corresponding subject in the encoding method of the encoding end of the embodiment of the present application, and the aforementioned and other operations and/or functions of each unit in the device 20 are respectively for implementing the corresponding processes in each method such as the encoding method of the encoding end, and for the sake of brevity, it will not be repeated here.
上文中结合附图从功能单元的角度描述了本申请实施例的装置和系统。应理解,该功能单元可以通过硬件形式实现,也可以通过软件形式的指令实现,还可以通过硬件和软件单元组合实现。具体地,本申请实施例中的方法实施例的各步骤可以通过处理器中的硬件的集成逻辑电路和/或软件形式的指令完成,结合本申请实施例公开的方法的步骤可以直接体现为硬件译码处理器执行完成,或者用译码处理器中的硬件及软件单元组合执行完成。可选地,软件单元可以位于随机存储器,闪存、只读存储器、可编程只读存储器、电可擦写可编程存储器、寄存器等本领域的成熟的存储介质中。该存储介质位于存储器,处理器读取存储器中的信息,结合其硬件完成上述方法实施例中的步骤。The above describes the device and system of the embodiment of the present application from the perspective of the functional unit in conjunction with the accompanying drawings. It should be understood that the functional unit can be implemented in hardware form, can be implemented by instructions in software form, and can also be implemented by a combination of hardware and software units. Specifically, the steps of the method embodiment in the embodiment of the present application can be completed by the hardware integrated logic circuit and/or software form instructions in the processor, and the steps of the method disclosed in the embodiment of the present application can be directly embodied as a hardware decoding processor to perform, or a combination of hardware and software units in the decoding processor to perform. Optionally, the software unit can be located in a mature storage medium in the field such as a random access memory, a flash memory, a read-only memory, a programmable read-only memory, an electrically erasable programmable memory, a register, etc. The storage medium is located in a memory, and the processor reads the information in the memory, and completes the steps in the above method embodiment in conjunction with its hardware.
图19是本申请实施例提供的电子设备的示意性框图。FIG. 19 is a schematic block diagram of an electronic device provided in an embodiment of the present application.
如图19所示,该电子设备30可以为本申请实施例所述的视频编码器,或者视频解码器,该电子设备30可包括:As shown in FIG. 19 , the electronic device 30 may be a video encoder or a video decoder as described in the embodiment of the present application, and the electronic device 30 may include:
存储器33和处理器32,该存储器33用于存储计算机程序34,并将该程序代码34传输给该处理器32。换言之,该处理器32可以从存储器33中调用并运行计算机程序34,以实现本申请实施例中的方法。 The memory 33 and the processor 32, the memory 33 is used to store the computer program 34 and transmit the program code 34 to the processor 32. In other words, the processor 32 can call and run the computer program 34 from the memory 33 to implement the method in the embodiment of the present application.
例如,该处理器32可用于根据该计算机程序34中的指令执行上述方法中的步骤。For example, the processor 32 may be configured to execute the steps in the above method according to the instructions in the computer program 34 .
在本申请的一些实施例中,该处理器32可以包括但不限于:In some embodiments of the present application, the processor 32 may include but is not limited to:
通用处理器、数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现场可编程门阵列(Field Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等等。General-purpose processor, digital signal processor (DSP), application-specific integrated circuit (ASIC), field programmable gate array (FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc.
在本申请的一些实施例中,该存储器33包括但不限于:In some embodiments of the present application, the memory 33 includes but is not limited to:
易失性存储器和/或非易失性存储器。其中,非易失性存储器可以是只读存储器(Read-Only Memory,ROM)、可编程只读存储器(Programmable ROM,PROM)、可擦除可编程只读存储器(Erasable PROM,EPROM)、电可擦除可编程只读存储器(Electrically EPROM,EEPROM)或闪存。易失性存储器可以是随机存取存储器(Random Access Memory,RAM),其用作外部高速缓存。通过示例性但不是限制性说明,许多形式的RAM可用,例如静态随机存取存储器(Static RAM,SRAM)、动态随机存取存储器(Dynamic RAM,DRAM)、同步动态随机存取存储器(Synchronous DRAM,SDRAM)、双倍数据速率同步动态随机存取存储器(Double Data Rate SDRAM,DDR SDRAM)、增强型同步动态随机存取存储器(Enhanced SDRAM,ESDRAM)、同步连接动态随机存取存储器(synch link DRAM,SLDRAM)和直接内存总线随机存取存储器(Direct Rambus RAM,DR RAM)。Volatile memory and/or non-volatile memory. Among them, the non-volatile memory can be read-only memory (ROM), programmable ROM (PROM), erasable programmable read-only memory (EPROM), electrically erasable programmable read-only memory (EEPROM) or flash memory. The volatile memory can be random access memory (RAM), which is used as an external cache. By way of example and not limitation, many forms of RAM are available, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate synchronous dynamic random access memory (DDR SDRAM), enhanced synchronous dynamic random access memory (ESDRAM), synchronous link DRAM (SLDRAM) and direct RAM bus random access memory (Direct Rambus RAM, DR RAM).
在本申请的一些实施例中,该计算机程序34可以被分割成一个或多个单元,该一个或者多个单元被存储在该存储器33中,并由该处理器32执行,以完成本申请提供的方法。该一个或多个单元可以是能够完成特定功能的一系列计算机程序指令段,该指令段用于描述该计算机程序34在该电子设备30中的执行过程。In some embodiments of the present application, the computer program 34 may be divided into one or more units, which are stored in the memory 33 and executed by the processor 32 to complete the method provided by the present application. The one or more units may be a series of computer program instruction segments capable of completing specific functions, and the instruction segments are used to describe the execution process of the computer program 34 in the electronic device 30.
如图19所示,该电子设备30还可包括:As shown in FIG. 19 , the electronic device 30 may further include:
收发器33,该收发器33可连接至该处理器32或存储器33。The transceiver 33 may be connected to the processor 32 or the memory 33 .
其中,处理器32可以控制该收发器33与其他设备进行通信,具体地,可以向其他设备发送信息或数据,或接收其他设备发送的信息或数据。收发器33可以包括发射机和接收机。收发器33还可以进一步包括天线,天线的数量可以为一个或多个。The processor 32 may control the transceiver 33 to communicate with other devices, specifically, to send information or data to other devices, or to receive information or data sent by other devices. The transceiver 33 may include a transmitter and a receiver. The transceiver 33 may further include an antenna, and the number of antennas may be one or more.
应当理解,该电子设备30中的各个组件通过总线系统相连,其中,总线系统除包括数据总线之外,还包括电源总线、控制总线和状态信号总线。It should be understood that the various components in the electronic device 30 are connected via a bus system, wherein the bus system includes not only a data bus but also a power bus, a control bus and a status signal bus.
图20是本申请实施例提供的视频编解码系统的示意性框图。FIG. 20 is a schematic block diagram of a video encoding and decoding system provided in an embodiment of the present application.
如图20所示,该视频编解码系统40可包括:视频编码器41和视频解码器42,其中视频编码器41用于执行本申请实施例涉及的视频编码方法,视频解码器42用于执行本申请实施例涉及的视频解码方法。As shown in FIG. 20 , the video encoding and decoding system 40 may include: a video encoder 41 and a video decoder 42 , wherein the video encoder 41 is used to execute the video encoding method involved in the embodiment of the present application, and the video decoder 42 is used to execute the video decoding method involved in the embodiment of the present application.
本申请还提供了一种计算机存储介质,其上存储有计算机程序,该计算机程序被计算机执行时使得该计算机能够执行上述方法实施例的方法。或者说,本申请实施例还提供一种包含指令的计算机程序产品,该指令被计算机执行时使得计算机执行上述方法实施例的方法。The present application also provides a computer storage medium on which a computer program is stored, and when the computer program is executed by a computer, the computer can perform the method of the above method embodiment. In other words, the present application embodiment also provides a computer program product containing instructions, and when the instructions are executed by a computer, the computer can perform the method of the above method embodiment.
本申请还提供了一种码流,该码流是根据上述编码方法生成的。The present application also provides a code stream, which is generated according to the above encoding method.
当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。该计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行该计算机程序指令时,全部或部分地产生按照本申请实施例该的流程或功能。该计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。该计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,该计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线(digital subscriber line,DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。该计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。该可用介质可以是磁性介质(例如,软盘、硬盘、磁带)、光介质(例如数字视频光盘(digital video disc,DVD))、或者半导体介质(例如固态硬盘(solid state disk,SSD))等。When software is used for implementation, it can be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on a computer, the process or function according to the embodiment of the present application is generated in whole or in part. The computer can be a general-purpose computer, a special-purpose computer, a computer network, or other programmable devices. The computer instructions can be stored in a computer-readable storage medium, or transmitted from one computer-readable storage medium to another computer-readable storage medium. For example, the computer instructions can be transmitted from a website site, computer, server or data center by wired (e.g., coaxial cable, optical fiber, digital subscriber line (digital subscriber line, DSL)) or wireless (e.g., infrared, wireless, microwave, etc.) mode to another website site, computer, server or data center. The computer-readable storage medium can be any available medium that a computer can access or a data storage device such as a server or data center that includes one or more available media integrations. The available medium can be a magnetic medium (e.g., a floppy disk, a hard disk, a magnetic tape), an optical medium (e.g., a digital video disc (DVD)), or a semiconductor medium (e.g., a solid state disk (SSD)), etc.
本领域普通技术人员可以意识到,结合本申请中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。Those of ordinary skill in the art will appreciate that the units and algorithm steps of each example described in conjunction with the embodiments disclosed in this application can be implemented in electronic hardware, or a combination of computer software and electronic hardware. Whether these functions are performed in hardware or software depends on the specific application and design constraints of the technical solution. Professional and technical personnel can use different methods to implement the described functions for each specific application, but such implementation should not be considered to be beyond the scope of this application.
在本申请所提供的几个实施例中,应该理解到,所揭露的系统、装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,该单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。In the several embodiments provided in the present application, it should be understood that the disclosed systems, devices and methods can be implemented in other ways. For example, the device embodiments described above are only schematic. For example, the division of the unit is only a logical function division. There may be other division methods in actual implementation, such as multiple units or components can be combined or integrated into another system, or some features can be ignored or not executed. Another point is that the mutual coupling or direct coupling or communication connection shown or discussed can be through some interfaces, indirect coupling or communication connection of devices or units, which can be electrical, mechanical or other forms.
作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。例如,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。The units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the scheme of this embodiment. For example, each functional unit in each embodiment of the present application may be integrated into a processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit.
以上内容,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以该权利要求的保护范围为准。 The above contents are only specific implementation methods of the present application, but the protection scope of the present application is not limited thereto. Any technician familiar with the technical field can easily think of changes or substitutions within the technical scope disclosed in the present application, which should be included in the protection scope of the present application. Therefore, the protection scope of the present application should be based on the protection scope of the claims.
Claims (42)
Priority Applications (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| PCT/CN2023/106426 WO2025010558A1 (en) | 2023-07-07 | 2023-07-07 | Video encoding method and apparatus, video decoding method and apparatus, device, system, and storage medium |
| CN202380095468.0A CN120858579A (en) | 2023-07-07 | 2023-07-07 | Video encoding and decoding method, device, equipment, system, and storage medium |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| PCT/CN2023/106426 WO2025010558A1 (en) | 2023-07-07 | 2023-07-07 | Video encoding method and apparatus, video decoding method and apparatus, device, system, and storage medium |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2025010558A1 true WO2025010558A1 (en) | 2025-01-16 |
Family
ID=94214581
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/CN2023/106426 Pending WO2025010558A1 (en) | 2023-07-07 | 2023-07-07 | Video encoding method and apparatus, video decoding method and apparatus, device, system, and storage medium |
Country Status (2)
| Country | Link |
|---|---|
| CN (1) | CN120858579A (en) |
| WO (1) | WO2025010558A1 (en) |
Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20180103253A1 (en) * | 2016-10-11 | 2018-04-12 | Dolby Laboratories Licensing Corporation | Adaptive chroma quantization in video coding for multiple color imaging formats |
| CN115336269A (en) * | 2020-04-13 | 2022-11-11 | 英特尔公司 | Texture-based immersive video coding |
| KR20230051056A (en) * | 2021-10-08 | 2023-04-17 | 한국전자통신연구원 | Apparatus and Method for Eliminating Duplicate Data between Multi-View Videos |
-
2023
- 2023-07-07 WO PCT/CN2023/106426 patent/WO2025010558A1/en active Pending
- 2023-07-07 CN CN202380095468.0A patent/CN120858579A/en active Pending
Patent Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20180103253A1 (en) * | 2016-10-11 | 2018-04-12 | Dolby Laboratories Licensing Corporation | Adaptive chroma quantization in video coding for multiple color imaging formats |
| CN115336269A (en) * | 2020-04-13 | 2022-11-11 | 英特尔公司 | Texture-based immersive video coding |
| KR20230051056A (en) * | 2021-10-08 | 2023-04-17 | 한국전자통신연구원 | Apparatus and Method for Eliminating Duplicate Data between Multi-View Videos |
Non-Patent Citations (1)
| Title |
|---|
| SHIN HONG‐CHANG, JEONG JUN‐YOUNG, LEE GWANGSOON, KAKLI MUHAMMAD UMER, YUN JUNYOUNG, SEO JEONGIL: "Enhanced pruning algorithm for improving visual quality in MPEG immersive video", ETRI JOURNAL, vol. 44, no. 1, 1 February 2022 (2022-02-01), KR , pages 73 - 84, XP093261918, ISSN: 1225-6463, DOI: 10.4218/etrij.2021-0211 * |
Also Published As
| Publication number | Publication date |
|---|---|
| CN120858579A (en) | 2025-10-28 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US11151742B2 (en) | Point cloud data transmission apparatus, point cloud data transmission method, point cloud data reception apparatus, and point cloud data reception method | |
| US11170556B2 (en) | Apparatus for transmitting point cloud data, a method for transmitting point cloud data, an apparatus for receiving point cloud data and a method for receiving point cloud data | |
| US11917130B2 (en) | Error mitigation in sub-picture bitstream based viewpoint dependent video coding | |
| US12432380B2 (en) | Point cloud data transmitting device, point cloud data transmitting method, point cloud data receiving device, and point cloud data receiving method | |
| US11915456B2 (en) | Method and device for encoding and reconstructing missing points of a point cloud | |
| US20220159261A1 (en) | Point cloud data transmission device, point cloud data transmission method, point cloud data reception device, and point cloud data reception method | |
| CN115443652B (en) | Point cloud data sending device, point cloud data sending method, point cloud data receiving device and point cloud data receiving method | |
| CN115918093A (en) | Point cloud data sending device, point cloud data sending method, point cloud data receiving device, and point cloud data receiving method | |
| TW201911863A (en) | Reference map derivation and motion compensation for 360-degree video writing code | |
| US20240338857A1 (en) | Point cloud data transmission device, point cloud data transmission method, point cloud data reception device, and point cloud data reception method | |
| US20250173907A1 (en) | Point cloud data transmission device, point cloud data transmission method, point cloud data reception device, and point cloud data reception method | |
| CN116915967A (en) | Point cloud data transmitting device and method, and point cloud data receiving device and method | |
| WO2020135317A1 (en) | Devices and methods for coding a picture by partitioning it into slices comprising tiles | |
| KR20190020083A (en) | Encoding method and apparatus and decoding method and apparatus | |
| US20240373063A1 (en) | 3d data transmission device, 3d data transmission method, 3d data reception device, and 3d data reception method | |
| CN112262581A (en) | Constraint flag indication in video code stream | |
| JP2022517118A (en) | Efficient patch rotation in point cloud coding | |
| WO2023142127A1 (en) | Coding and decoding methods and apparatuses, device, and storage medium | |
| US20240314326A1 (en) | Video Coding Method and Related Apparatus Thereof | |
| WO2024077637A1 (en) | Encoding method and apparatus, decoding method and apparatus, encoder, decoder, and storage medium | |
| WO2025010558A1 (en) | Video encoding method and apparatus, video decoding method and apparatus, device, system, and storage medium | |
| CN119256546B (en) | Encoding and decoding method, device, encoder, decoder and storage medium | |
| WO2024213011A1 (en) | Visual volumetric video-based coding method, encoder and decoder | |
| WO2024213012A1 (en) | Visual volumetric video-based coding method, encoder and decoder | |
| US20240236352A1 (en) | Bitstream syntax for mesh motion field coding |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 23944581 Country of ref document: EP Kind code of ref document: A1 |
|
| WWE | Wipo information: entry into national phase |
Ref document number: 202380095468.0 Country of ref document: CN |
|
| WWP | Wipo information: published in national office |
Ref document number: 202380095468.0 Country of ref document: CN |