[go: up one dir, main page]

WO2018200293A1 - Codage d'images - Google Patents

Codage d'images Download PDF

Info

Publication number
WO2018200293A1
WO2018200293A1 PCT/US2018/028217 US2018028217W WO2018200293A1 WO 2018200293 A1 WO2018200293 A1 WO 2018200293A1 US 2018028217 W US2018028217 W US 2018028217W WO 2018200293 A1 WO2018200293 A1 WO 2018200293A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
encoding
target
determining
pixel
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/US2018/028217
Other languages
English (en)
Inventor
Jizheng Xu
Yiming Li
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Microsoft Technology Licensing LLC
Original Assignee
Microsoft Technology Licensing LLC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Microsoft Technology Licensing LLC filed Critical Microsoft Technology Licensing LLC
Publication of WO2018200293A1 publication Critical patent/WO2018200293A1/fr
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/154Measured or subjectively estimated visual quality after decoding, e.g. measurement of distortion
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/597Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding specially adapted for multi-view video sequence encoding
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/115Selection of the code volume for a coding unit prior to coding
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/157Assigned coding mode, i.e. the coding mode being predefined or preselected to be further used for selection of another element or parameter
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/167Position within a video image, e.g. region of interest [ROI]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/182Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being a pixel
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/103Selection of coding mode or of prediction mode
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/119Adaptive subdivision aspects, e.g. subdivision of a picture into rectangular or non-rectangular coding blocks
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/12Selection from among a plurality of transforms or standards, e.g. selection between discrete cosine transform [DCT] and sub-band transform or selection between H.263 and H.264
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/124Quantisation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/146Data rate or code amount at the encoder output
    • H04N19/147Data rate or code amount at the encoder output according to rate distortion criteria
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/17Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
    • H04N19/176Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a block, e.g. a macroblock
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/189Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the adaptation method, adaptation tool or adaptation type used for the adaptive coding
    • H04N19/19Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the adaptation method, adaptation tool or adaptation type used for the adaptive coding using optimisation based on Lagrange multipliers

Definitions

  • the omnidirectional video is generally composed of a plurality of spherical images.
  • the traditional image and/or video encoding system usually takes rectangular images as input, it is difficult for the omnidirectional video composed of a plurality of spherical images to be compressed. Therefore, to compress the omnidirectional video, it is usually necessary to map the spherical images into rectangular images. However, such mapping may introduce considerable distortion to the image.
  • different parts of a moving object may have different geometric distortions, which may reduce the encoding performance significantly.
  • an encoding method an encoding device and a computer program product.
  • the method determines an impact of the target pixel on the mapping based on the source and target pixels.
  • the method also determines a distortion caused by encoding of the target image at least based on the determined impact.
  • the method determines at least part of encoding parameters for the encoding of the target image.
  • FIG. 1 shows a block diagram of a system 100 in which implementations of the subject matter as described herein can be implemented
  • FIG. 2 shows a flowchart of an encoding method 200 according to implementations of the subject matter as described herein;
  • FIG. 3 shows a schematic diagram of mapping the spherical image into the rectangular image by equirectangular projection
  • FIG. 4 shows a block diagram of an example computing system/server 400 in which one or more implementations of the subject matter described herein can be implemented.
  • the term "includes” and its variants are to be read as open terms that mean “includes, but is not limited to.”
  • the term “based on” is to be read as “based at least in part on.”
  • the term “one implementation” and “an implementation” are to be read as “at least one implementation.”
  • the term “another implementation” is to be read as “at least one other implementation.”
  • the terms “first,” “second,” and the like may refer to different or same objects. Other definitions, explicit and implicit, may be included below. A definition of a term is consistent throughout the description unless the context clearly indicates otherwise.
  • FIG. 1 shows a block diagram of a system 100 in which implementations of the subject matter as described herein can be implemented.
  • the system 100 may be used for processing an image and/or a video. Particularly, the system 100 may be used for processing an omnidirectional video.
  • the system 100 in general may be divided into an encoding sub-system 110 and a decoding sub-system 120. It is to be understood that the structure and functionality of the system 100 are described only for the purpose of illustration without suggesting any limitations to the scope of the subject matter described herein.
  • the subject matter described herein can be embodied with a different structure and/or functionality.
  • some or all of the modules included in the system 100 can be implemented by software, hardware, firmware, and/or any suitable combination of the foregoing.
  • the encoding sub-system 110 may include a capturing module 111, a stitching module 112, a mapping module 113 and an encoding module 114.
  • these modules can be distributed on different devices. Alternatively, in some other implementations, these modules can be implemented on the same device. In addition, some of these modules can be incorporated into other modules.
  • the mapping module 113 can be incorporated into the encoding module 114.
  • the capturing module 111 may be implemented by a camera array which can cover all the directions of the light illuminating the camera array.
  • the capturing module 111 may be used to capture video signals from different directions.
  • the stitching module 112 may receive image signals captured by the capturing module 111 from different directions and apply a stitching process to derive a source image.
  • the stitching module 112 may generate the source image from the image signals captured by the capturing module 111 from different directions.
  • the source image may be a non-planar image.
  • the source image for instance, may include a spherical image which records visual signals from all the directions observed from the center of the sphere.
  • the stitching module 112 can also combine a plurality of spherical images as a spherical image sequence.
  • the combined spherical image sequence is exactly an omnidirectional video signal.
  • the omnidirectional video signal represented by the spherical image sequence may treat video signals from every direction equally, which is beneficial to the subsequent video rendering.
  • a mapping module 113 may be utilized to map the source image in the omnidirectional video.
  • the mapping module 113 may map the source image into a target image.
  • the mapping module 113 may map the spherical image in the spherical image sequence into a rectangular image.
  • the mapping module 113 may perform the above mapping in many ways. For example, in some implementations, to map the spherical image into a rectangular image, the mapping module 113 may perform the above mapping by means of equirectangular projection, which is widely adopted in head-mounted displays.
  • equirectangular projection may refer to a process by which longitudes of the sphere with a constant gap are mapped into vertical lines with a constant gap and latitude lines of the sphere with a constant gap are mapped into horizontal lines with a constant gap.
  • equirectangular projection may refer to a process by which longitudes of the sphere with a constant gap are mapped into vertical lines with a constant gap and latitude lines of the sphere with a constant gap are mapped into horizontal lines with a constant gap.
  • equirectangular projection may refer to a process by which longitudes of the sphere with a constant gap are mapped into vertical lines with a constant gap and latitude lines of the sphere with a constant gap are mapped
  • the encoding module 114 can be employed to compress the image and/or video signal into image and/or video bitstream.
  • the compressed image and/or video bitstream can be utilized for transmission to different devices.
  • the decoding sub-system 120 may perform an inverse process of the above process perfromed by the encoding sub-system 110 to reconstruct the image and/or video signal from the received image and/or video bitstream. Particularly, when the received bitstream is a compressed omnidirectional video signal, the decoding sub-system 120 can reconstruct the omnidirectional video signal from the received video bitstream. As shown in FIG. 1, the decoding sub-system 120 may include a decoding module 121, an unmapping module 122 and a rendering module 123. In some implementations, these modules can be distributed on different devices. Alternatively, in some implementations, these modules can be implemented on the same device. Moreover, some of these modules can be incorporated into other modules. For example, in some implementations, the unmapping module 122 can be incorporated into the decoding module 121.
  • the decoding module 121 may decode the received image and/or video bitstream into the image and/or video. Then, the unmapping module 122 may perform an inverse process of the above process performed by the mapping module 113, and map the target image (such as the rectangular image) in the image and/or video back into the source image (such as the spherical image), thereby reconstructing the image and/or video signal.
  • the rendering module 123 may display the reconstructed image and/or video signal to the corresponding display device (such as a head-mounted display).
  • the process for mapping the source image (such as the spherical image), which is probably non-planar, to the target image (such as the rectangular image) may introduce a considerable distortion to the image.
  • a point at the north pole of the sphere may be mapped into the topest line of the rectangular image, while a point at the equator of the sphere may remain the same before and after mapping.
  • Such a geometric distortion may become greater as a point becomes farther from the equator.
  • the distortion may also impact the motion.
  • an obj ect with a rigid motion may have a much more complicated motion after the mapping because different parts of the object may have different geometric distortions.
  • the term "rigid motion” refers to a movement of an object in which the distance between any two points of the object remains constant before and after the movement.
  • the traditional video encoding system is usually designed to be adapted for a linear motion (namely, the trajectory of the motion is linear) and/or a rigid motion. Therefore, such a geometric distortion can reduce the video encoding performance significantly.
  • an encoding scheme which can determine different impact of different points on the mapping process from the source image to the target image and can be adapted to any mapping methods (including but not limited to Equirectangular Proj ection) currently known or to be developed in the future.
  • the scheme can adjust measurement for the distortion caused by encoding based on the determined impact, and determine at least part of encoding parameters for the encoding of the image based on the adjusted measurement for the distortion so as to improve the encoding performance.
  • the encoding scheme according to the implementations of the subject matter described herein can be used to encode an image and/or a video.
  • the encoding scheme can be used to encode an omnidirectional video to improve the encoding performance for the omnidirectional video.
  • FIG. 2 shows a flowchart of an encoding method 200 according to the implementations of the subject matter described herein.
  • method 200 can be implemented by the encoding sub-system 110 as shown in FIG. 1. It is to be understood that method 200 may further include additional actions not shown and/or omit the shown actions. The scope of the subject matter described herein is not limited in this aspect.
  • the encoding sub-system 110 determines, in response to a source image being mapped into a target image and at least based on at least one source pixel in the source image and at least one target pixel in the target image, an impact of the at least one target pixel on the mapping.
  • the at least one source pixel in the source image is mapped into the at least one target pixel in the target image.
  • the term "source image” may include for example a planar image or a non-planar image of any type.
  • the source image may be an image in a video.
  • the source image can be any of a plurality of spherical images composing of an omnidirectional video.
  • the term "target image”, for instance, may be an image of any type that the source image is mapped into. For instance, when the source image is a spherical image, the target image may be a rectangular image.
  • source pixel may refer to a pixel in the source image while the term “target pixel” may refer to a pixel corresponding to the source pixel in the target image that the source image is mapped into.
  • target pixel may refer to a pixel corresponding to the source pixel in the target image that the source image is mapped into.
  • a spherical image is taken as an example for the source image and a rectangular image is taken as an example for the target image.
  • this is only for the purpose of illustration, without suggesting any limitations to the scope of subject matter as described herein in any manner.
  • FIG. 3 shows a schematic diagram of mapping the spherical image into the rectangular image by equirectangular proj ection.
  • a spherical image 310 may be mapped into a rectangular image 320.
  • a source pixel 311 in the spherical image 310 is mapped into a target pixel 321 in the rectangular image 320.
  • a point at the north pole of the spherical image 310 may be mapped into the toppest line of the rectangular image 320 by equirectangular proj ection, while a relative distance between two points at the equator in the spherical image 310 remains the same before and after mapping.
  • mapping the spherical image 310 into the rectangular image 320 may facilitate the encoding, such mapping may introduce a geometric distortion that significantly impacts the encoding performance, and result in different impact of different parts in the image on the geometric distortion.
  • each of source pixels in the spherical image 310 is equally important, the respective target pixel in the rectangular image 320 may be of different significance. In other words, different target pixels in the rectangular image 320 may have different impact on the mapping.
  • the encoding sub-system 110 may determine the impact of the target pixel 321 on the mapping based on the number of pixels included in the source pixel 311 in the spherical image 310 (also referred as "first number”) and the number of pixels included in the target pixel 321 in the rectangular image 320 (also referred as "second number").
  • the impact may be represented as a weight associated with the target pixel 321.
  • the weight may be quantified as, for example, a ratio between the first number and the second number. It is to be understood that this is only for the purpose of illustration, without suggesting any limitations to the scope of subject matter as described herein in any manner.
  • any suitable calculation can be applied to the first and second numbers, and the calculated result can be used as a quantitative representation of the above impact.
  • a pixel for example, target pixel 321 which is the j th pixel of the rectangular image 320 in the horizontal direction and the 1 th pixel in the vertical direction may be represented as I [i,j], where ⁇ [0, H-l] and je[0, W-l].
  • the weight associated with the target pixel I[i, j] may be represented as A[i,j].
  • a smaller A[i,j] may indicate that the target pixel I[i,j] corresponds to a smaller area of the spherical image (namely, less source pixels), and thus the target pixel I[i,j] may have smaller impact on the distortion.
  • a larger A[i,j ] may indicate that the target pixel I[i,j] corresponds to a larger area of the spherical image (namely, more source pixels), and thus the target pixel I[i, j] may have larger impact on the distortion.
  • A[i,j] may be determined based on the value of i as follows:
  • each of source pixels in the spherical image 310 may have different importance.
  • each of the source pixels in the spherical image 310 may have a degree of importance associated with the encoding.
  • the degree of importance may be represented as a weight reflecting the importance of the source pixel per se. For example, a higher weight may indicate that the source pixel has greater impact on the encoding, while a lower weight may indicate that the source pixel has smaller impact on the encoding.
  • the encoding sub-system 1 10 may also determine the impact of the target pixel on the mapping based on the source pixel, the target pixel corresponding to the source pixel and the degree of importance associated with the source pixel (such as the weight reflecting the importance of the source pixel per se).
  • the target pixel I[i, j] (such as the target pixel 321) in the rectangular image 320 may correspond to at least one source pixel (such as the source pixel 311) in the spherical image 310, and each of the at least one source pixel may have a respective weight reflecting its importance.
  • the encoding sub-system 110 may determine the weight associated with the at least one source pixel based on a respective weight of each of the at least one source pixels (for example, by averaging or by other proper means).
  • the weight associated with the at least one source pixel is represented as w[i,j] and the weight determined based on the number of pixels in the at least one source pixels and the number of pixels in the target pixel I[i,j] is represented as a[i,j] (for instance, according to equation (1)).
  • the weight A[i,j ] associated with the target pixel I[i,j] may be determined as, for instance, a product of w[i,j] and a[i,j].
  • the target image may be divided into a plurality of image blocks. If an image block is relatively small, it may be considered approximately that different target pixels in the image block have the same impact on the mapping. In other words, different target pixels in the image block may have the same weight. In this case, for example, the weight associated with one target pixel in the image block can be assigned to other pixels in the same image block to simplify some of the following-up processing.
  • the implementations of the subject matter described herein can determine different impact of different pixels on the mapping during the mapping process from the source image to the target image in a quantitative manner.
  • the implementations of the subject matter described herein are not limited to a particular mapping method but can be adapted to any of mapping methods currently known or to be developed in the future.
  • the encoding sub-system 110 determines a distortion caused by encoding of the target image based on the impact of the at least one target pixel (such as the target pixel 321) on the mapping.
  • the encoding sub-system 110 may determine the distortion by generating an error measuring the distortion and then estimate corresponding image and/or video encoding parameters based on the generated error in the subsequent actions.
  • the error for measuring the distortion caused by the encoding of the target image may include, for example, Sum of Squares for Error (SSE) between the rectangular image and the reconstructed image.
  • SSE Sum of Squares for Error
  • the term "reconstructed image" may refer to an image derived from at least in part applying an inverse process of the encoding to the encoded target image.
  • the encoding module 114 may usually perform a reconstruction to the encoded image.
  • the encoding sub-system 110 may, for example, obtain a reconstructed image of the target image from the encoding module 114 to calculate SSE between the target image and the reconstructed image.
  • the error for measuring the distortion caused by the encoding of the target image may further include Sum of Absolute Difference (SAD) or Sum of Absolute Transformation Difference (SATD) between the target image and the reconstructed image.
  • SAD Sum of Absolute Difference
  • SATD Sum of Absolute Transformation Difference
  • the encoding sub-system 110 may obtain the reconstructed image of the target image (such as from the encoding module 114) to calculate SAD or SATD.
  • Different errors for measuring the distortion caused by the encoding of the target image can be applied to different modules (such as movement estimation module, filtering module and/or so on) of the image and/or video encoder, for example, so as to estimate respective encoding parameters for different modules.
  • modules such as movement estimation module, filtering module and/or so on
  • H o represents Hadamard transformation
  • the encoding subsystem 110 may determine the distortion based on the target image, the reconstructed image of the target image and different impact of different pixels in the target image on the mapping. For example, the encoding sub-system 110 may adjust distortion measurement of the traditional solutions based on different impact of different pixels in the target image on the mapping.
  • the encoding sub-system 110 may generate, based on different impact of different pixels in the target image on the mapping, Sum of Squares for Error (hereinafter referred to as "SSE”') between the target image and its reconstructed image, Sum of Absolute Difference (hereinafter referred to as "SAD”') and/or Sum of Absolute Transformation Difference (hereinafter referred to as "SATD"') for use in the estimation of image and/or video encoding parameters in the subsequent actions.
  • SSE', SAD' and SATD' may be determined as follows, respectively:
  • the implementations of the subject matter described herein can adjust the measurement of the distortion for the image and/or video encoding based on different impact of different pixels on the mapping.
  • the encoding sub-system 110 determines at least part of encoding parameters for the encoding of the target image based on the distortion determined at block 220.
  • encoding parameters for example, for an image block in the target image to be encoded, there may exist a plurality of encoding parameters for the encoding of the image block.
  • encoding parameters may include one or more of the following: a quantitation parameter (QP) corresponding to a respective quantitative step size and deciding the quantitative degree of the image and/or video encoding; an encoding mode for the image block (for example, intra-frame prediction or inter-frame prediction, as well as the prediction direction of intra-frame prediction and/or inter-frame prediction in video encoding); a manner for further dividing the image block into sub-image blocks; a type of transformation applied to the image block, such as discrete cosine transformation or direction linear transformation; and the like.
  • QP quantitation parameter
  • an encoding mode for the image block for example, intra-frame prediction or inter-frame prediction, as well as the prediction direction of intra-frame prediction and/or inter-frame prediction in video encoding
  • a rate distortion optimization process may be applied by the encoding sub-system 110 to compare different values of encoding parameters, so as to determine at least part of encoding parameters with the optimal encoding performance.
  • the encoding sub-system 110 may compare different values of all of the encoding parameters to determine values of all of the encoding parameters with the optimal encoding performance.
  • the encoding sub-system 110 may also compare different values of part of the encoding parameters to determine values of part of encoding parameters with the optimal encoding performance. For example, in some cases, one or more encoding parameters might have been determined. Therefore, the encoding sub-system may apply an optimization process to other encoding parameters other than the one or more encoding parameters.
  • rate distortion optimization process refers to a process of determining encoding parameters with the optimal encoding performance based on the rate distortion theory.
  • the main targets of the rate distortion optimization process are: 1) to reduce distortion of the image and/or video at certain coding rate limits; and 2) to reduce encoding rate as much as possible with a certain degree of distortion being allowed.
  • the rate distortion optimization process may be transformed into a Lagrangian optimization process.
  • Equation (9) can be simplified as: minimize (D + XR) (10) according to equation (10), the Lagrangian optimization process can be interpreted as to determine encoding parameters such that the value of ( D + XR ) is minimum (namely, to determine encoding parameters with the optimal encoding performance).
  • the encoding sub-system 110 may determine any of SSE ', SAD ' and SATO ' according to equations (5)-(7) and use it as D in equation (10), so as to determine the encoding parameters with the optimal encoding performance. More generally, in some implementations, the encoding sub-system 1 10 may also determine other distortion measurement based on equation (8) and use it as D in equation (10), thereby, so as to the encoding parameters with the optimal encoding performance.
  • the Lagrangian operator may need to be changed from ⁇ to ⁇ , so as to determine the encoding parameters with the optimal encoding performance.
  • the implementations of the subj ect matter described herein can determine, based on the adjusted distortion measurement, the optimal encoding parameters for the encoding of the image and/or video (such as the omnidirectional video) without making a big change to the current image and/or video encoder to improve the encoding performance.
  • the encoding parameter to be determined may include QP which corresponds to a respective quantitative step size and decides the quantitative degree of the image and/or video encoding.
  • QP' can be determined as follows:
  • the encoding sub-system 1 10 may determine, only based on the weight associated with the image block and the Lagrangian operator, the optimal quantitation parameter as the at least part of encoding parameters for the encoding of the target image, thereby further improving the encoding performance for the omnidirectional video.
  • the method 200 may further include actions not shown in FIG. 2.
  • the encoding sub-system 110 (such as the mapping module 113) may map the source image into the target image.
  • the encoding sub-system 110 may map the spherical image 310 into the rectangular image 320 by means of equirectangular projection.
  • the encoding sub-system (such as the encoding module 114) may use the encoding parameters determined at block 230 to encode the target image. It is to be understood that this is merely for the purpose of simplification, without suggesting any limitations to the scope of subject matter as described herein in any manner.
  • FIG. 4 shows a block diagram of an example computing system/server 400 in which one or more implementations of the subject matter described herein can be implemented.
  • the encoding sub-system 110 as shown in FIG. 1 can be implemented by the computing system/server 400.
  • the computing system/server 400 as shown in FIG. 4 is only an example, which should not be construed as any limitation to the function and scope of use of the implementations of the subject matter described herein.
  • the computing system/server 400 is in a form of a general- purpose computing device.
  • Components of the computing system/server 400 may include, but are not limited to, one or more processors or processing units 400, a memory 420, one or more input devices 430, one or more output devices 440, storage 450, and one or more communication units 460.
  • the processing unit 400 may be a real or a virtual processor and is capable of performing various processes in accordance with a program stored in the memory 420. In a multi-processing system, multiple processing units execute computer- executable instructions to increase processing power.
  • the computing system/server 400 typically includes a variety of machine readable medium. Such medium may be any available medium that is accessible by the computing system/ server 400, including volatile and non-volatile medium, removable and non-removable medium.
  • the memory 420 may be volatile memory (e.g., registers, cache, a random-access memory (RAM)), non-volatile memory (e.g., a read only memory (ROM), an electrically erasable programmable read only memory (EEPROM), a flash memory), or some combination thereof.
  • the storage 450 may be removable or non-removable, and may include machine readable medium such as flash drives, magnetic disks or any other medium which can be used to store information and which can be accessed within the computing system/server 400.
  • the computing system/server 400 may further include other removable/nonremovable, volatile/non-volatile computing system storage medium.
  • a disk driver for reading from or writing to a removable, non-volatile disk (e.g., a "floppy disk")
  • an optical disk driver for reading from or writing to a removable, nonvolatile optical disk.
  • each driver can be connected to the bus by one or more data medium interfaces.
  • the memory 420 may include at least one program product having a set (e.g., at least one) of program modules that are configured to carry out the functions of various implementations described herein. For instance, when one or more modules in system 100 are implemented as software modules, they can be stored in the storage 420 and when accessed and operated by the processing unit 400, they can implement the function and/or method described herein, such as method 200.
  • the input unit(s) 430 may be one or more of various different input devices.
  • the input unit(s) 439 may include a user device such as a mouse, keyboard, trackball, etc.
  • the communication unit(s) 460 enables communication over communication medium to another computing entity.
  • functionality of the components of the computing system/server 400 may be implemented in a single computing machine or in multiple computing machines that are able to communicate over communication connections.
  • the computing system/server 400 may operate in a networked environment using logical connections to one or more other servers, network personal computers (PCs), or another common network node.
  • communication media include wired or wireless networking techniques.
  • the computing system/server 400 may also communicate, as required, with one or more external devices (not shown) such as a storage device, a display device, and the like, one or more devices that enable a user to interact with the computing system/server 400, and/or any device (e.g., network card, a modem, etc.) that enables the computing system/server 400 to communicate with one or more other computing devices. Such communication may be performed via an input/ output (I/O) interface(s) (not shown).
  • I/O input/ output
  • implementations of the subject matter described herein can determine different impact of different points on the mapping process from the source image to the target image and can be adapted to any mapping methods (including but not limited to equirectangular projection) currently known or to be developed in the future.
  • the implementations of the subject matter described herein can adjust measurement for the distortion caused by the encoding based on the determined impact, and determine at least part of encoding parameters for the encoding of the image based on the adjusted measurement for the distortion so as to improve the encoding performance.
  • the functionally described herein can be performed, at least in part, by one or more hardware logic components.
  • illustrative types of hardware logic components include Field-Programmable Gate Arrays (FPGAs), Application-specific Integrated Circuits (ASICs), Application-specific Standard Products (ASSPs), System-on-a-chip systems (SOCs), Complex Programmable Logic Devices (CPLDs), and the like.
  • FPGAs Field-Programmable Gate Arrays
  • ASICs Application-specific Integrated Circuits
  • ASSPs Application-specific Standard Products
  • SOCs System-on-a-chip systems
  • CPLDs Complex Programmable Logic Devices
  • program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowcharts and/or block diagrams to be implemented.
  • the program code may execute entirely on a machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.
  • a machine readable medium may be any tangible medium that may contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
  • the machine readable medium may be a machine readable signal medium or a machine readable storage medium.
  • a machine readable medium may include but not limited to an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing.
  • machine readable storage medium More specific examples of the machine readable storage medium would include an electrical connection having one or more wires, a portable computer diskette, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or Flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
  • RAM random access memory
  • ROM read-only memory
  • EPROM or Flash memory erasable programmable read-only memory
  • CD-ROM portable compact disc read-only memory
  • magnetic storage device or any suitable combination of the foregoing.
  • an encoding device comprising a processing unit and a memory.
  • the memory is coupled to the processing unit and stores instructions for execution by the processing unit.
  • the instructions when executed by the processing unit, cause the device to perform actions comprising: determining, in response to a source image being mapped to a target image and at least based on at least one source pixel in the source image and at least one target pixel in the target image, an impact of the at least one target pixel on the mapping, the at least one source pixel being mapped to the at least one target pixel; determining, at least based on the impact of the at least one target pixel on the mapping, a distortion caused by encoding of the target image; and determining, based on the distortion, at least part of encoding parameters for the encoding of the target image.
  • determining the impact comprises: determining the impact based on a first number of pixels in the at least one source pixel and a second number of pixels in the at least one target pixel.
  • the at least one source pixel is associated with at least one degree of importance on the encoding
  • determining the impact comprises: determining the impact based on the at least one source pixel, the at least one target pixel and the at least one degree of importance.
  • the target image is associated with a plurality of image blocks.
  • the actions further comprise: assigning the determined impact associated with the at least one target pixel to a pixel located in a same image block of the plurality of image blocks as the at least one target pixel.
  • determining the distortion comprises: obtaining a reconstructed image of the target image, the reconstructed image being derived from at least in part applying an inverse process of the encoding to the encoded target image; and determining the distortion based on the target image, the reconstructed image and the impact.
  • determining the distortion comprises: generating an error measuring the distortion.
  • generating the error comprises: generating, based on the impact, at least one of Sum of Squares for Error (SSE), Sum of Absolute Difference (SAD) and Sum of Absolute Transformation Difference (SATD) between the target image and the reconstructed image.
  • SSE Sum of Squares for Error
  • SAD Sum of Absolute Difference
  • SSD Sum of Absolute Transformation Difference
  • determining the at least part of the encoding parameters comprises: determining, based on the distortion, the at least part of the encoding parameters through a Lagrangian optimization process.
  • determining the at least part of the encoding parameters further comprises: determining, based on the impact and a Lagrangian operator employed in the Lagrangian optimization process, a quantitation parameter as the at least part of the encoding parameters.
  • the source image is included in a video to be encoded by the device, the source image comprises a spherical image, the target image comprises a rectangular image and the actions further comprise: mapping the spherical image to the rectangular image by equirectangular projection before determining the impact associated with the at least one target pixel.
  • an encoding method includes: determining, in response to a source image being mapped to a target image and at least based on at least one source pixel in the source image and at least one target pixel in the target image, an impact of the at least one target pixel on the mapping, the at least one source pixel being mapped to the at least one target pixel; determining, at least based on the impact of the at least one target pixel on the mapping, a distortion caused by encoding of the target image; and determining, based on the distortion, at least part of encoding parameters for the encoding of the target image.
  • determining the impact comprises: determining the impact based on a first number of pixels in the at least one source pixel and a second number of pixels in the at least one target pixel.
  • the at least one source pixel is associated with at least one degree of importance on the encoding
  • determining the impact comprises: determining the impact based on the at least one source pixel, the at least one target pixel and the at least one degree of importance.
  • the target image is associated with a plurality of image blocks.
  • the method further comprises: assigning the determined impact associated with the at least one target pixel to a pixel located in a same image block of the plurality of image blocks as the at least one target pixel.
  • determining the distortion comprises: obtaining a reconstructed image of the target image, the reconstructed image being derived from at least in part applying an inverse process of the encoding to the encoded target image; and determining the distortion based on the target image, the reconstructed image and the impact.
  • determining the distortion comprises: generating an error measuring the distortion.
  • generating the error comprises: generating, based on the impact, at least one of Sum of Squares for Error (SSE), Sum of Absolute Difference (SAD) and Sum of Absolute Transformation Difference (SATD) between the target image and the reconstructed image.
  • SSE Sum of Squares for Error
  • SAD Sum of Absolute Difference
  • SSD Sum of Absolute Transformation Difference
  • determining the at least part of the encoding parameters comprises: determining, based on the distortion, the at least part of the encoding parameters through a Lagrangian optimization process.
  • determining the at least part of the encoding parameters further comprises: determining, based on the impact and a Lagrangian operator employed in the Lagrangian optimization process, a quantitation parameter as the at least part of the encoding parameters.
  • the source image is included in a video to be encoded by the device, the source image comprises a spherical image, the target image comprises a rectangular image and the method further comprises: mapping the spherical image to the rectangular image by equirectangular projection before determining the impact associated with the at least one target pixel.
  • a computer program product which is tangibly stored on a non-transient machine-readable medium and comprising machine- executable instructions.
  • the machine-executable instructions when executed by a device, causing the device to perform actions of the method according to the second aspect.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

La présente invention concerne un procédé, un dispositif et un produit-programme informatique de codage d'images et/ou de vidéos. Selon le procédé, en réponse au mappage d'une image source dans une image cible, un impact d'au moins un pixel cible sur le mappage est déterminé sur la base d'au moins un pixel source dans l'image source et du ou des pixels cibles dans l'image cible. Le ou les pixels sources sont mappés dans le ou les pixels cibles. Une distorsion provoquée par le codage de l'image cible est ensuite déterminée sur la base de l'impact du ou des pixels cibles sur le mappage. Par la suite, au moins une partie des paramètres de codage servant au codage de l'image cible est déterminée sur la base de la distorsion.
PCT/US2018/028217 2017-04-28 2018-04-19 Codage d'images Ceased WO2018200293A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201710294882.0 2017-04-28
CN201710294882.0A CN109246407B (zh) 2017-04-28 2017-04-28 图像编码

Publications (1)

Publication Number Publication Date
WO2018200293A1 true WO2018200293A1 (fr) 2018-11-01

Family

ID=62116996

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2018/028217 Ceased WO2018200293A1 (fr) 2017-04-28 2018-04-19 Codage d'images

Country Status (2)

Country Link
CN (1) CN109246407B (fr)
WO (1) WO2018200293A1 (fr)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020107288A1 (fr) * 2018-11-28 2020-06-04 Oppo广东移动通信有限公司 Procédé et appareil d'optimisation de codage vidéo et support d'informations d'ordinateur
CN112365401A (zh) * 2020-10-30 2021-02-12 北京字跳网络技术有限公司 图像生成方法、装置、设备和存储介质

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110602495A (zh) * 2019-08-20 2019-12-20 深圳市盛世生物医疗科技有限公司 一种医学图像编码方法及装置

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018017599A1 (fr) * 2016-07-19 2018-01-25 Vid Scale, Inc. Système et procédé d'évaluation de qualité pour vidéo à 360 degrés

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102165772B (zh) * 2008-09-16 2013-07-24 杜比实验室特许公司 自适应视频编码器控制
CN103139554B (zh) * 2011-11-22 2016-12-21 浙江大学 一种三维视频率失真优化方法及优化装置
CN103813149B (zh) * 2012-11-15 2016-04-13 中国科学院深圳先进技术研究院 一种编解码系统的图像与视频重构方法
US9628803B2 (en) * 2014-11-25 2017-04-18 Blackberry Limited Perceptual image and video coding

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018017599A1 (fr) * 2016-07-19 2018-01-25 Vid Scale, Inc. Système et procédé d'évaluation de qualité pour vidéo à 360 degrés

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
LI TIMING ET AL: "Spherical domain rate-distortion optimization for 360-degree video coding", 2017 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), IEEE, 10 July 2017 (2017-07-10), pages 709 - 714, XP033146709, DOI: 10.1109/ICME.2017.8019492 *
SULLIVAN G J; WIEGAND T: "Rate-distortion optimization for video compression", IEEE SIGNAL PROCESSING MAGAZINE., vol. 15, no. 6, 1 November 1998 (1998-11-01), US, pages 74 - 90, XP055480988, ISSN: 1053-5888, DOI: 10.1109/79.733497 *
YULE SUN ET AL: "[FTV-AHG] WS-PSNR for 360 video quality evaluation", 115. MPEG MEETING; 30-5-2016 - 3-6-2016; GENEVA; (MOTION PICTURE EXPERT GROUP OR ISO/IEC JTC1/SC29/WG11), no. m38551, 27 May 2016 (2016-05-27), XP030066907 *
YULE SUN ET AL: "AHG8: Stretching ratio based adaptive quantization for 360 video", 6. JVET MEETING; 31-3-2017 - 7-4-2017; HOBART; (THE JOINT VIDEO EXPLORATION TEAM OF ISO/IEC JTC1/SC29/WG11 AND ITU-T SG.16 ); URL: HTTP://PHENIX.INT-EVRY.FR/JVET/, no. JVET-F0072, 30 March 2017 (2017-03-30), XP030150744 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020107288A1 (fr) * 2018-11-28 2020-06-04 Oppo广东移动通信有限公司 Procédé et appareil d'optimisation de codage vidéo et support d'informations d'ordinateur
CN112655212A (zh) * 2018-11-28 2021-04-13 Oppo广东移动通信有限公司 视频编码优化方法、装置及计算机存储介质
CN112365401A (zh) * 2020-10-30 2021-02-12 北京字跳网络技术有限公司 图像生成方法、装置、设备和存储介质

Also Published As

Publication number Publication date
CN109246407A (zh) 2019-01-18
CN109246407B (zh) 2020-09-25

Similar Documents

Publication Publication Date Title
CN114402607B (zh) 使用带有时间一致补片的视频编码进行点云压缩
KR101933860B1 (ko) 증강 현실에 대한 방사 전달 샘플링을 위한 장치 및 방법
US9699380B2 (en) Fusion of panoramic background images using color and depth data
CN101483770B (zh) 一种编解码方法及装置
US11812066B2 (en) Methods, devices and stream to encode global rotation motion compensated images
KR102254986B1 (ko) 구면 투영부들에 의한 왜곡을 보상하기 위한 등장방형 객체 데이터의 프로세싱
CN104715496B (zh) 云环境下基于三维点云模型的图像预测方法、系统及装置
AU2009243439A1 (en) Robust image alignment for distributed multi-view imaging systems
US20170064279A1 (en) Multi-view 3d video method and system
JP2018533273A5 (fr)
WO2017124298A1 (fr) Procédé de codage et de décodage vidéo, et procédé de prédiction intertrame, appareil et système associés
WO2018200293A1 (fr) Codage d'images
CN114170325B (zh) 确定单应性矩阵的方法、装置、介质、设备和程序产品
CN117981324A (zh) 使用多传感器协作的图像和视频编码
WO2024250160A1 (fr) Système et procédé d'achèvement de profondeur et de reconstruction tridimensionnelle d'une zone d'image
Feng et al. E-4dgs: High-fidelity dynamic reconstruction from the multi-view event cameras
CN115953468A (zh) 深度和自运动轨迹的估计方法、装置、设备及存储介质
Thirumalai et al. Distributed representation of geometrically correlated images with compressed linear measurements
US10979704B2 (en) Methods and apparatus for optical blur modeling for improved video encoding
CN117979057B (zh) 一种三维点云辅助视频语义通信的发送接收方法和装置
CN117768647B (zh) 图像处理方法、装置、设备及可读存储介质
Lin et al. MoVieS: Motion-aware 4D dynamic view synthesis in one second
CN120147493A (zh) 针对特定视角的图像生成方法、设备和计算机程序产品
CN104284195A (zh) 三维视频中的深度图预测方法、装置、编码器和解码器
Hu et al. Feature enhanced spherical transformer for spherical image compression

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18723148

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18723148

Country of ref document: EP

Kind code of ref document: A1