WO2015056712A1 - 動画像符号化方法、動画像復号方法、動画像符号化装置、動画像復号装置、動画像符号化プログラム、及び動画像復号プログラム - Google Patents
動画像符号化方法、動画像復号方法、動画像符号化装置、動画像復号装置、動画像符号化プログラム、及び動画像復号プログラム Download PDFInfo
- Publication number
- WO2015056712A1 WO2015056712A1 PCT/JP2014/077436 JP2014077436W WO2015056712A1 WO 2015056712 A1 WO2015056712 A1 WO 2015056712A1 JP 2014077436 W JP2014077436 W JP 2014077436W WO 2015056712 A1 WO2015056712 A1 WO 2015056712A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- image
- region
- motion information
- prediction
- area
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/50—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
- H04N19/597—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding specially adapted for multi-view video sequence encoding
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N13/00—Stereoscopic video systems; Multi-view video systems; Details thereof
- H04N13/20—Image signal generators
- H04N13/271—Image signal generators wherein the generated image signals comprise depth maps or disparity maps
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/102—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
- H04N19/119—Adaptive subdivision aspects, e.g. subdivision of a picture into rectangular or non-rectangular coding blocks
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/169—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
- H04N19/17—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
- H04N19/172—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a picture, frame or field
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/169—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
- H04N19/17—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
- H04N19/174—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a slice, e.g. a line of blocks or a group of blocks
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/169—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
- H04N19/17—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
- H04N19/176—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a block, e.g. a macroblock
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/50—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
- H04N19/503—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
- H04N19/51—Motion estimation or motion compensation
- H04N19/513—Processing of motion vectors
- H04N19/517—Processing of motion vectors by encoding
- H04N19/52—Processing of motion vectors by encoding by predictive encoding
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N13/00—Stereoscopic video systems; Multi-view video systems; Details thereof
- H04N2013/0074—Stereoscopic image analysis
- H04N2013/0081—Depth or disparity estimation from stereoscopic image signals
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N13/00—Stereoscopic video systems; Multi-view video systems; Details thereof
- H04N2013/0074—Stereoscopic image analysis
- H04N2013/0085—Motion estimation from stereoscopic image signals
Definitions
- the present invention relates to a moving image encoding method, a moving image decoding method, a moving image encoding device, a moving image decoding device, a moving image encoding program, and a moving image decoding program for encoding and decoding multi-viewpoint moving images.
- This application claims priority based on Japanese Patent Application No. 2013-216526 for which it applied to Japan on October 17, 2013, and uses the content here.
- multi-view images composed of a plurality of images obtained by photographing the same subject and background with a plurality of cameras. These moving images taken by a plurality of cameras are called multi-view moving images (or multi-view images).
- an image (moving image) captured by one camera is referred to as a “two-dimensional image (two-dimensional moving image)”, and the same subject and background have different positions and orientations (hereinafter referred to as viewpoints).
- viewpoints A group of two-dimensional images (two-dimensional moving images) photographed by these cameras is referred to as “multi-view images (multi-view images)”.
- the two-dimensional moving image has a strong correlation in the time direction, and the encoding efficiency can be increased by using the correlation.
- the encoding efficiency can be increased by using this correlation.
- H. an international encoding standard.
- high-efficiency encoding is performed using techniques such as motion compensation prediction, orthogonal transform, quantization, and entropy encoding.
- H.M. In H.264, encoding using temporal correlation between a frame to be encoded and a plurality of past or future frames is possible.
- H. The details of the motion compensation prediction technique used in H.264 are described in Non-Patent Document 1, for example.
- H. An outline of the motion compensation prediction technique used in H.264 will be described.
- H. H.264 motion compensation prediction divides the encoding target frame into blocks of various sizes, and allows each block to have different motion vectors and different reference frames. By using a different motion vector for each block, it is possible to achieve highly accurate prediction that compensates for different motions for each subject. On the other hand, by using a different reference frame for each block, it is possible to realize highly accurate prediction in consideration of occlusion caused by temporal changes.
- the difference between the multi-view image encoding method and the multi-view image encoding method is that, in addition to the correlation between cameras, the multi-view image has a temporal correlation at the same time. However, in either case, correlation between cameras can be used in the same way. Therefore, here, a method used in encoding a multi-view video is described.
- FIG. 8 is a conceptual diagram illustrating the parallax that occurs between the cameras (the first camera and the second camera).
- the image plane of the camera whose optical axes are parallel is looked down vertically. In this way, the position where the same part on the subject is projected on the image plane of a different camera is generally called a corresponding point.
- each pixel value of the encoding target frame is predicted from the reference frame based on the correspondence relationship, and the prediction residual and the disparity information indicating the correspondence relationship are encoded. Since the parallax changes for each target camera pair and position, it is necessary to encode the parallax information for each region where the parallax compensation prediction is performed. In fact, H. In the H.264 multi-view video encoding scheme, a vector representing disparity information is encoded for each block using disparity compensation prediction.
- Correspondence given by the parallax information can be represented by a one-dimensional quantity indicating the three-dimensional position of the subject instead of a two-dimensional vector based on epipolar geometric constraints by using camera parameters.
- information indicating the three-dimensional position of the subject there are various expressions, but the distance from the reference camera to the subject or the coordinate value on the axis that is not parallel to the image plane of the camera is often used. In some cases, the reciprocal of the distance is used instead of the distance. In addition, since the reciprocal of the distance is information proportional to the parallax, there are cases where two reference cameras are set and the three-dimensional position is expressed as the amount of parallax between images captured by these cameras. Since there is no essential difference no matter what expression is used, in the following, information indicating these three-dimensional positions is expressed as depth without distinguishing by expression.
- FIG. 9 is a conceptual diagram of epipolar geometric constraints.
- the point on the image of another camera corresponding to the point on the image of one camera is constrained on a straight line called an epipolar line.
- the corresponding point is uniquely determined on the epipolar line.
- the corresponding point in the second camera image corresponding to the subject projected at the position m in the first camera image is on the epipolar line when the subject position in the real space is M ′.
- the subject position in the real space is M ′′, it is projected at the position m ′′ on the epipolar line.
- Non-Patent Document 2 using this property, a composite image for the encoding target frame is generated from the reference frame in accordance with the three-dimensional information of each subject given by the depth map (distance image) for the reference frame.
- the synthesized image As a candidate for a predicted image for each region, highly accurate prediction is realized, and efficient multi-view video encoding is realized.
- a composite image generated based on this depth is called a viewpoint composite image, a viewpoint interpolation image, or a parallax compensation image.
- high-efficiency prediction can be realized by a viewpoint synthesized image that has been subjected to high-precision parallax compensation using three-dimensional information of a subject obtained from a depth map. .
- the existing prediction for each area and the prediction based on the viewpoint composite image even when a viewpoint composite image with low accuracy is generated partially due to the influence of depth map quality and occlusion, By selecting whether or not the viewpoint composite image is a predicted image for each region, it is possible to prevent the code amount from increasing.
- Non-Patent Document 2 when the accuracy of the three-dimensional information represented by the depth map is low, the parallax is compensated only with low accuracy compared to the parallax compensation prediction using a general parallax vector. Therefore, there is a problem that highly efficient prediction cannot be realized.
- the present invention has been made in view of such circumstances, and even when the accuracy of the amount of parallax expressed by the depth map is low, it is possible to realize highly accurate prediction and realize highly efficient encoding. It is an object of the present invention to provide a moving image encoding method, a moving image decoding method, a moving image encoding device, a moving image decoding device, a moving image encoding program, and a moving image decoding program.
- One aspect of the present invention provides a reference viewpoint that is motion information of a reference viewpoint image with respect to a reference viewpoint that is different from the encoding target image when one frame of a multi-viewpoint moving image including a plurality of different viewpoint moving images is encoded. Encoding is performed for each encoding target region that is a region obtained by dividing the encoding target image while predicting between different viewpoints using motion information and a depth map for a subject in the multi-viewpoint moving image.
- a moving image encoding apparatus wherein a corresponding region setting unit that sets a corresponding region on the depth map and a prediction region that is a region obtained by dividing the encoding target region are set for the encoding target region And a disparity vector for generating a disparity vector for the reference viewpoint using depth information in a region corresponding to the prediction region in the corresponding region for each prediction region
- the disparity vector generation unit further generates a disparity vector for the depth map for the encoding target region, and the corresponding region setting unit for the depth map An area indicated by a disparity vector may be set as the corresponding area.
- the disparity vector generation unit sets the disparity vector for the depth map using a disparity vector used when encoding an area adjacent to the encoding target area. May be.
- the region dividing unit may set region division for the encoding target region based on depth information in the corresponding region.
- the disparity vector generation unit sets, for each prediction region, a representative depth from the depth information in the region corresponding to the prediction region in the corresponding region, and the representative depth
- the disparity vector for the reference viewpoint may be set based on
- the motion information generation unit calculates a corresponding position in the reference viewpoint using a pixel position predetermined for the prediction region and the disparity vector with respect to the reference viewpoint.
- the motion information given to the region including the corresponding position in the reference viewpoint motion information may be obtained as the motion information in the prediction region.
- a reference image setting unit that sets, as a reference image, an already encoded frame that is different from the encoding target image at the encoding target viewpoint for the encoding target image.
- the motion information generation unit further includes the motion information in the prediction region by scaling the motion information obtained from the reference viewpoint motion information according to a time interval between the encoding target image and the reference image. May be generated.
- the predicted image generation unit includes a first predicted image generated using the motion information in the prediction region, the disparity vector with respect to the reference viewpoint, and the reference viewpoint image.
- the predicted image for the prediction region may be generated using the second predicted image generated by using the second predicted image.
- motion information of a reference viewpoint image with respect to a reference viewpoint different from the decoding target image Decoding is performed for each decoding target area, which is an area obtained by dividing the decoding target image, while predicting between different viewpoints using the reference viewpoint motion information and the depth map for the subject in the multi-viewpoint moving image.
- a moving image decoding apparatus that performs a corresponding region setting unit that sets a corresponding region on the depth map for the decoding target region, and a region that sets a prediction region that is a region obtained by dividing the decoding target region For each prediction area, a division unit and a disparity vector for generating a disparity vector for the reference viewpoint using depth information in an area corresponding to the prediction area in the corresponding area.
- the disparity vector generation unit further generates a disparity vector for the depth map for the decoding target region, and the corresponding region setting unit for the disparity vector for the depth map An area indicated by may be set as the corresponding area.
- the disparity vector generation unit may set the disparity vector for the depth map using a disparity vector used when decoding a region adjacent to the decoding target region.
- the area dividing unit may set area division for the decoding target area based on depth information in the corresponding area.
- the disparity vector generation unit sets a representative depth from the depth information in the region corresponding to the prediction region in the corresponding region for each prediction region, and sets the representative depth to the representative depth. Based on this, the disparity vector for the reference viewpoint may be set.
- the motion information generation unit obtains a corresponding position in the reference viewpoint using a pixel position predetermined for the prediction region and the disparity vector with respect to the reference viewpoint.
- motion information given to a region including the corresponding position may be used as the motion information in the prediction region.
- the moving image decoding apparatus of the present invention further includes a reference image setting unit configured to set, as a reference image, an already decoded frame that is different from the decoding target image at the decoding target viewpoint with respect to the decoding target image.
- the information generation unit may generate the motion information in the prediction region by scaling motion information obtained from the reference viewpoint motion information according to a time interval between the decoding target image and the reference image. .
- the predicted image generation unit uses the first predicted image generated using the motion information in the prediction region, the disparity vector with respect to the reference viewpoint, and the reference viewpoint image.
- the predicted image for the prediction region may be generated using the second predicted image generated in step S1.
- One aspect of the present invention provides a reference viewpoint that is motion information of a reference viewpoint image with respect to a reference viewpoint that is different from the encoding target image when one frame of a multi-viewpoint moving image including a plurality of different viewpoint moving images is encoded.
- Encoding is performed for each encoding target region that is a region obtained by dividing the encoding target image while predicting between different viewpoints using motion information and a depth map for a subject in the multi-viewpoint moving image.
- a corresponding region setting step for setting a corresponding region on the depth map for the encoding target region, and a prediction region that is a region obtained by dividing the encoding target region are set.
- a disparity vector for the reference viewpoint is generated using the depth information in the region corresponding to the prediction region in the corresponding region.
- the motion information generation step of generating motion information in the prediction region from the reference viewpoint motion information based on the disparity vector with respect to the reference view, and the motion information in the prediction region,
- a predicted image generation step of generating a predicted image for a prediction region.
- motion information of a reference viewpoint image with respect to a reference viewpoint different from the decoding target image Decoding is performed for each decoding target area, which is an area obtained by dividing the decoding target image, while predicting between different viewpoints using the reference viewpoint motion information and the depth map for the subject in the multi-viewpoint moving image.
- a moving image decoding method for performing a corresponding region setting step for setting a corresponding region on the depth map for the decoding target region, and a region for setting a prediction region which is a region obtained by dividing the decoding target region
- a disparity vector for the reference viewpoint is generated using a division step and depth information in a region corresponding to the prediction region in the corresponding region.
- the motion information generation step of generating motion information in the prediction region from the reference viewpoint motion information based on the disparity vector with respect to the reference view, and the motion information in the prediction region, And a predicted image generation step of generating a predicted image for the prediction region.
- One aspect of the present invention is a moving image encoding program for causing a computer to execute the moving image encoding method.
- One aspect of the present invention is a moving picture decoding program for causing a computer to execute the moving picture decoding method.
- the present invention even when the accuracy of the parallax expressed by the depth map is low, it is possible to realize prediction with decimal pixel accuracy, and it is possible to encode a multi-view video with a small code amount. An effect is obtained.
- FIG. 4 It is a block diagram which shows the hardware constitutions when the moving image decoding apparatus 200 shown in FIG. 4 is comprised by a computer and a software program. It is a conceptual diagram which shows the parallax which arises between cameras. It is a conceptual diagram of epipolar geometric constraint.
- a moving picture coding apparatus and a moving picture decoding apparatus will be described with reference to the drawings.
- a multi-view video captured by two cameras a first camera (camera A) and a second camera (camera B)
- a description will be given assuming that one frame of a moving image of the camera B is encoded or decoded as a viewpoint.
- information necessary for obtaining the parallax from the depth information is given separately. Specifically, this information is an external parameter representing the positional relationship between the camera A and the camera B, or an internal parameter representing projection information on the image plane by the camera. Other information may be given as long as parallax can be obtained.
- FIG. 1 is a block diagram showing a configuration of a moving picture coding apparatus according to this embodiment.
- the moving image encoding apparatus 100 includes an encoding target image input unit 101, an encoding target image memory 102, a reference viewpoint motion information input unit 103, a depth map input unit 104, a motion information generation unit 105, A viewpoint composite image generation unit 106, an image encoding unit 107, an image decoding unit 108, and a reference image memory 109 are provided.
- the encoding target image input unit 101 inputs an image to be encoded.
- the image to be encoded is referred to as an encoding target image.
- the moving image of the camera B is input frame by frame.
- the viewpoint (here, camera B) that captured the encoding target image is referred to as an encoding target viewpoint.
- the encoding target image memory 102 stores the input encoding target image.
- the reference viewpoint motion information input unit 103 inputs motion information (such as a motion vector) for the reference viewpoint moving image.
- the motion information input here is referred to as reference viewpoint motion information.
- reference viewpoint motion information the movement information of the camera A is input.
- the depth map input unit 104 inputs a depth map to be referred to when generating a viewpoint composite image.
- a depth map for an encoding target image is input, but a depth map for another viewpoint such as a reference viewpoint may be used.
- the depth map represents the three-dimensional position of the subject shown in each pixel of the corresponding image.
- the depth map may be any information as long as the three-dimensional position can be obtained by information such as camera parameters given separately. For example, a distance from the camera to the subject, a coordinate value with respect to an axis that is not parallel to the image plane, and a parallax amount with respect to another camera (for example, camera A) can be used.
- a parallax map that directly represents the parallax amount may be used instead of the depth map.
- the depth map is assumed to be delivered in the form of an image, but it may not be in the form of an image as long as similar information can be obtained.
- the motion information generation unit 105 generates motion information for the encoding target image using the reference viewpoint motion information and the depth map.
- the viewpoint composite image generation unit 106 generates a viewpoint composite image for the encoding target image from the reference image based on the generated motion information.
- the image encoding unit 107 predictively encodes the encoding target image while using the viewpoint synthesized image.
- the image decoding unit 108 decodes the bit stream of the encoding target image.
- the reference image memory 109 stores an image obtained when the bit stream of the encoding target image is decoded.
- FIG. 2 is a flowchart showing the operation of the video encoding apparatus 100 shown in FIG.
- the encoding target image input unit 101 receives the encoding target image Org, and stores the input encoding target image Org in the encoding target image memory 102 (step S101).
- the reference viewpoint motion information input unit 103 inputs the reference viewpoint motion information and outputs the input reference viewpoint motion information to the motion information generation unit 105
- the depth map input unit 104 inputs and inputs the depth map.
- the depth map is output to the motion information generation unit 105 (step S102).
- the reference viewpoint motion information and the depth map input in step S102 are the same as those obtained on the decoding side, such as the one already decoded. This is to suppress the occurrence of coding noise such as drift by using exactly the same information obtained by the decoding device. However, when the generation of such coding noise is allowed, the one that can be obtained only on the coding side, such as the one before coding, may be input.
- the depth map in addition to the one already decoded, the depth map estimated by applying stereo matching etc. to the multi-view video decoded for multiple cameras, or decoded
- the depth map estimated using the disparity vector, the motion vector, and the like can also be used as the same one can be obtained on the decoding side.
- the reference viewpoint motion information may be motion information used when encoding an image for the reference viewpoint, or may be separately encoded for the reference viewpoint. It is also possible to use motion information obtained by decoding and estimating a moving image for the reference viewpoint.
- Step S103 to S109 When the input of the encoding target image, the reference viewpoint motion information, and the depth map is finished, the encoding target image is divided into regions of a predetermined size, and the video signal of the encoding target image is encoded for each of the divided regions.
- Step S103 to S109 That is, assuming that the encoding target area index is blk and the total number of encoding target areas in one frame is represented by numBlks, blk is initialized to 0 (step S103), and then 1 is added to blk (step S108). ), The following processing (steps S104 to S107) is repeated until blk becomes numBlks (step S109).
- it is divided into processing unit blocks called macroblocks of 16 pixels ⁇ 16 pixels, but may be divided into blocks of other sizes as long as they are the same as those on the decoding side.
- the motion information generation unit 105 generates motion information in the encoding target area blk (step S104). This process will be described later in detail.
- the viewpoint composite image generation unit 106 generates a viewpoint composite image Syn for the encoding target region blk from the image stored in the reference image memory 109 according to the motion information. (Step S105). Specifically, for each pixel p included in the encoding target region blk, the viewpoint composite image generation unit 106 corresponds to the pixel of the corresponding point on the reference image represented by the motion information, as represented by the following equation:
- mv (p) and Ref (p) represent the motion vector indicated by the motion information for the pixel p and its time interval
- Dec T is an image stored in the reference image memory 109 at the time interval T with respect to the encoding target image. (Reference image). If the corresponding point p + mv (p) is not an integer pixel position, the pixel value at the nearest integer pixel position may be set as the pixel value of the corresponding point, or the integer pixel group around the corresponding point may be filtered. By applying, pixel values at corresponding points may be generated. However, the pixel value of the corresponding point is generated by the same method as the processing on the decoding side.
- a viewpoint composite image may be generated based on the average value. That is, when the number of motion information for the pixel p is represented by N (p) and the index of motion information is n, the viewpoint composite image is represented by the following mathematical formula.
- this formula does not consider rounding to an integer in division, but an offset may be added so as to round off. Specifically, after N (p) / 2 is added, division is performed by N (p). In addition, when there are three or more pieces of motion information, prediction may be performed using a median value instead of an average value. However, it is necessary to perform the same processing as that on the decoding side. Here, the viewpoint composite image is generated for each pixel. However, when the same motion information is provided for each small area, the process can be performed for each small area.
- the image encoding unit 107 encodes the video signal (pixel value) of the encoding target image in the encoding target region blk while using the viewpoint composite image as a predicted image (step) S106).
- the bit stream obtained as a result of encoding is the output of the video encoding apparatus 100.
- any method may be used as the encoding method.
- MPEG-2 and H.264 In general coding such as H.264 / AVC, frequency conversion such as DCT (Discrete Cosine Transform), quantization, binarization, and entropy coding is performed on the difference signal between the video signal of the block blk and the predicted image. Encoding is performed in order.
- the image decoding unit 108 decodes the video signal for the block blk using the bit stream and the view synthesized image, and stores the decoded image Dec [blk] as a decoding result in the reference image memory 109 (Ste S107).
- a method corresponding to the method used at the time of encoding is used.
- MPEG-2 and H.264 In the case of general encoding such as H.264 / AVC, it is obtained by sequentially performing inverse frequency transformation such as entropy decoding, inverse binarization, inverse quantization, IDCT (Inverse Discrete Cosine Transform) on code data.
- the predicted image is added to the two-dimensional signal, and finally the video signal is decoded by performing clipping in the pixel value range.
- the data immediately before the processing on the encoding side becomes lossless and the predicted image may be received, and decoding may be performed by simplified processing. That is, in the above-described example, a two-dimensional signal obtained by receiving a value obtained by applying quantization processing at the time of encoding and a predicted image, and sequentially performing inverse quantization and frequency inverse transform on the quantized value.
- a video signal may be decoded by adding a predicted image to the image and performing clipping in a pixel value range.
- step S104 shown in FIG. 2 a process (step S104 shown in FIG. 2) in which the motion information generation unit 105 shown in FIG. 1 generates motion information in the encoding target region blk will be described in detail.
- the motion information generation unit 105 sets an area on the depth map for the encoding target area blk (corresponding area on the depth map for the encoding target area) (step S1401).
- the depth map for the encoding target image is input, the depth map at the same position as the encoding target region blk is set. If the resolution of the encoding target image and the depth map are different, a scaled area is set according to the resolution ratio.
- the parallax DV between the encoding target viewpoint and the depth viewpoint in the encoding target region blk (disparity vector for the depth map) And a depth map at blk + DV is set.
- the position and size are scaled according to the resolution ratio.
- the parallax DV between the encoding target viewpoint and the depth viewpoint in the encoding target region blk may be obtained using any method as long as it is the same method as that on the decoding side.
- the disparity vector used when encoding the peripheral region of the encoding target region blk, the global disparity vector set for the entire encoding target image or the partial image including the encoding target region, the encoding target It is possible to use a disparity vector or the like that is separately set and encoded for a region.
- region blk, or the image encoded in the past may be accumulate
- a disparity vector obtained by converting the depth value at the same position as the encoding target area of the depth map encoded in the past with respect to the encoding target viewpoint may be used.
- the motion information generation unit 105 determines a method for dividing the encoding target region blk (step S1402).
- division may be performed by a predetermined method, or a division method may be determined by analyzing a set depth map. However, the division method is set in the same manner as the decoding side.
- a method of setting a division with a fixed block size For example, there is a method of dividing into a block of 4 pixels ⁇ 4 pixels or a block of 8 pixels ⁇ 8 pixels.
- MAX (a, b) represents the maximum value of a and b.
- the size of the encoding target area is larger than 16 pixels ⁇ 16 pixels, it is divided into blocks of 8 pixels ⁇ 8 pixels, and when the size of the encoding target area is 16 pixels ⁇ 16 pixels or less, it is 4 pixels. There is also a method of making the block size of ⁇ 4 pixels.
- the division method As a method of determining the division method by analyzing the depth map, for example, a method using the result of clustering based on the depth value, or a quadtree so that the variance value of the generated depth value for each division is equal to or less than a threshold value. There is a method of recursively dividing by expression. Instead of the generated variance value of the depth value for each division, the division method may be determined by comparing the depth values at the four vertices of the corresponding region on the depth map for the encoding target region.
- step S1403 to S1409 motion information is generated for each sub-region (prediction region) generated according to the division method. That is, assuming that the sub-region index is sblk and the number of sub-regions in the encoding target region blk is represented by numSBlks blk , sblk is initialized with 0 (step S1403), and then 1 is added to sblk (step S1408). , Sblk becomes numSBlks blk (step S1409), the following processing (steps S1404 to S1407) is repeated.
- the motion information generation unit 105 obtains a representative depth value from the set depth map (depth information in the region corresponding to the prediction region in the corresponding region) for the sub-region sblk. Determination is made (step S1404).
- the representative depth value may be determined using any method, but it is necessary to use the same method as that on the decoding side. For example, there is a method using an average value, median value, maximum value, minimum value, or the like of the depth map for the sub-region sblk. Further, an average value, a median value, a maximum value, a minimum value, or the like of depth values for some pixels may be used instead of all the pixels of the depth map for the sub-region sblk. As some pixels, four vertices or four vertices and the center may be used. Further, there is a method of using a depth value for a predetermined position such as the upper left or the center with respect to the sub-region sblk.
- the motion information generation unit 105 obtains a disparity vector dv sblk (disparity vector with respect to the reference viewpoint) from the representative depth value using information on the positional relationship between the encoding target viewpoint and the reference viewpoint (step S1405).
- a method for obtaining a disparity vector dv sblk by backprojection and reprojection using camera parameters a method for obtaining a disparity vector dv sblk by conversion using a homography matrix, and a disparity vector for a depth value created in advance
- a method of obtaining the disparity vector dv sblk with reference to the look-up table is a method of obtaining the disparity vector dv sblk with reference to the look-up table.
- the motion information generation unit 105 obtains a corresponding position in the reference viewpoint (step S1406). Specifically, the corresponding position is obtained by adding the disparity vector dv sblk to the point P sblk representing the sub-region sblk (the position of the pixel determined in advance with respect to the prediction region). As a point representing the sub region, a predetermined position such as the upper left or the center of the sub region can be used. It should be noted that which position is the representative point must be the same as that on the decoding side.
- the motion information generation unit 105 sets the reference viewpoint motion information accumulated for the region including the corresponding point P sblk + dv sblk in the reference viewpoint as motion information (motion information in the prediction region) for the sub region sbllk. (Step S1407). If no motion information is accumulated for the region including the corresponding point P sblk + dv sblk , the corresponding point may be set even if information without motion information is set or default motion information such as a zero vector is set. A region in which motion information closest to P sblk + dv sblk is accumulated may be identified, and the motion information accumulated in that region may be set. However, motion information is set according to the same rules as those on the decoding side.
- the reference viewpoint motion information is set as the motion information as it is, but the time interval is set in advance, the motion information is scaled according to the predetermined time interval and the time interval in the reference viewpoint motion information, and the reference viewpoint motion is determined.
- the motion information obtained by replacing the time interval in the information with the predetermined time interval may be set.
- the motion information generated for different regions all have the same time interval, and the reference image used when generating the viewpoint composite image is unified (the encoding target at the encoding target viewpoint).
- An already encoded one frame different from the image is set as a reference image), and the memory space to be accessed can be limited.
- the viewpoint composite image (first predicted image) is generated using only the motion information generated from the reference viewpoint motion information for each sub-region, but is obtained for each sub-region in addition to the motion information.
- the disparity vector dv sblk may be used.
- a viewpoint composite image is generated by the following formula (the prediction for the prediction region is obtained by obtaining an average value rounded to an integer of the first prediction image and the second prediction image) An image may be generated).
- a viewpoint composite image may be generated while selecting either motion information or a disparity vector for each sub-region or pixel. Furthermore, either one or both of the motion information and the disparity vector may be selected. Note that any method may be used for selection as long as it is the same method as that on the decoding side. For example, when the reference viewpoint motion information set in step S1407 does not exist for the corresponding point obtained in step S1406, or when a moving image for the reference viewpoint is encoded, prediction other than motion compensation prediction is performed in a region including the corresponding point. There is a method of generating a viewpoint composite image using a disparity vector in the case of using, and generating a viewpoint composite image using motion information in other cases.
- the viewpoint composite image using the disparity vector is used.
- the prediction residual when the motion compensation prediction is performed using the motion information generated for sblk in the region sblk + dv sblk may be generated using the encoded video image at the reference viewpoint. .
- a viewpoint composite image is generated using a disparity vector, and when the amount is equal to or less than the threshold, motion information is used.
- ResIVMC DecIV [sblk + dv sblk ] ⁇ DecIV Ref (sblk) [sblk + dv sblk + mv (sblk)]
- DecIV T there is also a method of generating and using the next prediction residual ResPastIV in addition to ResIVMC. Specifically,
- ResPastIV Dec Ref (sblk) [sblk + mv (sblk)] ⁇ DecIV Ref (sblk) [sblk + dv sblk + mv (sblk)]
- a threshold value may be set,
- may be compared with the set threshold value, and a viewpoint composite image may be generated using only information corresponding to a value smaller than the threshold value.
- both are larger than the threshold, only the information corresponding to the residual having a smaller value according to the above-described rule may be used, or both may be used.
- a viewpoint synthesized image or its candidate may be generated according to the following mathematical formula.
- Syn [p] Dec Ref (p) [p + mv (p)] + w0 ⁇ (DecIV [p + dv (p)] ⁇ DecIV Ref (p) [p + dv (p) + mv (p)])
- Syn [p] DecIV [p + dv (p)] + w1 ⁇ (Dec Ref (p) [p + mv (p)] ⁇ DecIV Ref (p) [p + dv (p) + mv (p)])
- w0 and w1 are weighting factors determined separately, and may be determined in any way as long as they are the same values as those on the decoding side.
- a predetermined value may be used.
- an image generated according to the above formula as a candidate for a viewpoint composite image, the parallax obtained for each sub-region instead of the viewpoint composite image generated using only the motion information generated from the reference viewpoint motion information described above
- an image generated according to the above formula may be used, or an image generated according to the above formula may be added as an alternative to the option.
- FIG. 4 is a block diagram showing the configuration of the moving picture decoding apparatus according to this embodiment.
- the moving image decoding apparatus 200 includes a bit stream input unit 201, a bit stream memory 202, a reference viewpoint motion information input unit 203, a depth map input unit 204, a motion information generation unit 205, and a viewpoint composite image generation unit. 206, an image decoding unit 207, and a reference image memory 208.
- the bit stream input unit 201 inputs a moving image bit stream to be decoded.
- a decoding target image indicates one frame of the moving image of the camera B.
- the viewpoint (here, camera B) that captured the decoding target image is referred to as a decoding target viewpoint.
- the bit stream memory 202 stores a bit stream for the input decoding target image.
- the reference viewpoint motion information input unit 203 inputs motion information (such as a motion vector) for the reference viewpoint moving image.
- reference viewpoint motion information is referred to as reference viewpoint motion information.
- the depth map input unit 204 inputs a depth map to be referred to when generating a viewpoint composite image.
- a depth map for a decoding target image is input, but a depth map for another viewpoint such as a reference viewpoint may be used.
- the depth map represents the three-dimensional position of the subject shown in each pixel of the corresponding image.
- the depth map may be any information as long as the three-dimensional position can be obtained by information such as camera parameters given separately. For example, a distance from the camera to the subject, a coordinate value with respect to an axis that is not parallel to the image plane, and a parallax amount with respect to another camera (for example, camera A) can be used.
- a parallax map that directly represents the parallax amount may be used instead of the depth map.
- the depth map is assumed to be delivered in the form of an image, but it may not be in the form of an image as long as similar information can be obtained.
- the motion information generation unit 205 uses the reference viewpoint motion information and the depth map to generate motion information for the decoding target image.
- the viewpoint composite image generation unit 206 generates a viewpoint composite image for the decoding target image from the reference image based on the generated motion information.
- the image decoding unit 207 decodes and outputs the decoding target image from the bitstream using the viewpoint synthesized image.
- the reference image memory 208 stores the obtained decoding target image for subsequent decoding.
- FIG. 5 is a flowchart showing the operation of the video decoding device 200 shown in FIG.
- the bit stream input unit 201 inputs a bit stream obtained by encoding a decoding target image, and stores the input bit stream in the bit stream memory 202 (step S201).
- the reference viewpoint motion information input unit 203 inputs the reference viewpoint motion information and outputs the input reference viewpoint motion information to the motion information generation unit 205
- the depth map input unit 204 inputs and inputs the depth map.
- the depth map is output to the motion information generation unit 205 (step S202).
- the reference viewpoint motion information and the depth map input in step S202 are the same as those used on the encoding side. This is to suppress the occurrence of coding noise such as drift by using exactly the same information as that obtained by the moving picture coding apparatus. However, if such encoding noise is allowed to occur, a different one from that used at the time of encoding may be input.
- depth maps in addition to those separately decoded, depth maps estimated by applying stereo matching etc. to multi-viewpoint images decoded for multiple cameras, decoded parallax vectors, motion vectors, etc. A depth map estimated by using may be used.
- the reference viewpoint motion information may be the motion information used when decoding the image for the reference viewpoint, or may be separately encoded for the reference viewpoint. It is also possible to use motion information obtained by decoding and estimating a moving image for the reference viewpoint.
- the decoding target image is divided into regions of a predetermined size, and the video signal of the decoding target image is decoded from the bit stream for each divided region.
- Steps S203 to S208 That is, assuming that the decoding target region index is blk and the total number of decoding target regions in one frame is represented by numBlks, blk is initialized with 0 (step S203), and then 1 is added to blk (step S207). The following processing (steps S204 to S206) is repeated until blk becomes numBlks (step S208).
- it is divided into processing unit blocks called macroblocks of 16 pixels ⁇ 16 pixels, but may be divided into blocks of other sizes as long as they are the same as those on the encoding side.
- the motion information generation unit 205 In the process repeated for each decoding target area, first, the motion information generation unit 205 generates motion information in the decoding target area blk (step S204). This process is the same as step S104 described above.
- the viewpoint composite image generation unit 206 When the motion information for the decoding target area blk is obtained, the viewpoint composite image generation unit 206 generates a viewpoint composite image Syn for the decoding target area blk from the image stored in the reference image memory 208 according to the motion information (step). S205).
- the process here is the same as step S105 described above.
- the image decoding unit 207 decodes the decoding target image from the bitstream using the viewpoint composite image as the predicted image (step S206).
- the obtained decoding target image is stored in the reference image memory 208 and is output from the moving image decoding apparatus 200.
- a method corresponding to the method used at the time of encoding is used.
- MPEG-2 and H.264 When general coding such as H.264 / AVC is used, frequency inverse transform such as entropy decoding, inverse binarization, inverse quantization, and IDCT (Inverse Discrete Cosine Transform) is sequentially performed on the code data. Then, a predicted image is added to the obtained two-dimensional signal, and finally, the video signal is decoded by performing clipping in the range of pixel values.
- the motion information and the viewpoint composite image are generated for each region obtained by dividing the encoding target image or the decoding target image.
- both the motion information or the motion information and the viewpoint composite image are to be encoded. You may produce
- a memory for buffering the generated motion information is required.
- a memory for storing the generated viewpoint composite image is required.
- the process of the present technique has been described as the process of encoding / decoding the entire image, but the process can be applied to only a part of the image. In this case, it may be determined whether or not the process is to be applied, and a flag indicating whether or not to apply may be encoded or decoded, or the necessity of application may be designated by some other means. For example, you may use the method of expressing the necessity of application as one of the modes which show the method of producing
- the pixel value is not predicted directly between viewpoints, but the pixel value is temporally predicted after predicting a motion vector between viewpoints. Therefore, even when the accuracy of the parallax expressed by the depth map is low, it is possible to realize the prediction with the decimal pixel accuracy, and it is possible to encode the multi-view video with a small code amount.
- FIG. 6 is a block diagram showing a hardware configuration when the above-described moving image encoding apparatus 100 is configured by a computer and a software program.
- the system shown in FIG. 6 includes a CPU (Central Processing Unit) 50 that executes a program, a memory 51 such as a RAM (Random Access Memory) that stores programs and data accessed by the CPU 50, and an encoding target from a camera or the like.
- An encoding target image input unit 52 (which may be a storage unit that stores an image signal from a disk device or the like), and a reference viewpoint motion information input unit 53 (a disk that inputs reference viewpoint motion information from a memory or the like).
- a depth map input unit 54 for inputting a depth map for the viewpoint where the image to be encoded from the depth camera or the like is captured (a memory for storing the depth map by the disk device or the like).
- Image code that is a software program that causes the CPU 50 to execute the moving image encoding process.
- a bit stream output unit 56 that outputs a bit stream generated by executing the image storage program 55 stored in the encoding program 551 and the image encoding program 551 loaded in the memory 51 by, for example, a network. (It may be a storage unit that stores a bit stream by a disk device or the like) and is connected by a bus.
- FIG. 7 is a block diagram showing a hardware configuration when the above-described moving picture decoding apparatus 200 is configured by a computer and a software program.
- the system shown in FIG. 7 includes a CPU 60 that executes a program, a memory 51 such as a RAM that stores programs and data accessed by the CPU 60, and a bit that inputs a bit stream encoded by the moving image encoding apparatus according to this method.
- a stream input unit 62 may be a storage unit that stores a bit stream by a disk device or the like), and a reference viewpoint motion information input unit 63 that inputs motion information of a reference viewpoint from a memory or the like (stores motion information by a disk device or the like).
- a storage unit A storage unit
- a depth map input unit 64 (which may be a storage unit that stores depth information by a disk device or the like) that inputs a depth map for a viewpoint where a decoding target from a depth camera or the like is captured
- image decoding processing Image decoding program 651 which is a software program executed by CPU 60
- the stored program storage device 65 and the decoding target image that outputs the decoding target image obtained by decoding the bitstream to the playback device by executing the image decoding program 651 loaded in the memory 61 by the CPU 60
- the output unit 66 (which may be a storage unit that stores an image signal by a disk device or the like) is connected by a bus.
- the moving picture encoding apparatus 100 and the moving picture decoding apparatus 200 in the above-described embodiment may be realized by a computer.
- a program for realizing this function is recorded on a computer-readable recording medium, and the program recorded on the recording medium is read into a computer system and executed, whereby the moving picture coding apparatus 100 and The moving picture decoding apparatus 200 may be realized.
- the “computer system” includes hardware such as an OS (Operating System) and peripheral devices.
- “Computer-readable recording medium” means a portable medium such as a flexible disk, a magneto-optical disk, a ROM (Read Only Memory), a CD (Compact Disk) -ROM, or a hard disk built in a computer system. Refers to the device.
- the “computer-readable recording medium” dynamically holds a program for a short time like a communication line when transmitting a program via a network such as the Internet or a communication line such as a telephone line.
- a volatile memory inside a computer system serving as a server or a client in that case may be included and a program held for a certain period of time.
- the program may be a program for realizing a part of the functions described above, and may be a program capable of realizing the functions described above in combination with a program already recorded in a computer system.
- the moving image encoding device 100 and the moving image decoding device 200 may be realized using hardware such as PLD (Programmable Logic Device) and FPGA (Field Programmable Gate Array).
- the present invention uses, for example, an image captured from a viewpoint different from the viewpoint from which the encoding (decoding) target image is captured and a depth map for the subject in the image to the viewpoint of the encoding (decoding) target image.
- the present invention can be applied to a purpose of achieving high coding efficiency.
- DESCRIPTION OF SYMBOLS 100 ... Moving image encoding device 101 ... Encoding object image input part 102 ... Encoding object image memory 103 ... Reference viewpoint motion information input part 104 ... Depth map input part 105 ... Motion information generation unit 106 ... viewpoint composite image generation unit 107 ... image encoding unit 108 ... image decoding unit 109 ... reference image memory 200 ... moving image decoding apparatus 201 ... bitstream input Unit 202 ... bitstream memory 203 ... reference viewpoint motion information input unit 204 ... depth map input unit 205 ... motion information generation unit 206 ... viewpoint composite image generation unit 207 ... image decoding unit 208: Reference image memory
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Compression Or Coding Systems Of Tv Signals (AREA)
- Testing, Inspecting, Measuring Of Stereoscopic Televisions And Televisions (AREA)
Abstract
Description
本願は、2013年10月17日に日本へ出願された特願2013-216526号に基づき優先権を主張し、その内容をここに援用する。
Syn[p]=DecRef(p)[p+mv(p)]
Syn[p]=(DecRef(p)[p+mv(p)]+DecIV[p+dv(p)]+1)/2
なお、dv(p)は画素pを含むサブ領域に対する視差ベクトルを表す。
ResIVMC=DecIV[sblk+dvsblk]―DecIVRef(sblk)[sblk+dvsblk+mv(sblk)]
ResPastIV=DecRef(sblk)[sblk+mv(sblk)]―DecIVRef(sblk)[sblk+dvsblk+mv(sblk)]
Syn[p]=DecRef(p)[p+mv(p)]+w0×(DecIV[p+dv(p)]-DecIVRef(p)[p+dv(p)+mv(p)])
Syn[p]=DecIV[p+dv(p)]+w1×(DecRef(p)[p+mv(p)]-DecIVRef(p)[p+dv(p)+mv(p)])
ここで、w0及びw1は別途定められた重み係数であり、復号側と同じ値であればどのように決定してもよい。例えば、予め定めた値を使用してもよい。上記数式に従って生成された画像を視点合成画像の候補として用いる場合は、前述した参照視点動き情報から生成される動き情報のみを用いて生成した視点合成画像の代わりや、サブ領域ごとに得られる視差ベクトルのみを用いて生成した視点合成画像の代わりに、上記数式に従って生成された画像を用いてもよいし、上記数式に従って生成された画像を別の候補として選択肢に加えてもよい。
101・・・符号化対象画像入力部
102・・・符号化対象画像メモリ
103・・・参照視点動き情報入力部
104・・・デプスマップ入力部
105・・・動き情報生成部
106・・・視点合成画像生成部
107・・・画像符号化部
108・・・画像復号部
109・・・参照画像メモリ
200・・・動画像復号装置
201・・・ビットストリーム入力部
202・・・ビットストリームメモリ
203・・・参照視点動き情報入力部
204・・・デプスマップ入力部
205・・・動き情報生成部
206・・・視点合成画像生成部
207・・・画像復号部
208・・・参照画像メモリ
Claims (20)
- 複数の異なる視点の動画像からなる多視点動画像の1フレームを符号化する際に、符号化対象画像とは異なる参照視点に対する参照視点画像の動き情報である参照視点動き情報と、前記多視点動画像中の被写体に対するデプスマップとを用いて、異なる視点間で予測しながら、前記符号化対象画像を分割した領域である符号化対象領域ごとに、符号化を行う動画像符号化装置であって、
前記符号化対象領域に対して、前記デプスマップ上での対応領域を設定する対応領域設定部と、
前記符号化対象領域を分割した領域である予測領域を設定する領域分割部と、
前記予測領域ごとに、前記対応領域内の当該予測領域に対応する領域におけるデプス情報を用いて、前記参照視点に対する視差ベクトルを生成する視差ベクトル生成部と、
前記参照視点に対する前記視差ベクトルに基づいて、前記参照視点動き情報から前記予測領域における動き情報を生成する動き情報生成部と、
前記予測領域における前記動き情報を用いて、前記予測領域に対する予測画像を生成する予測画像生成部と
を備える動画像符号化装置。 - 前記視差ベクトル生成部は、さらに、前記符号化対象領域に対して、前記デプスマップに対する視差ベクトルを生成し、
前記対応領域設定部は、前記デプスマップに対する前記視差ベクトルによって示される領域を前記対応領域として設定する請求項1に記載の動画像符号化装置。 - 前記視差ベクトル生成部は、前記符号化対象領域に隣接する領域を符号化する際に使用した視差ベクトルを用いて、前記デプスマップに対する前記視差ベクトルを設定する請求項2に記載の動画像符号化装置。
- 前記領域分割部は、前記対応領域内のデプス情報に基づいて、前記符号化対象領域に対する領域分割を設定する請求項1から3のいずれか1項に記載の動画像符号化装置。
- 前記視差ベクトル生成部は、前記予測領域ごとに、前記対応領域内の当該予測領域に対応する前記領域における前記デプス情報から代表デプスを設定し、当該代表デプスに基づいて前記参照視点に対する前記視差ベクトルを設定する請求項1から4のいずれか1項に記載の動画像符号化装置。
- 前記動き情報生成部は、前記予測領域に対して予め定められた画素の位置と前記参照視点に対する前記視差ベクトルとを用いて、前記参照視点における対応位置を求め、前記参照視点動き情報のうち、当該対応位置を含む領域に対して与えられた動き情報を前記予測領域における前記動き情報とする請求項1から請求項5のいずれか1項に記載の動画像符号化装置。
- 前記符号化対象画像に対して、符号化対象視点における前記符号化対象画像とは異なる既に符号化済みの1フレームを参照画像として設定する参照画像設定部をさらに備え、
前記動き情報生成部は、前記参照視点動き情報から得られた動き情報を、前記符号化対象画像と前記参照画像との時間間隔に合わせてスケーリングすることで前記予測領域における前記動き情報を生成する請求項1から請求項6のいずれか1項に記載の動画像符号化装置。 - 前記予測画像生成部は、前記予測領域における前記動き情報を用いて生成した第1の予測画像と、前記参照視点に対する前記視差ベクトルと前記参照視点画像とを用いて生成した第2の予測画像とを用いて、前記予測領域に対する前記予測画像を生成する請求項1から請求項7のいずれか1項に記載の動画像符号化装置。
- 複数の異なる視点の動画像からなる多視点動画像の符号データから、復号対象画像を復号する際に、前記復号対象画像とは異なる参照視点に対する参照視点画像の動き情報である参照視点動き情報と、前記多視点動画像中の被写体に対するデプスマップとを用いて、異なる視点間で予測しながら、前記復号対象画像を分割した領域である復号対象領域ごとに、復号を行う動画像復号装置であって、
前記復号対象領域に対して、前記デプスマップ上での対応領域を設定する対応領域設定部と、
前記復号対象領域を分割した領域である予測領域を設定する領域分割部と、
前記予測領域ごとに、前記対応領域内の当該予測領域に対応する領域におけるデプス情報を用いて、前記参照視点に対する視差ベクトルを生成する視差ベクトル生成部と、
前記参照視点に対する前記視差ベクトルに基づいて、前記参照視点動き情報から前記予測領域における動き情報を生成する動き情報生成部と、
前記予測領域における前記動き情報を用いて、前記予測領域に対する予測画像を生成する予測画像生成部と
を備える動画像復号装置。 - 前記視差ベクトル生成部は、さらに、前記復号対象領域に対して、前記デプスマップに対する視差ベクトルを生成し、
前記対応領域設定部は、前記デプスマップに対する前記視差ベクトルによって示される領域を前記対応領域として設定する請求項9に記載の動画像復号装置。 - 前記視差ベクトル生成部は、前記復号対象領域に隣接する領域を復号する際に使用した視差ベクトルを用いて、前記デプスマップに対する前記視差ベクトルを設定する請求項10に記載の動画像復号装置。
- 前記領域分割部は、前記対応領域内のデプス情報に基づいて、前記復号対象領域に対する領域分割を設定する請求項9から請求項11のいずれか1項に記載の動画像復号装置。
- 前記視差ベクトル生成部は、前記予測領域ごとに、前記対応領域内の当該予測領域に対応する前記領域における前記デプス情報から代表デプスを設定し、当該代表デプスに基づいて前記参照視点に対する前記視差ベクトルを設定する請求項9から請求項12のいずれか1項に記載の動画像復号装置。
- 前記動き情報生成部は、前記予測領域に対して予め定められた画素の位置と前記参照視点に対する前記視差ベクトルとを用いて、前記参照視点における対応位置を求め、前記参照視点動き情報のうち、当該対応位置を含む領域に対して与えられた動き情報を前記予測領域における前記動き情報とする請求項9から請求項13のいずれか1項に記載の動画像復号装置。
- 前記復号対象画像に対して、復号対象視点における前記復号対象画像とは異なる既に復号済みの1フレームを参照画像として設定する参照画像設定部をさらに備え、
前記動き情報生成部は、前記参照視点動き情報から得られた動き情報を、前記復号対象画像と前記参照画像との時間間隔に合わせてスケーリングすることで前記予測領域における前記動き情報を生成する請求項9から請求項14のいずれか1項に記載の動画像復号装置。 - 前記予測画像生成部は、前記予測領域における前記動き情報を用いて生成した第1の予測画像と、前記参照視点に対する前記視差ベクトルと前記参照視点画像とを用いて生成した第2の予測画像とを用いて、前記予測領域に対する前記予測画像を生成する請求項9から請求項15のいずれか1項に記載の動画像復号装置。
- 複数の異なる視点の動画像からなる多視点動画像の1フレームを符号化する際に、符号化対象画像とは異なる参照視点に対する参照視点画像の動き情報である参照視点動き情報と、前記多視点動画像中の被写体に対するデプスマップとを用いて、異なる視点間で予測しながら、前記符号化対象画像を分割した領域である符号化対象領域ごとに、符号化を行う動画像符号化方法であって、
前記符号化対象領域に対して、前記デプスマップ上での対応領域を設定する対応領域設定ステップと、
前記符号化対象領域を分割した領域である予測領域を設定する領域分割ステップと、
前記予測領域ごとに、前記対応領域内の当該予測領域に対応する領域におけるデプス情報を用いて、前記参照視点に対する視差ベクトルを生成する視差ベクトル生成ステップと、
前記参照視点に対する前記視差ベクトルに基づいて、前記参照視点動き情報から前記予測領域における動き情報を生成する動き情報生成ステップと、
前記予測領域における前記動き情報を用いて、前記予測領域に対する予測画像を生成する予測画像生成ステップと
を有する動画像符号化方法。 - 複数の異なる視点の動画像からなる多視点動画像の符号データから、復号対象画像を復号する際に、前記復号対象画像とは異なる参照視点に対する参照視点画像の動き情報である参照視点動き情報と、前記多視点動画像中の被写体に対するデプスマップとを用いて、異なる視点間で予測しながら、前記復号対象画像を分割した領域である復号対象領域ごとに、復号を行う動画像復号方法であって、
前記復号対象領域に対して、前記デプスマップ上での対応領域を設定する対応領域設定ステップと、
前記復号対象領域を分割した領域である予測領域を設定する領域分割ステップと、
前記予測領域ごとに、前記対応領域内の当該予測領域に対応する領域におけるデプス情報を用いて、前記参照視点に対する視差ベクトルを生成する視差ベクトル生成ステップと、
前記参照視点に対する前記視差ベクトルに基づいて、前記参照視点動き情報から前記予測領域における動き情報を生成する動き情報生成ステップと、
前記予測領域における前記動き情報を用いて、前記予測領域に対する予測画像を生成する予測画像生成ステップと
を有する動画像復号方法。 - コンピュータに、請求項17に記載の動画像符号化方法を実行させるための動画像符号化プログラム。
- コンピュータに、請求項18に記載の動画像復号方法を実行させるための動画像復号プログラム。
Priority Applications (4)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| KR1020167007560A KR101750421B1 (ko) | 2013-10-17 | 2014-10-15 | 동화상 부호화 방법, 동화상 복호 방법, 동화상 부호화 장치, 동화상 복호 장치, 동화상 부호화 프로그램, 및 동화상 복호 프로그램 |
| JP2015542635A JPWO2015056712A1 (ja) | 2013-10-17 | 2014-10-15 | 動画像符号化方法、動画像復号方法、動画像符号化装置、動画像復号装置、動画像符号化プログラム、及び動画像復号プログラム |
| US15/029,553 US10911779B2 (en) | 2013-10-17 | 2014-10-15 | Moving image encoding and decoding method, and non-transitory computer-readable media that code moving image for each of prediction regions that are obtained by dividing coding target region while performing prediction between different views |
| CN201480056611.6A CN105612748B (zh) | 2013-10-17 | 2014-10-15 | 活动图像编码方法、活动图像解码方法、活动图像编码装置、活动图像解码装置、活动图像编码程序、以及活动图像解码程序 |
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| JP2013216526 | 2013-10-17 | ||
| JP2013-216526 | 2013-10-17 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2015056712A1 true WO2015056712A1 (ja) | 2015-04-23 |
Family
ID=52828154
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/JP2014/077436 Ceased WO2015056712A1 (ja) | 2013-10-17 | 2014-10-15 | 動画像符号化方法、動画像復号方法、動画像符号化装置、動画像復号装置、動画像符号化プログラム、及び動画像復号プログラム |
Country Status (5)
| Country | Link |
|---|---|
| US (1) | US10911779B2 (ja) |
| JP (1) | JPWO2015056712A1 (ja) |
| KR (1) | KR101750421B1 (ja) |
| CN (1) | CN105612748B (ja) |
| WO (1) | WO2015056712A1 (ja) |
Cited By (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN112602325A (zh) * | 2018-12-27 | 2021-04-02 | Kddi 株式会社 | 图像解码装置、图像编码装置、程序和图像处理系统 |
| JP2021510251A (ja) * | 2018-01-05 | 2021-04-15 | コーニンクレッカ フィリップス エヌ ヴェKoninklijke Philips N.V. | 画像データビットストリームを生成するための装置及び方法 |
Families Citing this family (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP6758977B2 (ja) * | 2016-07-22 | 2020-09-23 | キヤノン株式会社 | 画像処理装置、画像処理方法およびプログラム |
| WO2020059616A1 (ja) * | 2018-09-21 | 2020-03-26 | 日本放送協会 | 画像符号化装置、画像復号装置、及びプログラム |
Citations (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2013001813A1 (ja) * | 2011-06-29 | 2013-01-03 | パナソニック株式会社 | 画像符号化方法、画像復号方法、画像符号化装置および画像復号装置 |
| JP2013074303A (ja) * | 2011-09-26 | 2013-04-22 | Nippon Telegr & Teleph Corp <Ntt> | 画像符号化方法、画像復号方法、画像符号化装置、画像復号装置、画像符号化プログラム及び画像復号プログラム |
Family Cites Families (11)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US5619256A (en) * | 1995-05-26 | 1997-04-08 | Lucent Technologies Inc. | Digital 3D/stereoscopic video compression technique utilizing disparity and motion compensated predictions |
| US5612735A (en) * | 1995-05-26 | 1997-03-18 | Luncent Technologies Inc. | Digital 3D/stereoscopic video compression technique utilizing two disparity estimates |
| EP2512139B1 (en) * | 2006-10-30 | 2013-09-11 | Nippon Telegraph And Telephone Corporation | Video encoding method and decoding method, apparatuses therefor, programs therefor, and storage media which store the programs |
| US9066075B2 (en) * | 2009-02-13 | 2015-06-23 | Thomson Licensing | Depth map coding to reduce rendered distortion |
| KR101628383B1 (ko) * | 2010-02-26 | 2016-06-21 | 연세대학교 산학협력단 | 영상 처리 장치 및 방법 |
| CN103404154A (zh) * | 2011-03-08 | 2013-11-20 | 索尼公司 | 图像处理设备、图像处理方法以及程序 |
| JP5932666B2 (ja) * | 2011-07-19 | 2016-06-08 | パナソニック インテレクチュアル プロパティ コーポレーション オブ アメリカPanasonic Intellectual Property Corporation of America | 画像符号化装置とその集積回路、および画像符号化方法 |
| CN104081414B (zh) * | 2011-09-28 | 2017-08-01 | Fotonation开曼有限公司 | 用于编码和解码光场图像文件的系统及方法 |
| EP2727366B1 (en) * | 2011-10-11 | 2018-10-03 | MediaTek Inc. | Method and apparatus of motion and disparity vector derivation for 3d video coding and hevc |
| CN102510500B (zh) | 2011-10-14 | 2013-12-18 | 北京航空航天大学 | 一种基于深度信息的多视点立体视频错误隐藏方法 |
| US9253486B2 (en) * | 2012-09-28 | 2016-02-02 | Mitsubishi Electric Research Laboratories, Inc. | Method and system for motion field backward warping using neighboring blocks in videos |
-
2014
- 2014-10-15 CN CN201480056611.6A patent/CN105612748B/zh active Active
- 2014-10-15 JP JP2015542635A patent/JPWO2015056712A1/ja active Pending
- 2014-10-15 KR KR1020167007560A patent/KR101750421B1/ko active Active
- 2014-10-15 WO PCT/JP2014/077436 patent/WO2015056712A1/ja not_active Ceased
- 2014-10-15 US US15/029,553 patent/US10911779B2/en active Active
Patent Citations (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2013001813A1 (ja) * | 2011-06-29 | 2013-01-03 | パナソニック株式会社 | 画像符号化方法、画像復号方法、画像符号化装置および画像復号装置 |
| JP2013074303A (ja) * | 2011-09-26 | 2013-04-22 | Nippon Telegr & Teleph Corp <Ntt> | 画像符号化方法、画像復号方法、画像符号化装置、画像復号装置、画像符号化プログラム及び画像復号プログラム |
Non-Patent Citations (3)
| Title |
|---|
| GERHARD TECH ET AL.: "3D-HEVC Test Model 1", JOINT COLLABORATIVE TEAM ON 3D VIDEO CODING EXTENSION DEVELOPMENT OF ITU-T SG 16 WP3 AND ISO/IEC JTC1/SC29/WG11 JCT3V-A1005_D0, ITU-T, 20 September 2013 (2013-09-20), pages 12 - 21 * |
| SHIN'YA SHIMIZU: "Depth Map o Mochiita Sanjigen Eizo Fugoka no Kokusai Hyojunka Doko", IPSJ SIG NOTES, AUDIO VISUAL AND MULTIMEDIA INFORMATION PROCESSING (AVM), INFORMATION PROCESSING SOCIETY OF JAPAN, vol. 2013 -AV, 5 September 2013 (2013-09-05), pages 1 - 6 * |
| YU-LIN CHANG ET AL.: "3D-CE2.h related: Simplified DV derivation for DoNBDV and BVSP", JOINT COLLABORATIVE TEAM ON 3D VIDEO CODING EXTENSIONS OF ITU-T SG 16 WP 3 AND ISO/IEC JTC1/SC29/WG11 JCT3V-D0138, ITU-T, 13 April 2013 (2013-04-13), pages 1 - 4 * |
Cited By (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP2021510251A (ja) * | 2018-01-05 | 2021-04-15 | コーニンクレッカ フィリップス エヌ ヴェKoninklijke Philips N.V. | 画像データビットストリームを生成するための装置及び方法 |
| JP7252238B2 (ja) | 2018-01-05 | 2023-04-04 | コーニンクレッカ フィリップス エヌ ヴェ | 画像データビットストリームを生成するための装置及び方法 |
| CN112602325A (zh) * | 2018-12-27 | 2021-04-02 | Kddi 株式会社 | 图像解码装置、图像编码装置、程序和图像处理系统 |
Also Published As
| Publication number | Publication date |
|---|---|
| CN105612748B (zh) | 2019-04-02 |
| US10911779B2 (en) | 2021-02-02 |
| US20160255370A1 (en) | 2016-09-01 |
| KR20160045864A (ko) | 2016-04-27 |
| CN105612748A (zh) | 2016-05-25 |
| KR101750421B1 (ko) | 2017-06-23 |
| JPWO2015056712A1 (ja) | 2017-03-09 |
| US20170055000A2 (en) | 2017-02-23 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| JP6232076B2 (ja) | 映像符号化方法、映像復号方法、映像符号化装置、映像復号装置、映像符号化プログラム及び映像復号プログラム | |
| JP5947977B2 (ja) | 画像符号化方法、画像復号方法、画像符号化装置、画像復号装置、画像符号化プログラム及び画像復号プログラム | |
| JP6307152B2 (ja) | 画像符号化装置及び方法、画像復号装置及び方法、及び、それらのプログラム | |
| JP6027143B2 (ja) | 画像符号化方法、画像復号方法、画像符号化装置、画像復号装置、画像符号化プログラム、および画像復号プログラム | |
| US20150249839A1 (en) | Picture encoding method, picture decoding method, picture encoding apparatus, picture decoding apparatus, picture encoding program, picture decoding program, and recording media | |
| JP6232075B2 (ja) | 映像符号化装置及び方法、映像復号装置及び方法、及び、それらのプログラム | |
| WO2014050830A1 (ja) | 画像符号化方法、画像復号方法、画像符号化装置、画像復号装置、画像符号化プログラム、画像復号プログラム及び記録媒体 | |
| CN104429077A (zh) | 图像编码方法、图像解码方法、图像编码装置、图像解码装置、图像编码程序、图像解码程序以及记录介质 | |
| JP5926451B2 (ja) | 画像符号化方法、画像復号方法、画像符号化装置、画像復号装置、画像符号化プログラム、および画像復号プログラム | |
| KR101750421B1 (ko) | 동화상 부호화 방법, 동화상 복호 방법, 동화상 부호화 장치, 동화상 복호 장치, 동화상 부호화 프로그램, 및 동화상 복호 프로그램 | |
| WO2015098827A1 (ja) | 映像符号化方法、映像復号方法、映像符号化装置、映像復号装置、映像符号化プログラム及び映像復号プログラム | |
| JPWO2015141549A1 (ja) | 動画像符号化装置及び方法、及び、動画像復号装置及び方法 |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 14854836 Country of ref document: EP Kind code of ref document: A1 |
|
| DPE1 | Request for preliminary examination filed after expiration of 19th month from priority date (pct application filed from 20040101) | ||
| ENP | Entry into the national phase |
Ref document number: 2015542635 Country of ref document: JP Kind code of ref document: A |
|
| ENP | Entry into the national phase |
Ref document number: 20167007560 Country of ref document: KR Kind code of ref document: A |
|
| WWE | Wipo information: entry into national phase |
Ref document number: 15029553 Country of ref document: US |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| WWE | Wipo information: entry into national phase |
Ref document number: 15029553 Country of ref document: US |
|
| 122 | Ep: pct application non-entry in european phase |
Ref document number: 14854836 Country of ref document: EP Kind code of ref document: A1 |