WO2012060156A1 - Dispositif de codage et dispositif de décodage d'images à points de vue multiples - Google Patents
Dispositif de codage et dispositif de décodage d'images à points de vue multiples Download PDFInfo
- Publication number
- WO2012060156A1 WO2012060156A1 PCT/JP2011/070641 JP2011070641W WO2012060156A1 WO 2012060156 A1 WO2012060156 A1 WO 2012060156A1 JP 2011070641 W JP2011070641 W JP 2011070641W WO 2012060156 A1 WO2012060156 A1 WO 2012060156A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- image
- viewpoint
- viewpoint image
- depth
- encoding
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/50—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
- H04N19/597—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding specially adapted for multi-view video sequence encoding
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N13/00—Stereoscopic video systems; Multi-view video systems; Details thereof
- H04N13/10—Processing, recording or transmission of stereoscopic or multi-view image signals
- H04N13/106—Processing image signals
- H04N13/161—Encoding, multiplexing or demultiplexing different image signal components
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N13/00—Stereoscopic video systems; Multi-view video systems; Details thereof
- H04N13/10—Processing, recording or transmission of stereoscopic or multi-view image signals
- H04N13/106—Processing image signals
- H04N13/128—Adjusting depth or disparity
Definitions
- the present invention relates to a multi-view image encoding device that encodes images taken from a plurality of viewpoints and depth images thereof, and a multi-view image decoding device that decodes them.
- the currently mainstream stereoscopic image display is classified as a binocular stereoscopic image display, and one method is an active shutter method.
- the active shutter 3D image display displays the left-eye image and the right-eye image alternately on the screen in a time-division manner, while the LCD shutter glasses worn by the viewer of the image are synchronized with the timing to display the left and right images.
- the LCD shutter glasses worn by the viewer of the image are synchronized with the timing to display the left and right images.
- MVC Multi-view Video Coding
- Annex H of H.264 and Advanced Video Coding ISO / IEC 14496-10
- MVC is an encoding method for efficiently compressing multi-viewpoint moving images.
- MVC is an encoding method for storing stereoscopic image works such as 3D movies in Blu-ray Disc. It has been adopted.
- the multi-view stereoscopic image display includes an optical mechanism for controlling the direction of light emitted from the screen surface, so that the left eye image can be seen only by the left eye and the right eye image can be seen only by the right eye. Is. For this reason, glasses that are used in the active shutter method are not required, and moreover, it is possible to display an image with the number of viewpoints exceeding two viewpoints, that is, to change an observation image as the observation position is moved.
- optical mechanisms include a parallax barrier and a lenticular lens, and the number of viewpoints that can be observed is determined by the structure and the definition of processing. At present, multi-view stereoscopic image displays such as 5 viewpoints and 8 viewpoints are put into practical use.
- the viewpoint is not limited to two viewpoints, there is an advantage that the degree of freedom and the naturalness in viewing a stereoscopic image are improved.
- the amount of image data required increases as the number of viewpoints increases, there is a problem that the cost for recording and transmission increases.
- Patent Document 1 encodes a multi-viewpoint image, encodes depth information generated from the multi-viewpoint image, and generates an encoded stream including both.
- the depth information generated and encoded here is used to generate an image signal of a desired virtual viewpoint that does not exist from an image signal of an existing viewpoint. That is, it is possible to generate a large number of viewpoint images from images with a smaller number of viewpoints when it is difficult to capture, transmit or store images of all the viewpoints because of the large number of viewpoints. It is.
- a parallax predictive coding scheme is used in order to improve the coding efficiency of multi-viewpoint images.
- the parallax predictive coding method the parallax generated by different shooting viewpoints is extracted from the image, and the prediction coding is performed by compensating the parallax, so that the time-direction coding performed in the conventional single-viewpoint image coding is performed. Similar to reduction of redundancy by predictive coding, reduction of redundancy between parallaxes is realized.
- the disparity predictive coding method is a technique introduced in the above-described MVC.
- FIG. 7 is a block diagram showing a functional configuration of a conventional single-view video encoding apparatus 700.
- the moving image coding apparatus 700 includes a DCT quantization unit 701.
- the DCT quantization unit 701 first performs DCT and quantization processing, which is transform coding processing, on data obtained by subtracting past or future images from input images.
- the moving image encoding apparatus 700 is provided with a subtracter 709.
- the subtractor 709 gives data obtained by subtracting a past or future image from the input image to the DCT quantization unit 701.
- the moving picture encoding apparatus 700 includes an inverse quantization inverse DCT unit 702.
- the inverse quantization inverse DCT unit 702 performs inverse quantization inverse DCT processing on the data subjected to DCT and quantization processing by the DCT quantization unit 701 and supplies the result to the adder 710.
- the adder 710 adds the prediction result to the data subjected to the inverse quantization inverse DCT processing by the inverse quantization inverse DCT unit 702, and restores the image signal.
- the restored image signal is further stored in the reference memory 704 after block distortion caused by DCT is reduced by the deblocking filter 703.
- the moving picture encoding apparatus 700 is provided with a motion vector detection unit 705.
- the motion vector detection unit 705 searches and detects a motion vector from the input image in order to reduce redundancy between images in the time direction.
- the moving picture coding apparatus 700 includes a motion compensation / prediction unit 706.
- the motion compensation / prediction unit 706 performs motion compensation / prediction with the reference image based on the motion vector detected by the motion vector detection unit 705.
- the moving picture encoding apparatus 700 is provided with an in-screen prediction unit 707.
- the intra-screen prediction unit 707 performs intra-screen prediction based on the input image in order to reduce spatial redundancy in the image.
- the moving image encoding apparatus 700 includes a switch 708.
- the switch 708 supplies the processing result of the motion compensation / prediction unit 706 or the processing result of the intra-screen prediction unit 707 to the subtracter 709 and the adder 710. As described above, the processing result of either the motion compensation / prediction unit 706 or the intra-screen prediction unit 707 is used as predicted image data.
- the moving image encoding apparatus 700 is provided with an entropy encoding unit 711.
- the entropy encoding unit 711 converts the image data subjected to the DCT and quantization processing by the DCT quantization unit 701 and the motion vector detected by the motion vector detection unit 705 into a variable length code, and generates a moving image as an encoded stream.
- the data is output to the outside of the encoding device 700.
- a parallax predictive coding method is a method in which this motion compensation / prediction, that is, prediction coding in the time direction, is applied between images of a plurality of viewpoints.
- the above-described multi-view image encoding method MVC selects one of the above-described intra-frame prediction, temporal direction prediction, and parallax prediction in a predetermined encoding processing unit to improve the total encoding efficiency.
- Extraction of disparity in the disparity prediction encoding method is the same as in conventional prediction encoding in the temporal direction, but is mainly performed by block matching.
- Block matching is a process of searching for a block having the highest degree of matching in comparison with a reference image for each block obtained by dividing the target image by a predetermined size.
- Patent Document 2 discloses an encoding method in which disparity vectors are detected by applying block matching between a plurality of images.
- Japanese Patent Publication No. 2010-157823 (published July 15, 2010) Japanese Patent Publication No. 6-98312 (published April 8, 1994)
- the method of encoding using a depth image as described in Patent Document 1 is highly effective in order to efficiently record and transmit the number of viewpoints to be encoded. .
- the number of viewpoint images is reduced and a process for encoding a depth image is separately required, a sufficient efficiency improvement effect is not always obtained from the viewpoint of the processing amount and the generated code amount.
- the block matching process when extracting disparity vectors in the disparity prediction encoding method between viewpoint images performs error evaluation on all pixels in the block as shown in FIG.
- the calculation cost is high, which is one of the factors that make real-time processing of image encoding difficult.
- the present invention has been made in view of the above-described problems, and an object of the present invention is to improve viewpoint image encoding efficiency by using depth images when encoding a plurality of viewpoint images and depth images.
- a further object is to provide a multi-view image encoding device and a multi-view image decoding device that can reduce the amount of calculation required to generate a disparity vector.
- a multi-view image encoding device receives and encodes a first viewpoint image captured from a first viewpoint, and converts the encoded first viewpoint image into a first viewpoint image.
- the first viewpoint and the first viewpoint are based on the depth image restored by the depth image encoding means and the imaging condition information for imaging the first viewpoint image and the second viewpoint image. Since disparity information between two viewpoints is generated, the depth image restored by the depth image encoding means, and the first image, instead of performing processing with high calculation cost such as block matching between viewpoint images, Prediction processing can be performed using disparity information between the first viewpoint and the second viewpoint generated based on the viewpoint image and the imaging condition information for capturing the second viewpoint image, and encoding The amount of processing can be reduced.
- a multi-view image decoding apparatus includes a first viewpoint image decoding unit that receives and decodes an encoded first viewpoint image, a depth image decoding unit that receives and decodes an encoded depth image, Based on the imaging condition information decoding means for receiving and decoding the converted imaging condition information, the depth image decoded by the depth image decoding means, and the imaging condition information decoded by the imaging condition information decoding means Disparity information generating means for generating disparity information between one viewpoint and a second viewpoint, a first viewpoint image decoded by the first viewpoint image decoding means, and disparity information generated by the disparity information generating means And a second viewpoint image decoding unit that receives and decodes the encoded second viewpoint image.
- disparity information between the first viewpoint and the second viewpoint is generated based on the depth image decoded by the depth image decoding means and the imaging condition information decoded by the imaging condition information decoding means. Therefore, disparity information is generated based on the decoded depth image. Therefore, the disparity vector can be generated on the decoding device side without being transmitted from the encoding device. Therefore, the amount of code transmitted from the encoding device to the decoding device can be reduced, and the utilization efficiency of the transmission path and the utilization efficiency of the recording medium can be increased when transmitting and recording the encoded data.
- the multi-viewpoint image encoding device is based on imaging condition information for capturing the first viewpoint image and the second viewpoint image, between the first viewpoint and the second viewpoint. Since the disparity information generating means for generating the disparity information is provided, the disparity information is generated based on the depth image restored by the depth image encoding means and the imaging condition information for capturing the first viewpoint image and the second viewpoint image. Prediction processing can be performed using the disparity information between the first viewpoint and the second viewpoint, and the amount of calculation of the encoding processing can be reduced.
- FIG. 1 is a block diagram showing a configuration of a multi-view image encoding device according to Embodiment 1.
- FIG. It is a block diagram which shows the structure of the parallax information generation part provided in the said multiview image coding apparatus. It is a conceptual diagram of the representative depth value determination process by the representative depth value determination part provided in the said parallax information generation part. It is a conceptual diagram which shows the relationship between a depth value and a parallax value. It is a figure which shows the imaging distance in imaging
- FIG. 10 is a block diagram showing a configuration of a multi-view image decoding apparatus according to Embodiment 2. It is a block diagram which shows the structure of the conventional image coding apparatus.
- FIG. 1 is a block diagram showing a configuration of multi-view image encoding apparatus 1 according to Embodiment 1.
- Input data input to the multi-viewpoint image encoding device 1 includes a viewpoint image (reference viewpoint image) captured from the reference viewpoint, a viewpoint image (non-reference viewpoint image) captured from the non-reference viewpoint, This is shooting condition information for capturing a depth image corresponding to a non-reference viewpoint image, and a reference viewpoint image and a non-reference viewpoint image.
- the reference viewpoint image is limited to an image based on a single viewpoint, a plurality of images based on a plurality of viewpoints may be input as the non-reference viewpoint image.
- the depth image may be one depth image corresponding to the non-reference viewpoint image, or a plurality of depth images corresponding to all the non-reference viewpoint images may be input.
- Each viewpoint image and depth image may be a still image or a moving image.
- the multi-view image encoding device 1 includes a reference viewpoint image encoding unit 4.
- the reference viewpoint image encoding unit 4 receives and encodes the reference viewpoint image captured from the reference viewpoint, and further restores the encoded reference viewpoint image to the reference viewpoint image again.
- the multi-view image encoding device 1 is provided with a depth image encoding unit 5.
- the depth image encoding unit 5 encodes the depth image corresponding to the non-reference viewpoint image captured from the non-reference viewpoint, and further restores the encoded depth image to the depth image again.
- the multi-view image encoding device 1 includes an imaging condition information encoding unit 6.
- the imaging condition information encoding unit 6 receives and encodes imaging condition information for imaging the reference viewpoint image and the non-reference viewpoint image.
- the multi-view image encoding device 1 includes a parallax information generation unit 2.
- the disparity information generation unit 2 determines the reference viewpoint and the non-reference viewpoint based on the depth image restored by the depth image encoding unit 5 and the imaging condition information for capturing the reference viewpoint image and the non-reference viewpoint image. Is generated.
- the multi-view image encoding device 1 is provided with a non-reference viewpoint image encoding unit 3.
- the non-reference viewpoint image encoding unit 3 receives the non-reference viewpoint image, and based on the reference viewpoint image restored by the reference viewpoint image encoding unit 4 and the disparity information generated by the disparity information generation unit 2, Encode.
- the non-reference viewpoint image encoding unit 3 includes the reference viewpoint image indicated by the disparity information generated by the disparity information generation unit 2 for each block corresponding to the block obtained by dividing the depth image restored by the depth image encoding unit 5. Predictive coding is performed using the pixel value of.
- the reference viewpoint image encoding unit 4 compresses and encodes the reference viewpoint image by the intra-view prediction encoding method.
- the intra-view prediction encoding method is a prediction encoding method in which image data is compression-encoded based only on image data within a single viewpoint.
- intra-frame prediction and motion compensation are performed to compress and encode image data.
- the reference viewpoint image encoding unit 4 performs reverse processing, that is, decoding on the compression-encoded reference viewpoint image to restore the reference viewpoint image. This is for reference when encoding a non-reference viewpoint image to be described later.
- the standard viewpoint image is used as a reference image in the parallax predictive coding method.
- the non-reference viewpoint image encoded by the disparity prediction encoding method may further be used as a reference image at the time of motion compensation or when performing disparity prediction from another viewpoint image.
- the decoding device cannot obtain the same image as the input image (the encoding method here assumes lossy compression).
- the encoding method here assumes lossy compression.
- the depth image encoding unit 5 compresses and encodes the depth image by the intra-view prediction encoding method, similarly to the reference viewpoint image encoding unit 4. That is, the depth image encoding unit 5 compresses and encodes the depth image based only on the depth image. At the same time, the depth image encoding unit 5 performs reverse processing, that is, decoding on the compression-encoded depth image to restore the data representing the depth image. This is for reference when generating disparity information to be described later.
- the restored depth image is used for generating the disparity vector because when the disparity vector is generated from the input depth image, the decoding device side can generate the disparity vector only from the decoded depth image. If a disparity occurs between the disparity vector generated in step 1 and the disparity vector generated on the image decoding device side, and encoding / decoding is continued using different disparity vectors in which such a disparity has occurred, This is because a mismatch occurs when motion compensation and parallax compensation are performed with reference to the encoding result, which is propagated as an error.
- the disparity information generation unit 2 generates disparity information based on the depth image restored by the depth image encoding unit 5 and the shooting condition information. Details of disparity information generation will be described later.
- the non-reference viewpoint image encoding unit 3 performs inter-view prediction on the non-reference viewpoint image based on the reference viewpoint image restored by the reference viewpoint image encoding unit 4 and the disparity information generated by the disparity information generation unit 2. Compression encoding is performed by an encoding method.
- the inter-view prediction encoding method is a prediction encoding method in which image data captured from a certain viewpoint is compression encoded using image data captured from another viewpoint.
- the shooting condition information encoding unit 6 performs an encoding process for converting shooting condition information, which is a condition when shooting multiple viewpoint images, into a predetermined code.
- the encoded data of the reference viewpoint image, the non-reference viewpoint image, the depth image, and the shooting condition information are connected and rearranged by a code configuration unit (not shown), and the multi-view image encoding device 1 is encoded stream. Is output to the outside.
- FIG. 2 is a block diagram illustrating a configuration of the disparity information generation unit 2.
- the disparity information generating unit 2 includes a block dividing unit 7.
- the block dividing unit 7 divides the input depth image into blocks according to a predetermined size (for example, 8 pixels ⁇ 8 pixels) and supplies the blocks to the representative depth value determining unit 8.
- the representative depth value determining unit 8 determines the representative depth value based on the frequency distribution of the depth values in the blocks divided by the block dividing unit 7.
- the disparity information generation unit 2 is provided with a distance information extraction unit 13.
- the distance information extraction unit 13 extracts information corresponding to the inter-camera distance A and the shooting distance a (FIG. 4) from the shooting condition information, and transmits the information to the parallax information calculation unit 9.
- the disparity information calculation unit 9 calculates disparity information based on the representative depth value determined by the representative depth value determination unit 8, the inter-camera distance A and the shooting distance a extracted by the distance information extraction unit 13, and the non-reference This is supplied to the viewpoint image encoding unit 3.
- FIG. 3 is a conceptual diagram of representative depth value determination processing by the representative depth value determination unit 8 provided in the parallax information generation unit 2.
- the representative depth value determining unit 8 determines a representative value of the depth value for each block divided by the block dividing unit 7. Specifically, a frequency distribution (histogram) of depth values in the block is created, and a depth value having the highest appearance frequency is extracted and determined as a representative depth value.
- a depth image 15 corresponding to the viewpoint image 14 is given.
- the depth image 15 is represented as a monochrome image with luminance only.
- the depth value 18 having the highest appearance frequency is determined as the representative depth value of the block 16.
- the example of the method based on the above histograms was demonstrated as a method of determining the representative value of a depth value, this invention is not limited to this.
- it may be determined according to the following method. For example, (a) intermediate value of depth value in block; (b) average value considering appearance frequency; (c) value closest to camera (maximum value of depth value in block); (d) from camera May be extracted as the representative value by extracting the values with the longest distance (the minimum value of the in-block depth values).
- the parallax information is calculated according to the method described later from the representative value of the depth value determined based on each method including the above (a) to (d), the inter-camera distance and the shooting distance. Since the obtained disparity information means the image shift of the encoding target block in the non-reference viewpoint image with respect to the reference viewpoint image, the pixel value of the image block on the reference viewpoint image corresponding to the shift and the current When the difference from the pixel value of the encoding target block is taken, the difference should be a sufficiently small value. Since the smaller the difference value is, the more efficient the encoding is. Therefore, it is only necessary to select a determination method that obtains the parallax information, that is, the depth representative value, with the smallest difference value.
- the block size when dividing the depth image 15 is not limited to the 8 pixel ⁇ 8 pixel size described above, and may be a size of 16 pixels ⁇ 16 pixels, 4 pixels ⁇ 4 pixels, or the like.
- the number of vertical and horizontal pixels does not have to be the same.
- the size may be 16 pixels ⁇ 8 pixels, 8 pixels ⁇ 16 pixels, 8 pixels ⁇ 4 pixels, 4 pixels ⁇ 8 pixels, or the like.
- an optimum size is selected according to the size of the subject included in the depth image 15 and the corresponding viewpoint image 14, the required compression rate, and the like.
- the parallax information calculation unit 9 is based on the representative depth value Z determined by the representative depth value determination unit 8 and information indicating the camera interval A and the shooting distance a extracted from the shooting condition information by the distance information extraction unit 13.
- the parallax value (parallax information) v of the corresponding block is calculated according to (Equation 1) described later.
- the depth value included in the depth image 15 represents the distance range included in the photographed image by a predetermined numerical range (for example, 0 to 255), not the distance from the camera to the subject itself. Is. For this reason, the depth value is calculated based on the information indicating the distance range at the time of shooting included in the shooting condition information (for example, the minimum value and the maximum value of the distance from the camera regarding the subject included in the image). It is converted into an image distance, which is a distance, and the numerical value indicating the actual distance such as the shooting distance a and the camera interval A is combined with the dimension.
- the formula for calculating the parallax value is defined as follows (Formula 1).
- FIG. 4 is a conceptual diagram showing the relationship between the depth value and the parallax value. Assume that the two viewpoints, that is, the cameras 10A and 10B and the two subjects 12A and 12B are in a positional relationship as shown in FIG. At this time, the front points 19A and 19B on the subjects 12A and 12B are projected to positions PL1 and PR1 and positions PL2 and PR2 on the plane 20 that are separated from the cameras 10A and 10B by the photographing distance a.
- the positions PL1 and PR1 mean corresponding points of pixels on the left viewpoint image and the right viewpoint image regarding the point 19A on the subject 12A.
- the positions PL2 and PR2 mean corresponding points of pixels on the left viewpoint image and the right viewpoint image regarding the point 19B on the subject 12B.
- the shooting distance of the cameras 10A and 10B is a
- the distances (representative depth values) to the points 19A and 19B in front of the subjects 12A and 12B are Z1 and Z2.
- the above parameters (camera interval A, shooting distance a, representative depth values Z1 and Z2) have the following relationship.
- the representative depth values Z1 and Z2 are actual distances from the camera, not the depth values themselves in the depth image.
- parallax value v is defined as the position (vector) of the corresponding point of the left viewpoint image with respect to the corresponding point of the right viewpoint image
- the parallax value v can be given by the above-described (Equation 1).
- the parallax value v can be calculated in units of predetermined blocks.
- the disparity information calculating unit 9 The disparity value generated in (1) can be directly handled as a disparity vector related to the non-reference viewpoint image. That is, the non-reference viewpoint image encoding unit 3 performs a parallax information generation unit according to (Equation 1) instead of performing processing with high calculation cost such as block matching between viewpoint images when performing parallax prediction encoding.
- the disparity prediction encoding can be performed using the disparity value calculated by 2 as it is.
- FIG. 5 (A) of FIG. 5 is a figure which shows the imaging distance in parallel method imaging
- (b) of FIG. 5 is a figure which shows the imaging distance in crossing method imaging
- the shooting distance a of the cameras 10A and 10B described above is the distance (focal length) that is in focus at the time of shooting in the case of parallel shooting, that is, when the optical axes of the two cameras 10A and 10B are parallel.
- a in FIG. 5 in the case of cross-method shooting, that is, when the optical axes of the two cameras 10A and 10B intersect in front, the distance from the cameras 10A and 10B to the intersection (cross point)
- the distance may be regarded as corresponding to the shooting distance a ((b) in FIG. 5).
- the multi-view image encoding apparatus 1 that is one embodiment of the present invention has been described above.
- FIG. 1 a case where there are two viewpoint images, a reference viewpoint image and a non-reference viewpoint image, and a depth image is given as an example, but there are three viewpoint images.
- the multi-viewpoint image encoding device can be configured based on the idea of the present invention. For example, when there are three viewpoint images, one viewpoint image is treated as a reference viewpoint, and compression encoding is performed by an intra-view prediction encoding method, and the remaining two viewpoint images are non-reference viewpoint viewpoint images. Is compressed and encoded.
- both of them may be compression-encoded by the intra-view prediction encoding method, or one of them is regarded as a reference view and the other is a non-reference view, and the intra-view prediction encoding method and the inter-viewpoint respectively.
- Compression encoding may be performed by a predictive encoding method.
- three depth images the same concept as in the case of three viewpoint images can be applied. Furthermore, the same concept can be applied when there are more than three viewpoint images and depth images.
- the depth image is handled as being prepared as input data in advance.
- the depth image is obtained by estimating a plurality of viewpoint images by block matching or the like, infrared rays, ultrasonic waves, or the like. It can be generated by a method of radiating toward the subject and measuring the reflection time.
- FIG. 6 is a block diagram showing a configuration of multi-view image decoding apparatus 50 according to Embodiment 2.
- the data input to the multi-view image decoding device 50 includes encoded data of a reference viewpoint image, encoded data of a non-reference viewpoint image, and a depth image code output from the multi-view image encoding device 1 according to the first embodiment.
- Each piece of encoded data is given by separating and extracting what has been concatenated and transmitted as an encoded stream by a code separation unit (not shown).
- the multi-view image decoding device 50 includes a reference viewpoint image decoding unit 53.
- the reference viewpoint image decoding unit 53 receives and decodes the encoded reference viewpoint image.
- the multi-viewpoint image decoding device 50 is provided with a depth image decoding unit 54.
- the depth image decoding unit 54 receives and decodes the encoded depth image.
- the multi-viewpoint image decoding device 50 includes an imaging condition information decoding unit 55.
- the imaging condition information decoding unit 55 receives and decodes the encoded imaging condition information.
- the multi-viewpoint image decoding apparatus 50 is provided with a parallax information generation unit 51.
- the disparity information generation unit 51 is based on the depth image decoded by the depth image decoding unit 54 and the imaging condition information decoded by the imaging condition information decoding unit 55, and the disparity information between the reference viewpoint and the non-reference viewpoint Is generated.
- the multi-view image decoding device 50 includes a non-reference viewpoint image decoding unit 52.
- the non-reference viewpoint image decoding unit 52 receives an encoded non-reference viewpoint image based on the reference viewpoint image decoded by the reference viewpoint image decoding unit 53 and the disparity information generated by the disparity information generation unit 51. To decrypt.
- the disparity information generation unit 51 determines a representative depth value of a block obtained by dividing the decoded depth image.
- the imaging condition information decoded by the imaging condition information decoding unit 55 includes an inter-camera distance between a camera arranged at a reference viewpoint and a camera arranged at a non-reference viewpoint, and an imaging distance between both cameras and a subject. including.
- the disparity information generation unit 51 calculates disparity information based on the representative depth value of each block, the inter-camera distance, and the shooting distance.
- the non-reference viewpoint image decoding unit 52 decodes each block corresponding to the block obtained by dividing the depth image restored by the depth image decoding unit 54 using the pixel value in the reference viewpoint image indicated by the disparity information.
- the reference viewpoint image decoding unit 53 decodes the compression-encoded encoded data by the intra-view prediction encoding method, and restores the reference viewpoint image.
- the restored reference viewpoint image is used for display as it is and also for decoding a non-reference viewpoint image described later.
- the depth image decoding unit 54 decodes the encoded data that has been compression-encoded by a method according to intra-view prediction encoding, and restores the depth image.
- the restored depth image is used to generate and display a non-reference viewpoint image.
- the photographing condition information decoding unit 55 restores information including the distance between the cameras and the photographing distance at the time of photographing from the encoded data of the photographing condition information.
- the restored photographing condition information is used for generating and displaying a necessary viewpoint image together with the depth image.
- the disparity information generation unit 51 generates disparity information based on the restored depth image and shooting condition information.
- the disparity information generation method and procedure are the same as those of the disparity information generation unit 2 in the multi-view image encoding device 1 described above.
- the non-reference viewpoint image decoding unit 52 decodes the encoded data of the compression-coded non-reference viewpoint image based on the restored reference viewpoint image and the disparity information by a method according to inter-view prediction encoding. Then, the non-reference viewpoint image is restored. Finally, the reference viewpoint image and the non-reference viewpoint image are used as display images as they are, and images of other viewpoints are generated for display based on the depth image and shooting condition information as necessary. Is done.
- the viewpoint image generation processing may be performed within the multi-view image decoding device 50 or may be performed outside the multi-view image decoding device 50.
- the encoded data of the non-reference viewpoint image that has been compression-encoded by the inter-view prediction encoding method will be further described.
- the disparity vector necessary for the disparity prediction encoding method is obtained by generating from a depth image rather than detecting from a plurality of viewpoint images.
- the parallax vector is generated from the restored depth image after the depth image is restored. That is, the disparity vector can be generated on the image decoding device side without being transmitted from the image encoding device. With this configuration, it is possible to reduce the amount of code transmitted from the image encoding device to the image decoding device, and to increase the utilization efficiency of the transmission path and the recording medium when transmitting and recording the encoded data.
- the encoded data includes a disparity vector (Patent Document 1: Paragraph [0105] of the specification).
- the viewpoint image decoding device 50 can generate a disparity vector from the depth image, and the decoding device side can also generate a disparity vector. Therefore, the multi-viewpoint image encoding device 1 according to Embodiment 1 uses the disparity vector. There is no need for encoding, and the amount of code can be reduced accordingly.
- each block of the multi-view image encoding device 1 and the multi-view image decoding device 50 in particular, the reference viewpoint image encoding unit 4, the depth image encoding unit 5, the disparity information generation unit 2, and the non-reference viewpoint image encoding unit 3.
- the reference viewpoint image decoding unit 53, the depth image decoding unit 54, the parallax information generation unit 51, and the non-reference viewpoint image decoding unit 52 are implemented in hardware by a logic circuit formed on an integrated circuit (IC chip). It may be realized or may be realized by software using a CPU (central processing unit).
- the multi-view image encoding device 1 and the multi-view image decoding device 50 develop a CPU that executes instructions of a control program that realizes each function, a ROM (read only memory) that stores the program, and the program.
- RAM random access memory
- a storage device such as a memory for storing the program and various data, and the like.
- An object of the present invention is to allow a computer to read program codes (execution format program, intermediate code program, source program) of control programs for the image encoding device 100 and the image decoding device 600, which are software that realizes the functions described above. This can also be achieved by supplying the recording medium recorded in (1) to each of the above-described devices and reading and executing the program code recorded on the recording medium by the computer (or CPU or MPU (microprocessor unit)).
- the recording medium examples include tapes such as a magnetic tape and a cassette tape, a magnetic disk such as a floppy (registered trademark) disk / hard disk, a CD-ROM (compact disk-read-only memory) / MO (magneto-optical) / Discs including optical discs such as MD (Mini Disc) / DVD (digital versatile disc) / CD-R (CD Recordable), IC cards (including memory cards) / optical cards, mask ROM / EPROM (erasable) Programmable read-only memory) / EEPROM (electrically erasable and programmable programmable read-only memory) / semiconductor memory such as flash ROM, or logic circuits such as PLD (Programmable logic device) and FPGA (Field Programmable Gate Array) be able to.
- a magnetic disk such as a floppy (registered trademark) disk / hard disk
- the multi-view image encoding device 1 and the multi-view image decoding device 50 may be configured to be connectable to a communication network, and the program code may be supplied via the communication network.
- the communication network is not particularly limited as long as it can transmit the program code.
- Internet intranet, extranet, LAN (local area network), ISDN (integrated area services digital area), VAN (value-added area network), CATV (community area antenna television) communication network, virtual area private network (virtual area private network), A telephone line network, a mobile communication network, a satellite communication network, etc. can be used.
- the transmission medium constituting the communication network may be any medium that can transmit the program code, and is not limited to a specific configuration or type.
- IEEE institute of electrical and electronic engineers 1394, USB, power line carrier, cable TV line, telephone line, ADSL (asynchronous digital subscriber loop) line, etc. wired such as IrDA (infrared data association) or remote control , Bluetooth (registered trademark), IEEE802.11 wireless, HDR (high data rate), NFC (Near field communication), DLNA (Digital Living Network Alliance), mobile phone network, satellite line, terrestrial digital network, etc. Is possible.
- the disparity information generating unit includes a representative depth value determining unit that determines a representative depth value of a block obtained by dividing the restored depth image.
- the representative depth value determining means determines the representative depth value based on a frequency distribution of depth values in a block obtained by dividing the restored depth image. It is preferable.
- the imaging condition information is an inter-camera distance between the first camera arranged at the first viewpoint and the second camera arranged at the second viewpoint.
- the parallax information generating means based on the representative depth value of each block, the inter-camera distance and the shooting distance, It is preferable to further include parallax information calculating means for calculating parallax information.
- disparity information can be calculated according to a simple mathematical formula.
- the second viewpoint image encoding unit includes the disparity information for each block corresponding to a block obtained by dividing the depth image restored by the depth image encoding unit. It is preferable to perform predictive encoding using the pixel value in the first viewpoint image indicated by.
- the second viewpoint image can be encoded with a simple configuration.
- the multi-viewpoint image encoding device further includes imaging condition information encoding means for receiving and encoding the imaging condition information, and the first viewpoint image is a viewpoint image from a reference viewpoint.
- the image is a certain reference viewpoint image
- the second viewpoint image is a viewpoint image other than the reference viewpoint image.
- the imaging condition information can be encoded and transmitted, and an image captured from a reference viewpoint and an image captured from a viewpoint other than the reference viewpoint can be encoded and transmitted.
- the disparity information generation unit determines a representative depth value of a block obtained by dividing the decoded depth image, and the imaging condition decoded by the imaging condition information decoding unit
- the information includes the inter-camera distance between the first camera arranged at the first viewpoint and the second camera arranged at the second viewpoint, and the photographing distance between the first and second cameras and the subject.
- the disparity information generating unit calculates the disparity information based on a representative depth value of each block, the inter-camera distance, and the shooting distance.
- the depth value with the highest appearance frequency is extracted and determined as the representative value, and the parallax information can be calculated according to a simple mathematical expression.
- the second viewpoint image decoding unit is configured to display the first disparity information indicated by the disparity information for each block corresponding to a block obtained by dividing the depth image restored by the depth image decoding unit.
- Decoding is preferably performed using pixel values in one viewpoint image.
- the second viewpoint image can be decoded with a simple configuration.
- the present invention can be applied to a multi-view image encoding device that encodes images taken from a plurality of viewpoints and depth images thereof, and a multi-view image decoding device that decodes them.
- Parallax information generation part (parallax information generation means) 3 Non-reference viewpoint image encoding unit (second viewpoint image encoding means) 4 Reference viewpoint image encoding unit (first viewpoint image encoding means) 5 Depth image encoding unit (depth image encoding means) 6 Imaging condition information encoding unit (imaging condition information encoding means) 7 Block division part 8 Representative depth value determination part (Representative depth value determination means) 9 Parallax information calculation unit 10A, 10B Camera (first camera, second camera) 12A, 12B Subject 13 Distance information extraction unit 14 Viewpoint image 15 Depth image 16 Block 17 Frequency distribution 18 Depth value 19A, 19B Point 20 Plane PR1, PL1, PR2, PL2 Position
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Compression Or Coding Systems Of Tv Signals (AREA)
Abstract
L'invention concerne un dispositif (1) de codage d'images à points de vue multiples, qui comporte: une unité de codage (4) d'images à point de vue standard; une unité de codage (5) d'images de profondeur; une unité (2) produisant des données de parallaxe, qui produit des données relatives à la parallaxe entre des points de vue standard et des points de vue non standard, sur la base des images de profondeur restaurées par le dispositif de codage (5) d'images de profondeur et des données de conditions de capture d'images; et une unité de codage (3) d'images à point de vue non standard, qui reçoit et code des images à point de vue non standard, sur la base des images à points de vue standard restaurées par l'unité de codage (4) d'images à point de vue standard et des données de parallaxe, produites par l'unité (2) produisant des données de parallaxe.
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| JP2010-245332 | 2010-11-01 | ||
| JP2010245332A JP2012100019A (ja) | 2010-11-01 | 2010-11-01 | 多視点画像符号化装置及び多視点画像復号装置 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2012060156A1 true WO2012060156A1 (fr) | 2012-05-10 |
Family
ID=46024277
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/JP2011/070641 Ceased WO2012060156A1 (fr) | 2010-11-01 | 2011-09-09 | Dispositif de codage et dispositif de décodage d'images à points de vue multiples |
Country Status (2)
| Country | Link |
|---|---|
| JP (1) | JP2012100019A (fr) |
| WO (1) | WO2012060156A1 (fr) |
Cited By (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2014183652A1 (fr) * | 2013-05-16 | 2014-11-20 | City University Of Hong Kong | Procédé et appareil de codage vidéo de la profondeur utilisant une distorsion de synthèse de vue acceptable |
| WO2015109373A1 (fr) | 2014-01-24 | 2015-07-30 | Synthetica Holdings Pty Ltd | Appareil perfectionné permettant le nettoyage de gazon artificiel |
| CN104885450A (zh) * | 2012-12-27 | 2015-09-02 | 日本电信电话株式会社 | 图像编码方法、图像解码方法、图像编码装置、图像解码装置、图像编码程序、以及图像解码程序 |
| CN108293110A (zh) * | 2015-11-23 | 2018-07-17 | 韩国电子通信研究院 | 多视点视频编码/解码方法 |
Families Citing this family (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP6039178B2 (ja) * | 2011-09-15 | 2016-12-07 | シャープ株式会社 | 画像符号化装置、画像復号装置、並びにそれらの方法及びプログラム |
| JP2013258577A (ja) * | 2012-06-13 | 2013-12-26 | Canon Inc | 撮像装置、撮像方法及びプログラム、画像符号化装置、画像符号化方法及びプログラム |
| WO2014005548A1 (fr) * | 2012-07-05 | 2014-01-09 | Mediatek Inc. | Procédé et appareil pour calculer un vecteur de disparité conjointe dans un codage vidéo 3d |
| CN104813669B (zh) * | 2012-09-21 | 2018-05-22 | 诺基亚技术有限公司 | 用于视频编码的方法和装置 |
Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JPH09289638A (ja) * | 1996-04-23 | 1997-11-04 | Nec Corp | 3次元画像符号化復号方式 |
| JP2007036800A (ja) * | 2005-07-28 | 2007-02-08 | Nippon Telegr & Teleph Corp <Ntt> | 映像符号化方法、映像復号方法、映像符号化プログラム、映像復号プログラム及びそれらのプログラムを記録したコンピュータ読み取り可能な記録媒体 |
| JP2008193530A (ja) * | 2007-02-06 | 2008-08-21 | Canon Inc | 画像記録装置、画像記録方法、及びプログラム |
Family Cites Families (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP3733358B2 (ja) * | 1996-04-05 | 2006-01-11 | 松下電器産業株式会社 | 画像伝送装置、送信装置、受信装置、送信方法および受信方法 |
| JP3776595B2 (ja) * | 1998-07-03 | 2006-05-17 | 日本放送協会 | 多視点画像の圧縮符号化装置および伸長復号化装置 |
| JP4780046B2 (ja) * | 2007-06-19 | 2011-09-28 | 日本ビクター株式会社 | 画像処理方法、画像処理装置及び画像処理プログラム |
-
2010
- 2010-11-01 JP JP2010245332A patent/JP2012100019A/ja active Pending
-
2011
- 2011-09-09 WO PCT/JP2011/070641 patent/WO2012060156A1/fr not_active Ceased
Patent Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JPH09289638A (ja) * | 1996-04-23 | 1997-11-04 | Nec Corp | 3次元画像符号化復号方式 |
| JP2007036800A (ja) * | 2005-07-28 | 2007-02-08 | Nippon Telegr & Teleph Corp <Ntt> | 映像符号化方法、映像復号方法、映像符号化プログラム、映像復号プログラム及びそれらのプログラムを記録したコンピュータ読み取り可能な記録媒体 |
| JP2008193530A (ja) * | 2007-02-06 | 2008-08-21 | Canon Inc | 画像記録装置、画像記録方法、及びプログラム |
Cited By (8)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN104885450A (zh) * | 2012-12-27 | 2015-09-02 | 日本电信电话株式会社 | 图像编码方法、图像解码方法、图像编码装置、图像解码装置、图像编码程序、以及图像解码程序 |
| CN104885450B (zh) * | 2012-12-27 | 2017-09-08 | 日本电信电话株式会社 | 图像编码方法、图像解码方法、图像编码装置、图像解码装置、图像编码程序、以及图像解码程序 |
| US9924197B2 (en) | 2012-12-27 | 2018-03-20 | Nippon Telegraph And Telephone Corporation | Image encoding method, image decoding method, image encoding apparatus, image decoding apparatus, image encoding program, and image decoding program |
| WO2014183652A1 (fr) * | 2013-05-16 | 2014-11-20 | City University Of Hong Kong | Procédé et appareil de codage vidéo de la profondeur utilisant une distorsion de synthèse de vue acceptable |
| US10080036B2 (en) | 2013-05-16 | 2018-09-18 | City University Of Hong Kong | Method and apparatus for depth video coding using endurable view synthesis distortion |
| WO2015109373A1 (fr) | 2014-01-24 | 2015-07-30 | Synthetica Holdings Pty Ltd | Appareil perfectionné permettant le nettoyage de gazon artificiel |
| CN108293110A (zh) * | 2015-11-23 | 2018-07-17 | 韩国电子通信研究院 | 多视点视频编码/解码方法 |
| CN108293110B (zh) * | 2015-11-23 | 2022-07-05 | 韩国电子通信研究院 | 多视点视频编码/解码方法 |
Also Published As
| Publication number | Publication date |
|---|---|
| JP2012100019A (ja) | 2012-05-24 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US9998726B2 (en) | Apparatus, a method and a computer program for video coding and decoding | |
| EP3114643B1 (fr) | Accentuation sensible à la profondeur pour vidéo stéréoscopique | |
| KR101354387B1 (ko) | 2d 비디오 데이터의 3d 비디오 데이터로의 컨버전을 위한 깊이 맵 생성 기술들 | |
| CN102939763B (zh) | 计算三维图像的视差 | |
| JP6042536B2 (ja) | 3dビデオ符号化におけるビュー間候補導出の方法と装置 | |
| AU2013284038B2 (en) | Method and apparatus of disparity vector derivation in 3D video coding | |
| WO2012060156A1 (fr) | Dispositif de codage et dispositif de décodage d'images à points de vue multiples | |
| JP6571646B2 (ja) | マルチビュービデオのデコード方法及び装置 | |
| JP2015525997A5 (fr) | ||
| WO2014075625A1 (fr) | Procédé et appareil de dérivation de vecteur de disparité limité dans un codage vidéo tridimensionnel (3d) | |
| JP5395911B2 (ja) | ステレオ画像符号化装置、方法 | |
| Rusanovskyy et al. | Depth-based coding of MVD data for 3D video extension of H. 264/AVC | |
| CN105247862A (zh) | 三维视频编码中的视点合成预测的方法及装置 | |
| WO2013141075A1 (fr) | Dispositif de codage d'image, dispositif de décodage d'image, procédé de codage d'image, procédé de décodage d'image, et programme | |
| JP2012178818A (ja) | 映像符号化装置および映像符号化方法 | |
| Manasa Veena et al. | Coding structure of JMVDC along saliency mapping: a prespective compression technique | |
| JP2013085064A (ja) | 多視点画像符号化装置、多視点画像復号装置、多視点画像符号化方法及び多視点画像復号方法 | |
| JP2013085063A (ja) | 多視点画像符号化装置、多視点画像復号装置、多視点画像符号化方法、及び多視点画像復号方法 | |
| WO2013159300A1 (fr) | Appareil, procédé et programme informatique de codage et décodage vidéo | |
| BR112016020544B1 (pt) | Realce consciente de profundidade para vídeo estéreo |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 11837809 Country of ref document: EP Kind code of ref document: A1 |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| 122 | Ep: pct application non-entry in european phase |
Ref document number: 11837809 Country of ref document: EP Kind code of ref document: A1 |