WO2018056802A1

WO2018056802A1 - A method for estimating three-dimensional depth value from two-dimensional images

Info

Publication number: WO2018056802A1
Application number: PCT/MY2017/050057
Authority: WO
Inventors: Rahmita Wirza O.K. Rahmat; Ng SENG BENG; Fatimah KHALID
Original assignee: Universiti Putra Malaysia (UPM)
Current assignee: Universiti Putra Malaysia (UPM)
Priority date: 2016-09-21
Filing date: 2017-09-14
Publication date: 2018-03-29
Anticipated expiration: 2019-03-21

Abstract

The present invention relates to a method for estimating three-dimensional depth value from two-dimensional images, characterised by the steps of placing an object (102) on a rotatable plate (103); acquiring a first view of the object (102) comprising a first image and a second image of the object (102), wherein the first image of the object (102) is captured at an angle between 0° to 360°, and the second image is captured at an angle in a range of 1° to 35° relative to the first image; obtaining two-dimensional feature point coordinates by applying Good Features to Track technique and extracting colour information, simultaneously, from the first image and the second image; filtering two-dimensional feature point coordinates; applying Pyramidal Lucas-Kanade Optical Flow technique to obtain displacement magnitudes from the two-dimensional feature point coordinates; calculating World Coordinate; estimating the three-dimensional depth value of the acquired images; and implementing inverse perspective mapping algorithm on the three-dimensional feature point coordinates.

Description

A METHOD FOR ESTIMATING THREE-DIMENSIONAL DEPTH VALUE FROM

TWO-DIMENSIONAL IMAGES

Background of the Invention Field of the Invention

This invention relates to a method for estimating depth value from at least two images, and more particularly to a method of three-dimensional depth value estimation from at least two images using optical flow and trigonometry method. Description of Related Arts

Three dimensional (3D) modeling of physical objects can be very useful in many areas, such as computer graphics and animation, robot vision, reverse engineering, and medical. 3D modeling can be represented from the scratch using modeling software, or digitized from real world objects. The basic information required for this representation is the x, y, and z coordinates. Further manipulation of these coordinates can deduce the objects' dimensions (width, height, and depth). Other attributes such as the models' surface colour, texture, lighting, shading and shadow contribute to a more realistic representation. Conventional digitization methods utilize Coordinate Measuring Machine (CMMs) or laser scanners to obtain coordinate for each good feature from an object to a digital data. Nevertheless, both of these devices are very costly and require a certain amount of technical knowledge during usage and maintenance. For example, US Patent No. 7,342,669 B2 disclosed a 3D measurement system that uses laser-light and estimates depth information of an object by applying a triangulation method. The system consists of an apparatus comprising a laser projecting, an image capturing device and a computer. The cited art also provides a method for 3D measurement using said apparatus comprises projecting a line-laser to an object where the apparatus having light-emitting diodes (LEDs) attached to a line-laser projector for estimating the position and orientation of the laser projecting device; then, capturing projected line-laser light and the LEDs at the same time using the image capturing device; then, calculating, using the computer, a 3D shape of the object from the captured image using the triangulation method; and outputting the calculated 3D shape. Meanwhile, by applying the teachings in this cited art, a user can quickly acquire a precise 3D shape without using complex equipment and can also interactively check the regions whose positions are not measured during the measuring process by displaying the acquired 3D positions in real-time. Thus, measurement of 3D shape of the target object becomes efficient. However, said laser projecting device may face problems on objects with shiny surfaces or objects that do not reflect light, which include black colour and transparent surface. Thus, the user may have the difficulties to verify the depth of the object.

Another example such as US Patent No. 8,330,803 B2 disclosed a method for 3D digitization of an object, in which a plurality of camera images of the object are recorded and assembled to determine the 3D coordinates of the object. An apparatus for performing the method for 3D digitization of the object comprises a projector and one or more cameras. The projector projects a light pattern onto the object, in more particular white -light strips. In order to improve the method for 3D digitization of the object, 2D feature points from the plurality of camera images of the objects are determined without human intervention. The 2D point correspondences between the 2D feature points of a picture and the 2D feature points of another picture are determined. Several of these 2D point correspondences are selected, and an associated 3D transformation is determined. The quality of this 3D transformation is determined with reference to the transformed 3D coordinates of the 2D feature points. Valid 3D feature points are determined therefrom. For assembling the camera images of the object, the 3D coordinates of the valid 3D feature points are use. However, the cited patent does not disclosed in detail how to calculate or determine the depth value of each features across said image-set. US Patent No. 7,573,475 B2 disclosed a method of converting 2D image to 3D image. The method includes receiving a first 2D image comprising image data, where the first 2D image is captured from a first camera location. The method also includes projecting at least a portion of the first 2D image onto computer-generated geometry. The image data has depth value associated with the computer-generated geometry. The system includes rendering, using the computer-generated geometry and a second camera location differs from the first camera location, a second 2D image that is stereoscopically complementary to the first 2D image, and infilling image data that absent from the second 2D image. The cited patent uses two different images captured from two different locations for example image from the left and the right of the object. Then, geometry map is built for both objects and compared to each other to calculate depth for the geometry map. Geometry map is accurate to be used if the object is simple but when the object is complicated, said method becomes inaccurate to determine the depth of the object.

Accordingly, it can be seen in the prior arts that there exists a need to provide a simple depth estimation technique to merge the 2D camera images into a set of 3D surface points with colour information without using high cost gadgets such as coordinate measuring machine (CMM) or laser scanner.

Summary of Invention

It is an objective of the present invention to provide to a method of three-dimensional depth value estimation from at least two images using optical flow and trigonometry method.

It is also an objective of the present invention to provide to a method of three-dimensional depth value estimation and colour information extraction from at least two images. It is yet another objective of the present invention to provide a method of three-dimensional depth value estimation in either a control environment or open space environment. Accordingly, these objectives may be achieved by following the teachings of the present invention. The present invention relates to a method for estimating three-dimensional depth value from two-dimensional images, characterised by the steps of placing an object on a rotatable plate; acquiring a first view of the object comprising a first image and a second image of the object, wherein the first image of the object is captured at an angle between 0° to 360°, and the second image is captured at an angle in a range of 1 ° to 35° relative to the first image; obtaining two-dimensional feature point coordinates by applying Good Features to Track technique and extracting colour information, simultaneously, from the first image and the second image; filtering two-dimensional feature point coordinates; applying Pyramidal Lucas-Kanade Optical Flow technique to obtain displacement magnitudes from the two-dimensional feature point coordinates; calculating World Coordinate; estimating the three-dimensional depth value of the acquired images; and implementing inverse perspective mapping algorithm on the three-dimensional feature point coordinates.

Brief Description of the Drawings

The features of the invention will be more readily understood and appreciated from the following detailed description when read in conjunction with the accompanying drawings of the preferred embodiment of the present invention, in which:

Fig. 1 is a flow chart of a method for estimating three-dimensional depth value from a two-dimensional image; and

Fig. 2 is diagram showing preferred arrangement of a kit for estimating three-dimensional depth value. Detailed Description of the Invention

As required, detailed embodiments of the present invention are disclosed herein; however, it is to be understood that the disclosed embodiments are merely exemplary of the invention, which may be embodied in various forms. Therefore, specific structural and functional details disclosed herein are not to be interpreted as limiting but merely as a basis for claims. It should be understood that the drawings and detailed description thereto are not intended to limit the invention to the particular form disclosed, but on the contrary, the invention is to cover all modifications, equivalents and alternatives falling within the scope of the present invention as defined by the appended claims. As used throughout this application, the word "may" is used in a permissive sense (i.e., meaning having the potential to), rather than the mandatory sense (i.e., meaning must). Similarly, the words "include," "including," and "includes" mean including, but not limited to. Further, the words "a" or "an" mean "at least one" and the word "plurality" means one or more, unless otherwise mentioned. Where the abbreviations or technical terms are used, these indicate the commonly accepted meanings as known in the technical field. For ease of reference, common reference numerals will be used throughout the figures when referring to the same or similar features common to the figures. The present invention will now be described with reference to Figs. 1 and 2.

The present invention relates to a method for estimating three-dimensional depth value from two-dimensional images, characterised by the steps of:

placing an object (102) on a rotatable plate (103) with a distance from an image capture apparatus (101 );

acquiring a first view of the object (102) comprising a first image and a second image of the object (102), wherein the first image of the object (102) is captured at an angle between 0° to 360°, and the second image is captured at an angle in a range of 1 ° to 35° relative to the first image;

obtaining two-dimensional feature point coordinates by applying Good

Features to Track technique and extracting colour information, simultaneously, from the first image and the second image; filtering two-dimensional feature point coordinates to eliminate noise using first filtering means;

applying Pyramidal Lucas-Kanade Optical Flow technique to obtain displacement magnitudes from the two-dimensional feature point coordinates; calculating World Coordinate for each of the two-dimensional feature point coordinates;

estimating the three-dimensional depth value of the acquired images by calculating each of the two-dimensional feature point coordinates using equation (1 ), thereby producing three-dimensional feature point coordinates:

wherein,

zw is the three-dimensional depth value for the two-dimensional feature point coordinate (x_w1 , y_w1 ),

objdist is a distance of center of rotation (COR) from the image capture apparatus (101 ) (optic center),

(x_w1, ywi) is a coordinate of the two-dimensional feature point from the first image,

(x_w 2, yw2) is a matching coordinate of the two-dimensional feature point in the second image,

a is a rotation angle; implementing inverse perspective mapping algorithm on the three-dimensional feature point coordinates to remove perspective distortion effects.

In a preferred embodiment of the method for estimating three-dimensional depth value from two-dimensional images, at least two views of the object (102) are acquired to reflect all sides of the object (102). In a preferred embodiment of the method for estimating three-dimensional depth value from two-dimensional images, said first image is captured at an angle of 0°, 60°, 120°, 180°, 240°, 360° or a combination thereof. In a preferred embodiment of the method for estimating three-dimensional depth value from two-dimensional images, the three-dimensional feature point coordinates from all views are merged and collected into a single set of three-dimensional feature point coordinates. In a preferred embodiment of the method for estimating three-dimensional depth value from two-dimensional images, the views are merged by using an inverse rotation matrix with equation (8):

wherein,

(x' y', z') are merged three-dimensional feature point coordinate after applying the inverse rotation matrix on three-dimensional feature point coordinate (x, y, z).

In a preferred embodiment of the method for estimating three-dimensional depth value from two-dimensional images, further filtering step on the single set of three-dimensional feature point coordinates is required to remove redundant points using a second filtering means.

In a preferred embodiment of the method for estimating three-dimensional depth value from two-dimensional images, the colour information is extracted using equations (2), (3) and (4):

(2)

(3) (4)

wherein,

Bi,j is a blue colour information of pixel (/^' ,/),

Gi,j is a green colour information of pixel (/,/),

Ri,j is a red colour information of pixel (/,/).

In a preferred embodiment of the method for estimating three-dimensional depth value from two-dimensional images, the world coordinate is obtained using equations (5) and (6): (5)

wherein,

objheight \s an actual object (102) height measured using linear callipers, detectedheight is a distance of highest point and lowest point of the object

(102) detected in the acquired image, measured in pixels:

(6) wherein,

(x_w, ₁ yw!) is the world coordinate for the two-dimensional feature point coordinate i,

(xpi, y_Pi) is a projected coordinate for the two-dimensional feature point coordinate i on the acquired image. In a preferred embodiment of the method for estimating three-dimensional depth value from a two-dimensional image, the inverse perspective mapping algorithm is implemented using equation (7):

wherein,

{x'wi, y'wi) is a corrected coordinate for the two-dimensional feature point coordinate from the first image,

(x_w,₁ ywi) is the world coordinate for the two-dimensional feature point coordinate in the first image.

z₁ is an approximated three-dimensional depth value for the two-dimensional feature point coordinate (x_w,₁ ywi),

In a preferred embodiment of the method for estimating three-dimensional depth value from two-dimensional images, the step of filtering two-dimensional feature point coordinates using the first filtering means is by applying Euclidean distance method.

In a preferred embodiment of the method for estimating three-dimensional depth value from two-dimensional images, the estimation of three-dimensional depth value is carried out in a controlled environment includes controlling lighting and background colour.

In a preferred embodiment of the method for estimating three-dimensional depth value from two-dimensional images, the estimation of three-dimensional depth value is carried out in an open space.

In a preferred embodiment of the method for estimating three-dimensional depth value from two-dimensional images, further filtering step on the three-dimensional feature point coordinates is carried out to eliminate noise using a third filtering means, before the merging step.

Examples

In an exemplary embodiment of the present invention as illustrated in Figure 2, a kit (100) mainly comprises a rotatable plate (103) and an image capture apparatus (101 ), where an object (102) is a blue toy bird which placed on the rotatable plate (103) with a distance from the image capture apparatus (101 ). In a preferred embodiment, the blue toy bird has a width, height and depth of 134.03 mm, 94.90 mm, 35.12 mm respectively. The distance from the image capture apparatus (101 ) to a centre of rotatable plate (103) is set at a length that is able to capture the whole image of the object. An embodiment shows either use of two pieces of 180° half circle protractor for forming a circle or one piece of 360° protractor, which is placed underneath the rotatable plate (103) for ease of reference when measuring the angle of rotation of the rotatable plate (103).

The kit (100) of the present invention is for estimating three-dimensional depth value of the object (102) by carrying a method as shown in Figure 1 . In an exemplary embodiment, the kit (100) is placed in a controlled environment which includes controlling lighting and background colour. The background colour is preferably black colour to eliminate reflections during image capturing. A first image of the object (102) is acquired by rotating the rotatable plate (103) to a predetermined angle between 0° to 360°. In a preferred embodiment, said predetermined angle is 0°, 60°, 120°, 180°, 240° and 360°. In another preferred embodiment, a combination of predetermined angle may be selected for acquiring multiple images at different views of the object (102), for example 0° and 180° in a pair angles and 60° and 240°in a pair angles. The pair angles is preferably a vertically opposite angles. There is at least two views of the object (102) are acquired to obtain at least four images to reflect all sides of the object (102).

Then, the rotatable plate (103) is rotated to an angle in a range of 1 ° to 35° relative to the predetermined angle of the first image for capturing a second image. In a preferred embodiment, the first image and the second image are captured using the image capture apparatus (101 ), wherein the image capture apparatus (101 ) is preferably a camera and more preferably a webcam.

Two-dimensional feature point coordinates is obtained by applying Good Features to Track technique and the colour information is extracted from the two-dimensional feature point coordinates, simultaneously, from the first image and the second image. Said colour information is extracted using equations (2), (3), and (4) as followings:

(2) (3) (4)

wherein,

Bi,j is a blue colour information of pixel (/^',_/),

Gi,j is a green colour information of pixel (/^',_/),

Ri,j is a red colour information of pixel (/,/).

The two-dimensional feature point coordinates is filtered to eliminate noise using first filtering means, where the detected two-dimensional feature point coordinates which are located further away from the centre of the images are assumed to have a higher probability of noise. The two-dimensional feature point coordinates is filtered by applying Euclidean distance method.

Pyramidal Lucas-Kanade optical Flow technique is applied to the two-dimensional feature point coordinates to obtain displacement magnitudes and to find matching two-dimensional feature point coordinates from the first image and second image. If any of the two-dimensional feature point coordinates from the first and second image do not match, the unmatched two-dimensional feature point coordinates are discarded. Then, a World Coordinate for each of the two-dimensional feature point coordinates is calculated using equations (5) and (6) as followings: (5)

wherein,

objheight \s an actual object (102) height measured using linear callipers,

detectedheight is a distance of highest point and lowest point of the object (102) detected in the acquired image, measured in pixels:

wherein,

(x_w, ₁ ywi) is the world coordinate for the two-dimensional feature point coordinate i,

(xpi, y_Pj) is a projected coordinate for the two-dimensional feature

point coordinate i on the acquired image.

The three-dimensional depth value of the acquired images is estimated by calculating each of the two-dimensional feature point coordinates using equation (1 ), thereby producing three-dimensional feature point coordinates:

(1) wherein,

z_w is the three-dimensional depth value for the two-dimensional feature point coordinate (xwi, ywi),

(x_w,₁ ywi) is a coordinate of the two-dimensional feature point from the first image,

a is a rotation angle.

Then, an inverse perspective mapping algorithm is implemented on the three-dimensional feature point coordinates to remove perspective distortion effects using equation (7):

wherein,

zi is an approximated three-dimensional depth value for the two-dimensional feature point coordinate (xwi, ywi),

objdist is a distance of center of rotation (COR) from the image capture apparatus (101 ) (optic center).

The three-dimensional feature point coordinates from all individual views collected are merged into a single set of three-dimensional feature point coordinates. The merging step is performed using an inverse rotation matrix with equation (8):

wherein, (x' y', z') are merged three-dimensional feature point coordinate after applying the inverse rotation matrix on three-dimensional feature point coordinate (x, y, z). The single set of three-dimensional feature point coordinates is filtered to remove redundant points using a second filtering means.

In another exemplary embodiment of the present invention, the kit (100) is placed in an open space for carrying out the estimation of three-dimensional depth value of the object (102). If the kit (100) is placed in the open space, a further filtering step is required on the three-dimensional feature point coordinates to eliminate noise using a third filtering means, before the merging step of the present invention. Example of calculation for three-dimensional depth value approximation

User input (known values):

objdist = 300 mm

a = 3°

objectheight = 94.90 mm

Detected two-dimensional feature point pair:

Point 1 (x_P1, y_P1) = (12.3456, 123.4567)

Point 2 (X_P2, y_P2) = (15.1234, 120.3333)

^*Point 2 is a matching two-dimensional feature point of Point 1 in the second image after the a° rotation.

A minimum value and a maximum value of y is extracted from the two-dimensional feature points collected to calculate a detected height. For example, min (y) = 50.1932

max ( ) = 313.4567

detectedheight = 313.4567 = 263.2635

So, the 3D coordinate before inverse perspective projection for Point 1 is (4.4506, 44.5061 , 0.9883).

By applying the perspective distortion, inverse perspective projection is applied on the point to find the corrected coordinates,

So, the three-dimensional coordinate for Point 1 after inverse perspective projection is (4.4359, 44.3592, 0.9883). Although the present invention has been described with reference to specific embodiments, also shown in the appended figures, it will be apparent for those skilled in the art that many variations and modifications can be done within the scope of the invention as described in the specification and defined in the following claims.

Description of the reference numerals used in the accompanying drawings according to the present invention:

Claims

I/We claim: 1 . A method for estimating three-dimensional depth value from two-dimensional images, characterised by the steps of:

acquiring a first view of the object (102) comprising a first image and a second image of the object (102), wherein the first image of the object (102) is captured at an angle between 0° to 360°, and the second image is captured at an angle in a range of 1 ° to 35° relative to the first image; obtaining two-dimensional feature point coordinates by applying Good Features to Track technique and extracting colour information, simultaneously, from the first image and the second image;

filtering two-dimensional feature point coordinates to eliminate noise using first filtering means;

applying Pyramidal Lucas-Kanade Optical Flow technique to obtain displacement magnitudes from the two-dimensional feature point coordinates and to find matching two-dimensional feature point coordinates from the first image and second image;

calculating World Coordinate for each of the two-dimensional feature point coordinates;

wherein,

z_w is the three-dimensional depth value for the two-dimensional feature point coordinate (x_w1, ywi),

(x_w2 , Y_w2) is a matching coordinate of the two-dimensional feature point in the second image,

2. The method for estimating three-dimensional depth value from two-dimensional images according to claim 1 , wherein at least two views of the object (102) are acquired to reflect all sides of the object (102).

3. The method for estimating three-dimensional depth value from two-dimensional images according to claim 1 and claim 2, wherein said first image is captured at an angle of 0°, 60°, 120°, 180°, 240°, 360° or a combination thereof.

4. The method for estimating three-dimensional depth value from two-dimensional images according to claim 2, wherein the three-dimensional feature point coordinates from all views are merged and collected into a single set of three-dimensional feature point coordinates.

5. The method for estimating three-dimensional depth value from two-dimensional images according to claim 4, wherein the views are merged by using an inverse rotation matrix with equation (8):

wherein,

x' y', z') are merged three-dimensional feature point coordinate after applying the inverse rotation matrix on three-dimensional feature point coordinate (x, y, z).

6. The method for estimating three-dimensional depth value from two-dimensional images according to claim 4, wherein further filtering step on the single set of three-dimensional feature point coordinates is required to remove redundant points using a second filtering means.

7. The method for estimating three-dimensional depth value from two-dimensional images according to claim 1 , wherein the colour information is extracted using equations (2), (3) and (4):

wherein,

Bi,j is a blue colour information of pixel (/,/),

Gi,j is a green colour information of pixel (/,/),

R Rii,,jj is a red colour information of pixel (/,/).

8. The method for estimating three-dimensional depth value from two-dimensional images according to claim 1 , wherein the world coordinate is obtained using equations (5) and (6):

(5) wherein,

objheight \s an actual object (102) height measured using linear callipers,

(6)

wherein,

(xpi, y_Pj) is a projected coordinate for the two-dimensional feature point coordinate i on the acquired image.

9. The method for estimating three-dimensional depth value from two-dimensional images according to claim 1 , wherein inverse perspective mapping algorithm is implemented using equation (7):

wherein,

z-i is an approximated three-dimensional depth value for the two-dimensional feature point coordinate {x_w1, y_w1),

10. The method for estimating three-dimensional depth value from two-dimensional images according to claim 1 , wherein the step of filtering two-dimensional feature point coordinates using the first filtering means is by applying Euclidean distance method.

1 1 . The method for estimating three-dimensional depth value from two-dimensional images according to claim 1 , wherein the estimation of three-dimensional depth value is carried out in a controlled environment includes controlling lighting and background colour.

12. The method for estimating three-dimensional depth value from two-dimensional images according to claim 1 , wherein the estimation of three-dimensional depth value is carried out in an open space.

13. The method for estimating three-dimensional depth value from two-dimensional images according to claim 4 and claim 12, wherein further filtering step on the three-dimensional feature point coordinates is carried out to eliminate noise using a third filtering means, before the merging step.