[go: up one dir, main page]

WO2004012144A1 - Systeme et procede d'affichage d'images numeriques reliees entre elles de façon a permettre la navigation entre les vues - Google Patents

Systeme et procede d'affichage d'images numeriques reliees entre elles de façon a permettre la navigation entre les vues Download PDF

Info

Publication number
WO2004012144A1
WO2004012144A1 PCT/SE2003/001237 SE0301237W WO2004012144A1 WO 2004012144 A1 WO2004012144 A1 WO 2004012144A1 SE 0301237 W SE0301237 W SE 0301237W WO 2004012144 A1 WO2004012144 A1 WO 2004012144A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
transformation parameters
camera
scenery
images
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/SE2003/001237
Other languages
English (en)
Inventor
Sami Niemi
Mikael Persson
Karl-Anders Johansson
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Scalado AB
Original Assignee
Scalado AB
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Scalado AB filed Critical Scalado AB
Priority to AU2003247316A priority Critical patent/AU2003247316A1/en
Publication of WO2004012144A1 publication Critical patent/WO2004012144A1/fr
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T15/003D [Three Dimensional] image rendering
    • G06T15/10Geometric effects
    • G06T15/20Perspective computation
    • G06T15/205Image-based rendering
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/60Control of cameras or camera modules
    • H04N23/66Remote control of cameras or camera parts, e.g. by remote control devices
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/60Control of cameras or camera modules
    • H04N23/698Control of cameras or camera modules for achieving an enlarged field of view, e.g. panoramic image capture
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N5/00Details of television systems
    • H04N5/222Studio circuitry; Studio devices; Studio equipment
    • H04N5/262Studio circuits, e.g. for mixing, switching-over, change of character of image, other special effects ; Cameras specially adapted for the electronic generation of special effects
    • H04N5/2624Studio circuits, e.g. for mixing, switching-over, change of character of image, other special effects ; Cameras specially adapted for the electronic generation of special effects for obtaining an image which is composed of whole input images, e.g. splitscreen

Definitions

  • This invention relates to a system and method for displaying digital images linked together and for enabling a user to navigate through the linked digital images.
  • this invention relates to a system and method for generating a novel view from digital images or video sequences obtained from one or more fixed camera views during a user's navigation through said digital images or video sequences.
  • a digital image of a particular object such as an object presented in a digital image on a computer, may be established through linking a series of digital pictures together so as to achieve a wide or panoramic representation of the object.
  • the digital representation comprises vertices, each representing a digital image, and edges binding together a first vertex and a second vertex.
  • An edge represents information on the transition between a first digital image and a second digital image.
  • the first and second digital image comprise a first and second area, respectively, wherein the depictions in the first area substantially correspond to the depictions in the second area.
  • the transition information defines how at least one of the digital images is to be manipulated in order to provide a smooth boundary between the two digital images.
  • the disclosed image-based representation and method requires a user to manually assist in identifying the substantially corresponding ar- eas prior to linking the series of digital images. Hence in representations including a plurality of digital images it is considerably time consuming to link each of the digital images together.
  • American patent US 6,337,688 discloses a method and system enabling a user lacking specialized programming skills and training to produce a realistic simulation of a real environment.
  • the simulation is constructed from a series of recorded frames that each contain an image of a real environment.
  • each frame comprises data specifying the associated position and orientation within the real environment.
  • the associated positions are recorded in- a camera utilising a position and orientation sensor implemented with inertial sensors that permit the camera to sense positional changes without directly receiving externally generated position information.
  • Canadian patent application CA 2,323,462 discloses a method and system for processing images into a mosaic.
  • the method and system convert both the input image and the mosaic into Lapla- cian image pyramids and a real-time alignment process is applied to the levels within the respective pyramids.
  • the method and system uses a coarse to fine image alignment ap- proach.
  • the result of the alignment process is alignment information that defines the required transformations to achieve alignment.
  • the method and system disclosed in the Canadian patent application however, applies two-dimensional displacements image relation, which are inaccurate for images captured from a fixed camera position. This becomes especially noticeable when working with live video image streams.
  • International patent application 098/54674 describes a method, instruction set and apparatus for combining related source images, each represented by a set of digital data, by determining three-dimensional relationships between data sets representing related source images and creating a data set representing an output image by combining the data sets represent- ing the source images in accordance with the determined three- dimensional relationships.
  • a motorized device controlling the pan, tilt and zoom of the camera is a straightforward way to solve the navigation problem.
  • the camera is mounted on a rotation device of which the pan, tilt can be remote controlled. This enables, if a suitable user interface exists, remote control navigation of the camera.
  • FIGS. 2a and 2b show typical extensive user interfaces for creating navigation enabled panoramic images with high quality.
  • Some tools for creating panoramic images exists, which do not apply relatively complicated mathematical operations. This results in an image of lower quality, however this seriously reduces the need for a complicated user interface.
  • These tools are often seen bundled with digital cameras, as shown in figures 3a and 3b.
  • FIG. 4a and 4b shows the result of the reversion process needed to convert the image captured by a camera equipped with a fisheye lens into a panoramic image. Since only one camera is used, the entire field of view, which is often more than 180 degrees, is compressed onto the image plane of the camera. This implies that an enormous resolution is required.
  • Images captured by a fish eye lens are compressed in a spherical manner, i.e. the image is thus not compressed equally over the image plane but compressed more at the edges.
  • the above referenced prior art technologies perform a single estimation of a displacement between corresponding areas of two images and assume that the single estimation is the correct one. Hence the statistical hit rate of the prior art technology is low.
  • the object of the present invention to clearly display the physical relation of the digital images acquired by cameras in relation to each other. This gives the user a good overview of the scenery to be captured.
  • temporal synchronization errors are identified and eliminated so as to avoid correlation of image points in two images not representing a projection of the same scenery point.
  • a particular feature of the present invention is the provision of a self-optimization procedure enabling an automatic and continuous recalibration of any number of cameras utilised for providing digital images.
  • a particular feature of the present invention is the provision of clustering technique performing a number of correlations originating from different initial displacements.
  • a first aspect of the present invention obtained by a method for generating a view of at least part of a scenery from a plural- ity of images showing said scenery, comprising:
  • navigation is in this context to be construed as a tool for moving through a series of linked images. During navigation it appears as if a camera is moving although all cameras are actually static, which implies that no moving parts are re- quired.
  • image stream is in this context to be construed as a representation of a continuous flow of images, for instance from a network camera. At each instant a single image is avail- able for retrieval.
  • projective transformation is in this context to be construed as the process of projecting points in three-dimensional space onto a plane or as mostly used in our case pro- jecting points from one plane onto another. It is further described in "Multiple view geometry in computer vision” by Richard Hartley and Andrew Zisserman, Cambridge university press 2000.
  • panoramic images is in this context to be construed as referring to images covering a very wide field of view, usually more than 180 degrees. If only parts of the image are shown at one time navigation such as pan, tilt, rotation and zoom are made possible without actually moving the cameras capturing the images.
  • image stitching is in this context to be construed as referring to the process of creating a panoramic image out of many images captured with narrow field of view. All images are captured from the same point of view and only differs by viewing direction, this implies that the images are related by a projective transform or a so called homography, further de- scribed in "Multiple view geometry in computer vision” by Richard Hartley and Andrew Zisserman, Cambridge university press 2000. If this projective transform can be found the images can be stitched and blended together into one large mosaic of images making a panoramic image.
  • transition data is in this context to be construed as referring to the computed parameters, in the case of stitching panoramic images the projective transform relating the images, needed to enable navigation.
  • a or “an” is in this context to be construed as “one”, “one or more”, “at least one”.
  • a plurality of digital images are linked together according to a geometrical interrelationship between the digital images such as a series of still digital photographs of an object or objects from various angles, video sequences of an object or objects from various angles, or a combination thereof .
  • the method according to the first aspect of the present invention provides means for generating transformation parame- ters linking a plurality of images forming a scenery and means for enabling a user to view any particular part of the scenery.
  • a particular advantage of the method according to the first aspect of the present invention is the fact that the images do not need to be relocated from the cameras to the processor unit in order for the processor unit to generate transformation parameters.
  • a user may through a viewer directly connected to the processor unit or connected through a co - puter network communicating with said viewer generate a specific view in a scenery captured by the cameras without having to communicate entire images from the cameras but only the particular views.
  • This provides a method which significantly increases applicability of the present invention since the transmission data is reduced.
  • a second aspect of the present invention obtained by a method for generating one or more transformation parameters interrelating a plurality of images each showing at least a part of a scenery and for displaying said images in accordance with the transformation parameters, and said method comprising:
  • the method according to the second aspect of the present invention enables a user to auto-configure how a plurality of images are to be linked together to form a scenery.
  • the method enables a user to re-compute the automatically generated proposal, and if the user finds that any of the camera's field of view needs to be adjusted.
  • the method according to the second aspect of the present invention is particularly advantageous since the configuration of the images is performed automatically. Hence the production time is significantly reduced and the operation needed to produced the appropriate links simplified.
  • a system for generating one or more transformation parameters interrelating a plurality of images each showing at least a part of a scen- 15 ery comprising: a) a first camera for capturing a first image of a first part of said scenery; b) a second camera for capturing a second image of a second part of said scenery, said first part and said
  • the first and second camera according to the third aspect of the present invention may comprise a digital still camera, - 30 a network camera, cell phone, mobile phone, a digital video camera, any other device capable of generating said views of said scenery, or any combination thereof.
  • the variety of digital imaging devices have multiplied in recent years, the sys- tern according to the third aspect of the present invention may comprise any type known to the person skilled in the art.
  • the communication lines according to the third aspect of 10 the present invention may comprise a wire or wireless dedicated line, computer network, television network, telecommunications network or any combinations thereof.
  • the communication lines may in fact enable communication not only between the cameras and the processor unit but also between the cameras, 15 the processor unit and further peripherals connecting to the cameras or processor unit.
  • the processor unit may comprise a plurality of processor devices . ' 20 communicating with one another through a communications network.
  • the processor unit is in this context to be construed as any number of processors inter-connected so as to communicate with one another.
  • the cameras may include processors for preliminary handling of the images, and the system
  • ⁇ .5 further may comprise processors for performing mathematical operation on the images and processors for communicating with a plurality of various clients.
  • the processor unit according to the third aspect of the 310 present invention may comprise a viewer adapted to enable a first user to navigate through the scenery and a camera configuration display adapted to calculate the transformation pa- rameters and enable a second user to store the transformation parameters in a storage device.
  • the processor unit may comprise a server communicating with the first and second camera through the communication lines and adapted to establish a database for the first and second image in a storage device.
  • the server may be adapted to calculate said transformation parameters and/or communicate with a camera configuration display for calculat- ing said transformation parameters and to enable a second user to store the transformation parameters in the storage device.
  • the server may further be adapted to communicate with a viewer for determining a view of at least part of the scenery in accordance with user interaction navigating in the scenery.
  • the processor unit may comprise a processor device in at least one of the first and second cameras, and a viewer communicating through the communication lines with the first and second camera, the viewer determining a view of at least part of the scenery in accordance with user interaction navigating in the scenery.
  • the above listed three alternatives provide for solution fulfilling requirements of various systems .
  • the first alterna- tive presents a processor unit performing both the viewing operation and the calculation operation needed in order to enable a user to navigate in a scenery consisting of a plurality of images.
  • the processor unit may be implemented on a server communicating with further processor devices in the system and enabling clients (e.g. other processor devices) connecting to the server to utilise the information generated by the server.
  • the second alternative is particularly advantageous for implementation on a computer network such as the Internet.
  • the server may include a capability to connect to mobile processor devices such as mobile or cell phones having multimedia displaying means.
  • the third alternative provides a processor unit integrated as processor devices in the cameras and thus eliminating the use for a server thus enabling a system which is mobile.
  • navigating software may be embedded in a network camera.
  • the system according to the third aspect of the present invention may further comprise a display for displaying the first and second image in accordance with the transformation parameters.
  • the display may be implemented as a multimedia display of a mobile or cell phone, or may be implemented as any monitor communicating with the processor unit.
  • the storage device may be adapted to store the transformation parameters and/or the first image and/or the second image and may be established according to any techniques known to a person skilled in the art.
  • the system according to the third aspect of the present invention may further incorporate any features of the method according to a first aspect and any features of the method according to a second aspect of the present invention.
  • a fourth and fifth aspect of the present invention obtained by a computer program comprising code adapted to perform the method according to the first aspect of the present invention and a computer program comprising code adapted to perform the method according to the second aspect of the present invention.
  • the computer program according to the fourth and fifth aspect of the present invention may incorporate features of the method according to the first aspect of the present invention, features of the method according to the second aspect of the present invention, and features of the system according to the third aspect of the present invention.
  • the system and methods according to the first, second and third aspect of the present invention may comprise a user interface used in conjunction with an automatic computation of transformation parameters requiring very little or no user interaction and may easily be embedded in processor devices having limited graphical and memory storage capabilities.
  • the present invention remedies this by enabling the use of multiple inexpensive consumer targeted cameras to be used to allow navigation such as pan, tilt, rotation and zoom. The requirement of an extensive user interface is eliminated as described in the following section.
  • figure 1 shows a photograph of a prior art pan/tilt camera toolkit.
  • the camera is mounted on a device enabling motorized remote control of pan, tilt and zoom;
  • figures 2a and 2b show two screen shots of prior art user interface for a panoramic image creation software;
  • FIGS. 3a and 3b show a prior art user interface of a digital camera and a merge preview window
  • FIGS. 4a and 4b show an image captured by a camera equipped with a fisheye lens and a panoramic image created by reversing the fisheye effect
  • figure 5 shows a camera configuration display according to a first embodiment of the present invention
  • figure 6 shows a camera configuration display according to a second embodiment of the present invention.
  • figure 7 shows a camera configuration display according to a third embodiment of the present invention.
  • figure 8 shows a first graphical user interface of the camera configuration display according to the first, second and third embodiments of the present invention
  • figure 9 shows a second graphical user interface of the camera configuration display illustrating image streams presented in three dimensions
  • figure 10 shows a flow chart of the method for linking image streams according to a fourth embodiment of the present in- vention
  • figure 11 shows flow and components of an automatic computation of transformation parameters
  • figure 12 shows flow and components of a phase correlation method
  • figure 13 shows an example of displacement of peaks as applied in the phase correlation method
  • FIGS. 14a and 14b show image “A” and image “B”, being slightly displaced relative to one another;
  • figure 15 shows a correlation surface having an arrow indicating one candidate displacement vector
  • figure 16 shows a correlation surface as a periodic sig- nal
  • FIGS. 17a, 17b, 17c, and 17d show first displacement vectors
  • FIGS. 18a, 18b, 18c, and 18d show second displacement vectors
  • figure 19 shows a graph of sites of initial displacement vectors for phase correlation.
  • Figure 5 shows a system according to a first embodiment of the present invention designated in entirety by reference nu- eral 10 and comprising a number of vital components.
  • a plurality of cameras 12 being any type of device capable of delivering images, provide the system 10 with image streams. There are no special requirements on the cameras
  • any ordinary off-the-shelf consumer targeted camera can be used.
  • the system 10 handles ordinary still images as well as live video streams. This requires the system 10 to operate well in real-time, since the content of video image streams can continuously change.
  • a database 14 stores parameters and other information required to enable navigation through a mosaic of im- ages.
  • the database 14 may be incorporated in one of the cameras 12, if the cameras 12 are equipped with a memory storage unit.
  • a camera configuration display 16 is the component responsible for computing the parameters required for the navigation and responsible for showing the current camera configuration.
  • the parameters are computed from the camera image streams and user input, the result is stored in the database 14.
  • a viewer 18 the component performing the actual navigation, can use the pre-computed parameters and camera image streams to enable navigation.
  • the viewer 18 presents the result on for instance a computer screen and enables navigation through images with for instance an attached computer mouse or other pointing device.
  • the system 10 enables a user to open any number of viewers simultaneously and the user can view different parts of the scenery independent of each other.
  • the system 10 operates by having the plurality of cameras 12 provide one or more streams of images to the camera configuration display 16 and the viewer 18.
  • the camera configuration display 16 computes transformation parameters and stores these in the database 14.
  • the viewer 18 receives the stream of images from the plurality of cameras 12, displays them for a user, and enables the user to navigate through the stream of images.
  • the system 10 uses the database 14 as a storing device for the computed parameters .
  • the database 14 is resident on one of the cameras 12, eliminating the use of a server component.
  • the system 10 may be configured in a number of different set-ups.
  • Figure 6 shows a system according to a second embodiment of the present invention, which system is designated in entirety by reference numeral 30.
  • the system 30 comprises a plurality of cameras 32 and an actual server 34 containing a database.
  • the server 34 has the capability to store and manipulate the camera image streams re- ceived from the plurality of cameras 32.
  • the camera image streams are uploaded to the server 34.
  • the server component 34 accesses the images and computes the required transformation parameters.
  • a camera configuration display 36 downloads the images and transformation parameters and displays the camera set-up.
  • the viewer 38 downloads the image streams and in addition the transformation parameters.
  • the server 34 is intelligent so as to enable automatic calibration as well as minimiz- ing the amount of downloaded data.
  • the server can automatically detect and mask out motion in the streams of images. This implies that parts of an image that includes motion are detected and masked out and excluded from the computation of the transformation parameters. If pixels containing movement in the images are used it will result in errors in the transformation parameters. This implies that the transformation parameters can be re-calibrated even though the image streams contain motion. This is simply implemented by comparing a number of temporal neighbouring images in the image stream, the areas that are constant are included in the mask and used when computing transformation parameters. The parts that differ are excluded from the mask.
  • a server component can apply additional image processing such as correction for lens distortions, colour adjustment and additional compression.
  • FIG. 6 shows a system according to a third embodiment of the present invention, which system is designated in entirety by reference numeral 50.
  • the system 50 is the "black box"-case where no user interfaces are required.
  • a plurality of cameras 52 transmit image streams to a viewer 54 that computes all relevant parameters in real-time and no storing capacity is needed. This is ideal for off-the-shelf consumer targeted cameras, where no processing or storing capacity exists.
  • the camera images are fetched by the viewer 54, which initially performs the automatic computation of transformation parameters.
  • the parameters are only stored locally and used for enabling navigation. The parameters can be continuously recomputed, to ensure that the link between the images is correct if the cameras 52 would for instance be knocked out of place.
  • the camera configuration display user interface is designed to be as effective as possible regarding, ease of use, speed and the memory requirements of the platform hosting the interface.
  • the platform may be hosted on a digital camera (video and/or still) , a cell phone, a personal computer, and/or a server accessible through a computer network or a dedicated communication line.
  • a central part of the camera configuration display user interface is the use of a method for automatic linking of images, described below. This implies that all steps required to compute the transition data for the case where the involved cam- eras approximately share focal point location are performed automatically.
  • the software automatically estimates the region where two images substantially correspond.
  • substantially corresponding regions should in this context be construed as the two areas depicting or showing the same or mainly the same part of the same or at least almost identical objects or sceneries.
  • a first area depicting a front view of a painting substantially corresponds to a second area depicting the painting from a side angle.
  • the second area will show more of the frame of the painting, but the information in the first and the second area will still be corresponding.
  • the purpose of the camera configuration display user interface is to show the placements of the cameras and give the user the opportunity to, while the image streams are continuously updated, interactively move the cameras capturing the images and thus updating the transformation parameters i.e. the link between images .
  • the interface also works as a backup if an automatic computation of transformation parameters fails.
  • the camera configuration display user interface is designed to effectively and fast create navigation able content from live video image streams.
  • FIG 8 shows a camera configuration display user interface designated in entirety by reference numeral 60.
  • the camera configuration display user interface 60 is shown working with only a first image stream 62 and a second image stream 64 for illustrative purposes only.
  • the camera configuration display user interface 60 may include any number of image streams. If the user is not content with the result he may order the camera configuration display through the camera configuration display user interface 60 to re-link by activating button 66. When the user is satisfied with the result he may order the camera configuration display through the camera configuration display user interface 60 to store the transformation parameters for image streams by activating a button 68.
  • An alternative set-up is to present the image relations in three dimensions as the cameras are actually physically placed. This implies that a three-dimensional presentation is required. The presentation can be rotated, translated and moved to view different parts of the camera set-up. This form of interface is required for future extensions, if the cameras were to be allowed to be placed arbitrarily, i.e. not required to share focal point.
  • the workflow of the three-dimensional user interface differs from the two-dimensional. The difference is the possi- bility to view the camera set-ups from different angles and positions.
  • the 3D user interface also requires that the complete computation process is executed before the image streams can be related in three dimensions.
  • the set-up process is completed in two steps, initial two-dimensional placement and final optimization. This cannot be done in the three-dimensional case.
  • the three-dimensional user interface is used only to show the relative location and orientation of the cameras .
  • Figure 9 shows a 3D camera configuration display user interface designated in entirety by reference numeral 70.
  • a first live video stream 72 and a second live video stream 74 are projected according to camera position and orientation.
  • the camera configuration display user interface 70 is shown working with only two video streams for illustrative purposes only.
  • the camera configuration display user interface 70 may include any number of image streams.
  • Marker 76 symbolises the camera centre and punctured lines 78 illustrate the projection of the first live video stream 72 and punctured lines 80 illustrate the projection of the second video stream 74.
  • the user may order the camera configuration display through the camera configuration display user interface 70 to re-compute by activating button 82.
  • the user may order the camera configuration display through the camera configuration display user interface 70 to store the transition data linking the image streams by activating a button 84.
  • Figure 10 shows a flowchart of the method for linking image streams according to a fourth embodiment of the present invention.
  • the method is designated in entirety by reference numeral 90.
  • the object of the camera configuration display and thus the method 90 is to define the relative two-dimensional placement among image streams. This placement is required to perform a final step 98 of the method 90.
  • the final step 98 can be quite time consuming and is performed only when the user is satisfied with the camera set-up and the transformation parameters are stored.
  • the relative placement of the image streams is achieved automatically through the first step 92, the initial placement, using a phase correlation step of the automatic computation of transformation parameters. This process is only performed at start-up or when requested by the user and can therefore be computational expensive, this increases the reliability of the method.
  • This placement consists only of relative two-dimensional displacements.
  • the user may in a second step 94 physically adjust the camera or cameras and view the result.
  • the user may in a third step 96 click drag and drop the camera images relative to one another. When the user is satisfied with the camera set-up, the user initiates the final step 98, performing the non-linear optimization and stores the transformation parameters .
  • a stripped down automatic placement method computes new placements relative to the previously computed displacements. This enables the user to adjust the cameras and view the result at interactive frame rates. If the user is satisfied with the camera set-up, the user can initiate the nonlinear optimization and store the parameters. If for some reason the user is unsatisfied and wants to start over, this can be achieved through ordering a re-computation.
  • the first step 92 of the method 90 fails to perform the initial placement a texture snap function is available.
  • the user can click-drag the image of a video stream and approximately place it at the correct position and the method 90 will use the texture content of the two video streams to align them correctly. In other words, the user supplies a guess displacement defining the overlapping area that substantially corresponds between the images.
  • the method 90 then computes the best match in texture originating from the user guess. This greatly reduces the time spend on defining the transition data even if the automatic procedure fails. If the user is satisfied with the camera set-up, the user can initiate the non-linear optimization and store the parameters. However, if for some reason the user is unsatisfied and wants to start over, this can be achieved by ordering a re-com- putation.
  • the quality of the result can vary, and the correctness of the placement is not always immediately obvious to the user.
  • the borders of the video stream images are indicated, using a colouring system, to visualise quality of the resulting video stream mosaic e.g. correctness of the placement.
  • the quality is computed as the inverse of error. If the match error is below a certain threshold the indication system indicates a match, e.g. green border. If the error is above a certain upper threshold the indication system indicates a mismatch, e.g. a red border. This can occur if the images have no overlapping areas and the cameras need to be adjusted. Anything in-between is indicated by a gradient between for instance two colours or shades of one colour.
  • This function is not required in the 3D user interface case since the camera set-up can be viewed from any direction.
  • a re-computation function is available through the button 66 or 82 in the user interfaces 60 and 70, respectively.
  • a usual case where this is required is if the images have no overlap when the first initial placement was performed and no computation could be performed.
  • the method 90 performs an optimization that uses the two-dimensional displacements to compute the homographies relating the image, i.e. enabling the generation of the transfor- mation parameters needed to enable navigation.
  • the parameters needed by the viewer to enable navigation are stored in the database, or used directly to view the result.
  • One of the main features of the present invention is the use of a method for automatic linking images according to a fifth embodiment of the present invention.
  • the fact that a fully automatic method is available enables a number of interesting cases .
  • the method for automatic linking images works completely without requiring user intervention. This enables integration of the method in units where no graphical user interface is possible, for instance in digital cameras and mobile units.
  • the method simply requires a number of image streams and computes the image relations enabling linking of images . Obviously the above described navigation through linked images may be similarly integrated where no user interface is available.
  • the method for automatic linking images requires a number of image streams and fully automatically computes the transformation parameters needed to perform the navigation that is the nature of the invention. This could for instance be used to publish navigation-enabled content directly from the digital camera or other mobile unit responsible for acquiring the source image streams.
  • the method for automatic linking of images can be extended to work with images related by different degrees of zoom instead of images related by different orientations. This can be used to create zoomable images with no graphical user interface required.
  • the method requires a number of images related by different degree of zoom and produces a zoomable image.
  • the method for automatically linking images works without user intervention it can be used to automatically and continuously recalibrate the cameras by continuously re-comput- ing the transformation parameters. This prevents the resulting link to become distorted if the cameras were to be physically affected, for instance knocked out of place.
  • the transformation parameters may be pre-computed and stored as a part of the image storing structure and may comprise transition data.
  • the transition data are continuously recomputed as the content of the images change (i.e. live video streams) .
  • a coarse version of the transition data are stored prior to display and used to obtain, in real-time, an optimized version of the data suitable for use when displaying the transitions, i.e. self-optimization.
  • the database/server can continuously update the parameters previously computed. This will ensure that the parameters are up to date although the camera positions or orientations are altered.
  • the cameras capturing the images should, at least in theory, share focal point. This is of course almost never possible in practise and thus some par- allax distortion will occur as a result of the focal point approximation. Using different projective transformations for different parts of the image streams these distortions can be reduced.
  • the method for automatic linking of images comprises automatic calibration on segments of the image individually in order to accomplish this.
  • the purpose of the method for automatic linking of images is, given two images of a scenery captured by two cameras that, often only approximately, share focal point but differ in ori- entation, to automatically compute the projective transformation (homography) relating the two images.
  • the projective transform is used to relate two image points in two images representing a projection of the same scenery point.
  • a three by three matrix called a homography can describe this relation.
  • An image point in one image ( u, v) and the homography matrix (mo. .7) can be used to find the corresponding image point in the second image ( u f , v r ) according to equation 1:
  • the homography matrix or other similar representations relating image pairs, such as two-dimensional displacements, is part of the transformation parameters used to enable navigation, i.e. the novel view generation.
  • the projection of a scenery point in the first image is related to the projection of the same scenery point in the second image by a projective transformation (homography) that can be described by a three by three matrix. If this projective transformation can be found, navigation in the form of change of view direction from the view direction used when capturing the first image to the view direction used when capturing the second image can be performed.
  • a projective transformation homoography
  • phase correlation 102 is used to find the estimated two-dimensional motion from a first image 104 to a second image 106.
  • the phase correlation 102 is performed multiple times originating from different displacements of the input images 104 and 106.
  • the second step 108 clusters the displacement estimates obtained in the phase correlation to find the statistically most likely displacement .
  • the two-dimensional displacement found in the two steps works as a starting estimate for the final non-linear optimization process 110.
  • the following sections describe each of the three steps in the method 100.
  • phase correlation 102 is to find the dis- placement of one signal in relation to another, most commonly a displaced version of the first signal. Below the one-dimensional case is described but the process is identical for the two-dimensional case.
  • the signals represent two-dimensional intensity images.
  • a signal is composed by the sum of its frequency compo- nents, each frequency has an amplitude and a phase displacement.
  • the standardized tool for extracting frequency and phase information from a signal is the Fourier transform.
  • Figure 12 shows basic components and workflow of the phase correlation 102.
  • the frequency and phase information are extracted from a first 112 and second signal 114 using the Fourier transform 116.
  • the Fourier transform 116 Preferably the Fast Fourier Transform, FFT .
  • phase differences from the first 112 and second signal 114 are computed during step 118 and fed as phase information to the phase difference step 120 for the output correlation signal. Then, the amplitudes from the first 112 and second signal 114 are processed in a normalization step 122.
  • the correlation signal 124 is obtained by inverse Fourier transform 126.
  • the result is a set of frequency components that all have normalized amplitude, but have phases corresponding to the difference between the phases of the input signals.
  • Figure 13 shows a first signal 130 consisting of three frequency components 132, 134, 136 and a second signal 138 consisting of three frequency components 140, 142 and 144.
  • the phase differences are constantly zero.
  • the frequency components will be aligned such that the position of their global maximum indicates the sought displacement.
  • the peak is central which indicates no dis- placement, that is along axis 148. In other words, if the two signals are not displaced the normalized frequency components are added with zero phase and produce a single peak in the centre of the inverse transform.
  • the correlation signal will contain phase information since it is obtained from the difference 150 of the phase of the first 130 and second signal 138.
  • the normalized signals 148 of the correlation signal will now reach their local maximum simultaneously at a peak shifted to the left, as illustrated in the bottom right of figure 13. The peak is shifted by the difference 150 moved from the first 130 to the second signal 138.
  • the two-dimensional case is analogous with the one-dimensional case.
  • the two-dimensional case is used when working with intensity images .
  • the correlation signal is represented by an intensity image, where a peak represents a detected motion.
  • a condition such as a thresholding operation, is used to find the motion estimate peak representing the largest object, preferably a displacement of the entire image. The local maximum of the selected peak can be found with sub-pixel accuracy, thus ensuring sub-pixel displacement accuracy as well.
  • the phase correlation is performed from a specific origin, i.e. a predefined displacement of the images.
  • the area of over- lap obtained when displacing the images is used when performing the phase correlation.
  • the two image blocks from the overlapping area are modulated with a window function that ensures there is no high frequency activity near the edges of the image blocks when performing the Fourier transform. This is done to prevent frequency peaks that are not related to the source images, but rather to the forced discontinuity of the signal as the Fourier transform treats the signal as periodic.
  • the window function chosen is shown below as equation 2, where "x" and "y” define a position within a window having height "H” and width " W":
  • the window function preserves the amplitude of relevant features near the edges of the image blocks but still removes the periodicity artefacts.
  • the phase correlation is performed as described earlier and a two-dimensional correlation surface is obtained.
  • the initial displacement is decided to be the zero vector since the two images are not very displaced. In practice, many different initial displacement are tested and combined. By applying phase correlation to the two images above, the correlation surface as seen in figure 15 is obtained.
  • the correlation surface is the result of two forward Fourier transforms and one inverse transform, the correlation surface must also be interpreted as having a periodic nature. Thus, for every peak found as many as three other displacements must be considered, as shown in figure 16. Only four of the infinite numbers of repeated peaks are valid since all other peaks represent displacements that would lead to no image overlap .
  • any vector that has a too large magnitude can be discarded.
  • the number of phase correlation steps performed and the placement of the initial displacements determine the deviation threshold. A large number of phase correlations should lead to a low tolerance.
  • a candidate vector is assigned to each pixel in the corre- lation surface having a relative value in a top level defined relative to a maximum in the ranges 5 to 50% or 10 to 40% level or such as ranges top 5 to 14% level, 15 to 24% level, 25 to 34% level, or 35 to 50% level.
  • the correlation surface is scanned for the maximum level, and then it is scanned a second time, creating displacement vectors for each point having a level above the peak level multiplied by 0.7. This makes the method exceptionally stable for level variations. If a displacement vector would result in an overlap area less then 10% of the original image area, or if its magnitude is too large, the vector is immediately discarded.
  • the next step is to measure the error when using a certain displacement vector.
  • the error metric used is a weighted mean square error where each pixel error is multiplied with a weight map before the summation.
  • the weight maps can be obtained by first applying a Sobel filter to the images and then filter them using a box filter. The actual weight used for each pixel difference is the largest weight from the two weight maps. By doing this, edges and areas with high contrast are required to be in approximately the same places in the two images, but large uniform areas do not affect the error.
  • Figures 17a-d and 18a-d show the visualization of two candidate vectors and the weight maps used to calculate the error.
  • the vector with the least error is chosen as the best match.
  • the single candidate obtained by one phase correlation step is not sufficient for a precise approximation of the final displacement by itself. Instead a number of phase correlation can- didates are collected from a number of origins, called sites, and sorted into clusters. The cluster with most candidates is statistically the correct displacement.
  • the sites' initial displacement vectors for the phase cor- relation step can be seen in figure 19. Each one of these sites will provide their candidate for the final displacement.
  • Each site's candidate vector is compared to every other site's vector and merged if they are supporting the same dis- placement within a certain tolerance.
  • their scores are added.
  • the score is the inverse of the vector's error.
  • the score is again converted to an error by taking its inverse.
  • the vector that now has the least error is determined to be the best over- all displacement vector for the image pair and is used to create an initial homography matrix for the next step of the method.
  • m 07 represents the elements of the homography matrix
  • ( x, , y, ) is the image point coordinates in image /'
  • (x. -j/, ) is the image point coordinates of image /
  • ⁇ and ⁇ r are weighting functions weighting edges and discontinuities in the respective im- ages so that regions of similar intensities does not affect the error function as much as the edges.
  • / ' and I ⁇ represents image /' and / modified with the respective weighting factor as previously described.
  • the partial derivatives, the Hessian matrix A and the weighted gradient vector b are continuously updated for each pixel of overlap between the two images.
  • the motion vector for the homography matrix is computed (equation 7) :
  • the homography is updated from iteration j to the next iteration j +1 .
  • the factor ⁇ is a time-varying stabilization parameter.
  • the factor ⁇ is a constant stabilization parameter used to slow the descent of the error minimization and reduce the influence of pixel noise. If the global error has decreased everything is fine, m is updated and another iteration is be- gun. If not ⁇ is increased by a factor 10 and Am is recomputed.
  • the complete non-linear optimization consists of the following steps:

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Geometry (AREA)
  • Computer Graphics (AREA)
  • General Physics & Mathematics (AREA)
  • Navigation (AREA)
  • Studio Devices (AREA)

Abstract

La présente invention concerne un système et un procédé d'affichage d'images numériques reliées entre elles de façon à permettre à un utilisateur de naviguer entre ces images numériques reliées entre elles. L'invention concerne plus particulièrement un système et un procédé permettant de générer une nouvelle vue à partir d'images numériques ou de séquences vidéo obtenues à partir d'une ou plusieurs vues fixes pendant la navigation de l'utilisateur dans ces images numériques ou séquences vidéo.
PCT/SE2003/001237 2002-07-31 2003-07-24 Systeme et procede d'affichage d'images numeriques reliees entre elles de façon a permettre la navigation entre les vues Ceased WO2004012144A1 (fr)

Priority Applications (1)

Application Number Priority Date Filing Date Title
AU2003247316A AU2003247316A1 (en) 2002-07-31 2003-07-24 System and method for displaying digital images linked together to enable navigation through views

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
SE0202342A SE0202342D0 (sv) 2002-07-31 2002-07-31 System and method for displaying digital images linked together to enable navigation through views
SE0202342-2 2002-07-31

Publications (1)

Publication Number Publication Date
WO2004012144A1 true WO2004012144A1 (fr) 2004-02-05

Family

ID=20288657

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/SE2003/001237 Ceased WO2004012144A1 (fr) 2002-07-31 2003-07-24 Systeme et procede d'affichage d'images numeriques reliees entre elles de façon a permettre la navigation entre les vues

Country Status (3)

Country Link
AU (1) AU2003247316A1 (fr)
SE (1) SE0202342D0 (fr)
WO (1) WO2004012144A1 (fr)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7262736B2 (en) 2004-05-25 2007-08-28 Nec Corporation Mobile communication terminal
CN102103457A (zh) * 2009-12-18 2011-06-22 深圳富泰宏精密工业有限公司 简报操作系统及方法
US8203597B2 (en) * 2007-10-26 2012-06-19 Hon Hai Precision Industry Co., Ltd. Panoramic camera

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5650814A (en) * 1993-10-20 1997-07-22 U.S. Philips Corporation Image processing system comprising fixed cameras and a system simulating a mobile camera
US6075905A (en) * 1996-07-17 2000-06-13 Sarnoff Corporation Method and apparatus for mosaic image construction
US6173087B1 (en) * 1996-11-13 2001-01-09 Sarnoff Corporation Multi-view image registration with application to mosaicing and lens distortion correction
US6304284B1 (en) * 1998-03-31 2001-10-16 Intel Corporation Method of and apparatus for creating panoramic or surround images using a motion sensor equipped camera

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5650814A (en) * 1993-10-20 1997-07-22 U.S. Philips Corporation Image processing system comprising fixed cameras and a system simulating a mobile camera
US6075905A (en) * 1996-07-17 2000-06-13 Sarnoff Corporation Method and apparatus for mosaic image construction
US6173087B1 (en) * 1996-11-13 2001-01-09 Sarnoff Corporation Multi-view image registration with application to mosaicing and lens distortion correction
US6304284B1 (en) * 1998-03-31 2001-10-16 Intel Corporation Method of and apparatus for creating panoramic or surround images using a motion sensor equipped camera

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7262736B2 (en) 2004-05-25 2007-08-28 Nec Corporation Mobile communication terminal
US8203597B2 (en) * 2007-10-26 2012-06-19 Hon Hai Precision Industry Co., Ltd. Panoramic camera
CN102103457A (zh) * 2009-12-18 2011-06-22 深圳富泰宏精密工业有限公司 简报操作系统及方法
CN102103457B (zh) * 2009-12-18 2013-11-20 深圳富泰宏精密工业有限公司 简报操作系统及方法

Also Published As

Publication number Publication date
AU2003247316A1 (en) 2004-02-16
SE0202342D0 (sv) 2002-07-31

Similar Documents

Publication Publication Date Title
US11721067B2 (en) System and method for virtual modeling of indoor scenes from imagery
CN105191287B (zh) 替换视频流中的对象的方法及计算机程序
US8477246B2 (en) Systems, methods and devices for augmenting video content
US9369694B2 (en) Adjusting stereo images
US20080253685A1 (en) Image and video stitching and viewing method and system
Majumder et al. Immersive teleconferencing: a new algorithm to generate seamless panoramic video imagery
EP1074943A2 (fr) Procédé et appareil de traitement d'image
US20140192147A1 (en) Automatic tracking matte system
WO2019164498A1 (fr) Procédés, dispositifs et produits-programmes informatiques d'ajustement global de paquet d'images 3d
KR20000023784A (ko) 모자이크 이미지 구성 방법 및 장치
CN104584032A (zh) 混合式精确跟踪
US8531505B2 (en) Imaging parameter acquisition apparatus, imaging parameter acquisition method and storage medium
US11328436B2 (en) Using camera effect in the generation of custom synthetic data for use in training an artificial intelligence model to produce an image depth map
WO2014121108A1 (fr) Procédés pour convertir des images bidimensionnelles en images tridimensionnelles
Bleyer et al. Temporally consistent disparity maps from uncalibrated stereo videos
JP2018116421A (ja) 画像処理装置および画像処理方法
JP2005063041A (ja) 3次元モデリング装置、方法、及びプログラム
Inamoto et al. Immersive evaluation of virtualized soccer match at real stadium model
CN115753019A (zh) 采集设备的位姿调整方法、装置、设备和可读存储介质
CA2822946C (fr) Procedes et appareil d'elaboration d'images composites
KR101841750B1 (ko) 매칭 정보에 의한 입체 영상 보정 장치 및 그 방법
CN119071651B (zh) 用于显示和捕获图像的技术
WO2004012144A1 (fr) Systeme et procede d'affichage d'images numeriques reliees entre elles de façon a permettre la navigation entre les vues
JP2005141655A (ja) 3次元モデリング装置及び3次元モデリング方法
TWI603288B (zh) 在3d空間中將3d幾何數據用於虛擬實境圖像的呈現和控制的方法

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NI NO NZ OM PG PH PL PT RO RU SC SD SE SG SK SL SY TJ TM TN TR TT TZ UA UG US UZ VC VN YU ZA ZM ZW

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IT LU MC NL PT RO SE SI SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
122 Ep: pct application non-entry in european phase
NENP Non-entry into the national phase

Ref country code: JP

WWW Wipo information: withdrawn in national office

Country of ref document: JP