WO2025123929A1

WO2025123929A1 - A system and method for position detection of one or more objects by a moving camera

Info

Publication number: WO2025123929A1
Application number: PCT/CN2024/126175
Authority: WO
Inventors: Rustom Adi Kanga; Ivy Lai Chun Li
Original assignee: Wildfaces Technology Ltd
Current assignee: Wildfaces Technology Ltd
Priority date: 2023-12-15
Filing date: 2024-10-21
Publication date: 2025-06-19
Anticipated expiration: 2026-06-15
Also published as: HK30101307A2

Abstract

A system and method for position detection of one or more objects in images captured by a moving camera, comprising at least one camera configured to capture one or more images of a scene including one or more objects, wherein the at least one camera is moveable, a computing apparatus operatively coupled to the at least one camera, the computing apparatus arranged in communication with a global positioning system, the computing apparatus configured to receive one or more images from a camera, wherein each image is an image of a scene including one or more objects, identify one or more reference features in the one or more images, detect camera movement based on a change in a local position of the one or more reference features between consecutive images, calculate a global position of the one or more reference features, calculate a position of other objects in the one or more images based on the global position of the one or more reference features.

Description

A SYSTEM AND METHOD FOR POSITION DETECTION OF ONE OR MORE OBJECTS BY A MOVING CAMERA

TECHNICAL FIELD

The present invention relates to a system and method for determining a position of one or more objects, and particularly, although not exclusively, to a system and method for determining a geo location of one or more objects in a plurality of images captured by a moving camera.

BACKGROUND

Systems for detecting one or more objects and determining the positions of the detected objects within in one or more captured images, or in a video stream is known. In current systems that utilise a fixed camera, the system understands the background and extracts the foreground. These systems may be used for road surveillance or in parking lots or other environments that have fixed cameras. The location of the camera is known and does not move. However, if the camera (or cameras) used to capture the images are moving there is no context for determining what the background is and how it has changed. Further there is limited context to differentiate between the camera movement and the movement of objects in the images. Current systems determining positions of objects cannot determine the exact location of objects in a changing scene.

Cameras captured images in 3D from a single point. This causes perspective error since the camera is capturing images from a perspective. The perspective error can increase due to movement of the camera. Global positioning systems e.g., positioning systems that use satellite navigation networks such as BeiDou, GPS, GLONASS, Galileo may be only accurate to a few meters under normal civilian or free to access licenses. This can often be too inaccurate. The moving camera contributes to further inaccuracy making it challenging to accurately determine the position or location of objects captured in an image.

SUMMARY OF THE INVENTION

In accordance with a first aspect, there is provided a system for position detection of one or more objects in images captured by a moving camera, comprising:

at least one camera configured to capture one or more images of a scene including one or more objects, wherein the at least one camera is moveable,

a computing apparatus operatively coupled to the at least one camera, the computing apparatus arranged in communication with a global positioning system,

the computing apparatus configured to:

receive one or more images from a camera, wherein each image is an image of a scene including one or more objects,

identify one or more reference features in the one or more images,

detect camera movement based on a change in a local position of the one or more reference features between consecutive images,

calculate a global position of the one or more reference features,

calculate a position of other objects in the one or more images based on the global position of the one or more reference features.

In one embodiment of the first aspect, the computing apparatus is further configured to:

identify a local position of the one or more reference features in a previous image to the current image,

calculate a local position of the one or more reference features in a current image by applying a movement transform to the local position of the one or more reference features, wherein the movement transform defines camera movement,

wherein the calculated local position of the one or more reference features in the current image corresponds to the change in the local position of the one or more reference features between consecutive images.

In an embodiment of the first aspect, the computing apparatus is configured to:

identify a timestamp of each received image,

arrange the images in a consecutive temporal order, and;

wherein each received image is a 3D perspective image of a scene including one or more objects.

In an embodiment of the first aspect, the computing apparatus is configured to:

apply a perspective transform to each received image to covert the received image into a 2D image,

determine a corresponding global position of the one or more reference features within each 2D image,

apply an inverse perspective transform to each 2D image to convert the 2D image into a 3D perspective image with the appropriate global position of each reference feature on the 3D perspective image.

In an embodiment of the first aspect, the computing apparatus is configured to: determine a global position of each reference feature within the current 2D image and previous 2D image.

In an embodiment of the first aspect, wherein the local position of each reference feature relates to a pixel coordinate of each reference feature within a 2D grid that is superimposed on each 2D image,

the global position is a geo location defined as coordinates, wherein the global position is accessed by the computing apparatus from a global positioning system, wherein the computing apparatus is configured to communicate with a global positioning system, and; wherein each reference feature is a fixed object having a known global position.

In an embodiment of the first aspect, the computing apparatus is configured to:

superimpose a 2D grid on each 2D image, wherein the local position and global position of each reference feature in the 2D image is defined in the 2D grid,

determine a local position of one or more other objects in the scene relative to the reference features in each 2D image based on the distance from reference feature defined in the 2D grid,

determine a global position of the one or more other objects using the relative local position of the one or more other objects to the reference features, and the global position of each reference feature.

In an embodiment of the first aspect, the computing apparatus is configured to:

detect movement of the camera and a direction travel of the camera,

determine if one or more new reference features is required for a received 3D perspective image, wherein one or more new reference features are required if one or more previously identified reference features are not visible in a new image,

if a new reference is required, locate one or more new reference features at an end of the received 3D image in the direction of the travel of the camera, wherein the one or more new reference features are identified by applying an object recognition process or an object recognition model to the received image,

apply a perspective transform to the received 3D image with the one or more new reference features identified therein and transforming the received 3D image to a 2D image,

determine a local position of the one or more new reference features in the 2D image, determine a global position of the one or more new reference features in the 2D image.

In an embodiment of the first aspect, wherein the movement and direction of travel of the camera is determined by the computing apparatus by comparing the change in the local position of one or more reference features in consecutive 2D transformed images, wherein the change in the local position of the reference features is indicative of movement and direction of travel of the camera.

In an embodiment of the first aspect, the computing apparatus is configured to:

identify background features in each received image by applying a background identification model or by determining one or more attributes that identify a background feature within the image, wherein background features are objects that are stationary,

identify one or more moving objects within the received images, wherein the moving objects are identified by applying an object recognition process to a plurality of consecutive images or by utilizing edge or color detection in consecutive images, wherein the one or more moving objects are defined as foreground features,

identify global positions of the background features in one or more transformed 2D image,

identify the global positions of the foreground features in one or more transformed 2D image, wherein the changes in the global positions are indicative of the movement of the foreground features. In accordance with a second aspect, there is provided a computer-implemented method for position detection of one or more objects in images captured by a moving camera, comprising the steps of:

receiving one or more images from a camera, wherein each image is an image of a scene including one or more objects;

identifying one or more reference features in the one or more images,

detecting camera movement based on a change in a local position of the one or more reference features between consecutive images,

calculating a global position of the one or more reference features,

calculating a position of other objects in the one or more images based on the global position of the one or more reference features.

In an embodiment of the second aspect, the method comprises the additional steps of:

identifying a local position of the one or more reference features in a previous image to the current image,

calculating a local position of the one or more reference features in a current image by applying a movement transform to the local position of the one or more reference features, wherein the movement transform defines camera movement,

In an embodiment of the second aspect, the method comprises the steps of:

identifying a timestamp of each received image,

arranging the images in a consecutive temporal order, and;

In an embodiment of the second aspect, the method comprises the steps of:

applying a perspective transform to each received image to covert the received image into a 2D image,

determining a corresponding global position of the one or more reference features within each 2D image,

applying an inverse perspective transform to each 2D image to convert the 2D image into a 3D perspective image with the appropriate global position of each reference feature on the 3D perspective image.

In an embodiment of the second aspect, the method comprises: determining a global position of each reference feature within the current 2D image and previous 2D image.

In an embodiment of the second aspect, wherein the local position of each reference feature relates to a pixel coordinate of each reference feature within a 2D grid that is superimposed on each 2D image, wherein the global position is a geo location defined as coordinates, wherein the global position is identified from a global positioning system, and; wherein each reference feature is a fixed object having a known global position.

In an embodiment of the second aspect, the method comprising the steps of:

superimposing a 2D grid on each 2D image, wherein the local position and global position of each reference feature in the 2D image is defined in the 2D grid,

determining a local position of one or more other objects in the scene relative to the reference features in each 2D image based on the distance from reference feature defined in the 2D grid,

determining a global position of the one or more other objects using the relative local position of the one or more other objects to the reference features, and the global position of each reference feature.

In an embodiment of the second aspect, the method comprising the steps of:

detecting movement of the camera and a direction travel of the camera,

determining if one or more new reference features is required for a received 3D perspective image, wherein one or more new reference features are required if one or more previously identified reference features are not visible in a new image,

locating one or more new reference features at an end of the received 3D image in the direction of the travel of the camera, wherein the one or more new reference features are identified by applying an object recognition process or an object recognition model to the received image,

applying a perspective transform to the received 3D image with the one or more new reference features identified therein and transforming the received 3D image to a 2D image,

determining a local position of the one or more new reference features in the 2D image, determining a global position of the one or more new reference features in the 2D image.

In an embodiment of the second aspect, wherein the movement and direction of travel of the camera is determined by comparing the change in the local position of one or more reference features in consecutive 2D transformed images, wherein the change in the local position of the reference features is indicative of movement and direction of travel of the camera.

In an embodiment of the second aspect, the method comprising the steps of:

identifying background features in each received image by applying a background identification model or by determining one or more attributes that identify a background feature within the image, wherein background features are objects that are stationary,

identifying one or more moving objects within the received images, wherein the moving objects are identified by applying an object recognition process to a plurality of consecutive images or by utilizing edge or color detection in consecutive images, wherein the one or more moving objects are defined as foreground features,

identifying global positions of the background features in one or more transformed 2D image,

identifying the global positions of the foreground features in one or more transformed 2D image, wherein the changes in the global positions are indicative of the movement of the foreground features.

In an embodiment of the second aspect, wherein the method is continuously repeated for all images captured by the camera to continuously determine the global position of one or more objects detected in each image captured by the camera.

In one example, the method of the second aspect is utilized to detect and identify one or more fixed road assets and their global positions.

In accordance with a third aspect, there is provided a system for position detection of one or more objects in one or more images captured by a moving camera comprising:

a data processing apparatus comprising processor and a memory operatively coupled to the processor, wherein the memory stores one or more computer readable and executable instructions, and;

wherein the processor configured to execute the instructions causing the processor to carry out the method according to the second aspect and its embodiments.

In one embodiment of the third aspect, the system may comprise one or more moveable cameras that are operatively coupled to the data processing apparatus and configured to transmit captured images to the data processing apparatus.

The term “image” as used herein may refer to a still image captured by a camera or may refer to a single frame in a video stream captured by a camera. The system and method for position detection as described herein is configured to be used for one or more still images or may be used on a video stream.

The term “comprising” (and its grammatical variations) as used herein are used in the inclusive sense of “having” or “including” and not in the sense of “consisting only of” .

BRIEF DESCRIPTION OF THE DRAWINGS

Example forms of the present invention will now be described, by way of example, with reference to the accompanying drawings in which:

Figure 1 illustrates an example embodiment of a system for position detection of one or more objects in an image captured by a moving camera.

Figure 2 illustrates an example computing apparatus of Figure 1.

Figure 3 illustrates a flow chart of a method for position detection of one or more objects in an image captured by a moving camera.

Figure 4 illustrates an example method for detecting camera movement.

Figure 5 illustrates example method of position detection of one or more objects in a 3D image.

Figure 6 and Figure 7 illustrate examples of the camera movement in 3D and 2D respectively.

Figures 8 to 12 illustrate an application of the method of position detection of one or more objects when from a moving camera.

Figure 13 illustrates an example output presented on a display illustrating the global positions of objects detected by a moving camera.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS

Referring to Figure 1, there is provided an example embodiment of a system 100 for position detection of one or more objects in one or more images captured by a moving camera. The system 100 can be used to identify various fixed objects e.g., lampposts, windows on buildings, road signs etc. and their respective positions. The system may identify global positions of these objects e.g., a geo position or geo location and/or relative positions to other objects. The system 100 may be used to track the motion of one or more moving objects e.g., vehicles with a moving camera.

Referring to Figure 1, the system 100 comprises at least one camera 102. The system 100 may comprise a plurality of cameras. The camera 102 may be a moveable camera for example the camera 102 may be mounted on a drone or on a robot, or the camera may be hand-held camera or body worn camera or a PTZ camera. In another example the camera may be a surveillance camera that may be moveable about an axis and may be mounted in a specific location. The camera 102 is configured to capture one or more images of a scene 10 with one or more objects (A, B, C and D) within it.

The system 100 further comprises a computing apparatus 200 that is operatively coupled to the camera 102 and is configured to receive captured images from the camera 102. The system 100 may optionally include a global positioning system 104 e.g., BeiDou, GPS, Galileo or GLONASS or any other global positioning, geo location system that may use satellite navigation networks, ground based networks, aerial navigation assistances or any combination of thereof.

The computing apparatus 200 configured to: receive one or more images from a camera, wherein each image is an image of a scene including one or more objects; identify one or more reference features in the one or more images, detect camera movement based on a change in a local position of the one or more reference features between consecutive images, calculate a global position of the one or more reference features, and; calculate a position e.g. . a global position of other objects in the one or more images based on the global position of the one or more reference features.

The system 100 may optionally comprise a display unit or a user device 106 e.g., a mobile device or a tablet or other device that is adapted to communicate with the computing apparatus. The calculated positions of the one or more objects may be displayed on the display unit or the user device. The system 100 is used to determine where relevant objects are relative to each other and also to an objective reality e.g., a global position of the objects, while the camera is moving, and the background image keeps changing.

As shown in Figure 2, the computing apparatus 200 is programmed to detect the position of one or more objects in a scene captured by a moving camera, wherein the position may be a global position e.g., a geo location. In this example embodiment, the computing apparatus 200 (i.e., computer or computing system or data processing apparatus) may be implemented by any computing architecture, including portable computers, tablet computers, stand-alone Personal Computers (PCs) , smart devices, Internet of Things (IOT) devices, edge computing devices, client/server architecture, “dumb” terminal/mainframe architecture, cloud-computing based architecture, or any other appropriate architecture. The computing apparatus may be appropriately programmed to implement a method for position detection of one or more objects in images captured by a moving camera.

As shown in Figure 2 the computing apparatus 200 includes suitable components necessary to receive, store and execute appropriate computer instructions. The computing apparatus 200 may be a server that is configured to wirelessly communicate with the at least one camera 102. The components may include a processing unit 202, including Central Processing Unit (CPU) , Math Co-Processing Unit (Math Processor) , Graphic Processing Unit (GPUs) or Tensor processing unit (TPUs) for tensor or multi-dimensional array calculations or manipulation operations, read-only memory (ROM) 204, random access memory (RAM) 206, and input/output devices such as disk drives 208, input devices 210 such as an Ethernet port, a USB port, etc. Optionally, the computing apparatus 200 may comprise a display 212 such as a liquid crystal display, a light emitting display or any other suitable display and communications links 214. The server 200 (i.e., computing apparatus 200) may include instructions that may be included ROM, RAM or disk drives and may be executed by the processing unit 202. There may be provided a plurality of communication links 214 which may variously connect to one or more cameras (e.g., camera 102) and/or a global positioning system. The communication links 214 may further connect to one more other devices such as personal computers, terminals, wireless or handheld computing devices, Internet of Things (IoT) devices, smart devices, edge computing devices or user devices e.g., mobile phones or tables. At least one of a plurality of communications link may be connected through a telephone line or a wireless communication link via a suitable network.

The computing apparatus 200 (i.e., server 200) may include storage devices such as a disk drive 208 which may encompass solid state drives, hard disk drives, optical drives, or remote or cloud-based storage devices. The server 200 may also have a suitable operating system which resides on the disk drive or in the ROM.

The computing apparatus 200 may also provide the necessary computational capabilities to operate or to interface with a machine learning network, such as a neural network, to provide various functions and outputs. The neural network may be implemented locally, or it may also be accessible or partially accessible via a server or cloud-based service. The machine learning network may also be untrained, partially trained or fully trained, and/or may also be retrained, adapted or updated over time. For example, the computing apparatus, may comprise a neural network or other model for object recognition in an image or background and/or foreground recognition in an image. The object recognition models may be applied to images to detect objects e.g., reference features or other artefacts or objects.

The computing apparatus 200 may be programmed to perform a method of position detection of one or more objects in images captured by a moving camera. The method may be defined as a computer program comprising instructions (i.e., computer readable and executable instructions) which, when the program is executed by the computing apparatus 200 (i.e., computer) , causes the computing apparatus to carry out the method of position detection of one or more objects. The computer program may be stored in a non-transitory computer readable medium e.g., ROM, RAM or a disk drive.

In one example the method of position detection of one or more objects in image captured by a moving camera comprises the steps of: receiving one or more images from a camera, wherein each image is an image of a scene including one or more objects; identifying one or more reference features in the one or more images, detecting camera movement based on a change in a local position of the one or more reference features between consecutive images, calculating a global position of the one or more reference features, and calculating a position of other objects in the one or more images based on the global position of the one or more reference features.

The method may comprise the additional steps of applying a perspective transform to each received image to covert the received image into a 2D image, determining a corresponding global position of the one or more reference features within each 2D image, and applying an inverse perspective transform to each 2D image to convert the 2D image into a 3D perspective image with the appropriate global position of each reference feature on the 3D perspective image.

Figure 3 illustrates a flow diagram of a computer-implemented method 300 for position detection of one or more objects in images captured by a moving camera. The computing apparatus 200 (i.e., data processing apparatus) may comprise a memory that stores one or more computer readable and executable instructions, wherein the processor 202 is configured to execute the instructions causing the processor 202 to carry out the method 300.

Referring to Figure 3, the method comprises receiving a plurality of images from one or more cameras 302. The one or more cameras may be moving. Each received image is a 3D perspective image of a scene including one or more objects. The perspective is created due to the camera capturing images from a single point.

Step 304 comprises arranging the received images in a temporal order. The timestamp of each received image may be identified and the images may be arranged in a consecutive temporal order based on the timestamps. Step 306 comprises identifying one or more reference features. The reference features are points in an image with a known global position e.g., a geo location or geo position. In one example reference features are fixed objects or structures, e.g., buildings, signal lights or other fixed structures having a known geo location. The geo location may be access from a suitable global positioning system.

Step 308 comprises applying a perspective transform to each received image to covert the received image from a 3D perspective image to a 2D image. The perspective transform may be a perspective matrix that is used to transform points of the 3D perspective image into x and y coordinates of a 2D image. An example of the perspective matrix is illustrated below:

In one example the perspective matrix may be obtained by matching the global position e.g., geo location coordinates of the reference features with the 3D local position e.g., pixel coordinates of the reference features in the 3D image.

Step 310 comprises determining a local position of each reference feature in each 2D image. Step 312 comprises identifying a global position for each reference feature in each 2D image. The local position is a pixel coordinate. The global position is a geo location defined as a coordinate. In one example, the global position is a latitude and longitude coordinate identified from a global positioning system.

Step 314 comprises detecting camera movement based on a change in the local position of the one or more reference features between temporally consecutive images. Step 314 comprises applying a movement transform to the local position in a previous image to determine the local position of the reference features in the current image.

Step 316 comprises determining a global position of the reference features in the current image 2D image after accounting for camera movement. The global position remains consistent for the reference features even if the camera moves. The local position changes. Step 318 comprises calculating a position of other objects in the one or more images based on the global position of the one or more reference features.

In one example, the global position of other objects in the scene may be calculated based on the relative position compared to the reference features e.g., the relative distance from the reference features. The method may comprise superimposing a 2D grid on each 2D image, wherein the local position and global position of each reference feature in the 2D image is defined in the 2D grid. The method may comprise determining a local position of one or more other objects in the scene relative to the reference features in each 2D image based on the distance from reference feature defined in the 2D grid. The position of the other objects may be calculated by determining a global position of the one or more other objects using the relative local position of the one or more other objects to the reference features, and the global position of each reference feature. Similarly, the global position of the reference features can be used to calculate the global position or geo location e.g., a GPS location of each point in image. This may be performed using the grid lines on the 2D images. In a further example the position of other objects may be calculated by using triangulation from a plurality of reference points, using the 2D grid lines as a guide.

Step 320 comprises applying an inverse perspective transform to each 2D image to convert the 2D image into a 3D perspective image with the appropriate global position of each reference feature on the 3D perspective image. The inverse perspective transform may be matrix that is an inverse of the perspective matrix. Step 322 comprises presenting on the mobile device or display the 3D image with global positions of objects in the image. The method 300 is continuously repeated for all images captured by the camera to continuously determine the global position of one or more objects detected in each image captured by the camera.

In one example the method 300 may operate based on determining the positions of at least four reference features. The method may at step 306 may identify at least four reference features. At least four reference features provide an adequate number of benchmark points to calculate the positions of other objects in the frame (i.e., image) . In other examples, the method 300 may operate using a plurality of reference points e.g., 3 or 6 reference points.

Figure 6 and Figure 7 illustrate an example camera movement. Figure 6 illustrates a camera 102 and movement e.g., panning and how the camera would capture a scene in 3D. The reference features P1, P2, P3 and P4 are illustrated on the grid. Figure 7 illustrates the transformed 2D image with camera movement and a grid overlaid on the 2D image. The reference features P1-P4 are shown. The camera movement and the camera capturing 3D images can make it challenging to determine global positions of the reference features and other objects.

The camera movement may be detected based on changes in the local position of the reference features. The reference features appear in consecutive captured images to move due to the camera moving. Camera movement may be pivoting about an axis (e.g., about an x, y or z axis) or moving in multiple axes or panning or tilting or may be zooming in or out.

Figure 4 illustrates an example method 400 for detecting camera movement. Step 402 comprises identifying a local position (e.g., a pixel coordinate) of one or more reference features in a previous image or previous frame. Step 404 comprises detecting a direction of travel of the camera. Step 406 comprises calculating a local position of the one or more reference features in the current image by applying a movement transform to the positions (e.g., pixel coordinates) or the previous image. The movement transform may be a homography matrix. The homography matrix may comprise the following two equations to determine x coordinate and y coordinate shift in combination with the perspective matrix. Example movement equations for x and y shift are shown below:

The output of the movement transform yields the new local position of the reference features in the current image. The new local positions are indicative of or correspond to the moved position and are indicative of camera movement.

In some instances, as the camera moves e.g., pans or zooms in original reference features may move out of the field of view of the camera. Camera view cones 602, 604, 606, are illustrated in Figure 6 and Figure 7. Each viewing cone corresponds to the view are of the camera when the camera is in one of the three illustrated orientations. Referring to Figure 6 and Figure 7, as the camera pans some of the reference features P1-P4 go out of view of the camera. This is particularly noticeable in the 2D image, where perspective error is accounted for. In these instances, the computing apparatus 200 is configured to locate new easily identifiable objects or artefacts. These objects may be identifiable fixed objects e.g., lampposts, traffic lights, traffic signs, transformers etc.

The method 400 may comprise the additional steps to determine new reference features. Step 408 comprises determining if one or more new reference features is required for a received 3D perspective image. Step 410 comprises locating one or more new reference features at an end of the received 3D image in the direction of the travel of the camera. The one or more new reference features are identified by applying an object recognition process or an object recognition model to the received image e.g., a neural network for object recognition. For example, the new objects may be detected based on edge detection, color contrast or any combination of attributes of the objects.

Method 400 may be a sub method or sub routine of method 300. Once new reference features are identified, the method may return to steps 308 and continue. Once new reference features are identified the computing apparatus 200 may be configured to apply a perspective transform to the received 3D image with the one or more new reference features identified therein to transform the received 3D image to a 2D image, determine a local position of the one or more new reference features in the 2D image and determine a global position of the one or more new reference features in the 2D image.

The movement and direction of travel of the camera is determined by comparing the change in the local position of one or more reference features in consecutive 2D transformed images, wherein the change in the local position of the reference features is indicative of movement and direction of travel of the camera.

Over time as the camera is moved, a plurality (e.g., four) new reference features may be identified and used to determine the positions e.g., global positions of other objects in the captured images. In one example, the computing apparatus 200 may be configured to utilize at least four reference features with known geo locations to determine the positions of other objects or artefacts in the images. The system 100 may utilize at least one reference feature with a known global position to determine the positions of other objects or artefacts. The 2D grid may be a global position grid e.g., acquired from a global positioning system. For example, the 2D grid may correspond to latitude and longitudes on earth. In another example the 2D grid may be customized or a more granular 2D grid may be overlaid on top of the latitude and longitude grid.

In one example, the computing apparatus 200 may further be configured to detect noise and motion within images. Noise is essentially, the foreground image. The computing apparatus 200 may be configured to remove noise from received images as an initial processing step by applying noise filtering methods such as for example RANSAC (Random Sample Consensus) . The computing apparatus 200 may be configured to identify everything that moves relative to the background. These moving objects may be masked and/or eliminated from background calculations. Stationary foreground items may not be filtered, as they may be viewed as part of the background.

The computing apparatus 200 may optionally be configured to identify background features in each received image by applying a background identification model or by determining one or more attributes that identify a background feature within the image. For example, background features may be identified by applying a deep learning model or a neural network trained to detect stationary objects. The computing apparatus 200 may be configured to identify one or more moving objects by applying an object recognition model or process to consecutive images, or by utilizing edge or color detection in consecutive images. The moving objects may be classified as a foreground features. The apparatus 200 may further be configured to identify global positions of the background features in one or more transformed 2D image, and

identify the global positions of the foreground features in one or more transformed 2D image. The changes in the global positions are indicative of the movement of the foreground features. The steps described above for background and foreground detection may be applied as part of method 300. The foreground detection can be used to identify moving objects such as for example vehicles or people or other moving objects.

The system and method as described herein provides an understanding of what is happening in a scene and exactly where relevant objects are relative to each other and relative to an objective reality while the camera is moving, and the background image keeps changing. The local position may provide relative information, and the global position can provide relative position and position relative to an objective reality.

The system and method for position detection as described herein may be used in several applications. One example application is to detect and monitor road assets such as lamp poles, road signs, fire hydrants and/or other infrastructure items of importance. The global position of these objects can be automatically detected, labelled and tracked. The status and condition of these assets may be tracked using the system 100. The system and method may be used to track vehicles e.g., fire engines or other vehicles. The camera 102 may be mounted on a drone or other vehicle to capture images or a video stream. For example, on an initial trip the camera may pass over a path to detect road assets. At the end of the trip a human may also identify specific assets of interest. On a future trip the vehicle with the camera may follow a specific path or move to a specific geo location to check on specific identified assets (e.g., user defined assets) and identify any issues with the assets. The system and method may be used for other applications.

Figure 5 illustrates an example method of position detection of one or more objects in a 3D image, as executed by the computing apparatus 200. Referring to Figure 5, a first 3D image 502 is received. The reference features 520, 522, 524, 526 are identified in the 3D image. A perspective transform (e.g., like described earlier) may be applied to transform the 3D perspective image 502 into a 2D image 504. The local positions of the reference features may be determined in the 3D image and these local positions may be transformed and calculated in the 2D image, as denoted by the arrows. 2D grid 510 may be superimposed onto the 2D image as shown in image 506. The grid lines of grid 510 may correspond to latitudes and longitudes. A global position e.g., a geo location of the reference features 520-526 may be determined in 2D image. The local positions and global positions of the other objects in the images may also be calculated based on the relative distance and relative position to the reference features 520-526. The 2D image is transformed back into the 3D image 508 by applying an inverse perspective transform. The image 508 may be 3D image that includes a grid 510 and global positions of the reference features and other objects.

Figures 8 to 12 illustrate an example of the method of position detection e.g., geo location detection of one or more objects when from a moving camera. Figure 8 illustrates an initial image (i.e., a previous image) 802. The reference features p11, p12, p13 and p14 in their pre-set positions are illustrated. Figure 9 illustrates a current image 902 with the reference features p11-p14 illustrated thereon. The camera has moved between image 802 and 902. The pixel coordinates of p11-p14 can be determined in the previous image, and a movement transform may be applied to calculate the new pixel coordinates. The pixel coordinates have shifted as shown in Figure 9. For clarity, an old coordinate point A has shifted to A’ from image 802 to image 902.

Referring to Figure 8 and 9, the movement transform process my involve connecting the reference features to form a quadrilateral, and applying the movement equations may be applied to draw the quadrilateral. The movement equations calculate the moved x and y coordinates of features p11-p14. The movement matrix e.g., homography matrix may be obtained by applying SIFT, ORB and Opticlaflow to the previous and current images to compare images and obtain the homography matrix.

The global positions do not change as the camera moves, and only the local positions change. Figure 10 illustrates a 2D image of the previous image 1002 with the 2D grid 1010 superimposed thereon. The reference features are not labelled for clarity. The grid 1010 may be used to determine the global positions as the grid corresponds to latitudes and longitudes. The global positions may be obtained from a global positioning system. Both Figure 8 and 9 have been transformed from a 3D image to a 2D image by applying the perspective matrix. The perspective matrix may be obtained by converting the changed pixel coordinates of the reference features from the 3D image to a 2D image by substituting the geo location coordinates. An example of the perspective matrix is shown below:

The global position e.g., geo location coordinates may be extracted for pixel coordinates of specific objects for moved images e.g., images 802, 902.

A 2D image 1102 of the current frame is shown in Figure 11. The grid 1010 are superimposed onto image 1102. The global positions of the reference features p11-p14 are obtained. The image may be transformed by applying an inverse perspective transform to a 3D image 1202 as shown in Figure 12. The obtained coordinates of other objects may be plotted onto the 3D image 1202 or may be plotted on a 2D image e.g., a google maps image. The perspective matrix used allows global positions to be obtained even with a moving camera.

Figure 13 illustrates an example output that can be presented on a display e.g., display 212 or a mobile device . The outputs of the system and method as per the present invention may be a Google Map or other 2D map image from a global positioning system with the global position of an object of interest. Referring to Figure 13, there is shown for a location a first original image 1302 as captured by the moving camera. A specific object 1310 is illustrated with its global position coordinates. The 3D image may be transformed into a 2D image 1304 with the object 1310 and its position 1306 are illustrated on image 1304. The position is a global position e.g., defined as geo location coordinates. Image 1304 may be presented on a display or a mobile device.

Image 1402 is an original image captured by the camera after it has moved. Image 1404 illustrates a 2D image with the detected objects 1410, 1412 and their global coordinates 1406, 1408, after accounting for camera movement. The global positions may be calculated using the method 300. Image 1404 may be presented on a display e.g., on a mobile device.

The system and method for position detection provide an improved position detection. The system as described provides an improved geo location than GPS locators. The grid used on the 2D images provides a more accurate way to determine the exact geo location of one or more other objects in relation the known position of the reference features. The system and method for position detection provide an approach that results in an accurate location of an object or artefact in captured images without having to rely on a GPS device which can be inaccurate or have limited accuracy to a several meters.

The system and method for position detection is also advantageous as it corrects for perspective error due to images being captured from a single camera. The system can be used with a single camera making it easier and cheaper to use than having multiple cameras. The system and method described herein also provides accurate positions of objects in images captured by a moving camera. The system and method described herein can be used for object tracking or asset management. Some example uses are for monitoring fixed infrastructure, road infrastructure, buildings, and/or monitoring an area for various objects. Other uses are contemplated.

Although not required, the embodiments described with reference to the Figures can be implemented as an application programming interface (API) or as a series of libraries for use by a developer or can be included within another software application, such as a terminal or personal computer operating system or a portable computing device operating system. Generally, as program modules include routines, programs, objects, components and data files assisting in the performance of particular functions, the skilled person will understand that the functionality of the software application may be distributed across a number of routines, objects or components to achieve the same functionality desired herein.

It will also be appreciated that where the methods and systems of the present invention are either wholly implemented by computing system or partly implemented by computing systems then any appropriate computing system architecture may be utilised. This will include stand alone computers, network computers and dedicated hardware devices. Where the terms “computing system” and “computing device” are used, these terms are intended to cover any appropriate arrangement of computer hardware capable of implementing the function described.

Any reference to prior art contained herein is not to be taken as an admission that the information is common general knowledge, unless otherwise indicated.

It will be appreciated by persons skilled in the art that numerous variations and/or modifications may be made to the invention as shown in the specific embodiments without departing from the spirit or scope of the invention as broadly described. The present embodiments are, therefore, to be considered in all respects as illustrative and not restrictive.

Claims

A system for position detection of one or more objects in images captured by a moving camera, comprising:

at least one camera configured to capture one or more images of a scene including one or more objects, wherein the at least one camera is moveable,

a computing apparatus operatively coupled to the at least one camera, the computing apparatus arranged in communication with a global positioning system,

the computing apparatus configured to:

receive one or more images from a camera, wherein each image is an image of a scene including one or more objects,

identify one or more reference features in the one or more images,

detect camera movement based on a change in a local position of the one or more reference features between consecutive images,

calculate a global position of the one or more reference features,

calculate a position of other objects in the one or more images based on the global position of the one or more reference features.
The system of claim 1 wherein the computing apparatus is further configured to:

identify a local position of the one or more reference features in a previous image to the current image,

calculate a local position of the one or more reference features in a current image by applying a movement transform to the local position of the one or more reference features, wherein the movement transform defines camera movement,

wherein the calculated local position of the one or more reference features in the current image corresponds to the change in the local position of the one or more reference features between consecutive images.
The system of claim 2 wherein the computing apparatus is configured to:

identify a timestamp of each received image,

arrange the images in a consecutive temporal order, and;

wherein each received image is a 3D perspective image of a scene including one or more objects.
The system of claim 2 wherein the computing apparatus is configured to:

apply a perspective transform to each received image to covert the received image into a 2D image, wherein the perspective transform comprises a perspective matrix,

determine a corresponding global position of the one or more reference features within each 2D image,

apply an inverse perspective transform to each 2D image to convert the 2D image into a 3D perspective image with the appropriate global position of each reference feature on the 3D perspective image.
The system of claim 4 wherein the computing apparatus is configured to:

determine a global position of each reference feature within the current 2D image and previous 2D image.
The system of claim 5 wherein the local position of each reference feature relates to a pixel coordinate of each reference feature within a 2D grid that is superimposed on each 2D image,

the global position is a geo location defined as coordinates, wherein the global position is accessed by the computing apparatus from a global positioning system, wherein the computing apparatus is configured to communicate with a global positioning system, and;

wherein each reference feature is a fixed object having a known global position.
The system of claim 6 wherein the computing apparatus is configured to:

superimpose a 2D grid on each 2D image, wherein the local position and global position of each reference feature in the 2D image is defined in the 2D grid,

determine a local position of one or more other objects in the scene relative to the reference features in each 2D image based on the distance from reference feature defined in the 2D grid,

determine a global position of the one or more other objects using the relative local position of the one or more other objects to the reference features, and the global position of each reference feature.
The system of claim 7 wherein the computing apparatus is configured to:

detect movement of the camera and a direction travel of the camera,

determine if one or more new reference features is required for a received 3D perspective image, wherein one or more new reference features are required if one or more previously identified reference features are not visible in a new image,

if a new reference is required, locate one or more new reference features at an end of the received 3D image in the direction of the travel of the camera, wherein the one or more new reference features are identified by applying an object recognition process or an object recognition model to the received image,

apply a perspective transform to the received 3D image with the one or more new reference features identified therein and transforming the received 3D image to a 2D image, wherein the perspective transform comprises a transform matrix defined as:

wherein, x” defines the transformed x coordinate and y” defines the transformed y coordinate, determine a local position of the one or more new reference features in the 2D image, and determine a global position of the one or more new reference features in the 2D image.
The system of claim 8 wherein the movement and direction of travel of the camera is determined by the computing apparatus by comparing the change in the local position of one or more reference features in consecutive 2D transformed images, wherein the change in the local position of the reference features is indicative of movement and direction of travel of the camera.
The system of claim 9 wherein the computing apparatus is configured to:

identify background features in each received image by applying a background identification model or by determining one or more attributes that identify a background feature within the image, wherein background features are objects that are stationary,

identify one or more moving objects within the received images, wherein the moving objects are identified by applying an object recognition process to a plurality of consecutive images or by utilizing edge or color detection in consecutive images, wherein the one or more moving objects are defined as foreground features,

identify global positions of the background features in one or more transformed 2D image,

identify the global positions of the foreground features in one or more transformed 2D image, wherein the changes in the global positions are indicative of the movement of the foreground features.
A computer-implemented method for position detection of one or more objects in images captured by a moving camera, comprising the steps of:

receiving one or more images from a camera, wherein each image is an image of a scene including one or more objects,

identifying one or more reference features in the one or more images,

detecting camera movement based on a change in a local position of the one or more reference features between consecutive images,

calculating a global position of the one or more reference features,

calculating a position of other objects in the one or more images based on the global position of the one or more reference features.
The method of claim 11 comprising the additional steps of:

identifying a local position of the one or more reference features in a previous image to the current image,

calculating a local position of the one or more reference features in a current image by applying a movement transform to the local position of the one or more reference features, wherein the movement transform defines camera movement,

wherein the calculated local position of the one or more reference features in the current image corresponds to the change in the local position of the one or more reference features between consecutive images.
The method of claim 12 wherein the method comprises the steps of:

identifying a timestamp of each received image,

arranging the images in a consecutive temporal order, and;

wherein each received image is a 3D perspective image of a scene including one or more objects.
The method of claim 12 wherein the method comprises the steps of:

applying a perspective transform to each received image to covert the received image into a 2D image,

determining a corresponding global position of the one or more reference features within each 2D image,

applying an inverse perspective transform to each 2D image to convert the 2D image into a 3D perspective image with the appropriate global position of each reference feature on the 3D perspective image.
The method of claim 14 wherein the method comprises:

determining a global position of each reference feature within the current 2D image and previous 2D image.
The method of claim 15 wherein the local position of each reference feature relates to a pixel coordinate of each reference feature within a 2D grid that is superimposed on each 2D image,

the global position is a geo location defined as coordinates, wherein the global position is identified from a global positioning system, and;

wherein each reference feature is a fixed object having a known global position.
The method of claim 16 comprising the steps of:

superimposing a 2D grid on each 2D image, wherein the local position and global position of each reference feature in the 2D image is defined in the 2D grid,

determining a local position of one or more other objects in the scene relative to the reference features in each 2D image based on the distance from reference feature defined in the 2D grid,

determining a global position of the one or more other objects using the relative local position of the one or more other objects to the reference features, and the global position of each reference feature.
The method of claim 17 comprising the steps of:

detecting movement of the camera and a direction travel of the camera,

determining if one or more new reference features is required for a received 3D perspective image, wherein one or more new reference features are required if one or more previously identified reference features are not visible in a new image,

locating one or more new reference features at an end of the received 3D image in the direction of the travel of the camera, wherein the one or more new reference features are identified by applying an object recognition process or an object recognition model to the received image,

applying a perspective transform to the received 3D image with the one or more new reference features identified therein and transforming the received 3D image to a 2D image,

determining a local position of the one or more new reference features in the 2D image,

determining a global position of the one or more new reference features in the 2D image.
The method of claim 18 wherein the movement and direction of travel of the camera is determined by comparing the change in the local position of one or more reference features in consecutive 2D transformed images, wherein the change in the local position of the reference features is indicative of movement and direction of travel of the camera.
The method of claim 19 comprising the steps of:

identifying background features in each received image by applying a background identification model or by determining one or more attributes that identify a background feature within the image, wherein background features are objects that are stationary,

identifying one or more moving objects within the received images, wherein the moving objects are identified by applying an object recognition process to a plurality of consecutive images or by utilizing edge or color detection in consecutive images, wherein the one or more moving objects are defined as foreground features,

identifying global positions of the background features in one or more transformed 2D image,

identifying the global positions of the foreground features in one or more transformed 2D image, wherein the changes in the global positions are indicative of the movement of the foreground features.