WO2018061010A1

WO2018061010A1 - Point cloud transforming in large-scale urban modelling

Info

Publication number: WO2018061010A1
Application number: PCT/IL2017/051100
Authority: WO
Inventors: Iovav Cohen; Dror Ouzana
Original assignee: Pixtier Maps Ltd
Current assignee: Pixtier Maps Ltd
Priority date: 2016-09-28
Filing date: 2017-09-28
Publication date: 2018-04-05
Anticipated expiration: 2019-03-28

Abstract

A computerized method comprising using at least one hardware processor for adjusting a digital urban three-dimensional model by receiving a plurality of digital image files, and for each computing initial camera parameters, segmenting each to one of a plurality of map tiles that overlap each other, and for each map tile computing a structure from motion analysis (SFM). The method further comprises, for at least one map tile, computing an alignment transformation based on iteratively computing a discrepancy score of the overlaps between that map tile and the SFM values from surrounding map tiles. The method further comprises, for at least one map tile, generating a transformed SFM data values based on the alignment transformation. The method further comprises, for at least one map tile, computing a three- dimensional (3D) urban model of that map tile based on the transformed SFM values.

Description

POINT CLOUD TRANSFORMING IN LARGE-SCALE URBAN MODELLING

BACKGROUND

[0001] The invention relates to the three-dimensional (3D) mapping, and more specifically to generating 3D urban models from images.

[0002] Three dimensional (3D) urban models are digital models of urban areas showing terrain surfaces, buildings, roads, and the like. Components of the model may be encoded in vector format and stored in a database, optionally with texturing images for presentation of the models on a display of a user interface, semantic metadata for names of objects, and/or the like.

[0003] A 3D urban model may comprise different levels of details (LOD) to allow different levels of abstraction and resolution. Spatio-semantic coherence, resolution of the texture, and the like may be considered a part of the LOD. For example, CityGML defines five LODs for building models: LOD 0: 2.5D footprints; LOD 1 : Buildings represented by block models (usually extruded footprints); LOD 2: Building models with standard roof structures; LOD 3: Detailed (architectural) building models; and, LOD 4: LOD 3 building models supplemented with interior features.

[0004] A 3D urban model may comprise geographical information system (GIS) data of base information, such as by digital terrain models, road networks, land use maps, and related geo-referenced data. GIS data may also include cadastral data that may be converted into simple 3D models as, for example, in the case of building footprints. Core components of 3D urban models form digital terrain models (DTM) represented, for example, by TINs or grids.

[0005] A 3D urban model may comprise computer-aided drafting (CAD) data, such as models of buildings, sites, and infrastructure elements. CAD data may provide a high level of detail, possible not required by 3D city model applications, but may be incorporated either by exporting their geometry or as encapsulated objects.

[0006] Building information models (BIM) data may represent another category of geo- spatial data that may be integrated into a 3D urban model providing the highest level of detail for building components. [0007] A building model construction may comprise extruding the footprint polygons of buildings, e.g., taken from the cadaster, by pre-computing average building heights. In practice, 3D models of buildings of urban regions may be generated by capturing and analyzing 3D point clouds (e.g., sampled by terrestrial or aerial laser scanning) or by photogrammetric approaches. To achieve a high percentage of geometrically and topologically correct 3D building models, digital terrain surfaces and 2D footprint polygons may be required by automated building reconstruction tools such as BREC. Statistical approaches are common for roof reconstruction based on airborne laser scanning point clouds.

[0008] Fully automated processes may exist to generate LOD1 and LOD2 building models for large regions. For example, the Bavarian Office for Surveying and Spatial Information is responsible for about 8 million building models at LOD1 and LOD2.

[0009] Structure from Motion (SFM) techniques may estimate the parameters of a set of cameras, such as position, orientation, focal length, distortion, and/or the like, and may estimate the 3D position of objects observed by them. This may be done by computing the relationships between images and estimating initial cameras poses, camera positions, camera poses, camera intrinsic parameters, and/or the like.

[0010] Camera pose computation may be performed incrementally or globally. Incremental camera pose computation allows unsolved cameras to be introduced iteratively and their initial pose is estimated using the already solved cameras. Global camera pose computation may be performed on the entire set of image files simultaneously. Each of these techniques may use bundle adjustment optimization to decrease the re-projection error.

[0011] The foregoing examples of the related art and limitations related therewith are intended to be illustrative and not exclusive. Other limitations of the related art will become apparent to those of skill in the art upon a reading of the specification and a study of the figures. SUMMARY

[0012] The following embodiments and aspects thereof are described and illustrated in conjunction with systems, tools and methods which are meant to be exemplary and illustrative, not limiting in scope.

[0013] There is provided, in accordance with some embodiments, a computerized method comprising using one or more hardware processors for receiving two or more digital image files, each comprising a digital image depicting a geographical location. For each digital image file, computing a camera location, camera intrinsic parameter values, and/or a camera pose that acquired the digital image and two or more structural feature locations depicted in the digital image. The computerized method further comprises an action of segmenting the digital image files according to respective camera pose and geographical location to one of two or more map tiles, wherein each map tile is associated with a subset of the digital image files, and wherein the map tiles overlap each other. The computerized method further comprises an action of computing, for each map tile, a structure from motion analysis (SFM) on the subset to produce a point cloud for that map tile, wherein the SFM additionally produces (a) a refined camera pose and (b) refined camera intrinsic parameter values, for each digital image file. The computerized method further comprises an action of, for one or more map tile, computing an alignment transformation based on iteratively computing a discrepancy score of the overlaps between that map tile and the SFM values from surrounding map tiles, wherein the iteratively computing results in a decrease of the discrepancy score. The computerized method further comprises an action of, for one or more map tile, generating a transformed SFM values based on the alignment transformation. The computerized method further comprises an action of, for one or more map tile, computing a three-dimensional (3D) urban model of that map tile based on the transformed SFM values.

[0014] According to some embodiments, the computerized method further comprises bundle adjusting the SFM values of the map tiles and/or the overlap.

[0015] According to some embodiments, the computerized method further comprises using the one or more hardware processor for model aligning at least some of the 3D urban models, each associated with the respective map tile, to produce a large-scale 3D urban model.

[0016] According to some embodiments, the computerized method further comprises separating at least some of the 3D urban models into two or more building models and two or more terrain models, and wherein the model aligning is performed separately for the terrain models.

[0017] According to some embodiments, the computerized method further comprises separating at least some of the 3D urban models into two or more building models and two or more terrain models, and wherein the model aligning of the building models is performed using bundle adjustment of the transformed SFM value subset associated with the building models.

[0018] According to some embodiments, wherein one or more of the map tiles is sized and shaped to match one or more of: (i) specific features at the borders of that map tile, and (ii) the number of points in the point cloud at the structural feature locations.

[0019] There is provided, in accordance with some embodiments, a computerized system comprising one or more hardware processor and a non-transitory computer readable storage medium, having program code stored thereon. The program code is configured, when executed on the one or more hardware processor, to receive two or more digital image files, each comprising a digital image depicting a geographical location. For each digital image file, the program code is configured to compute a camera location, camera intrinsic parameter values, and/or a camera pose that acquired the digital image and two or more structural feature locations depicted in the digital image. The program code is configured to segment the digital image files according to respective camera pose and geographical location to one of two or more map tiles, wherein each map tile is associated with a subset of the digital image files, and wherein the map tiles overlap each other. The program code is configured to compute, for each map tile, a structure from motion analysis (SFM) on the subset to produce a point cloud for that map tile, wherein the SFM additionally produces (a) a refined camera pose and (b) refined camera intrinsic parameter values, for each digital image file. The program code is configured to, for one or more map tile, compute an alignment transformation based on iteratively computing a discrepancy score of the overlaps between that map tile and the SFM values from surrounding map tiles, wherein the iteratively computing results in a decrease of the discrepancy score. The program code is configured to, for one or more map tile, generate a transformed SFM values based on the alignment transformation. The program code is configured to, for one or more map tile, compute a three-dimensional (3D) urban model of that map tile based on the transformed SFM values.

[0020] According to some embodiments, the computerized system further comprises an action of bundle adjusting the SFM values of the map tiles and/or the overlap.

[0021] According to some embodiments, the computerized system further comprises program code configured to model align at least some of the 3D urban models, each associated with the respective map tile, to produce a large-scale 3D urban model.

[0022] According to some embodiments, the computerized system further comprises program code configured to separate at least some of the 3D urban models into two or more building models and two or more terrain models, and wherein the model aligning is performed separately for the terrain models.

[0023] According to some embodiments, the computerized system further comprises program code configured to separating at least some of the 3D urban models into two or more building models and two or more terrain models, and wherein the model aligning of the building models is performed using bundle adjustment of the transformed SFM value subset associated with the building models.

[0024] There is provided, in accordance with some embodiments, a computer program product, the computer program product comprising a non-transitory computer-readable storage medium having program code embodied therewith. The program code executable by one or more hardware processor to receive two or more digital image files, each comprising a digital image depicting a geographical location. The program code executable by one or more hardware processor to, for each digital image file, compute one or more of a camera location, camera intrinsic parameter values, and a camera pose that acquired the digital image and two or more structural feature locations depicted in the digital image. The program code executable by one or more hardware processor to segment the digital image files according to respective camera pose and geographical location to one of two or more map tiles, wherein each map tile is associated with a subset of the digital image files, and wherein the map tiles overlap each other. The program code executable by one or more hardware processor to compute, for each map tile, a structure from motion analysis (SFM) on the subset to produce a point cloud for that map tile, wherein the SFM additionally produces (a) a refined camera pose and (b) refined camera intrinsic parameter values, for each digital image file. The program code executable by one or more hardware processor to, for one or more map tile, compute an alignment transformation based on iteratively computing a discrepancy score of the overlaps between that map tile and the SFM values from surrounding map tiles, wherein the iteratively computing results in a decrease of the discrepancy score. The program code executable by one or more hardware processor to, for one or more map tile, generate a transformed SFM values based on the alignment transformation. The program code executable by one or more hardware processor to, for one or more map tile, compute a three-dimensional (3D) urban model of that map tile based on the transformed SFM values.

[0025] According to some embodiments, the alignment transformation is computed according to two or more sub-transformations, each respective sub-transformation computed from one of two or more discrepancy scores.

[0026] According to some embodiments, the computer program product further comprises program code configured to bundle adjust the SFM values of the map tiles and/or the overlap.

[0027] According to some embodiments, the computer program product further comprises program code configured to model align at least some of the 3D urban models, each associated with the respective map tile, to produce a large-scale 3D urban model.

[0028] According to some embodiments, the computer program product further comprises program code configured to separate at least some of the 3D urban models into two or more building models and two or more terrain models, and wherein the model aligning is performed separately for the terrain models.

[0029] According to some embodiments, the computer program product further comprises program code configured to separating at least some of the 3D urban models into two or more building models and two or more terrain models, and wherein the model aligning of the building models is performed using bundle adjustment of the transformed SFM value subset associated with the building models.

[0030] According to some embodiments, the discrepancy score is based on the point cloud, the camera poses, and/or the camera intrinsic parameter values associated with the map tile overlap.

[0031] According to some embodiments, the overlap is between 1% and 20% of a linear dimension distance substantially orthogonal to an edge of the map tile at that overlap location.

[0032] According to some embodiments, the overlap is sized and shaped such that between 1% and 20% of the digital image files of that tile depict the map tile overlap.

[0033] According to some embodiments, the overlap is sized and shaped to match one or more of: (i) specific features at the borders of that map tile, and (ii) the number of points in the point cloud at the structural feature locations.

[0034] According to some embodiments, each map tile is sized and shaped to match one or more of: (i) specific features at the borders of that map tile, and (ii) the number of points in the point cloud at the structural feature locations.

[0035] According to some embodiments, one or more of the map tiles is shaped at least in part as one or more of a square, a rectangle, a circle, a triangle, a trapezoid, a parallelepiped, a polygon, and a blob, and wherein the shape is selected according to the structural feature locations.

[0036] The above method may be implemented as a computerized method, incorporated into a computerized system, as a computer program product, as a software-as-a-service, and/or the like.

[0037] In addition to the exemplary aspects and embodiments described above, further aspects and embodiments will become apparent by reference to the figures and by study of the following detailed description. BRIEF DESCRIPTION OF THE FIGURES

[0038] Exemplary embodiments are illustrated in referenced figures. Dimensions of components and features shown in the figures are generally chosen for convenience and clarity of presentation and are not necessarily shown to scale. The figures are listed below.

[0039] FIG. 1 is a schematic illustration of a system for tiling and point cloud transformations in 3D urban modeling;

[0040] FIG. 2A is a flowchart of a method for gridding and point cloud transformations in 3D urban modeling;

[0041] FIG. 2B is a flowchart of a second method for gridding and point cloud transformations in 3D urban modeling;

[0042] FIG. 3A is a flowchart of a method for map tile bundle adjustment using map tile overlap discrepancy scores;

[0043] FIG. 3B is a flowchart of a method for point cloud classification;

[0044] FIG. 4 is a schematic illustration of a vertical view of a map tile overlap;

[0045] FIG. 5 is a schematic illustration of a top view of a map tile overlap;

[0046] FIG. 6 is a schematic illustration of a map tile overlap and camera pose discrepancy;

[0047] FIG. 7 is a schematic illustration of a global coordinate system; and

[0048] FIG. 8 is a schematic illustration of map tile overlap weighting values.

DETAILED DESCRIPTION

[0049] Disclosed herein are methods, systems, and computer program products for generating a three dimensional (3D) urban model using tiling and point cloud transformations. Images of urban areas are received from one or more cameras, such as ground-based cameras, air-borne cameras, car mounted cameras, and/or the like. The images may also comprise a global positioning system (GPS) tag indicating the geographical location of the image. The images may be analyzed using structure from motion (SFM) techniques to calculate camera poses of each image, image coordinates of structure features in each image, camera intrinsic parameters, and/or the like. The images are arranged according to geographical location in overlapping map tiles, where a 3D model is computed for each tile separately and the 3D models of all tiles are combined later. As used herein, the term tile means a geographical map tile. By transforming the camera poses and/or the point cloud data according to one or more discrepancy scores, the quality of the 3D models in each tile as well as the combining of the 3D model overlaps across tiles is improved significantly relative to a standard SFM techniques.

[0050] The size and shape of the map tiles and the amount of overlap may be determined using a uniform map tile and overlap size, using features of the image, using the number of images available in each tile, and//or the like. For example, the overlap is between 1% and 20% of a linear dimension distance substantially orthogonal to an edge of the map tile at that overlap location, such that any linear dimension has a minimum amount of overlap. For example, the overlap is sized and shaped such that between 1% and 20% of the digital image files of that tile are depicted on the map tile overlap, such as the map tile has 1000 images and 20% of those images are in the overlap region with neighboring tiles. In some examples, the percentage of overlap (according to any of the criteria) may be 5%, 10%, 15%, 20%, 25%, 30%, or the like, depending on the images available, features depicted, and/or the like. For example, the overlap and/or the map tile are sized and shaped to match features of the 3D urban model, images features, the number of points in the point cloud, and/or the like. For example, at least part of each map tile is shaped as a square, a rectangle, a circle, a triangle, a trapezoid, a parallelepiped, a polygon, a blob, and/or the like.

[0051 ] The transformed point clouds for each tile may be segmented into building points and terrain points. The building points undergo bundle adjustment and 3D modelling separately from the 3D modelling of the terrain points. The two 3D models, terrain and builds, may be combined for the final 3D urban model.

[0052] The discrepancy scores may be cost functions that minimize the discrepancy of the point clouds of each tile at the tiles overlaps. The discrepancy score cost function is minimized to determine a transformation of the camera poses and/or point cloud data. Optionally, a coherence score benefit function is maximized to determine a transformation of the camera poses and/or point cloud data. The discrepancy scores may be a function of two adjacent tile overlaps, a corner overlap of four tiles, a 4-side overlap of a tile (5 tiles total), and the like. Optionally, the overlap is a sensitive geographical area that needs a higher accuracy 3D model and more tiles participate in the discrepancy score function.

[0053] The technique further allows incremental addition of new images to a 3D urban model without completely re-computing the 3D model. The map tile(s) of the new images to be added are located, and only the specific map tiles may be recomputed, thereby allowing a crowdsourcing-like approach to addition of new images from multiple cameras, vehicles, image qualities, weather conditions, lighting conditions, and/or the like. These improvements over conventional techniques allow the efficient computation of high-quality 3D models on a large scale, such as a city-scale, country-scale, region-scale, global-scale, and/or the like.

[0054] Reference is now made to FIG. 1, which is a schematic illustration of a system 100 for tiling and point cloud transformations in 3D urban modeling. System 100 comprises one or more hardware processors 101 for executing processor instructions stored in modules on a storage medium 102, such as a non-transitory computer-readable storage medium. A Structure from Motion (SFM) Analyzer 102A receives digital image files from one or more camera systems (130) through a network interface 110, such as a group of files from each camera system, and processes the files to determine for each one a camera pose, image features, camera intrinsic parameters, and/or the like. For example, a camera system 130 is a video system aboard a drone that flies over an urban area to create a 3D urban model. For example, a camera system 130 is a vehicle mounted video system. For example, a camera system 130 is a series of end user photographs, such as a crowdsourced camera system. For example, each digital image file comprises a digital image and metadata related to the camera intrinsic parameters, camera position, camera location, camera pose, and/or the like. SFM Analyzer 102A may produce a refined camera pose, a refined camera intrinsic parameter values, a refined point cloud, and/or the like.

[0055] A map tiler 102B separates files into geographical map tiles, where each map tile may be processed separately, such as in parallel by hardware processors 101. The map tiles of adjacent tiles overlap each other. The size of the map tiles and the size of the overlap may be determined by the number of image files, the existence of a previous 3D urban model, the quality of the existing model and image files, the desired accuracy of the model, and/or the like. Point clouds may be calculated from the image feature coordinates and camera poses, such as by triangulation.

[0056] A discrepancy transformer 102C may transform the camera pose coordinates and point cloud coordinates to new coordinates based on a discrepancy function computed for the overlap between tiles. For example, the transformation using the discrepancy function smooths the transition of the 3D urban model across tiles. A 3D model stitcher 102D may classify some of the points of the point cloud as belonging to a building or terrain, and separately process the points from the building and the terrain. For example, the point cloud of buildings is bundle adjusted separately, and an adjusted point cloud is generated from the buildings. The adjusted point cloud may be used to generate a separate building model or combined with the processed or unprocessed terrain model for further processing and/or modelling. For example, the point cloud of terrain is modelled and stitched across map tiles separately from the buildings.

[0057] Computerized system 100 may include a user interface 111 to control the processing, display the results, monitor the system, and the like. Optionally, modules and/or models may be stored on a network attached storage 120.

[0058] Reference is now made to FIG. 2A, which is a flowchart of a method 200 for gridding and point cloud transformations in 3D urban modeling. Hardware processor(s) 101 receive 201 digital image files from camera system(s) 120, such as images in groups, each group associated with a capture session, such as a video clip, a series of drone captured images from a flight, a series of images from car mounted camera mapping vehicles, a series of crowdsourced photos, and/or the like. Each image may comprise one or more GPS coordinate, one or more two-dimensional (2D) image data, metadata tags, and the like. As used herein, the term image file means the digital image file, and the term image means the 2D image data, such as a digital still, a frame of digital video, and/or the like. The groups of images may undergo feature extraction 202 to determine camera poses, image features, point cloud, and the like, each associate with one of the digital image files. [0059] To efficiently and practically convert large amounts of images to a large-scale 3D urban model, the images may be split 203 into sub-groups, each associated with a geographical map tile. The map tile may comprise an overlap with adjacent tiles and the selection of the size of the tiles and the area of the overlaps will be explained in detail hereinbelow. Optionally, when an image is outside the currently computed tile but contains feature of objects within the current tile, it is included in the processing of the current tile, such as when an image was computed previously for a different tile. The images may be solved for camera positions, such as using by using SFM analysis 204. The discrepancies across tiles may be computed 205, and the camera poses may be transformed 206 to minimize the discrepancy within the tile overlap, such as computed 205 with a discrepancy score, discrepancy function, cost function, benefit function, coherence function, and the like. A point cloud may then be created 207 from the transformed camera poses, and the point cloud may be transformed 208 using a new or existing computation 205 of the cross tile overlap discrepancies. The transformed point cloud may be used to create 209 a 3D urban model.

[0060] Reference is now made to FIG. 2B, which is a flowchart of a second method 210 for gridding and point cloud transformations in 3D urban modeling. As in FIG. 2A, the action of receiving 201 images, extracting 202 features, and splitting 203 images into tiles may be performed by hardware processor(s) 101. Camera poses may be solved, such as using by using SFM analysis 214, creating a new point cloud of structural features.

[0061] Each point of the cloud may be classified 215 into belonging to terrain or buildings. Building point cloud may be bundle adjusted 216 including adjusted camera poses, optionally including points of the same building from nearby tiles. Optionally, bundle adjustment 216 is performed on a subset of the point cloud, such as sparse point cloud of the extracted 202 features. A dense building point cloud may be created 217 based on this bundle adjustment 216. The terrain point cloud may be used to model the 3D terrain and stitch 218 the 3D terrain model between map tiles. A new set of terrain point cloud from the model may be combined 219 with the adjusted building point cloud, and used to create 220 a 3D urban model. Optionally, the adjusted dense building point cloud is used to create building models, so that a building does not need alignment of building parts across tiles and the building models may be combined with the 3D terrain model to create a 3D urban model. Optionally, each of the identified building point clouds is re-computed using the images in which the building is observed.

[0062] Reference is now made to FIG. 3 A, which is a flowchart of a method 310 for map tile bundle adjustment using tile overlap discrepancy scores. For example, different discrepancy scores may be used multiple times during the bundle adjusting 313, camera pose transforming, point cloud transforming, and/or the like. For example, a first discrepancy score is minimized 311 to determine a bundle adjust transform. For example, a second discrepancy score is minimized 312 to determine a bundle adjust transform, and combined with the first transform. An entire tile is bundle adjusted 313, and the first discrepancy score is again minimized 314 to transform the bundle adjusted build point cloud. Each discrepancy score may use a cost/benefit function computed on the point cloud overlap between tiles to transform the point cloud within a tile and thus achieve a smoother tile to tile transition of the 3D models with benefits of shorter computation time, less memory usage, easier integration of new images to an existing 3D model, and/or the like.

[0063] Reference is now made to FIG. 3B, which is a flowchart of a method 320 for point cloud classification. Point cloud may be resampled 321 on a regular grid, and each point cloud identified 322 based on the 3D traits of the point, such as coordinates, colors, adjacent neighbors, and/or the like. The points may be projected 323 onto the images, such as converting the 3D coordinates to 2D image coordinates. Within the image space the point may be classified 324 based on radiometric traits, such as view, adjacency, color, computer vision techniques, and/or the like. The classification may then be used to locate 325 the object boundaries on the original point cloud.

[0064] Following are details on the formulas used to create and manage map tiles for 3D urban models.

[0065] When handling large amount of image file data, incremental SFM's bundle adjustment optimization, carried out at each iteration step, may become a computational bottle neck because of processing time, memory resources needed, and/or the like. Global SFM techniques, although potentially efficient, may be sensitive to noisy data. Furthermore, incremental SFM may introduce drifting errors as the data increases in size. Once cameras are solved and transformed, a dense point cloud may be transformed on which further processing may be done to achieve a textured, low noise, accurate, and smooth 3D model.

[0066] Furthermore, a world-scale 3D model may handle unconventional problems, such as not all images are provided to the algorithm at once, image groups cover different areas, such as areas far from each other, changes in the scene over time may render the previous gathered data and/or model at least partially obsolete.

[0067] The proposed methods may overcome such problems and issues using computed map tiles for determining subgroups of image files for processing together, such as a during bundle adjustment, modelling, and/or the like. The techniques disclosed herein provide technical solutions for stitching data between adjacent tiles, such as ground point cloud data, classification data, model data, and the like, without the use of control points. Optionally, ground control points are additionally used to generate and combine the 3D models.

[0068] Map tiles may be geographically bounded areas, such as a bounding box, with an infinite height, containing images and all related data, such as properties of the camera sensor(s) used for the acquisition of the images, images analysis products, the generated model, and the like. The generated model within the tile's bounding box is hereafter referred to as the tile's Area of Interest (AOI). The spatial boundaries of the computation tile may be defined using the center location of the tile and its width, referred to as easting, and height, referred to as northing. The location coordinates of the map tile may be expressed in geodetic coordinate system (i.e. Lat. Lon.), with World Geodetic System (WGS84) data, and the like.

[0069] Although the computation map tiles may not be required to be of the same size, it may be simpler to maintain and process the entire dataset when they are. When determining the size of the tile, the number of images it contains and the area it covers may be considered. Too few images may not converge well to a correct solution and too many images may have a negative impact on performance. Similarly, too small an area may prove inefficient in later steps of the reconstruction pipeline, and too large areas may show major inconsistency between neighboring tiles, due to the spherical nature of the surface of the earth. Since images may not have the same size (width & height), the same shape, cover the same area, such as in aerial images vs. terrestrial images, and/or the like, robust criteria pertaining to the amount of image data in a map tile may be considered. For example, the sum of pixels, the number of certain features extracted from the images, and/or the like.

[0070] One way to estimate the area of a tile may be to use some a-priori knowledge of the images. For example, when images are expected to be acquired by commercial drones and using consumer grade cameras. In this example, images are typically taken at altitudes of from 50 to 100 meters above ground, with a camera diagonal field of view of up to 90 degrees. The measured footprint of an image taken at nadir roughly ranges from 80 by 60 meters (m) up to 170 by 130 meters. When -60% of the map tile area is part of the overlap with adjacent tiles, an image should be available at least every 80 * 0.4 = 32 m by 60 * 0.4 = 24 m. When 500 images are found to be an optimal amount of data per tile for these cameras, the area of the tile is at least 32 * 24 * 500 = 384000 m2. Therefore, a tile size of, for example, the square root of 384000 is approximately 620 m in width and height. Overlap ranges may be from very small where the features are space and map tile stitching is minimal, and up to very large overlaps

[0071 ] For each tile a model may be created, and the map tile models stitched together to create a uniform 3D urban model. Images affecting the tile's AOI may be outside the tile's boundaries, and the bounding box, hereafter referred to as expanded boundaries, may include images with GPS tags outside the map tile boundaries. When the adjacent tile is already computed then additional cameras may be added from that tile to the computed tile, even when they are outside the cameras bounding box, as they may contribute to the tile's AOI. These additional cameras may also be used for the process of aligning the tile's AOI to its surrounding.

[0072] The embodiments described herein may be scalable since the map tile computations, possibly being of an unlimited number, may be easily deployed in a multicomputer environment, such as public or private cloud infrastructures, parallel computers, and/or the like.

[0073] Reference is now made to FIG. 4, which is a schematic illustration of a vertical view of a map tile overlap. The figure shows a camera (clear dot marked A) within the tile boundaries contributing to the AOI, a camera (red dot marked B) within the tile boundaries, not contributing to the AOI and dropped from tile, and a camera (blue dot marked C) in solved adjacent tile contributing to the AOI.

[0074] Following is a description of a diagram outlines the reconstruction pipeline, from images to a dense point cloud, using computation tiles. Let T = { T\...T_m} denote a set of computation tiles. Let P = {p\...p_a} denote a group of images. For each T_q 6 T, 0 < q < m, let ExRect(T_q) denote the coordinates of the expanded bounding rectangle of the tile, and P(T_q)Q P denote the set of images in T_q for which the GPS coordinates of all images in P(T_q) may be contained within ExRect(T_q). Let Rect(T_q) denote the coordinates of the bounding rectangle of the tile, containing the reconstructed scene (AOI).

[0075] The number of images in T_q may therefore be defined as Size{q) = \\P(T_q)\\ > 0. It may be possible to organize the tiles in a searchable data structure, such as a quad-tree. When a new set of images Pnew is introduced to the system, for each new image key points may be detected and feature vectors may be computed. Using the image file GPS tag, each image may be associated with the appropriate tile. An image i 6 T_q when the GPS tag is within the tile's expanded boundaries.

[0076] GPS tags may be expressed in geodetic coordinate system having latitude and longitude using WGS84 data. However, for the purpose of geometric reconstruction of the cameras, scene, 3D model, and the like, coordinates may be converted to a Cartesian system, such as ECEF (Earth Centered, Earth Fixed), to the appropriate local UTM (Universal Traversal Mercator) zone, and the like.

[0077] Overlap may exist between tiles, an image may be associated with more than one tile, such as ILP(r_q) Π (7 )||≥ 0, for some q≠ r. Each tile may be processed independently as detailed in the following. For each pair of images (existing and new), pairs of matching key points, such as feature points, object points, and the like, may be found by comparing feature vectors computed for those points. These pair-wise matches may have incorrect matches (such as outliers), and thus a filtering step may be carried out. For example, for each pair of image features, a fundamental or an essential matrix may be computed, using Random Sample Consensus (RANSAC), and used to filter out key points not complying with the transformation, under a given threshold. [0078] N-view correspondences may be inferred using the pair- wise matches, for example when key point kn in image Pi, matches key point fej in image P₂, and fej matches key point feh in image P3, then the 3-view correspondence would be the set of pairs: {< Pi , kn >, < Pi, ki] >, < P3, feh > }. These correspondences may be referred to as Tracks.

[0079] Using the tracks, camera poses may be solved for the entire set, using incremental or global structure from motion techniques. The bundle adjustment steps may assure that the solution is (at worse) locally optimized, such that it may be with substantially minimal re-projection error with respect to the images key points. Denote the solved camera parameters with respect to some image i 6 P(T_q) as C(Pi, T_q) = < Intrinsic<i,_q>, Pose< _q>>. Camera intrinsic parameters may be focal length, principal point, skew, lens' distortion (radial and tangential), and the like. Pose is the camera translation and rotation denoted as t and R respectively.

[0080] Given the noisy nature of the image data and possibly local convergence of bundle adjustment, neighboring tiles may not align perfectly. This means that for any image i, such as i 6 P(T_q) for some q and i 6 P(Tr) for q≠ r, the solved camera parameters C(Pi, T_q)≠ C(Pi, Tr). This inequality may be referred to as cross-tiles cameras discrepancies. For example, 3D models computed in 2 neighboring tiles may converged to different solutions in each of the tiles.

[0081 ] Discrepancy may be measured in multiple ways, as well as the optimal transformation which results from a minimization of the discrepancies. In order to align adjacent tiles, a discrepancy score function may be used, such as a distance function. Let Tiles(Pi) = { ti . . .

where ¾ 6 Tiles(Pi) iff Pi E Ρ(7¾), where Tiles(Pi) comprises indices of all tiles in which Pi is present.

[0082] Let Overlap(T_q) = { i I Pi EP(T_q), I I 77 Pi)ll > 1 } denote a list of indices of all images in tile T_q that may be also associated to other tiles, i.e. images in overlapping regions of the tile. Reference is now made to FIG. 5, which is a schematic illustration of a top view of a map tile overlap, such as from different cameras.

[0083] For example, the cross-tiles cameras discrepancies may be minimized to find a rigid transformation and a scaling factor, such as one that minimizes the following score function: _{r L}

27 November 2017

Discrepancy ScoreT(s, R, t)

where C(Pi, T_q) = <Intrinsic<i,q>, Pose<i,ak» denote the camera intrinsic parameters

when i is solved for tile T_q.

[0084] The transformation function may be defined as: s_x

s_y

Ft(cam, s, R,t) =

s_z

where s, R and t denote scale, rotation and translation, respectively, applied on the camera

pose as denoted:

cam.pos■

The distance function may be the L2 distance, dist{caml, cam2) = Wcaml.pos - cam2.pos\\.

The minimal discrepancy transform may be computed using the minimal mean squared error

transform applied to the cameras in the tile, for example, by using Umeyama's technique.

A non-linear optimization, such as Levenberg-Marquardt (LM) algorithm may be applied.

[0085] Reference is now made to FIG. 6, which is a schematic illustration of a map tile

overlap and camera pose discrepancy. Another example of a technique to minimize the

cameras cross-tiles discrepancies is by minimizing the focal distances cross-tiles variance.

The discrepancy score function may be denoted as:

DiscrepancyScoreF(s) = Var(

i<≡Overlap(T_q ) nE ile(P^ ,n≠q

This function may be minimized by searching for the ratio s with the transformation function

denoted: Ft(^cam,s) = cam. focal * s _an(j ^₆ distance function dist may be denoted as the

focal lengths ratio: dist(caml, cam2) = (caml.focal I caml.focal) [0086] After computing the ratio and applying the transformation to all of the tile's cameras, the rest of the cameras' parameters may be adjusted in accordance. This may be done for example by applying a bundle adjustment step in which the focal length is kept fixed. Discrepancy score may be minimized by combining different optimization steps. For example, first minimizing Discrepancy ScoreF, and then minimizing DiscrepancyScoreT.

[0087] Based on the cameras poses solved for the tile, now a dense point cloud may be computed. Various techniques may be applied, such as Patch-Based Multi-View Stereopsis. Suppose Vq, point cloud computed on T_q, and V_q,j£ V_q, 0<j<| V_q I , a vertex. For each vertex computed, we may mark:

[0088] Pixel(q,

rowi, coli>....<i_m, row_m, col_m>}, where z^'i,..z^' _m may refer to images Pii,.. Pim E P(T_q) where the vertex Vj is projected at < rowi, coh>... < row_m, col_m > respectively, for example, a look up table referencing a vertex v_?,j to all images observing it. Also denote Vertex(q, i, row, col) = j, the reverse lookup table, referencing pixel<row, col> of i 6 T_q to the computed vertex v_?j E Vq.

[0089] After the dense point cloud has been computed, another cross-tile discrepancy minimization step may be taken, this time operating on overlapped vertices across tiles. Using the above notations, for each v_?j E V_q for which any referenced image i in Pixel(q, j) also exists in Overlap(T_q), the computed vertex in Vertex(r, i, row, col) for each tile T_r E Tiles(Pi) may be determined. Vertices that were compute in different tiles may refer to the same pixels by images in the tile's overlap. The optimal transform for the tile may be computed, minimizing distances between these corresponding vertices, using such techniques as Iterative Closest Point (ICP) etc.

[0090] Optionally, the discrepancy score is computed iteratively, where each iteration may use a different discrepancy score variation. For example, a first iteration computes a discrepancy score that is near optimal, and no further iterations may be needed. For example, a first iteration computes a first discrepancy score for the overlap with a first adjacent map tile, a second iteration computes a second discrepancy score for the overlap with a second adjacent map tile, and/or the like. When the discrepancy score decreases with each iteration, the iterations may continue until the discrepancy score decrease is below a threshold value (i.e. the discrepancy score did not decrease enough or increased). [0091 ] Optionally, the transformation of the SFM values is computed according to sub- transformations, where each sub-transformations uses a different discrepancy score.

[0092] Before smoothing the tile's seams, for alignment within the seam neighborhood, the point cloud may be uniformly sampled. Manmade structures, such as buildings, bridges, and the like may be extracted. These 2 steps may be interchangeable, depending on the techniques used for classification and segmentation. Such manmade objects split at the tiles boundaries may result in visible defects in the surface of the object, such as geometry misalignment, wrong texture projection and other deformations.

[0093] Many techniques for building segmentation and classification may rely on the 3D properties of points/surfaces and/or on radiometric properties in an image. For example, primary point clustering may be done on the point cloud using 3D traits such as height above ground, point normal, etc. All clusters suspicious of being buildings may be classified on images on which they are projected. Then a final segmentation refinement is carried out on the classified vertices to extract exact building boundaries.

[0094] These objects may be classified by querying open or paid databases, such as Open Street Map, however they may not guaranteed to be update nor accurate. Dense point cloud reconstruction may be performed on an irregular point cloud. For stitching neighboring tiles, point clouds may be resampled on a regular grid. Evenly spaced points may simplify the process of smoothing the tiles borders when stitching to neighboring tiles, reduces noise due to outliers, enable the use of more compact and fast-access data structures, thus optimizing performance for other point cloud and mesh processing algorithms. Regardless of the sampling technique, sample rate may be determined. Key factor for determining the correct sampling rate may be the density of the point cloud. Denote a local density in M³:

radiusSeaich(q, i, r)

LocalDensity3(q, i, r)

where radiusSearch(q,i,r) denotes the number of vertices in V_q, that are within radius r of vertex v_q,i, usually r = 1. Since points may not be evenly distributed across the tile, a threshold density may be used, on which the sample rate is defined as:

[0095] A threshold may be found by computing the cumulative histogram of local densities across all vertices in the point cloud. Assuming normal distribution of densities, the selection of the threshold may be based on the bin ~ 0.85* I VI. This may ensure that most of the image data, up to 1 σ of the mean density is accurately represented.

[0096] Alternatively, in cases of bimodal distributions of densities, the density threshold may be determined using Otzu's technique, such as based on the histogram of local densities across all vertices in the point cloud.

[0097] Following similar embodiments, local density may also be defined in M²:

radius Seaich2(q, i, r)

LocalDensity2(q, i, r) where radiusSearch2(q, i, r) is the number of XY projected vertices in V_q, that are within radius r of XY projected vertex v_q,i. Projection may be done by moving from ECEF coordinate system to the appropriate UTM zone, or by projecting to a local plane, tangent to the earth at the center of the tile (see Fig. 7). The rate is therefore refined as:

[0098] Reference is now made to FIG. 7, which is a schematic illustration of a global coordinate system. Projecting ECEF coordinate system to a tile's local tangent plane is shown.

[0099] The grid sample rate (resolution) is denoted Rate( _q). After the sample rate has been determined, the tile may be uniformly resampled to produce a grid on the XY plane, where each cell contains its height. To prevent extrapolating the heights at the exact boundaries of the tile, additional rows and columns may be added at the left, right, top and bottom of the tile.

[00100] When introducing a new computed tile to the system, the new computed tile may be smoothed along the seams and adjacent tiles. The smoothing process may ensure that the grid generated from adjacent tiles will align along the joint borders of these tiles, and that the alignment is gradually smoothed from the border towards the tile center.

[00101] A height grid may be constructed from the sampled tiles around the newly computed tile and the tile itself. The tiles may be re-sampled based on the highest available rate among all participating tiles. A smoothing operator may be applied on the seam-lines. Reference is now made to FIG. 8, which is a schematic illustration of map tile overlap weighting values. The smoothing operator may be combined with some gradient of weights such as in FIG. 8, to determine which seams are most smoothed, with a decreasing weight as the operator moves away from the seam. The shaded cells are the center tile and the darker shaded cells are the tile boundaries.

[00102] The smoothing operator may be a Gaussian filter, for example. Combined with a weight map: Grid(i )=Gauss(i )* f(weight(i ))+Grid(i )*(l-f (weighs where

/ : [0..1]— » [0..1] , for example ( ) = ² and Gauss{i,j) denoting the value at <ij> of the resulting grid with Gaussian filter applied on the re-sampled grid.

[00103] For each classified building, images in tiles that show that building may be identified, using the key points matchings. These images may be re-computed with respect to camera poses, and a new dense point cloud is generated on a per building basis. The buildings may be re-inserted into relevant tiles as points, or kept as a separate layer. Further processing may include surface estimation, plane alignment, simplification, heuristics, and/or the like, relying on the a-priori knowledge that the point cloud represents only buildings.

[00104] Since the tile may have undergone alignment transformations for dealing with cross-tiles discrepancies, buildings may be re-aligned, before being combined with the terrain. This may be achieved by analyzing a building's principal components and aligning the up axis with the tiles' up axis.

[00105] For some applications, the tile may undergo other processing, such as, surface estimation and texture assignment in order to complete a 3D textured mesh. Other applications may require the generation of the same model in several Level of Details (LOD), for better view or faster analysis.

[00106] The present invention may be a system, a method, and/or a computer program product. The computer program product may include a computer readable storage medium (or media) having computer readable program instructions thereon for causing a processor to carry out aspects of the present invention. [00107] Throughout this application, various embodiments of this invention may be presented in a range format. It should be understood that the description in range format is merely for convenience and brevity and should not be construed as an inflexible limitation on the scope of the invention. Accordingly, the description of a range should be considered to have specifically disclosed all the possible subranges as well as individual numerical values within that range. For example, description of a range such as from 1 to 6 should be considered to have specifically disclosed subranges such as from 1 to 3, from 1 to 4, from 1 to 5, from 2 to 4, from 2 to 6, from 3 to 6 etc., as well as individual numbers within that range, for example, 1, 2, 3, 4, 5, and 6. This applies regardless of the breadth of the range.

[00108] Whenever a numerical range is indicated herein, it is meant to include any cited numeral (fractional or integral) within the indicated range. The phrases "ranging/ranges between" a first indicate number and a second indicate number and "ranging/ranges from" a first indicate number "to" a second indicate number are used herein interchangeably and are meant to include the first and second indicated numbers and all the fractional and integral numerals therebetween.

[00109] In the description and claims of the application, each of the words "comprise" "include" and "have", and forms thereof, are not necessarily limited to members in a list with which the words may be associated. In addition, where there are inconsistencies between this application and any document incorporated by reference, it is hereby intended that the present application controls.

[00110] The computer readable storage medium may be a tangible device that may retain and store instructions for use by an instruction execution device. The computer readable storage medium may be, for example, but is not limited to, an electronic storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing. A non- exhaustive list of more specific examples of the computer readable storage medium includes the following: a portable computer diskette, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or Flash memory), a static random access memory (SRAM), a portable compact disc read-only memory (CD-ROM), a digital versatile disk (DVD), a memory stick, a floppy disk, a mechanically encoded device having instructions recorded thereon, and any suitable combination of the foregoing. A computer readable storage medium, as used herein, is not to be construed as being transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through a waveguide or other transmission media (e.g., light pulses passing through a fiber-optic cable), or electrical signals transmitted through a wire. Rather, the computer readable storage medium is a non-transient (i.e., not-volatile) medium.

[00111] Computer readable program instructions described herein may be downloaded to respective computing/processing devices from a computer readable storage medium or to an external computer or external storage device via a network, for example, the Internet, a local area network, a wide area network and/or a wireless network. The network may comprise copper transmission cables, optical transmission fibers, wireless transmission, routers, firewalls, switches, gateway computers and/or edge servers. A network adapter card or network interface in each computing/processing device receives computer readable program instructions from the network and forwards the computer readable program instructions for storage in a computer readable storage medium within the respective computing/processing device.

[00112] Computer readable program instructions for carrying out operations of the present invention may be assembler instructions, instruction-set-architecture (ISA) instructions, machine instructions, machine dependent instructions, microcode, firmware instructions, state-setting data, or either source code or object code written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C++ or the like, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The computer readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the latter scenario, the remote computer may be connected to the user's computer through any type of network, including a local area network (LAN) or a wide area network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet Service Provider). In some embodiments, electronic circuitry including, for example, programmable logic circuitry, field-programmable gate arrays (FPGA), or programmable logic arrays (PLA) may execute the computer readable program instructions by utilizing state information of the computer readable program instructions to personalize the electronic circuitry, in order to perform aspects of the present invention.

[00113] Aspects of the present invention are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, may be implemented by computer readable program instructions.

[00114] These computer readable program instructions may be provided to a processor of a general-purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks. These computer readable program instructions may also be stored in a computer readable storage medium that may direct a computer, a programmable data processing apparatus, and/or other devices to function in a particular manner, such that the computer readable storage medium having instructions stored therein comprises an article of manufacture including instructions which implement aspects of the function/act specified in the flowchart and/or block diagram block or blocks.

[00115] The computer readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other device to cause a series of operational steps to be performed on the computer, other programmable apparatus or other device to produce a computer implemented process, such that the instructions which execute on the computer, other programmable apparatus, or other device implement the functions/acts specified in the flowchart and/or block diagram block or blocks.

[00116] The flowchart and block diagrams in the Figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods, and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, may be implemented by special purpose hardware -based systems that perform the specified functions or acts or carry out combinations of special purpose hardware and computer instructions.

[00117] The descriptions of the various embodiments of the present invention have been presented for purposes of illustration, but are not intended to be exhaustive or limited to the embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terminology used herein was chosen to best explain the principles of the embodiments, the practical application or technical improvement over technologies found in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein.

Claims

CLAIMS What is claimed is:

1. A computerized method comprising using at least one hardware processor for: receiving a plurality of digital image files, each comprising a digital image depicting a geographical location;

for each digital image file, computing at least one of a camera location, camera intrinsic parameter values, and a camera pose that acquired the digital image and a plurality of structural feature locations depicted in the digital image;

segmenting the plurality of digital image files according to respective camera pose and geographical location to one of a plurality of map tiles, wherein each map tile is associated with a subset of the plurality of digital image files, and wherein the plurality of map tiles overlap each other;

computing, for each map tile, a structure from motion analysis (SFM) on the subset to produce a point cloud for that map tile, wherein the SFM additionally produces (a) a refined camera pose and (b) refined camera intrinsic parameter values, for each digital image file; and

for at least one map tile:

(i) computing an alignment transformation based on iteratively computing a discrepancy score of the overlaps between that map tile and the SFM values from surrounding map tiles, wherein the iteratively computing results in a decrease of the discrepancy score,

(ii) generating a transformed SFM values based on the alignment transformation, and

(iii) computing a three-dimensional (3D) urban model of that map tile based on the transformed SFM values.

2. The computerized method according to claim 1, wherein the alignment transformation is computed according to a plurality of sub-transformations, each respective sub-transformation computed from one of a plurality of discrepancy scores.

3. The computerized method according to claim 2, further comprising bundle adjusting the SFM values of at least one of the plurality of map tiles and the overlap.

4. The computerized method according to claim 1 , further comprising using the at least one hardware processor for model aligning at least some of the plurality of 3D urban models, each associated with the respective map tile, to produce a large-scale 3D urban model.

5. The computerized method according to claim 4, further comprising separating at least some of the 3D urban models into a plurality of building models and a plurality of terrain models, and wherein the model aligning is performed separately for the plurality of terrain models.

6. The computerized method according to claim 4, further comprising separating at least some of the 3D urban models into a plurality of building models and a plurality of terrain models, and wherein the model aligning of the plurality of building models is performed using bundle adjustment of the transformed SFM value subset associated with the plurality of building models.

7. The computerized method according to claim 1, wherein the discrepancy score is based on at least one of the point cloud, the camera poses, and the camera intrinsic parameter values associated with the map tile overlap.

8. The computerized method according to claim 1, wherein the overlap is between 1% and 20% of a linear dimension distance substantially orthogonal to an edge of the map tile at that overlap location.

9. The computerized method according to claim 1, wherein the overlap is sized and shaped such that between 1 % and 20% of the digital image files of that tile depict the map tile overlap.

10. The computerized method according to claim 1, wherein the overlap is sized and shaped to match at least one from the group consisting of: (i) specific features at the borders of that map tile, and (ii) the number of points in the point cloud at the plurality of structural feature locations.

11. The computerized method according to claim 1 , wherein at least one of the plurality of map tiles is sized and shaped to match at least one from the group consisting of: (i) specific features at the borders of that map tile, and (ii) the number of points in the point cloud at the plurality of structural feature locations.

12. The computerized method according to claim 1, wherein at least one of the plurality of map tiles is shaped at least in part as at least one from the group consisting of a square, a rectangle, a circle, a triangle, a trapezoid, a parallelepiped, a polygon, and a blob, and wherein the shape is selected according to the plurality of structural feature locations.

13. A computerized system comprising:

at least one hardware processor; and

a non-transitory computer readable storage medium, having program code stored thereon, wherein the program code is configured, when executed on the at least one hardware processor, to:

(i) receive a plurality of digital image files, each comprising a digital image depicting a geographical location;

(ii) for each digital image file, compute at least one of a camera location, camera intrinsic parameter values, and a camera pose that acquired the digital image and a plurality of structural feature locations depicted in the digital image;

(iii) segment the plurality of digital image files according to respective camera pose and geographical location to one of a plurality of map tiles, wherein each map tile is associated with a subset of the plurality of digital image files, and wherein the plurality of map tiles overlap each other;

(iv) compute, for each map tile, a structure from motion analysis (SFM) on the subset to produce a point cloud for that map tile, wherein the SFM additionally produces (a) a refined camera pose and (b) refined camera intrinsic parameter values, for each digital image file; and

(v) for at least one map tile:

(a) compute an alignment transformation based on iteratively computing a discrepancy score of the overlaps between that map tile and the SFM values from surrounding map tiles, wherein the iteratively computing results in a decrease of the discrepancy score,

(b) generate a transformed SFM values based on the alignment transformation, and

(c) compute a three-dimensional (3D) urban model of that map tile based on the transformed SFM values.

14. The computerized system according to claim 13, wherein the alignment transformation is computed according to a plurality of sub-transformations, each respective sub-transformation computed from one of a plurality of discrepancy scores.

15. The computerized system according to claim 13, further comprising program code configured to bundle adjust the SFM values of at least one of the plurality of map tiles and the overlap.

16. The computerized system according to claim 13, further comprising program code configured to model align at least some of the plurality of 3D urban models, each associated with the respective map tile, to produce a large-scale 3D urban model.

17. The computerized system according to claim 16, further comprising program code configured to separate at least some of the 3D urban models into a plurality of building models and a plurality of terrain models, and wherein the model aligning is performed separately for the plurality of terrain models.

18. The computerized system according to claim 16, further comprising program code configured to separating at least some of the 3D urban models into a plurality of building models and a plurality of terrain models, and wherein the model aligning of the plurality of building models is performed using bundle adjustment of the transformed SFM value subset associated with the plurality of building models.

19. The computerized system according to claim 13, wherein the discrepancy score is based on at least one of the point cloud, the camera poses, and the camera intrinsic parameter values associated with the map tile overlap.

20. The computerized system according to claim 13, wherein the overlap is between 1 % and 20% of a linear dimension distance substantially orthogonal to an edge of the map tile at that overlap location.

21. The computerized system according to claim 13, wherein the overlap is sized and shaped such that between 1 % and 20% of the digital image files of that tile depict the map tile overlap.

22. The computerized system according to claim 13, wherein the overlap is sized and shaped to match at least one from the group consisting of: (i) specific features at the borders of that map tile, and (ii) the number of points in the point cloud at the plurality of structural feature locations.

23. The computerized system according to claim 13, wherein at least one of the plurality of map tiles is sized and shaped to match at least one from the group consisting of: (i) specific features at the borders of that map tile, and (ii) the number of points in the point cloud at the plurality of structural feature locations.

24. The computerized system according to claim 13, wherein at least one of the plurality of map tiles is shaped at least in part as at least one from the group consisting of a square, a rectangle, a circle, a triangle, a trapezoid, a parallelepiped, a polygon, and a blob, and wherein the shape is selected according to the plurality of structural feature locations.

25. A computer program product, the computer program product comprising a non- transitory computer-readable storage medium having program code embodied therewith, the program code executable by at least one hardware processor to:

receive a plurality of digital image files, each comprising a digital image depicting a geographical location;

for each digital image file, compute at least one of a camera location, camera intrinsic parameter values, and a camera pose that acquired the digital image and a plurality of structural feature locations depicted in the digital image;

segment the plurality of digital image files according to respective camera pose and geographical location to one of a plurality of map tiles, wherein each map tile is associated with a subset of the plurality of digital image files, and wherein the plurality of map tiles overlap each other;

compute, for each map tile, a structure from motion analysis (SFM) on the subset to produce a point cloud for that map tile, wherein the SFM additionally produces (a) a refined camera pose and (b) refined camera intrinsic parameter values, for each digital image file; and

for at least one map tile:

(d) compute an alignment transformation based on iteratively computing a discrepancy score of the overlaps between that map tile and the SFM values from surrounding map tiles, wherein the iteratively computing results in a decrease of the discrepancy score,

(e) generate a transformed SFM values based on the alignment transformation, and

(f) compute a three-dimensional (3D) urban model of that map tile based on the transformed SFM values.

26. The computer program product according to claim 25, wherein the alignment transformation is computed according to a plurality of sub-transformations, each respective sub-transformation computed from one of a plurality of discrepancy scores.

27. The computerized system according to claim 25, further comprising program code configured to bundle adjust the SFM values of at least one of the plurality of map tiles and the overlap.

28. The computer program product according to claim 25, further comprising program code configured to model align at least some of the plurality of 3D urban models, each associated with the respective map tile, to produce a large-scale 3D urban model.

29. The computer program product according to claim 28, further comprising program code configured to separate at least some of the 3D urban models into a plurality of building models and a plurality of terrain models, and wherein the model aligning is performed separately for the plurality of terrain models.

30. The computer program product according to claim 28, further comprising program code configured to separating at least some of the 3D urban models into a plurality of building models and a plurality of terrain models, and wherein the model aligning of the plurality of building models is performed using bundle adjustment of the transformed SFM value subset associated with the plurality of building models.

31. The computer program product according to claim 25, wherein the discrepancy score is based on at least one of the point cloud, the camera poses, and the camera intrinsic parameter values associated with the map tile overlap.

32. The computer program product according to claim 25, wherein the overlap is between 1 % and 20% of a linear dimension distance substantially orthogonal to an edge of the map tile at that overlap location.

33. The computer program product according to claim 25, wherein the overlap is sized and shaped such that between 1 % and 20% of the digital image files of that tile depict the map tile overlap.

34. The computer program product according to claim 25, wherein the overlap is sized and shaped to match at least one from the group consisting of: (i) specific features at the borders of that map tile, and (ii) the number of points in the point cloud at the plurality of structural feature locations.

35. The computer program product according to claim 25, wherein at least one of the plurality of map tiles is sized and shaped to match at least one from the group consisting of: (i) specific features at the borders of that map tile, and (ii) the number of points in the point cloud at the plurality of structural feature locations.

36. The computer program product according to claim 25, wherein at least one of the plurality of map tiles is shaped at least in part as at least one from the group consisting of a square, a rectangle, a circle, a triangle, a trapezoid, a parallelepiped, a polygon, and a blob, and wherein the shape is selected according to the plurality of structural feature locations.