[go: up one dir, main page]

EP4377913A1 - Procédé d'entraînement pour l'entraînement d'un système de détection de changement, procédé de génération d'ensemble d'entraînement associé et système de détection de changement - Google Patents

Procédé d'entraînement pour l'entraînement d'un système de détection de changement, procédé de génération d'ensemble d'entraînement associé et système de détection de changement

Info

Publication number
EP4377913A1
EP4377913A1 EP22760770.2A EP22760770A EP4377913A1 EP 4377913 A1 EP4377913 A1 EP 4377913A1 EP 22760770 A EP22760770 A EP 22760770A EP 4377913 A1 EP4377913 A1 EP 4377913A1
Authority
EP
European Patent Office
Prior art keywords
change
data block
training
information data
registered
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
EP22760770.2A
Other languages
German (de)
English (en)
Inventor
Balázs NAGY
Lóránt KOVÁCS
Csaba BENEDEK
Tamás SZIRÁNYI
Örkény H. ZOVÁTHI
László TIZEDES
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Szamitastechnikai Es Automatizalasi Kutatointezet
Original Assignee
Szamitastechnikai Es Automatizalasi Kutatointezet
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Szamitastechnikai Es Automatizalasi Kutatointezet filed Critical Szamitastechnikai Es Automatizalasi Kutatointezet
Publication of EP4377913A1 publication Critical patent/EP4377913A1/fr
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/56Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
    • G06V20/58Recognition of moving objects or obstacles, e.g. vehicles or pedestrians; Recognition of traffic objects, e.g. traffic signs, traffic lights or roads

Definitions

  • the invention relates to a training method for training a change detection system, a training set generating method therefor, and a change detection system.
  • point clouds as opposed to traditional 2D photos and multispectral images, can be considered a proportionate 3D model of the environment, in which the relative position and size of each object can be determined on a scale identical to that of the world coordinate system.
  • their disadvantage is that the set of points is only a representation of an otherwise continuous surface observable in the world, obtained by a discrete sampling.
  • the sampling characteristics point density, sampling curves of different sensors (and sensor settings) might be significantly different.
  • this task can be formulated as a change detection (CD) problem.
  • CD change detection
  • video surveillance applications see C. Benedek, B. Galai, B. Nagy, and Z. Janko, “Lidar-based gait analysis and activity recognition in a 4d surveillance system,” IEEE Trans. Circuits Syst. Video Techn., vol. 28, no.1, pp. 101-113, 2018. and F. Oberti, L. Marcenaro, and C. S. Regazzoni, “Real-time change detection methods for video-surveillance systems with mobile camera,” in European Signal Processing Conference, 2002, pp. 1-4.) change detection is a standard approach for scene understanding by estimating the background regions and by comparing the incoming frames to this background model.
  • Mobile and terrestrial Lidar sensors can obtain point cloud streams providing accurate 3D geometric information in the observed area.
  • Lidar is used in autonomous driving applications supporting the scene understanding process, and it can also be part of the sensor arrays in ADAS (advance driver assistance) systems of recent high-end cars. Since the number of vehicles equipped with Lidar sensors is rapidly increasing on the roads, one can utilize the tremendous amount of collected 3D data for scene analysis and complex street-level change detection. Besides, change detection between the recorded point clouds can improve virtual city reconstruction or Simultaneous Localization and Mapping (SLAM) algorithms (see C.-C. Wang and C. Thorpe, “Simultaneous localization and mapping with detection and tracking of moving objects,” in Int. Conf. on Robotics and Automation (ICRA), vol. 3, 2002, pp. 2918-2924.)
  • SLAM Simultaneous Localization and Mapping
  • Processing street-level point cloud streams is often a significantly more complex task than performing change detection in airborne images or Lidar scans. From a street-level point of view, one must expect a larger variety of object shapes and appearances, and more occlusion artifacts between the different objects due to smaller sensor-object distances.
  • the lack of accurate registration between the compared 3D terrestrial measurements may mean a crucial bottleneck for the whole process, for two different reasons: First, in a dense urban environment, GPS/GNSS-based accurate self-localization of the measurement platform is often not possible (see B. Nagy and C. Benedek, “Real-time point cloud alignment for vehicle localization in a high resolution 3d map,” in ECCV 2018 Workshops, LNCS, 2019, pp. 226-239.). Second, the differences in viewpoints and density characteristics between the data samples captured from the considered scene segments may make automated point cloud registration algorithms less accurate (see B. Nagy and C.
  • the primary object of the invention is to provide a training method for training a change detection system, a training set generating method therefor, and a change detection system which are free of the disadvantages of prior art approaches to the greatest possible extent. Furthermore, an object of the invention is to provide solution for these methods and system applicable for a coarsely registered pair of 3D information data blocks (data blocks may e.g. be point clouds, see herebelow for some specific features). More specifically, the object of the invention (i.e. of our proposed solution) is to extract changes between two coarsely registered sparse Lidar point clouds.
  • the object of the method is to provide a machine learning based solution to compare only coarsely (approximately) registered 3D point clouds made from a given 3D environment and to determine the changed environmental regions without attempting to specify the registration (more generally, without performing registration).
  • the objects of the invention can be achieved by the training method for training a change detection system according to claim 1, the change detection system according to claim 6, and the training set generating method according to claim 7.
  • Preferred embodiments of the invention are defined in the dependent claims.
  • the task of change detection is solved by the invention e.g. for real Lidar point cloud- (generally, for 3D information data block) based change detection problems.
  • the later registration step is critical for real-world 3D perception problems, since the recorded 3D point clouds often have strongly inhomogeneous density, and the blobs of the scanned street-level objects are sparse and incomplete due to occlusions and the availability of particular scanning directions only. Under such challenging circumstances, conventional point-to-point, patch-to-patch, or point-to-patch correspondence-based registration strategies often fail (see R. Qin, J. Tian, and P. Reinartz, “3D change detection - Approaches and applications,” ISPRS J. Photogramm. Remote Sens., vol. 122, no. Cd, pp. 41-56, 2016.).
  • this description is the first approach to solve the change detection problem among sparse, coarsely registered terrestrial point clouds, without needing an explicit fine registration step.
  • Our proposed - preferably deep learning-based - method can extract and combine various low-level and high-level features throughout the convolutional layers, and it can learn semantic similarities between the point clouds, leading to its capability of detecting changes without prior registration.
  • a deep neural network-based change detection approach is proposed, which can robustly extract changes between sparse point clouds obtained in a complex street-level environment, i.e. the invention is preferably a deep (learning) network for change detection (alternatively, for detecting changes) in coarsely registered point clouds.
  • the proposed method does not require precise registration of the point cloud pairs. Based on our experiments, it can efficiently handle up to 1m translation and 10° rotation misalignment between the corresponding 3D point cloud frames.
  • the method according to the invention preferably following human perception, provides a machine-learning-based method for detecting and marking changes in discrete, only coarsely (approximately) registered point clouds.
  • point clouds represent the environment to scale (proportionately) and have a scale factor equal to that of the world coordinate system (for example, the distance between two characteristic points is the same as in the environment e.g. expressed in centimetres). Furthermore, the point clouds were created (collected, generated by measurement) in the same area, i.e. , from almost the same reference point and with a similar orientation, however, their exact reference positions (typically corresponds to the place of data collection) and orientations relative to each other are unknown.
  • Figs. 1A-1B illustrate an exemplary input image pair
  • Figs. 1 C-1 D illustrate target change images for the inputs of Figs. 1 A-1 B
  • Fig. 2 illustrates the internal structure of the change detection generator module in an embodiment with showing inputs and outputs
  • Fig. 3A shows an embodiment of the training method according to the invention
  • Fig 3B shows a flowchart of the training set generating method according to the invention
  • Figs. 4A-4E are illustrations for the invention in a scene
  • Fig. 5A is a fused target change image for Figs. 1 C-1 D
  • Fig. 5B is a fused output change image for Figs. 1 A-1 B,
  • Figs. 5C-5D are results for input of Figs. 1A-1B obtained by prior art approaches
  • Figs. 6A-6H are illustrations for the invention in an exemplary scene
  • Figs. 6I-6J are shadow bars for Figs. 6A-6FI.
  • Fig. 7 is a diagram showing the comparison of the results obtained by an embodiment of the change detection system according to the invention and prior art techniques.
  • Some embodiments of the invention relate to a training method for training a change detection system for detecting change for a coarsely registered pair of (or alternatively, having) a first 3D information data block and a second 3D information data block (a 3D information data block may be e.g. a range image or a point cloud, see below for details), wherein
  • the change detection system comprises a change detection generator module (see in Fig. 2 as well as in Fig. 3A its embodiments) based on machine learning (e.g. neural network in Fig. 3A) and adapted for generating a change data block (e.g. a change image, but changes assigned to points of a point cloud are also conceivable) for a coarsely registered pair of a first 3D information data block and a second 3D information data block (the change detection generator module is naturally - like any machine learning module - adapted for generating a change data block at the very beginning of the training method, since the untrained module starts from a status, where preliminary change data blocks can be generated, however, these change data blocks getting better and better - i.e.
  • machine learning e.g. neural network in Fig. 3A
  • a discriminator module based on machine learning is applied (according to the invention, this is necessary to train the change detection generator module; it has naturally a role during the training method, but no role after the training; see Fig. 3A of an embodiment of discriminator module labelled as discriminator network).
  • the training set preferably comprises range images as 3D information data block and target change images as target change data blocks.
  • the training set generating method preferably starts from a point cloud as a base 3D information data block of a registered base pair (see below), and preferably ends in a training set utilizable for the training method, i.e. range images and corresponding target change images.
  • the training method according to the invention can be interpreted based on Fig. 3A (see far below even more details about Fig. 3A): the figure illustrates the first 3D information data block and the second 3D information data block labelled by inputs.
  • the training method is for training a change detection system for detecting change for a coarsely registered pair; for training this way we naturally need a plurality of coarsely registered pairs and, according to the training strategy, ground truth images corresponding thereto, which are target change data blocks (e.g. target change images) in the framework of the invention; the plurality of coarsely registered pairs and the corresponding plurality of target change data blocks constitute the training set: of.
  • a target change data block or target change data block pair - see below - naturally correspond to a specific coarsely registered pair: these are of near the same scene/location having thus correlated content, see also below)
  • - change data blocks are generated by means of the change detection generator module for the plurality of coarsely registered pairs of the training set (see operational step S180 in Fig.
  • a discriminator loss contribution is generated by applying the discriminator module on a plurality of coarsely registered pairs of the training set, as well as corresponding target change data blocks of the training set and corresponding change data blocks (see operational step S190 in Fig. 3A in an embodiment), and
  • the change detection generator module is trained by a combined loss obtained from a summation of the generator loss contribution and the discriminator loss contribution, wherein in the summation at least one of the generator loss contribution and the discriminator loss contribution is multiplied by a respective loss multiplicator (see operational step S195 corresponding to training in Fig. 3A in an embodiment, where in a combined loss 235 the generator loss contribution 225 is multiplied by a l loss multiplicator 227 and a discriminator loss contribution 230 is simply added; the loss multiplicator has a predetermined/predefined multiplicator value; in other words, training is performed in operational step S195, accordingly, it is a training step).
  • the above steps of the training cycle are performed one after the other in the above order. Furthermore, after the last step (training step), the steps are again started from the first (change data block generation), if a next iteration cycle is started (see below for the termination of the training method).
  • change data block is generated in the first step and used in the second step as a possible input of the discriminator module.
  • the generator and discriminator loss contributions are generated in the first and second step, respectively, and play the role of an input in the third step.
  • the training method is for (in particular, suitable for) training a change detection system.
  • the system is trained for detecting change, i.e. detection of any change which can be revealed between its inputs.
  • the change is detected for a coarsely registered pair of 3D information data blocks.
  • a change data block is generated.
  • a pair of change data blocks are generated as detailed in some examples, but in some applications (e.g. the content of any of change data blocks is not relevant), it is enough to generate only one change data block.
  • this single change data block is processed by the discriminator module, it may be accompanied by a single target change data block.
  • the training cycle is formulated for a plurality of coarsely registered pairs and a plurality of respective target change data blocks of a training set (the latter many times called epoch, in a training cycle a batch of training data - being a part of the epoch - is utilized, after that the machine learning module is trained by the calculated loss, e.g. weights of a neural network are updated; see also below), and this approach is followed in the steps in the above description of the invention, but it worth to show some details of the steps for a single input and output, i.e. to illustrate the method for a single processing.
  • one (or two, i.e. a pair) change data block is generated for a coarsely registered pair, and by the help of this change data block and the corresponding target change data block (being also the part of the ground truth) a member of the generator loss contribution can be calculated (e.g. according to the definition of the Li loss as given in an example below; the whole loss contribution - here and below for the discriminator - can be generated based on the plurality of coarsely registered pairs); in the generator loss contribution, many times difference is taken into account for a corresponding combination (the word ‘combination’ is brought only for showing which are the corresponding entities) of the change data block and the target change data block, thus it may be formulated also on this basis;
  • generation of the discriminator loss contribution is formalized for the plurality of coarsely registered pairs of the training set, for a single processing - as illustrated in Fig. 3A - the discriminator is applied on corresponding sets of a coarsely registered pair, a target change data block and a change data block (see below for details);
  • training is also performed based on the plurality of coarsely registered pairs and the plurality of respective target change data blocks, since these all have taken into account for determining the generator and discriminator loss contributions (in other words, training is performed after a batch is processed in the above steps of a training cycle); furthermore, it is noted that for the summation, the multiplicator may be applied to any of the contributions or for both.
  • an optimizer will determine, based on the loss, at which point (after which training cycle) in the training method it is advantageous to stop (terminate) the training.
  • the l parameter (loss multiplicator) will handle the loss contributions (it can no longer be seen afterwards which loss contributes to the combined loss).
  • the appropriate value of the loss multiplicator is preferably predetermined in a way that the training is performed for several values of the loss multiplicator, and the appropriate value is selected based on the results.
  • the value of the global loss (the global loss function is preferably a differentiable function). For the global loss, we may require that its value falls below a threshold, but preferably we look for a small change from the previous value or values, as oscillations may occur in evolution of the loss value.
  • the value of the loss multiplicator is a function of many aspects.
  • the competing networks are essentially random networks. It can be calculated by tests on the training method, so as to check whether the parameter value is able to reach an appropriate balance between the generator and discriminator loss contributions.
  • the loss contributions are generated based on the outputs of the generator and discriminator modules. Separate modules may be dedicated for generating loss (loss generating modules), but this task may be considered also as a task of the generator and discriminator, themselves.
  • the generator word may be skipped, i.e. it may be simply called change detection module.
  • the generator and discriminator modules could be called first and second competing modules beside that their functions are defined (what output(s) are generated based on the inputs).
  • the aforementioned devices (tools) for surveying the 3D environment Lidar laser scanners, infrared scanners, stereo and multi-view camera systems) and other such devices provide 3D information (i.e. a 3D information data block) of the environment and any representation (point cloud, range image, etc.) can be extracted from it.
  • 3D information i.e. a 3D information data block
  • any representation point cloud, range image, etc.
  • the respective module is realized by the required element(s) of a machine learning approach, preferably by neural networks as shown in many examples in the description. All of the machine learning based modules applied in the invention can be realized by the help of neural networks, but the use of alternative machine learning approaches is also conceivable.
  • a data block pair e.g. a range image pair or a point cloud pair
  • a data block pair e.g. a range image pair or a point cloud pair
  • the registration information is not utilized, i.e. the translation and/or rotation which is necessary to bring the images of the image pair into alignment (i.e. there common/overlapping part where however naturally can be changes) is not determined or not utilized;
  • - ‘registered’ relates to a data block pair (e.g. a range image pair or a point cloud pair) which are registered, i.e. the translation and/or rotation which is necessary to bring the images of the image pair into alignment is determined and utilized during the processing of the image pair (registration is another approach based explicitly on using the registration information for processing).
  • a data block pair e.g. a range image pair or a point cloud pair
  • the change detection system becomes - by means of the training method - adapted for detecting changes for a coarsely registered pair, for which the translation and/or rotation by which these could be aligned is not determined, but if this translation and/or rotation would be determined the value of these would be restricted.
  • the data blocks (images) of a coarsely registered image pair are thus unprocessed data blocks (images) from this point of view, i.e. the complicated step of registering is not performed on them (or the registration information is not used, this latter can also be meant under the meaning of non-registered).
  • An alternative term for "coarsely registered” may be "data block (e.g. point cloud or range image) pair with restricted relative translation and/or rotation".
  • data block e.g. point cloud or range image
  • relative is clearly meant on the data blocks of the data block pair, i.e. the translation and/or rotation is interpreted between the data blocks of the pair.
  • restricted translation and/or rotation see in an example that it is preferably up to ⁇ 1m translation and/or an up to ⁇ 10° rotation (in the real coordinate system corresponding to them).
  • a registered point set means that it is spatially (or temporally) aligned, i.e. matched. If the exact registration is known, our method is not needed.
  • the content of the two input images generally, data blocks
  • the inputs to be processed may also include an image pair that is registered, i.e. has the data available to register it (e.g. translation and/or rotation for alignment), in which case these data are not used, as the input to the change detection system is only the coarsely registered pair itself.
  • an image pair that is registered, i.e. has the data available to register it (e.g. translation and/or rotation for alignment), in which case these data are not used, as the input to the change detection system is only the coarsely registered pair itself.
  • some embodiments of the invention relate to a change detection system adapted (as a consequence of being trained, see below) for detecting change for a coarsely registered pair of a first 3D information data block and a second 3D information data block, wherein the system
  • the change detection system may be also called change detection architecture. It can be equated by its main (only) module in its training state, the change detection generator module (in the training state the system does not comprise the discriminator module), since these - i.e. the change detection system and the change detection generator module - have the same inputs (a coarsely registered pair) and the same output (change data block(s)).
  • a 3D information data block may be any image, such as range image or depth image, which bearing 3D information/3D location information of the parts of the environment (these are 3D information images, but these could be also called simply 3D images) or other data structure bearing 3D information, such as a point cloud.
  • the coarsely registered pair of the first 3D information data block and the second 3D information data block is constituted by a coarsely registered range image pair of a first range image and a second range image (so, the 3D information data block itself may be a range image), or
  • a coarsely registered range image pair of a first range image and a second range image is generated from the coarsely registered pair of the first 3D information data block and the second 3D information data block before the application of the at least one training cycle (accordingly, as an option the range image may also be generated from another 3D information data block, which was e.g. originally a point cloud).
  • 3D information data blocks may be applied in the invention (it is applicable e.g. also on point clouds), applying range images as 3D information data blocks may be preferred since the preferably applied convolutions in the machine learning based modules handle easier such a format (but the convolution is also applicable also on a point cloud). Since it is preferred to use, many details are introduced in the following with illustrating on range images. On the most general level these are termed as 3D information data blocks, but all features introduced in a less general level but being compatible with the most general level are considered to be utilizable on the most general level.
  • a change mask image having a plurality of first pixels is generated by the change detection generator module as the change data block and the target change data block is constituted by a target change mask image having a plurality of second pixels, wherein to each of the plurality of first pixels and a plurality of second pixels presence of a change or absence of a change is assigned (accordingly, we have preferably change mask image and target change mask image as a manifestation of the change data block and the target change data block; for target or ground truth change mask, see Figs. 1 C and 1 D; a fused change mask is illustrated in Fig. 5B; change mask image and target change mask images are preferably 2D images).
  • the change data block is constituted by a change mask image, and target change mask images are used as target change data blocks.
  • the presence or absence of a change is denoted in every pixel.
  • such mask image may be applied in which denotes a change and ⁇ ’ denotes the other regions where there is no change (e.g. where small number of changes are identified in an image, there is a small number of ‘1’ on the image and there are ⁇ ’ in other regions).
  • ⁇ ’ denotes the other regions where there is no change (e.g. where small number of changes are identified in an image, there is a small number of ‘1’ on the image and there are ⁇ ’ in other regions).
  • range images and mask images are utilized.
  • all of the data “circulated” during the training process is represented by 2D images: the generator module receives 2D range images from which it generates 2D mask images.
  • the generator loss contribution can be calculated based on the mask images.
  • the discriminator module also handles 2D images in this case (also the target is a mask image). This choice is further detailed below and it can be utilized advantageously in a highly effective way.
  • a pair of change images are applied (i.e. for both “directions”, see below), and, accordingly, also a pair of target change images is utilized for them in course of the training method.
  • the utilization of change data block pairs i.e. that we have pairs) can be applied independently from the range image and mask image approach.
  • Lidar devices such as the Rotating multi-beam (RMB) sensors manufactured by Velodyne and Ouster
  • RMB Rotating multi-beam
  • SLAM see C.-C. Wang and C. Thorpe, “Simultaneous localization and mapping with detection and tracking of moving objects,” in Int. Conf. on Robotics and Automation (ICRA), vol. 3, 2002, pp. 2918-2924.
  • both input point clouds may contain various dynamic or static objects, which are not present in the other measurement sample.
  • range image representation preferably applied in the framework of the invention, see the followings.
  • Our proposed solution preferably extracts changes between two coarsely registered Lidar point clouds in the range image domain.
  • creating a range image from a rotating multi-beam (RMB) Lidar sensor's point stream is straightforward (see C. Benedek, “3d people surveillance on range data sequences of a rotating lidar,” Pattern Recognition Letters, vol. 50, pp. 149- 158, 2014, depth Image Analysis.) as its laser emitter and receiver sensors are vertically aligned, thus every measured point has a predefined vertical position in the image, while consecutive firings of the laser beams define their horizontal positions.
  • RMB rotating multi-beam
  • this mapping is equivalent to transforming the representation of the point cloud from the 3D Descartes to a spherical polar coordinate system, where the polar direction and azimuth angles correspond to the horizontal and vertical pixel coordinates, and the distance is encoded in the corresponding pixel's 'intensity' value.
  • range image mapping can also be implemented for other (non-RMB) Lidar technologies, such as for example Lidar sensor manufactured by Livox.
  • Lidar sensor manufactured by Livox Lidar sensor manufactured by Livox.
  • image resolution the conversion of the point clouds to 2D range images is reversible, without causing information loss.
  • using the range images makes it also possible to adopt 2D convolution operations by the used neural network architectures.
  • the proposed deep learning approach in an embodiment takes as input two coarsely registered 3D point clouds Pi and P2 represented by range images h and I2, respectively (shown in Figs. 1A and 1B) to identify changes.
  • Our architecture assumes that the images h and I2 are defined over the same pixel lattice S, and have the same spatial height (h), width (w) dimensions (i.e. the two input range images have the same resolution, the two images have the same number of horizontal and vertical pixels).
  • Figs. 1A-1D illustrates input data representation for the training method according to the invention.
  • Figs. 1A-1B show exemplary range images h, I2 (being the realisation of first and second 3D information data blocks 100a and 100b in this embodiment) from a pair of coarsely registered point clouds Pi and P2.
  • Figs. 1 C-1 D show binary ground truth change masks A-i, L2 (being the realisation of first and second target change data blocks 120a and 120b in this embodiment) for the range images h and I2, respectively (see also below).
  • a rectangle 109 marks the region also displayed in Figs. 6A-6FI (see also Figs. 5A-5D).
  • Figs 1A and 1B show range images for an exemplary scene. These range images are obtained from point clouds by projection.
  • the range images represent range (i.e. depth) information: in the images of Figs. 1A-1B the darker shades correspond to farther parts of the illustrated scene, lighter shades corresponds to closer parts thereof, as well as those parts from which no information has been obtained are represented by black. Flerebelow, some details are given about the content shown in Figs. 1 A-1 B for deeper understanding.
  • a bus 101 a is marked in Fig. 1 A which become to a bus 101 b in Fig. 1 B (the bus become to another status by movement and it possibly turns, therefore seen shorter in Fig. 1B);
  • a car 108b can be seen in another position, which can be another car or the same car as the car 102a;
  • a car 103a can be seen in Fig. 1A which does not move (i.e. be the same as 103b in Fig. 1 B) but a car 104b appears on its right in Fig. 1 B;
  • Figs. 1 C and 1 D show target change data blocks 120a and 120b (i.e. target change images) corresponding to data blocks 100a and 100b, thus the content of Figs. 1C- 1D can be interpreted based on the above content information.
  • Fig. 1C is coloured (grey, not black; black is the background in Figs. 1C-1D) at those parts
  • the target change data block 120a shows the back of the bus 101a, i.e. a part 111a (this was contained in Fig. 1A but not in Fig. 1B); however, the target change data block 120b also marks this place as a part 111b, since - because of the movement of the bus 101 a - a car becomes visible (this was occluded by the bus 101a in Fig. 1A) and the background (e.g. a wall) behind the car;
  • a part 112a corresponding to the 102a of Fig. 1A is shown the target change data block 120a, as well as a background-like part 112b can be seen in target change data block 120b in the place from which the car 102a has moved (a coloured/grey part can also be seen in Fig. 1 D where the car 108b is visible in Fig. 1 B);
  • a background-like part 114a can be seen in the target change data block 120a since the car 104b appears in Fig. 1 B, as well as a car-like part 117a can be seen in the target change data block 120a but a background-like part 117b can be seen in the target change data block 120b.
  • our change detection task can be reformulated in the following way: our network extracts similar features from the range images h and I2, then it searches for the high correlation between the features, and finally, it maps the correlated features to two binary change mask channels Ai and L2, having the same size as the input range images.
  • a new generative adversarial neural network (in particular generative adversarial neural network-like - abbreviated, GAN-like - architecture, more specifically a discriminative method, with an additional adversarial discriminator as a regularizer), called ChangeGAN, whose architecture (structure) is shown in Fig. 2 in an embodiment.
  • Fig. 2 thus shows a proposed ChangeGAN architecture, wherein the notations of components: SB1, SB2 - Siamese branches, DS - downsampling, STN - spatial transformation/transformer network, Conv2D - 2D convolution; Conv2DT - transposed 2D convolution.
  • GAN-like we also mean the general characteristic of a GAN that its two competing networks (generator, discriminator) learn simultaneously during training, but at the same time, the generator can be considered as a result of the training procedure, since the training aims at creating a generator with appropriate characteristics, which is capable of generating an output with the desired characteristics.
  • first and second 3D information data blocks 150a and 150b are for example images with 128x1024x1 dimensions and forming a coarsely registered pair 150
  • Siamese style see J. Bromley, J. Bentz, L. Bottou, I. Guyon, Y. Lecun, C. Moore, E. Sackinger, and R. Shah, “Signature verification using a ’’Siamese” time delay neural network,” International Journal of Pattern Recognition and Artificial Intelligence, vol. 7, p. 25, 081993.
  • the Siamese architecture is designed to share the weight parameters across multiple branches allowing us to extract similar features from the inputs and to decrease the memory usage and training time.
  • each branch 162a and 162b of the Siamese network comprises (in an example consists of) fully convolutional down-sampling (DS) blocks (i.e. DSi-DSn downsampling subunits 164a and 164b, the same machine learning units - i.e. branches 162a and 162b - are applied on the two inputs).
  • the two branches 162a and 162b constitute a downsampling unit together.
  • the first layer of the DS block is preferably a 2D convolutional layer with a stride of 2 which has a 2-factor down-sampling effect along the spatial dimensions.
  • the processing illustrated in Fig. 2 is a U-net-like processing, where the two “vertical lines” of the letter ‘U’ are the downsampling (it is a “double” vertical line because of the two branches) and the upsampling (upsampling unit 170, see also below). Furthermore, the “horizontal line” of the letter ‘U’ - connecting the two “vertical lines” thereof as the letter is drawn is illustrated by the conv2D unit 168. It is optional to have anything (any modules) in the “horizontal line”, but there may also be more convolution units (i.e. if any, one or more convolution units are arranged after merging but before upsampling).
  • the second part (see upsampling unit 170) of the proposed model contains a series of transposed convolutional layers (see Conv2DTi-Conv2DT n upsampling subunits 172) to up-sample the signal from the lower-dimensional feature space to the original size of the 2D input images.
  • Connections 167 interconnect the respective levels of downsampling and upsampling having the same resolution. Accordingly, in this U-net-like construction the Conv2DTi-Conv2DT n upsampling subunits 172 for upsampling on the one hand receive input from a lower level (first from conv2D unit 168) and - as another input - the output of the respective downsampling level. Thus, the output of the Conv2DTi-Conv2DT n upsampling subunits 172 is obtained using these two inputs.
  • first and second change data block 175a and 175b are for example images having 128x1024x1 dimensions.
  • the change maps are in general considered to be the output of the upsampling unit 170, in which the upsampling is performed.
  • a change detection generator module 160 is denoted in Fig. 2.
  • STN Spatial Transformation Network
  • an STN module it may be preferred to apply an STN module. Accordingly, in an embodiment of the training method a spatial transformer module (see spatial transformer modules 165a, 165b in the embodiment of Fig. 2 arranged in both branch of change detection generator module illustrated there)
  • the STN module (configured according to the description of the STN article cited above) is built in between the DS modules (at the same level in the two branches, of course), because it can work well on the downsampled images, helping to handle possible transformations in the inputs. So, in upscaling there is no STN module arranged, but it learns end-to-end to give good change data blocks (change images) for the inputs relative to the corresponding targets.
  • the STN module is thus preferably a part of the change detection generator module, i.e. it is trained in the framework of the end-to-end training. Accordingly, the upsampling unit and the interconnections between the upsampling and downsampling units helps to integrate the STN module.
  • the STN module as disclosed above, is therefore designed to help handle the translations and rotations that may be present in the coarsely registered pair. It is also important to emphasize that it does this while learning, together with the other modules, to produce the right change data blocks (showing such possible translation/rotation) for the right inputs. By this we mean helping to process the translation and/or rotation (and meanwhile the whole generator module does not eliminate them, but this registration error is also present in the change data blocks).
  • the STN module is of course also part (inseparable part) of the trained (completed) generator.
  • the STN module that is included in the generator as above only helps the operation of this generator, i.e. it is preferably more efficiently learned if included, but arranging of the STN module is not necessarily required.
  • downsampling unit 162 in Fig. 2 in the embodiment of Fig. 2 the downsampling unit has (directly) the inputs, i.e. the coarsely registered pair) having a first row of downsampling subunits (see downsampling subunits 164a, 164b) and interconnected with
  • upsampling unit 170 in the embodiment of Fig. 2 a merge unit 166 - because of the two branches 162a, 162b - and a conv2D unit 168 is inserted into the interconnection of these; furthermore, in the embodiment of Fig. 2 the change data blocks are obtained directly as outputs of the upsampling unit) having a second row of upsampling subunits (see upsampling subunits 172) and corresponding to the downsampling unit, is applied, wherein the downsampling unit and the upsampling unit is comprised in the change detection generator module and the spatial transformer module is arranged in the downsampling unit within the first row of downsampling subunits (i.e.
  • STN module is optionally used for helping the change detection generator module in its operation, but it can operate (perform its task) also without arranging STN module.
  • STN preferably works with 2D images. So does preferably the invention, since 3D point clouds can be preferably represented as 2D range images (however, the application of both for 3D inputs can be straightforwardly solved).
  • the position of the STN module within the feature extraction branch is preferably "in the middle" (among downsampling subunits) because it is already preferably looking for transformations among more abstract features, not on raw high-resolution data.
  • a clear difference between the proposed change detection method (i.e. the training method and the corresponding change detection system) and the state-of-the-art is the adversarial training strategy which has a regularization effect, especially on limited data.
  • the other main difference is the preferably built-in spatial transformer network (in the training method) yielding the proposed model to be able to learn and handle coarse registration errors.
  • the building blocks of the training method and thus the corresponding change detection system are combined so that a highly advantageous synergetic effect is achieved, e.g. by the adversarial training strategy itself, and even more with the application of STN module.
  • the model can automatically handle errors of coarse registration (i.e. any translation and/or rotation in the coarsely registered pair).
  • the generator network is responsible for learning and predicting the changes between the range image pairs.
  • the generator model is trained on a batch of data.
  • the number of epochs is a hyperparameter that defines the number times that the training (learning) algorithm will work through the entire training dataset.
  • the actual state of the generator is used to predict validation data which is fed to the discriminator model.
  • the discriminator network is preferably a fully convolutional network that classifies the output of the generator network.
  • the discriminator model preferably divides the image into patches and decides for each patch whether the predicted change region is real or fake. During training, the discriminator network forces the generator model to create better and better change predictions, until the discriminator cannot decide about the genuineness of the prediction.
  • Fig. 3A demonstrates the proposed adversarial training strategy in an embodiment, i.e. of the ChangeGAN architecture.
  • L1 Loss abbreviated, Lu; more generally generator loss contribution 225 in Fig. 3A
  • LGAN GAN Loss
  • Fig. 3A demonstrates the proposed adversarial training strategy in an embodiment, i.e. of the ChangeGAN architecture.
  • Lu L1 Loss
  • LGAN GAN Loss
  • it may also be called adversarial loss, more generally discriminator loss contribution 230 in Fig. 3A
  • FIG. 3A furthermore illustrates in the schematic flowchart the followings as a part of an embodiment of the training method.
  • a first 3D information data block 200a and a second 3D information data block 200b is denoted by respective blocks labelled by Inputi and Input2.
  • the first 3D information data block 200a and the second 3D information data block 200b are the inputs of a change detection generator module 210 (labelled by “Generator network”, since it is realized by a neural network module in this embodiment). Accordingly, at the output of the generator it is checked, how good the change was generated compared to the target.
  • the change detection generator module 210 have a change data block 215 (labelled by “Generated img (image)”; this is preferably an illustration of a change data block pair) as an output.
  • the change data block 215 as well as the target change data block 205 is processed to obtain the L1 loss 225.
  • the change data block 215 and the target change data block 205 are given also to the discriminator module 220 (labelled by “Discriminator network”, since it is realized by a neural network module in this embodiment), just like the first 3D information data block 200a and the second 3D information data block 200b. All of these inputs are processed by the discriminator module 220 (see also below) so as to obtain the combined loss 235. As it is also illustrated in Fig. 3A the combined loss 235 is fed back to the change detection generator module 210 preferably as a gradient update.
  • the discriminator loss contribution is generated based on coarsely registered pairs, as well as corresponding target change data blocks and change data blocks.
  • the change data block just generated by the generator module has a special role, since the other inputs of the discriminator module are parts of the ground truth. According to its general role, the discriminator makes a “real or fake” decision on the change/target change data blocks (images).
  • the generator and the discriminator “compete” with each other.
  • the goal of the generator is to generate better and better results for which the discriminator can be “persuaded” that it is not a generated result (“fake”) but a “real” one.
  • the discriminator learns to recognize the generated images better and better during the learning process.
  • generator generates high- quality change data blocks (images).
  • the training (learning) process of the discriminator is performed in the present case as follows. From the point of view of the discriminator a target change data block is “real” and a change data block (generated by the generator) is “fake”.
  • the coarsely registered pair e.g. the input range images, but this can be made also operable if the inputs are point clouds: the modules can also be constructed so as to handle this type of input
  • the modules can also be constructed so as to handle this type of input
  • the discriminator preferably has separated inputs for a target change image and a coarsely registered pair, and for a (generated) change image and the coarsely registered pair (the same as given along with the target image; the target and generated change images correspond also to each other; these corresponding inputs constitute an input set for the discriminator).
  • the discriminator will judge about both inputs. Accordingly, the discriminator decides about the target or generated change image having knowledge about the content of the 3D information image pair.
  • the discriminator For these inputs, the discriminator generates outputs preferably on a pixel-to-pixel basis whether, according to the judgement of the discriminator, an image part is real or fake. For the whole image, thus, discriminator will give a map illustrating the distribution of its decision.
  • a discriminator loss is generated based on the outputs (corresponding to the two inputs mentioned above) of the discriminator, which is used to train the discriminator itself as well as - a part of the combined loss mentioned above - also the generator.
  • both the generator and the discriminator is trained continuously, but after training only the generator will be used (as mentioned above, for generating change images for an input pair) and the trained discriminator is not utilized.
  • the discriminator loss is preferably calculated as follows. For obtaining the contribution of an output pair for the discriminator loss, it is known what type of input has been given to the discriminator.
  • the good result is when the image is decided to be “fake”, this is e.g. denoted by a ⁇ ’ (and a “real” pixel is denoted by a ; preferably makes a binary decision for each pixel, this is a kind of classification task) in the pixels of the respective output of the discriminator.
  • a “real” pixel is denoted by a ; preferably makes a binary decision for each pixel, this is a kind of classification task) in the pixels of the respective output of the discriminator.
  • the discriminator is preferably able to judge on a pixel-to-pixel basis, i.e. it can give a result for each pixel.
  • the discriminator preferably gives a number from the range [0,1] in which case it can give also a probability for the judgement for each pixel (the output in the [0, 1 ] range can be cut at 0.5, and thus binary results can be reached, sorting every result under 0.5 to 0 and every result equal to or larger than 0.5 to 1).
  • the case is typically not ideal during the learning process, therefore, the result coming from the discriminator will be diversified.
  • a correlation-like function e.g. sigmoid cross entropy
  • the discriminator loss thus guides the generator to generate images closer to the ground truth, because it will only accept them as 'real' if it gets the ground truth itself, so if an incoming generated change image resembles the ground truth in as many details as possible, we are doing well.
  • the discriminator Based on the spatial structure presented by the inputs bearing 3D information, the discriminator examines the imperfect result of the generator and can judge, not only at the mask level but also considering also the 3D information of the input pair, what level of learning the generator is at and can give an answer at the pixel level (relevance plays a big role here), where it is good and where it is not (it can accept roughly good on irrelevant parts, but expect more on a relevant part to judge that part positively).
  • the discriminator loss is built up from such loss contributions (generated using the target change data block as well as the (generated) change data block, i.e. the discriminator loss receives two contributions according to the two outputs of the discriminator for each input set thereof) and after an epoch, the discriminator will be trained by this discriminator loss, as well as it is taken to the combined loss to train also the generator.
  • a discriminator loss contribution is preferably generated by applying the discriminator module on a plurality of a corresponding
  • both the generator and the discriminator part of the GAN architecture were optimized by the Adam optimizer (Diederik P. Kingma, Jimmy Ba: Adam: A Method for Stochastic Optimization, arXiv:1412.6980v9) and the learning rate was set to 10 5 (learning rate is a tuning parameter in an optimization algorithm that determines the step size at each iteration while moving toward a minimum of a loss function; Murphy, Kevin P. (2012). Machine Learning: A Probabilistic Perspective. Cambridge: MIT Press p. 247).
  • this test we have trained the model on 300 epochs which takes almost two days. At each training epoch, we have updated the weights of both the generator and the discriminator ones.
  • the change detection dataset is described (training set generated by the training set generating method according to the invention). This is in connection with the training set generating method to which some of the embodiments of the invention relate.
  • the training set generating method is on the same generality level as the training method (relates also coarsely registered pairs and the training set generating method is for generating a plurality of coarsely registered pairs of a first 3D information data block and a second 3D information data block and a plurality of respective target change data blocks), it is mainly described in the following illustrated on the example of point clouds for 3D information data block (it is summarized also on the most general level).
  • the annotation should accurately mark the point cloud regions of objects or scene segments that appear only in the first frame, only in the second frame, or which ones are unchanged thus observable in both frames (see Figs. 4A-4E and 6A-6FI).
  • GT ground truth
  • a high-resolution 3D voxel map was built on a given pair of point clouds.
  • the voxel size defines the resolution of the change annotation.
  • the length of the change annotation cube (voxel) was set to 0.1 m in all three dimensions. All voxels were marked as changed if 90% of the 3D points in the given voxel belonged to only one of the point clouds. Thereafter minor observable errors were manually eliminated by a user-friendly point cloud annotation tool.
  • Range image creation and change map projection The transformed 3D point clouds were projected to 2D range images h, and I2 as described in connection with range image representation above (see Figs. 1A-1B).
  • the Lidar's horizontal 360° field of view was mapped to 1024 pixels and the 5m vertical height of the cropped point cloud was mapped to 128 pixels, yielding that the size of the produced range image is 1024 x 128.
  • our measurements were recorded at 20 Hz, where the angular resolution is around 0.3456°, which means we get 1042 points per channel per revolution.
  • the dimension of the training data is a power of two. So we removed 18 points with equal step size from each channel. Since the removed points are in fixed positions, we know the exact mapping between the 2D and 3D domain.
  • the Lidar sensor used in this experiment has 64 emitters yielding that the height of the original range images should be 64.
  • the 2D convolutional layers with a stride of 2 have a 2-factor down-sampling effect.
  • the horizons of the range images are at similar positions in the two inputs due to the cropped height of the input point clouds.
  • the p GT (p) ground truth labels of the points were also projected to the Ai GT and A2 GT change masks, used for reference during training and evaluation of the proposed network.
  • the change labelling is performed for registered point cloud pairs (generally, registered 3D information data block pairs, which called registered base pairs) captured from the same sensor position and orientation at different times (e.g. captured from a car standing at the same place, that is why the point clouds will be registered). Since production of the GT starts from registered point cloud pairs (in general, 3D information data block pairs), it is naturally known where the changes are in the images.
  • registered point cloud pairs generally, registered 3D information data block pairs, which called registered base pairs
  • the change annotation is performed on registered point clouds a.
  • the frame pairs are taken in the same global coordinate system, they can be considered as registered.
  • their ground truth (GT) change annotation can be efficiently created in a semi-automatic way: i.
  • a high-resolution 3D voxel map is built on a given pair of point clouds (e.g. voxels with 10 cm edges for a general scene, and with these preferably cubic, matching voxels, we cover the scene so that all points of the point cloud are contained within a voxel and a plurality of points may be contained in a voxel).
  • point clouds e.g. voxels with 10 cm edges for a general scene, and with these preferably cubic, matching voxels, we cover the scene so that all points of the point cloud are contained within a voxel and a plurality of points may be contained in a voxel.
  • the reference positions and orientations of e.g. the second frames are randomly transformed (any of the frames may be transformed or both of them) yielding a large set of accurately labelled coarsely registered point cloud pairs (the pairs were registered so far, the transformation is performed so as to achieve coarsely registered pairs).
  • the second frame (P2) of each point cloud pair both in the training and test datasets has been applied for the second frame (P2) of each point cloud pair both in the training and test datasets.
  • the GT labels remained attached to the p e P2 points and were transformed together with them (i.e. the annotation advantageously remains valid also after applying the transformation).
  • cloud crop and normalization steps are also performed: a. In the next step, all 3D points were removed from the point clouds, whose horizontal distances from the sensor were larger than 40m, or their elevation values were greater than 5m above the ground level. b. This step yielded the capability of normalizing the point distances from the sensor between 0 and 1 (these values are preferred for e.g. neural networks).
  • range image creation and change map projection is performed: a. The transformed 3D point clouds were projected to 2D range images 11, and I2.
  • the generated GT set has been preferably divided into disjunct training and test sets which could be used to train and quantitatively evaluate the proposed method.
  • the remaining parts of the collected data including originally unregistered point cloud pairs have been used for qualitative analysis through visual validation. These are based on real measurements taken e.g. in a city, for example. We paired them afterwards based on nearby GPS position measurement data, so they are not like those recorded as a pair that can be considered as a registered pair. We could only check this visually and the trained system performed very well on these.
  • some embodiments of the invention relate to a training set (mentioned also as dataset and it may also be called training database) generating method for generating a plurality of coarsely registered pairs of a first 3D information data block and a second 3D information data block and a plurality of respective target change data blocks (in this case preferably point clouds) of a training set for applying in any embodiments of the training method according to the invention.
  • a training set (mentioned also as dataset and it may also be called training database) generating method for generating a plurality of coarsely registered pairs of a first 3D information data block and a second 3D information data block and a plurality of respective target change data blocks (in this case preferably point clouds) of a training set for applying in any embodiments of the training method according to the invention.
  • Fig. 3B shows a flowchart of the main steps of an embodiment of the training set generating method according to the invention.
  • a plurality of registered base pairs (see registered base pairs 300 in Fig. 3B) having (alternatively, of) a first base 3D information data block and a second base 3D information data block is generated or provided (it may be - readily - available e.g. from some database, but these can be generated also as given in the example detailed above; as a result of these options, we can start this method with a plurality of registered base pairs),
  • - change annotation is performed in a change annotation step (see operational step S310 in Fig. 3B) on the first base 3D information data block and the second base 3D information data block of each of the plurality of registered base pairs,
  • a transformation step by transforming in a transformation step (see operational step S320 in Fig. 3B) at least one of the first base 3D information data block and the second base 3D information data block (as a first option, only one of them is transformed, but even both of them may be transformed, but it is irrelevant which is transformed) of each of the plurality of registered base pairs, a first resultant 3D information data block and a second resultant 3D information data block are generated for each of the plurality of registered base pairs (resultant 3D information data blocks may have another name, like e.g. intermediate 3D information data blocks), and
  • a training data generation step in a training data generation step (see operational step S330 in Fig. 3B) the plurality of coarsely registered pairs and the plurality of respective target change data blocks of the training set are generated based on a respective first resultant 3D information data block and second resultant 3D information data block.
  • one or more artificial change is applied before the change annotation step by addition or deletion to any of the first base 3D information data block and the second base 3D information data block of each of the plurality of registered base pairs.
  • Artificial changes are applied on the base 3D information data blocks (e.g. point clouds), so these modified data blocks are forwarded to the change annotation step afterwards.
  • the plurality of registered base pairs of a first basis 3D information data block and a second base 3D information data block are constituted by a (respective) plurality of registered base point cloud pairs of a first base point cloud having a plurality of first points and (a respective plurality of) a second base point cloud having a plurality of second points (i.e. in this embodiment point clouds are utilized), and in the change annotation step
  • a 3D voxel grid having a plurality of voxels is applied on the first basis point cloud and the second basis point cloud (i.e. the 3D voxel grid is applied to the union of the two point clouds, more specifically to the space part in which the first and second point clouds are arranged (situated)),
  • a first target change data block and a second target change data block are generated based on assigned change for first points and second points, respectively (since a change label is assigned to all points where it is judged that there is a change, a target change data block can be easily produced based on the change assignment information).
  • the predetermined first ratio limit and the predetermined second ratio limit is both 0.9 (both of them is set to be this value).
  • a common voxel grid (voxel map) is built (assigned, generated) for the two point clouds.
  • the step of change assignment it is given how it is judged that all points of a voxel get a change label (i.e. it is not given when these receive a non change label, since the points of a voxel get non-change label if the change label has not been assigned to them).
  • the points of those voxels get the change label in which there are too much of such points which have no correspondent in the respective voxel of the other point cloud (since this is analysed before the transformation is done, the correspondent points in the point cloud pair can be easily counted, and no-correspondence is found at those position into which a change has been induced by dynamic changes or by hand, see points 1-3 of the list above).
  • voxel e.g. 0.1 m for a general scene, as specified above
  • a scene will be partitioned correctly, i.e. as you move from a non-change area to a change area, it will not indicate a change early, because the points in the voxel from the contribution of the two point clouds will largely match.
  • it arrives at a change area, then it derives the change areas with a resolution comparable to the voxel, but not point by point, rather it determines much more efficiently (with a precision that is well within the accuracy of the result, given that for a street scene with a voxel size of e.g. 0.1 m is preferably chosen) which volume parts have change and which do not.
  • the threshold is preferably set to 90% (this corresponds to 0.9 for the ratio limit). This way, the voxels will have the right resolution at the transition to the change areas, and the volume parts affected by the real change can be well identified.
  • the ratio limit which is therefore preferably 0.9, i.e. 90%
  • all points in the voxel are classified as belonging to change, while if it is below it, all points in the voxel are classified as not belonging to change.
  • an up to ⁇ 1m translation preferably in a plane perpendicular to the z-axis
  • an up to ⁇ 10° rotation transform preferably around a z-axis, see also below in connection with the z-axis.
  • Figs. 4A-4E show the changes predicted by the proposed ChangeGAN model, i.e. results obtained by (an embodiment of) the trained change detection system according to the invention (there are also such results among Fig. 5A-5D and Fig. 6A-6FI, which were obtained by this).
  • Figs. 5A-5D show (predicted) change masks by the different methods on input data shown in Figs. 1 A-1 B. More specifically, Fig. 5A shows ground truth fused change map A GT (this one is not a predicted mask), Fig. 5B shows ChangeGAN output’s fused change map L, Fig. 5C shows ChangeNet output, and Fig. 5D shows MRF output. Rectangles 109 correspond to the region shown in Figs. 6A-6FI.
  • fused change maps i.e. fused change images
  • Figs. 1 A and 1 B show relative mask, i.e. the change in view of another image. It is disclosed in connection with them how their content is determined.
  • the fused change maps show every change, i.e. not only the relative changes: if there is a change in any of the images of the image pair, there is a change shown in the fused image.
  • changes can be represented by masks.
  • a change image of a pair a change can be denoted by in a pixel, if there is no change, there is ⁇ ’ in the pixel.
  • a change is denoted by ‘2’ in the respective pixel (no change remains O’).
  • change is denoted in all of that pixels which contain 'T in the first change image or ‘2’ in the second change image.
  • MRF method i.e., MRF-based reference approach, see B. Galai and C. Benedek, “Change detection in urban streets by a real time Lidar scanner and MLS reference data,” in Int. Conf. Image Analysis and Recognition, LNCS, 2017, pp. 210-220.
  • ChangeNet method A. Varghese, J. Gubbi, A. Ramaswamy, and P. Balamuralidhar, “Changenet: A deep learning architecture for visual change detection,” in ECCV 2018 Workshops, LNCS, 2019, pp. 129-145.
  • Table 1 is a performance comparison of these methods.
  • the ChangeGAN method outperforms both reference methods in terms of these performance factors, including the F1 -score and loU values.
  • the MRF method (B. Galai and C. Benedek, “Change detection in urban streets by a real time Lidar scanner and MLS reference data,” in Int. Conf. Image Analysis and Recognition, LNCS, 2017, pp. 210-220.) is largely confused if the registration errors between the compared point clouds are significantly greater than the used voxel size. Such situations result in large numbers of falsely detected change-pixels, which fact yields on average very low precision result (0.44), although due to several accidental matches, the recall rate might be relatively high (0.88) (see the definition of precision and recall above).
  • the measured low computational cost means a second strength of the proposed ChangeGAN approach, especially versus the MRF model, whose execution time is longer with one order of magnitude.
  • ChangeNet is even faster than ChangeGAN, its performance is significantly weaker compared to the other two methods.
  • the adversarial training strategy has a regularization effect (see P. Luc, C. Couprie, S. Chintala, and J. Verbeek, “Semantic segmentation using adversarial networks,” in NIPS 2016 Workshop on Adversarial Training, Dec 2016, Barcelona, Spain), and the STN layer can handle coarse registration errors
  • the proposed ChangeGAN model can achieve better generalization ability and it outperforms the reference models on the independent test set. Note that in each case of Tablel running speed was measured in seconds on a PC with an i8-8700K CPU @3.7GHz x12, 32GB RAM, and a GeForce GTX 1080Ti.
  • Figs. 4A-4E changes detected by ChangeGAN for a coarsely registered point cloud pair are illustrated.
  • Figs. 4A and 4B show the two input point clouds (generally, first and second 3D information data blocks 240a and 240b), as well as Fig. 4C displays the coarsely registered input point clouds in a common coordinate system in a 3D information data block 240.
  • Fig. 4C simply illustrates the two inputs in a common figure.
  • Fig. 4A darker image
  • Fig. 4B lighter image
  • Figs. 4A-4B are unified in Fig. 4C, where the darker and lighter points of Figs. 4A-4B are observable.
  • Figs. 4D and 4E present the change detection results (i.e. a common change data block 245): originally blue and green coloured points (in greyscale: darker and lighter) represent the objects marked as changes in the first and second point clouds, respectively.
  • a common change data block 245 originally blue and green coloured points (in greyscale: darker and lighter) represent the objects marked as changes in the first and second point clouds, respectively.
  • the above mentioned cars on the left and the bus on the right of Fig. 4A are shown with a darker colour in the change data block 245, as well as the tram and the car of Fig. 4B are shown with a lighter colour therein.
  • Fig. 4E shows the change data block 245 from above, wherein ellipse 246 draws attention to the global alignment difference between the two coarsely registered point clouds.
  • Figs. 4A-4E contain a busy road scenario, where different moving vehicles appear in the two point clouds.
  • moving objects both from the first (originally blue colour - i.e. darker colour in the vehicles - in the change data block 245) and second (originally green, i.e. lighter colour in the vehicles) frames (i.e. Figs. 4A-4B, respectively), are accurately detected despite the large global registration errors between the point clouds (highlighted by the ellipse 246 in Fig. 4E).
  • a change caused by a moving object in a given frame also implies a changed area in the other frame in its shadow region, which does not contain reflections due to occlusion (of. the shadow regions of Figs.
  • FIG. 4C the two inputs (Figs. 4A-4B) are simply superimposed, it is not identified what is the same and what is a change on them. It is noted that the most important function of the change data block (map) is to mark the changes.
  • Figs. 4D-4E contain the registration error, which can be observed more clearly in the top view of Fig. 4E. Although we illustrate these in one figure, we otherwise prefer to treat the two changes separately.
  • Fig. 6A-6FI show comparative results of the ground truth and the predicted changes by ChangeGAN and the reference techniques for the region marked by rectangle 109 in Figs. 1A-1D and Fig. 5A-5D (Figs. 4A-4E illustrate a different scene).
  • Figs. 6A-6B show originally coloured and greyscale versions of the same content (this statement holds true also for pairs of Figs. 6C-6D, 6E-6F, and 6G-6FI), respectively, i.e. the ground truth change mask.
  • Figs. 6C-6D show the ChangeGAN predicted change
  • Figs. 6E-6F show the ChangeNet predicted change
  • Figs. 6G-6FI show the MRF predicted change.
  • Figs. 6I and 6J (arranged for Figs. 6A-6D and Figs. 6E-6H, respectively) show shadow bars for showing correspondence between the colours of originally coloured and greyscale versions, and, by the help of showing correspondences, the colours of both versions can be interpreted when all of the figures are shown in greyscale or black and white.
  • Figs. 6I and 6J On the left side of Figs. 6I and 6J from the top to the bottom black, grey, blue and green colours were originally shown.
  • the corresponding shadows of grey can be seen.
  • Figs. 6A, 6C, 6E and 6G originally green and blue points (see the shadow bar of Fig. 6I and 6J for interpreting the correspondence with the same content Figs.
  • FIGS. 6A, 6C, 6E and 6G black shows the points of the first point cloud and grey shows the points of second point cloud.
  • a first ellipse 300 and a second ellipse 302 throughout Figs. 6A-6H mark the detected front and back part of a bus travelling in the upper lane, meanwhile occluded by other cars (cf. Figs. 1A-1B where the movement of the bus can be observed and thus the change visualized in Figs. 6A-6FI can be interpreted).
  • a first square 304 shows a building facade segment, which was occluded in Pi (cf. Fig. 1A).
  • the boxes 306 highlight false positive changes of the reference methods in Figs. 6E-6G confused by inaccurate registration (ChangeGAN in Figs. 6C-6D behaves very well in this region).
  • Figs. 6A-6FI display another traffic situation (different from Figs. 4A-4E), where the output of the proposed ChangeGAN technique can be compared to the manually verified Ground Truth (Figs. 6A-6B) and to the two reference methods (Figs. 6E-6FI) in the 3D point cloud domain.
  • both reference methods detected false changes in the bottom left corner of the image (in box 306: there is almost no changes in the ground truth and in the results of ChangeGAN in Figs. 6A-6D, but many changes - illustrated by originally blue and green points in Figs. 6E and 6G/many shades of grey in this region in Figs. 6F and 6H - are shown in this region; in greyscale it is advantageous to show the content of the box 306 with two type of colouring, since in the box 306 Figs. 6F and 6H show much more variability than Figs. 6E and 6G, see this in view of Fig. 6B below), which were caused by the inaccurate registration (please find more details above: these are false positive results in Figs.
  • Fig. 7 displays with hollow marks the average F1 -scores in a function of various ti values.
  • the training method is specialized so that the changes in the coarsely registered 3D information data block pair to be effectively detected, i.e. to achieve a change detection system of high efficiency.
  • our generative adversarial network (GAN) architecture preferably compounds Siamese-style feature extraction, U-net-like use of multiscale features, and STN blocks for optimal transformation estimation.
  • the input point clouds - as typical inputs - are preferably represented by range images, which advantageously enables the use of 2D convolutional neural networks.
  • the result is preferably a pair of binary masks showing the change regions on each input range image, which can be backprojected to the input point clouds (i.e. to the change detection generator module) without loss of information.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Image Analysis (AREA)

Abstract

L'invention concerne un procédé d'entraînement pour entraîner un système de détection de changement pour détecter un changement pour une paire grossièrement enregistrée (200) d'un premier et d'un second bloc de données d'informations 3D (200a, 200b), dans un cycle d'entraînement, - générer (S180) par un module générateur de détection de changement (210) des blocs de données de changement (215) pour les paires grossièrement enregistrées (200) et générer (S185) une perte de générateur (225) sur la base de blocs de données de changement (215) et de blocs de données de changement cibles (205), - générer (S190) une perte de discriminateur (230) par un module discriminateur (220) sur des paires grossièrement enregistrées (200), des blocs de données de changement cibles (205) et des blocs de données de changement (215) et - entraîner (S195) le module générateur de détection de changement (210) par une perte combinée (235) additionnant les pertes de générateur et de discriminateur (225, 230) multipliant l'une quelconque de celles-ci par un multiplicateur de perte (λ). L'invention concerne en outre un système de détection de changement et un procédé de génération d'ensemble d'entraînement pour le procédé d'entraînement.
EP22760770.2A 2021-07-27 2022-07-08 Procédé d'entraînement pour l'entraînement d'un système de détection de changement, procédé de génération d'ensemble d'entraînement associé et système de détection de changement Pending EP4377913A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
HUP2100280 2021-07-27
PCT/HU2022/050058 WO2023007198A1 (fr) 2021-07-27 2022-07-08 Procédé d'entraînement pour l'entraînement d'un système de détection de changement, procédé de génération d'ensemble d'entraînement associé et système de détection de changement

Publications (1)

Publication Number Publication Date
EP4377913A1 true EP4377913A1 (fr) 2024-06-05

Family

ID=89662436

Family Applications (1)

Application Number Title Priority Date Filing Date
EP22760770.2A Pending EP4377913A1 (fr) 2021-07-27 2022-07-08 Procédé d'entraînement pour l'entraînement d'un système de détection de changement, procédé de génération d'ensemble d'entraînement associé et système de détection de changement

Country Status (2)

Country Link
EP (1) EP4377913A1 (fr)
WO (1) WO2023007198A1 (fr)

Families Citing this family (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116049012B (zh) * 2023-02-24 2025-09-02 广西玉柴机器股份有限公司 基于adasis协议的高精地图重构软件测试方法及系统
CN116311482B (zh) * 2023-05-23 2023-08-29 中国科学技术大学 人脸伪造检测方法、系统、设备及存储介质
CN116452983B (zh) * 2023-06-12 2023-10-10 合肥工业大学 一种基于无人机航拍影像的国土地貌变化快速发现方法
CN116740652B (zh) * 2023-08-14 2023-12-15 金钱猫科技股份有限公司 一种基于神经网络模型的锈斑面积扩大的监测方法与系统
CN116740669B (zh) * 2023-08-16 2023-11-14 之江实验室 多目图像检测方法、装置、计算机设备和存储介质
CN117574259B (zh) * 2023-10-12 2024-05-07 南京工业大学 适用于高端装备的注意力孪生智能迁移可解释性诊断方法
CN117671437B (zh) * 2023-10-19 2024-06-18 中国矿业大学(北京) 基于多任务卷积神经网络的露天采场识别与变化检测方法
CN117152622B (zh) * 2023-10-30 2024-02-23 中国科学院空天信息创新研究院 边界优化模型训练、边界优化方法、装置、设备及介质
CN117456349B (zh) * 2023-12-03 2025-02-11 西北工业大学 一种基于伪样本学习的无监督sar与光学图像变化检测方法
CN118037793B (zh) * 2024-03-13 2024-10-01 首都医科大学附属北京安贞医院 一种术中x线和ct图像的配准方法和装置
CN118865100A (zh) * 2024-06-24 2024-10-29 南京林业大学 一种基于街景图像的行道树快速普查方法
CN118865120B (zh) * 2024-07-05 2025-03-21 中北大学 一种面向遥感变化解释召回率可调的多尺度特征融合深度网络
CN119399624B (zh) * 2024-10-12 2025-08-26 自然资源部第一航测遥感院(陕西省第五测绘工程院) 一种耕地非农化变化检测方法、系统、设备及存储介质
CN119131701B (zh) * 2024-11-13 2025-03-11 浙江省测绘科学技术研究院 一种建筑物变化区域的检测方法、系统、装置及介质
CN119596356B (zh) * 2024-11-15 2025-09-30 哈尔滨工业大学 一种基于超宽幅旋扫遥感图像的高价值目标快速检测与定位方法及其系统
CN119672964A (zh) * 2025-02-21 2025-03-21 贵州道坦坦科技股份有限公司 基于深度学习的高速车流量检测方法及系统
CN120047450B (zh) * 2025-04-27 2025-07-08 南京信息工程大学 基于双流时相特征适配器的遥感图像变化检测方法

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9619691B2 (en) 2014-03-07 2017-04-11 University Of Southern California Multi-view 3D object recognition from a point cloud and change detection
US10970518B1 (en) * 2017-11-14 2021-04-06 Apple Inc. Voxel-based feature learning network

Also Published As

Publication number Publication date
WO2023007198A1 (fr) 2023-02-02

Similar Documents

Publication Publication Date Title
EP4377913A1 (fr) Procédé d'entraînement pour l'entraînement d'un système de détection de changement, procédé de génération d'ensemble d'entraînement associé et système de détection de changement
Alcantarilla et al. Street-view change detection with deconvolutional networks
CN116229408A (zh) 一种图像信息与激光雷达点云信息融合的目标识别方法
Zhao et al. Road network extraction from airborne LiDAR data using scene context
US10043097B2 (en) Image abstraction system
GB2554481A (en) Autonomous route determination
Lin et al. Planar-based adaptive down-sampling of point clouds
Nagy et al. ChangeGAN: A deep network for change detection in coarsely registered point clouds
Nagy et al. 3D CNN-based semantic labeling approach for mobile laser scanning data
Pan et al. Automatic road markings extraction, classification and vectorization from mobile laser scanning data
Li et al. 3D map system for tree monitoring in hong kong using google street view imagery and deep learning
CN119625279A (zh) 多模态的目标检测方法、装置和多模态识别系统
Huang et al. Overview of LiDAR point cloud target detection methods based on deep learning
Gigli et al. Road segmentation on low resolution lidar point clouds for autonomous vehicles
Parmehr et al. Automatic registration of optical imagery with 3d lidar data using local combined mutual information
Song et al. Automatic detection and classification of road, car, and pedestrian using binocular cameras in traffic scenes with a common framework
Li et al. Fusion strategy of multi-sensor based object detection for self-driving vehicles
Fehr et al. Reshaping our model of the world over time
KR102249380B1 (ko) 기준 영상 정보를 이용한 cctv 장치의 공간 정보 생성 시스템
Pandey et al. Toward mutual information based place recognition
Zhu et al. Precise spatial transformation mechanism for small-size-aware roadside 3D object detection on traffic surveillance cameras
Iwaszczuk et al. Detection of windows in IR building textures using masked correlation
KR102801241B1 (ko) 깊이 맵을 생성하기 위한 전자 장치 및 이의 동작 방법
Babu et al. Enhanced Point Cloud Object Classification using Convolutional Neural Networks and Bearing Angle Image
de Paz Mouriño et al. Multiview rasterization of street cross-sections acquired with mobile laser scanning for semantic segmentation with convolutional neural networks

Legal Events

Date Code Title Description
STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: UNKNOWN

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE INTERNATIONAL PUBLICATION HAS BEEN MADE

PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: REQUEST FOR EXAMINATION WAS MADE

17P Request for examination filed

Effective date: 20240220

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR

DAV Request for validation of the european patent (deleted)
DAX Request for extension of the european patent (deleted)