WO2023284972A1

WO2023284972A1 - Privacy compliant monitoring of objects

Info

Publication number: WO2023284972A1
Application number: PCT/EP2021/069915
Authority: WO
Inventors: Laurent SMADJA; Nathan PIASCO; Dzmitry Tsishkou
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2021-07-16
Filing date: 2021-07-16
Publication date: 2023-01-19
Anticipated expiration: 2024-01-16
Also published as: CN117751391A; EP4285336A1

Abstract

It is provided a system for monitoring objects wherein data recording complies with data privacy policies by anonymizing sensible personal data comprised in images. The system comprises a LIDAR device configured for obtaining a temporal sequence of 3D point cloud data sets for at least one of the monitored objects wherein the temporal sequence of 3D point cloud data sets comprises a 3D point cloud data set and a camera device configured for capturing a temporal sequence of images of the at least one of the objects wherein the temporal sequence of images comprises a first image. Further, the system comprises a processing unit configured for determining a bounding box based on the first 3D point cloud data set, projecting the determined bounding box on a region of interest comprised in the first image and anonymizing the region of interest of the first image based on the projected bounding box.

Description

Privacy Compliant Monitoring of Objects

TECHNICAL FIELD

The present disclosure relates to the monitoring of objects by means of both a Light Detection and Ranging device and a camera device wherein data recording complies with data privacy policies.

BACKGROUND

Monitoring of objects in public areas is of growing importance, for example, in the context of connected vehicles, particularly, autonomously driving vehicles, and traffic surveillance and control, in general. For example, US 9235988 teaches tracking and characterizing a plurality of vehicles simultaneously in a traffic control environment by means of a 3D optical emitter oriented to allow illumination of a 3D detection zone in the environment and a 3D optical receiver oriented to have a wide and deep field of view within the 3D detection zone. The 3D optical emitter is operated to emit short light pulses toward the detection zone and the 3D optical receiver is operated to receive a reflection/backscatter of the emitted light on the vehicles in the 3D detection zone thereby acquiring an individual digital full-waveform Light Detection and Ranging (LIDAR) trace for each detection channel of the 3D optical receiver. Based on the individual digital full-waveform LIDAR trace and the emitted light waveform the presence of a plurality of vehicles in the 3D detection zone is detected and a position of at least part of each of the vehicles in the 3D detection zone is determined.

Data privacy is a crucial issue in view of a variety of data privacy policies that are in force in most countries. For example, in the case of general surveillance of traffic it must be guaranteed according to the data privacy policies of several countries that no identification of individual vehicles and persons is possible from the recorded data. In the field of autonomous vehicles, lots of data are needed in order to improve the involved algorithms and dedicated systems were designed to collect such data in public areas. Since many algorithms rely on camera data, special care must be taken to process the recorded images such that no personal information could be extracted. In the art, processing of the recorded images for detecting both faces of individuals and license plates of vehicles in order to delete or cover them in the images relies on Deep Learning techniques, implying the application of powerful computer resources, for example, Graphical Processor Unit (GPU) resources and databases that can usually be only provided by dedicated data processing centers. Therefore, that data to be processed for anonymization purposes has to be transmitted from the data collection site to the processing site which results in a high risk of data leakage during the data transmission process.

SUMMARY

In view of the above, it is an objective underlying the present application to provide techniques for monitoring objects that allow for fast and reliable anonymization of recorded sensible personal data, for example, without the need for highly powerful computational resources as GPUs.

The foregoing and other objectives are achieved by the subject matter of the independent claims. Further implementation forms are apparent from the dependent claims, the description and the figures.

According to a first aspect a system for monitoring objects is provided comprising a Light Detection and Ranging, LIDAR, device configured for obtaining a temporal sequence of 3D point cloud data sets for at least one of the objects and a camera device configured for capturing a temporal sequence of images of the at least one of the objects. The temporal sequence of 3D point cloud data sets comprises a first 3D point cloud data set and the temporal sequence of images comprises a first image. The first 3D point cloud data set and the first image correspond to each other in time, i.e. , both are obtained at the same time or almost the same time (due to different recording sampling rates of the LIDAR device and the camera device; see also detailed description below). Further, the system comprises a processing unit configured for determining a bounding box based on the first 3D point cloud data set, projecting the determined bounding box on a region of interest, ROI, comprised in the first image and anonymizing the ROI of the first image based on the projected bounding box. Here and in the following, projecting the bounding box on the ROI may comprise projecting some or all of the LIDAR 3D points comprised in the bounding box on the ROI.

A ROI of an image obtained by a camera is anonymized using data provided by a LIDAR device. The anonymization can be performed in real time. For performing this procedure, no high-power computer resources (as, for example, GPUs) are necessary but rather the processing unit may be or comprise an ordinary CPU, for example. In principle, the procedure of monitoring objects and anonymizing ROIs can be implemented without employing any Deep Learning techniques or training any data sets (though it is not excluded that Deep Learning techniques can be combined with or involved in the procedure according to particular implementations). The system can reliably operate day and night and is robust to harsh weather and light conditions. It is noted that no additional sensor devices are needed and no limitation to particular parts of the image is to be observed during the analysis of the first 3D point cloud data set and the corresponding first image. Further, it is noted that the term processing unit is to be understood more in a logical sense than a physical one, i.e., the processing unit may comprise physically distributed components of any kind of hardware architecture. The system may be part of some crowdsource system that provides recorded data to the cloud. Advantageously, the recorded data can be anonymized before transmission to the cloud or any other data processing site.

The monitored objects may be moving or moveable objects. The process of anonymizing the ROI may be performed for a moving object or an object at rest. Examples for the objects include persons and vehicles (for example, trucks, automobiles or motor bikes). Examples for the ROI, accordingly, include faces of persons and license plates of vehicles. The ROI may contain any other personal data that is to be anonymized.

More than one ROI may be present in the first image. Thus, according to an implementation, the processing unit may be configured for determining a first bounding box and a second bounding box different from the first bounding box based on the first 3D point cloud data set and projecting the determined first bounding box on a first ROI comprised in the first image and the determined second bounding box on a second ROI different from the first ROI comprised in the first image. Thus, during one processing step a number of bounding boxes corresponding to a number of ROIs can be determined and projected on the respective ROIs. Two or more of the bounding boxes may overlap each other. In the context of traffic surveillance the first ROI may represent a license plate of a vehicle and the second ROI may represent a passenger or a face of a passenger of the vehicle.

According to an implementation, the system for monitoring objects may be configured to be installed on a static platform (for example, a building) or on a moving/moveable platform (for example, a vehicle). Since the computational load involved in the process of anonymizing the ROI is relatively low, it can be implemented in a moving platform, i.e., the above-described system may represent an embedded system with limited computational resources. Other constraints required by embedded systems as, for example, limited cooling facilities, can be readily satisfied by the system. Thus, real time anonymization of image data (particularly, before transmission of the same to some data center) is possible for a moving/moveable system for monitoring objects.

Different techniques for anonymizing the ROI may be employed. According to an implementation, anonymizing may be achieved by deleting the ROI or colorizing the ROI or the projected bounding box. The colorizing may be performed by overwriting all pixels of the ROI with a single constant color value. Alternatively, anonymizing may be achieved by image inpainting the ROI or the projected bounding box or blurring the ROI. By these techniques the ROI can be anonymized reliably with low processing loads. Another option is to control the camera device to not acquire pixels of the ROI in the first place (which implies that the location of the ROI of the image has to be estimated from the location of the determined bounding box). According to this option, there is no need for post-processing of the image.

The anonymization of the ROI requires the determination of the bounding box based on the first 3D point cloud data set. According to an implementation, the bounding box is determined by selecting 3D points of the first 3D point cloud data set, clustering at least some of the selected 3D points to obtain at least one 3D points cluster and computing for one or more of the obtained 3D points clusters a convex hull (defining a convex bounding volume), respectively. The bounding box may be or comprise the convex hull. Such a cluster-based approach may be advantageous with respect to reliably and speedily identifying features of interest in the 3D point cloud data set under consideration.

According to an implementation, selection of the 3D points may be based on intensity values. For example, only 3D points having intensity values above some pre-determined threshold may be selected. Thereby, the subsequently performed clustering process may be accelerated, since not all of the 3D points of the first 3D point cloud data set have to be taken into account. The selection of the 3D points by intensity-thresholding may be particularly advantageous when the ROI represents a license plate of a vehicle, since license plates are usually characterized by highly reflective coatings.

According to an implementation, after clustering geometric filtering means may be used in order to filter out (discard) one or more of the obtained clusters that are of no relevance with respect to the ROI, i.e., they do not correspond to the ROI that is to be anonymized. Thus, according to an implementation, the processing unit may be further configured for determining the bounding box by filtering out at least one of the obtained 3D points clusters by geometric filtering means when more than one 3D points cluster are obtained by the clustering. As a result of the filtering process one or more relevant bounding boxes corresponding to one or more ROIs can be achieved. This does not exclude that in some cases and depending on the filter criteria no 3D points clusters are filtered out at all.

According to an implementation, the geometric filtering means may be configured for filtering out the at least one of the obtained 3D points clusters when the number of 3D points of the at least one of the obtained 3D points clusters is less than a pre-determined threshold. Thereby, spurious detections can be filtered out. Alternatively or additionally, according to an implementation, a 3D points cluster may be filtered out when a geometric dimension of the at least one of the obtained 3D points clusters exceeds a pre-determined threshold or does not exceed another (maybe the same) pre-determined threshold and/or an aspect ratio of two geometric dimensions of the at least one of the obtained 3D points clusters exceeds a pre determined threshold or does not exceed another (maybe the same) pre-determined threshold. Knowledge on dimensions of features represented by the ROI may, thereby, be taken into account when determining or more relevant bounding boxes.

Alternatively or additionally, according to an implementation, the filtering means may be configured to determine a point cloud dimensionality of the at least one of the obtained 3D points clusters and may filter out a 3D points clusters when the determined point cloud dimensionality exceeds or does not exceed a pre-determined threshold. The point cloud dimensionality corresponds to the number of non-zero (within numerical accuracy) eigenvalues of the stacked 3D coordinates. When all points of a cluster are aligned, the dimensionality of the point cloud is 1, when all points of a cluster are coplanar to each other, the dimensionality of the point cloud is 2, and else the dimensionality of the point cloud is 3. If one is interested in license plates detection, for example, clusters exhibiting a point cloud dimensionality of 1 or 3 can be discarded, since license plates are two-dimensional.

When the monitored objects are or comprise persons the selection of the 3D points may be based on some particular filtering template that is suitable for detecting faces or bodies of persons. According to an implementation, only such 3D points are selected for the subsequently performed clustering process that match with a 3D filtering template obtained based on an Omega shape template (see also detailed description below). In the case of anonymizing an ROI representing a person or a face of a person, the determination of the bounding box based on the clustering process may be improved in terms of reliability and time needed when such an Omega shape template that is designed for discriminating between persons and environments/backgrounds is employed.

In principle, the processing unit may be configured to determine a plurality of bounding boxes, for example, for a plurality of 3D points clusters. One or more of the determined bounding boxes may be of no relevance with respect to the ROI(s) comprised in the first image. Thus, according to an implementation, the processing unit may be configured to determine the bounding box by determining a plurality of bounding boxes and filtering out at least one bounding box of the determined plurality of bounding boxes according to at least one of a plurality of geometric criteria. One or more bounding boxes are maintained after the filtering process and projected on one or more ROIs. The geometric criteria can be chosen such that only bounding boxes are maintained that correspond to ROIs that have to be anonymized. In particular, the geometric criteria can be applied to bounding boxes projected on the first image. According to an implementation the geometric criteria comprise when a geometric dimension of the at least one bounding box exceeds a pre-determined threshold or does not exceed another (maybe the same) pre-determined threshold the least one bounding box may be filtered out. Another example for the geometric criteria refers to the aspect ratio of two geometric dimensions of the at least bounding box. It might be implemented that when this aspect ratio exceeds a pre-determined threshold or does not exceed another (maybe the same) pre-determined threshold the least one bounding box is filtered out. Moreover, the location of a determined bounding box, in particular, when projected on the image may decide on filtering it out or not.

In general, image analysis can be performed in order to more reliably anonymize the ROI. According to an implementation, the processing unit is further configured for determining the bounding box by determining at least one candidate bounding box based on the first 3D point cloud data set and verifying at least one of the determined candidate bounding boxes based on an image analysis of the first image. It is noted that non-verified candidate bounding boxes may be projected on the image but they are not projected on a ROI that is to be anonymized. Verification may include some scoring of individual candidates and determining the candidate with the highest score as the verified one. Such a verification process can be rapidly performed and may provide very reliable results. Even though image analysis increases the overall computational load it might be performed in order to guarantee correct anonymization in the case of sensitive personal data.

The image analysis, according to an implementation, comprises at least one of comparing one or more dimensions of the projected bounding box with one or more dimensions of one or more picture elements shown in the first image, determining the location of the projected bounding box with respect to one or more picture elements shown in the first image, recognizing letters or numbers shown in the first image and performing or face detection. Location and dimensions of the projected bounding box should be similar to a picture element that is to be anonymized (i.e. a ROI). In the context of anonymization of license plates, a projected bounding box should cover a region of the image where letters and numbers are present in order to anonymize the ROI. Detection or even recognition of a face shown in the image may efficiently help to verify a candidate bounding box.

Objects that are moving can be tracked by the system for monitoring objects. The temporal sequence of 3D point cloud data sets may comprise one or more second 3D point cloud data sets obtained after the first 3D point cloud data set and the temporal sequence of images may, accordingly, comprise one or more second images captured after the first image. In this case, the processing unit, according to an implementation, may be further configured for determining additional bounding boxes based on the one or more second 3D point cloud data sets corresponding to the bounding box determined based on the first 3D point cloud data set, projecting the determined additional bounding boxes on respective ROIs comprised in the one or more second images and corresponding to the ROI of the first image and anonymizing the respective ROIs of the one or more second images. The anonymizing process may, thus, be performed for a temporal sequence of images captured by the camera device. The above- described examples of determining the bounding box can also be applied to the determination of the additional bounding boxes.

Since relatively high frame rates may be available for both the LIDAR and the camera device (for example, 20 to 30 Hz), identification of ROIs and determination of bounding boxes to be projected on the ROIs can be facilitated by comparing locations of ROIs in different subsequently captured images with each other or comparing locations of the projected additional bounding boxes with the location of the bounding box projected on the ROI of the first image with each other. Thus, the processing unit may be further configured for at least one of comparing the locations of the projected additional bounding boxes with the location of the bounding box projected on the ROI comprised in the first image and verifying the additional bounding boxes based on the comparison and comparing the locations of the ROIs comprised in the one or more second images with the location of the ROI comprised in the first image and verifying the ROIs (i.e. , confirming the identification of the ROIs) comprised in the one or more second images based on the comparison. By verifying the additional bounding boxes it is meant that the determination of the additional bounding boxes is confirmed, i.e., they are used for the actual anonymization process. For example, they are chosen from additional candidate bounding boxes when the locations of their projections on the one or more second images correspond to the location of the projection of the bounding box on the first image.

The system for monitoring objects as described above may be, particularly, suitable for monitoring vehicles. In this context, according to a second aspect, it is also provided a system for vehicle surveillance comprising a Light Detection and Ranging, LIDAR, device configured for obtaining a temporal sequence of 3D point cloud data sets for at least one vehicle, wherein the temporal sequence of 3D point cloud data sets comprises a first 3D point cloud data set and a camera device configured for capturing a temporal sequence of images of the at least one vehicle, wherein the temporal sequence of images comprises a first image. Further, the system for vehicle surveillance comprises a processing unit configured for determining a first bounding box based on the first 3D point cloud data set, determining a second bounding box based on the first 3D point cloud data set, projecting the determined first bounding box on a first region of interest, ROI comprised in the first image, wherein the first ROI represents a license plate of the vehicle and projecting the determined second bounding box on a second ROI comprised in the first image, wherein the second ROI represents a passenger or a face of a passenger of the vehicle. The processing unit is further configured for anonymizing the first and second ROIs based on the projected first and second bounding boxes, respectively, after completion of the projection procedures. The above-described implementations can also be realized with the same advantages in this system for vehicle surveillance. Surveillance may comprise at least one of detection of the objects, communication with the objects, logging recorded data of the objects and transmitting recorded and processed data of the objects.

The above-mentioned object is also addressed by providing a method of monitoring moving or non-moving objects. Thus, according to a third aspect, it is provided a method of monitoring moving or non-moving objects (for example, persons or vehicles), comprising the steps of: obtaining a temporal sequence of 3D point cloud data sets for at least one of the objects by a Light Detection and Ranging, LIDAR, wherein the temporal sequence of 3D point cloud data sets comprises a first 3D point cloud data set, capturing a temporal sequence of images of the at least one of the objects by a camera device wherein the temporal sequence of images comprises a first image and performing by a processing unit: determining a bounding box based on the first 3D point cloud data set, projecting the determined bounding box on a region of interest, ROI comprised in the first image and anonymizing the ROI of the first image based on the projected bounding box.

According to an implementation, the anonymizing of the ROI comprises one of deleting the ROI, colorizing the ROI or the projected bounding box, image inpainting the ROI or the projected bounding box, blurring the ROI and controlling the camera device to not acquire pixels of the ROI.

According to an implementation, the ROI contains personal data that may represent a face of a person or a license plate of a vehicle.

According to an implementation, both the LIDAR device and the camera device are installed on a static platform or a moving platform, for example, a vehicle.

According to an implementation, the determining of the bounding box comprises selecting 3D points of the first 3D point cloud data set, clustering at least some of the selected 3D points to obtain at least one 3D points cluster, and computing for one or more of the obtained 3D points clusters a convex hull, respectively.

According to an implementation, the selected 3D points only have intensity values above some pre-determined threshold. According to an implementation, the determining of the bounding box comprises filtering out at least one of the obtained 3D points clusters by geometric filtering means when more than one 3D points cluster are obtained by the clustering.

According to an implementation, the obtained at least one 3D points clusters may be filtered out based on at least one of the following criteria: the number of 3D points of the at least one of the obtained 3D points clusters is less than a pre-determined threshold, a geometric dimension of the at least one of the obtained 3D points clusters exceeds a pre-determined threshold or does not exceed another pre-determined threshold, an aspect ratio of two geometric dimensions of the at least one of the obtained 3D points clusters exceeds a pre determined threshold or does not exceed another pre-determined threshold and a point cloud dimensionality of the at least one of the obtained 3D points clusters exceeds or does not exceed a pre-determined threshold.

According to an implementation, when the objects are or comprise persons the selected 3D points only match with a 3D filtering template obtained based on an Omega shape template.

According to an implementation, the process of determining the bounding box comprises determining a plurality of bounding boxes and filtering out at least one bounding box of the determined plurality of bounding boxes according to at least one of a plurality of geometric criteria.

According to an implementation, the geometric criteria comprise at least one of: a geometric dimension of the at least one bounding box exceeds a pre-determined threshold or does not exceed another pre-determined threshold, an aspect ratio of two geometric dimensions of the at least one bounding box exceeds a pre-determined threshold or does not exceed another pre-determined threshold and a location of the at least one bounding box.

According to another implementation of the method for monitoring objects the process of determining the bounding box comprises determining at least one candidate bounding box based on the first 3D point cloud data set and verifying at least one of the determined candidate bounding boxes based on an image analysis of the first image.

According to an implementation, the image analysis comprises at least one of comparing one or more dimensions of the projected bounding box with one or more dimensions of one or more picture elements shown in the first image, determining the location of the projected bounding box with respect to one or more picture elements shown in the first image, recognizing letters or numbers shown in the first image and performing face detection. According to an implementation, the method comprises determining a first bounding box and a second bounding box different from the first bounding box based on the first 3D point cloud data set and projecting the determined first bounding box on a first ROI comprised in the first image and the determined second bounding box on a second ROI different from the first ROI comprised in the first image.

According to an implementation, the temporal sequence of 3D point cloud data sets comprises one or more second 3D point cloud data sets obtained after the first 3D point cloud data set and the temporal sequence of images comprises one or more second images captured after the first image and the method comprises determining additional bounding boxes based on the one or more second 3D point cloud data sets corresponding to the bounding box determined based on the first 3D point cloud data set, projecting the determined additional bounding boxes on respective ROIs comprised in the one or more second images and corresponding to the ROI of the first image and anonymizing the respective ROIs of the one or more second images.

In this case, according to an implementation, the method further comprises at least one of comparing the locations of the projected additional bounding boxes with the location of the bounding box projected on the ROI comprised in the first image and verifying the additional bounding boxes based on the comparison and comparing the locations of the ROIs comprised in the one or more second images with the location of the ROI comprised in the first image and verifying the ROIs comprised in the one or more second images based on the comparison.

Furthermore, according to a fourth aspect, it is provided a method of surveilling vehicles, comprising obtaining a temporal sequence of 3D point cloud data sets for at least one vehicle by a Light Detection and Ranging, LIDAR, device, wherein the temporal sequence of 3D point cloud data sets comprises a first 3D point cloud data set, capturing a temporal sequence of images of the at least one vehicle by a camera device, wherein the temporal sequence of images comprises a first image and performing by a processing unit: determining a first bounding box based on the first 3D point cloud data set, determining a second bounding box based on the first 3D point cloud data set, projecting the determined first bounding box on a first region of interest, ROI comprised in the first image, wherein the first ROI represents a license plate of the vehicle, projecting the determined second bounding box on a second ROI comprised in the first image, wherein the second ROI represents a passenger or a face of a passenger of the vehicle and anonymizing the first and second ROIs based on the projected first and second bounding boxes, respectively.

All above-described implementations of the method provide the same advantages as the above-described implementations of the device and may be implemented in the above- described implementations of the device and the above-described implementations of the device may be configured to perform the above-described implementations of the method.

Furthermore, it is provided a computer program product comprising computer readable instructions for, when run on a computer, executing control of the LIDAR device and camera device and processing unit described above in order to perform the steps of the above- described implementations of the method.

Details of one or more embodiments are set forth in the accompanying drawings and the description below. Other features, objects, and advantages will be apparent from the description, drawings, and claims.

BRIEF DESCRIPTION OF THE DRAWINGS

In the following, embodiments of the present disclosure are described in more detail with reference to the attached figures and drawings, in which:

Figure 1 illustrates a system for monitoring objects according to an embodiment.

Figure 2 illustrates a method of monitoring objects according to an embodiment.

Figure 3 illustrates a system for monitoring objects installed in an automobile according to an embodiment.

Figure 4 illustrates projection of a processed LIDAR frame on a corresponding image.

Figure 5 illustrates different stages of anonymizing a license plate of a vehicle captured in an image.

Figure 6 illustrates different stages of anonymizing faces of persons captured in an image.

Figure 7 illustrates anonymization techniques suitable for anonymizing a license plate of a vehicle.

Figure 8 illustrates a process flow for monitoring objects according to an embodiment.

DETAILED DESCRIPTION OF THE EMBODIMENTS

A system and a method for monitoring objects (for example, persons or vehicles) in compliance with privacy policies are provided herein. Anonymization of regions of images representing personal data can be achieved, particularly, in real time, in embedded or non-embedded systems, without the need for powerful computational resources (as, for example, GPUs). An embodiment of a system 10 for monitoring objects is illustrated in Figure 1. The monitoring is performed by means of a LIDAR device 11 and a camera device 12 comprised in the system 10. Intrinsic parameters of the camera device 12, for example, the focal length, the principal point and recording frame rate, are known and kept fixed during the monitoring process. The LIDAR device 11 may comprise an optical emitter, a scanning means (for example, comprising a rotatable mirror) and an optical receiver for detecting reflected laser light and it may also comprise an active sensor operating as an illumination source. It is noted that the horizontal and vertical resolution of the LIDAR device 11 impacts the resolution of 3D point clouds obtained by the LIDAR device 11 and, therefore, has to be readily be adjusted to practical applications.

The LIDAR device 11 provides 3D point cloud data sets comprising 3D points with intensity values and the camera device 12 provides corresponding images of objects comprising pixels with color or gray scale values. The spatial arrangement of the LIDAR device 11 and the camera device 12 with respect to each other is fixed in order to have temporally corresponding frames provided by both devices to be processed for anonymization.

One or more ROIs are comprised in each of the images. Each ROI represents a region that has to be anonymized in accordance with particular data privacy policies. Each ROI may contain sensible private data and, for example, may represent a face of a person or a license plate of a vehicle. Recorded image data may be anonymized or anonymization may be performed during the recording process.

The process of anonymization of the ROI(s) of the images is based on the 3D point cloud data provided by the LIDAR device 11 and it is carried out by a processing unit 13 comprised in the system 10 and being in data communication with the LIDAR device 11 and the camera device 12. The processing unit 13 may comprise some conventional CPU and it may comprise physically distributed components some of which may be comprised in the LIDAR device 11 and/or the camera device 12. Particularly, the system 10 illustrated in Figure 1 may not comprise any GPU. Moreover, the system 10 and/or the processing unit 13 may comprise some storage means for temporarily storing data provided by the LIDAR device 11 and the camera device 12.

According to an embodiment, the LIDAR device 11 is configured for obtaining a temporal sequence of 3D point cloud data sets for at least one of the monitored objects, wherein the temporal sequence of 3D point cloud data sets comprises a first 3D point cloud data set. According to this embodiment, the camera device 12 is configured for capturing a temporal sequence of images of the at least one of the objects, wherein the temporal sequence of images comprises a first image. The first 3D point cloud data set and the first image correspond to each other in time, i.e., both are obtained at the same time or almost the same time. According to this embodiment, the processing unit 13 is configured for determining a bounding box based on the first 3D point cloud data set, projecting the determined bounding box on a region of interest, ROI, comprised in the first image and anonymizing the ROI of the first image based on the projected bounding box. Anonymization of ROIs of a number of second images captured by the camera device 12 after that first image has been captured using second 3D point cloud data sets obtained by the LIDAR device 13 after the first 3D point cloud data set can be performed accordingly. Based on the known and fixed spatial relationship between the LIDAR device 11 and the camera device 12 the determined bounding box can be readily projected on the first image using matrix multiplication that can be efficiently be carried out by the processing unit 13 without any requirement for high-power computational capabilities.

An embodiment of a method of monitoring objects (for example, persons or vehicles) is illustrated in Figure 2. The method according to this embodiment comprises the step of obtaining 21 a temporal sequence of 3D point cloud data sets for at least one of the monitored objects by a LIDAR device and capturing a temporal sequence of images of the at least one of the objects by a camera device. The temporal sequence of 3D point cloud data sets comprises a first 3D point cloud data set. , The temporal sequence of images comprises a first image. Moreover, the method according to this embodiment comprises performing by a processing unit: determining 22 a bounding box based on the first 3D point cloud data set, projecting 23 the determined bounding box on a region of interest, ROI comprised in the first image and anonymizing 24 the ROI of the first image based on the projected bounding box. The process can be repeated for a number of second images captured after the first image using second 3D point cloud data sets obtained by the LIDAR device 11 subsequently to the first 3D point cloud data set.

In the above-described embodiments, the bounding box may be determined by selecting 3D points of the first 3D point cloud data set, clustering at least some of the selected 3D points to obtain at least one 3D points cluster and computing for one or more of the obtained 3D points clusters a convex hull (defining a convex bounding volume), respectively. A cluster-based approach allows for reliably and speedily identifying features of interest in the 3D point cloud data set. The bounding box may be or comprise the convex hull. The clustering algorithm may be based on distances between the points of the first 3D point cloud data set. Since the LIDAR device usually comprises a radial sensor providing data for a constant angle resolution, it might not be appropriate to pre-determine a distance threshold and, therefore, a parameter free clustering algorithm may be employed. The system 10 shown in Figure 1 may be installed on a static platform (for example, a building) or moving/movable platform. For example, it may be installed in a vehicle. Figure 3 shows a moving/movable system 30 comprising an automobile 31. A LIDAR device 32 and a camera 33 are installed in the automobile 31. The system further comprises a processing unit (not shown in Figure 3) for data processing, for example, for performing the steps 22 to 24 of the method illustrated in Figure 2. The system 30 shown in Figure 3 can be used in connected cars and autonomous driving applications and it may wirelessly (for example, via Internet, 5G networks, etc.) communicate with similar systems installed in other vehicles as well as with some data/processing center. Examples for objects to be monitored by the system 30 include vehicles.

Figure 3 also exemplarily illustrates the above-mentioned spatial correlation between the position of the LIDAR device 32 and the position of the camera device 33 (i.e., the external calibration/registration of these devices with respect to each other) in the system 30. A rigid spatial transformation (with rotational component R and translational component t) between the coordinate systems centered on the LIDAR device 32 and the camera device 33, respectively, is known beforehand for the further processing of the data provided by both devices and remains constant during the monitoring process. Transformation of the coordinates from one of the coordinate systems to the other can be achieved by matrix multiplication as illustrated by the matrix shown in Figure 3.

Using, for example, a calibrated pinhole camera model that is commonly used in computer vision, pixel coordinates (u, v) of the projection of a 3d point, with its 3D coordinates expressed in its own coordinate system, are obtained by multiplying the 3D coordinates of this point expressed in camera coordinate system (lower index K) by the so called camera intrinsic matrix K (where f_x, f_y correspond to the focal length of the camera in pixel units, wherein f_x = f_y holds for square pixels cameras, and uo, vo denote the projection of the optical center of the camera on the image plane):

where the lower index L denotes the LIDAR coordinate system and R and t denote the rotation and translation from the LIDAR coordinate system to the camera coordinate system. Figure 4 illustrates the effect of projecting a processed LIDAR frame on an image according to such a transformation. A recorded image (upper pictures in Figure 4) and a temporally corresponding LIDAR frame (lower pictures in Figure 4) are provided and after processing of the LIDAR frame (for example, by determining bounding boxes that are considered relevant with respect to anonymization of personal data as described above) combined (see + sign in Figure 4) in order to obtain (see arrow in Figure 4) an image with LIDAR data projection (for example, 3D points within one or more bounding boxes).

The system 30 shown in Figure 3 allows for real time anonymization of one or more ROIs comprised in image data when the LIDAR device 32 and the camera device 33 have comparable recording frame rates, for example, frame rates differing from each other by less than 70 % or less than 60 %. Consider, for example, a recording frame rate of the LIDAR device 32 of 20 Hz and a recording frame rate of the camera device 33 of 30 Hz. For a relative velocity between the vehicle 31 and an object to be monitored of 25 m/s the difference in frame rates results in a maximum displacement of temporally neighbored recording frames of the LIDAR device 32 and the camera device 33 of 20 cm, leading to small pixel offsets (depending on the distance of the object to the camera and the focal length). Such pixel offsets can be accepted in terms of matching a bounding box to a corresponding ROI comprised in the image.

According to particular examples, in images captured by the camera device 11 of the system 10 illustrated in Figure 1 or the camera device 33 of the system 30 illustrated in Figure 3 license plates of vehicles and passengers or faces of passengers of the vehicles represent the ROIs that are to be anonymized based on the LIDAR data. Particularly, according to an embodiment, the system 10 illustrated in Figure 1 or the system 30 illustrated in Figure 3 can be a system for vehicle surveillance. The LIDAR device 11, 32 of this system for vehicle surveillance is configured for obtaining a temporal sequence of 3D point cloud data sets for at least one vehicle, wherein the temporal sequence of 3D point cloud data sets comprises a first 3D point cloud data set and the camera device 12, 33 is configured for capturing a temporal sequence of images of the at least one vehicle, wherein the temporal sequence of images comprises a first image. The processing unit 13 is configured for determining a first bounding box based on the first 3D point cloud data set, determining a second bounding box based on the first 3D point cloud data set, projecting the determined first bounding box on a first region of interest, ROI comprised in the first image, wherein the first ROI represents a license plate of the vehicle, projecting the determined second bounding box on a second ROI comprised in the first image, wherein the second ROI represents a passenger or a face of a passenger of the vehicle. The processing unit 13 is, furthermore, configured for anonymizing the first and second ROIs based on the projected first and second bounding boxes, respectively. Figure 5 illustrates some basic stages of the process of anonymizing a license plate of a vehicle that is present in an image captured by a camera device, for example, the camera device 11 of the system 10 illustrated in Figure 1 or the camera device 33 of the system 30 illustrated in Figure 3 in accordance with an embodiment of the present invention. A license plate of a vehicle is usually coated with some highly reflective (prismatic) material. LIDAR points obtained from a LIDAR beam reflected by such a material show high intensity values. As shown under 1) in Figure 5 LIDAR point cloud data is provided wherein intensity values (reflectivity) are encoded in grey scale, for example, and as shown under 2), according to this embodiment, only points having intensities above a pre-determined threshold (for example, 60 % to 80 % of the maximum value present the data set) are retained whereas other points are discarded from the point cloud. The retained points are clustered 3) according to well-known clustering criteria and bounding boxes are determined 4) for the resulting clusters. Bounding boxes or clusters of point clouds that are determined to be irrelevant with respect to ROIs to be anonymized in an image frame that temporally corresponds to the LIDAR frame under consideration are filtered out 5). Possible filtering processes and criteria for the filtering processes are described below. The remaining relevant bounding boxes are projected 5) on the corresponding ROIs comprised in the corresponding image and can be used for the anonymization of the ROIs 6).

Figure 6 illustrates some basic stages of the process of anonymizing persons or faces of persons or persons captured in an image by camera device, for example, the camera device 11 of the system 10 illustrated in Figure 1 or the camera device 33 of the system 30 illustrated in Figure 3 in accordance with an embodiment of the present invention. A LIDAR frame comprising 3D LIDAR points having intensity values corresponding to the reflectivity properties of the scanned objects is provided 1). The LIDAR frame results from a scanning of an environment with persons being present. A 3D filtering based on an Omega shape template is applied 2) to the 3D point cloud of the LIDAR frame in order to identify regions of the point cloud that correspond to the persons in the environment. The Omega shape template used for the 3D filtering is a 3D deformable template that represents a 3D extension of a 2D Omega shape template used in the art for 2D detection of persons (cf. Mukherjee, Subra & Das, Karen, “Omega Model for Human Detection and Counting for application in Smart Surveillance System", International Journal of Advanced Computer Science and Applications, vol 4. No. 2, 2013, pages 167-172).

All other points are filtered out and the retained points are clustered 3). Based on the clustering bounding boxes can be determined that subsequently are used for anonymizing the persons that are captured in an image corresponding to the LIDAR frame. It is noted that additional image processing techniques for face detection (for example, by some cascaded filters used for detecting Haar-like features) may be used to increase the reliability of the proper anonymization of faces of persons if desired.

In all of the above-described embodiments, irrelevant LIDAR point cloud clusters and/or irrelevant bounding boxes, particularly, after projection on an image, can be filtered out and/or projected bounding boxes may be determined to be relevant or irrelevant (and, therefore, are filtered out) based on some image analysis of the image. Reliability and efficiency of the determination of relevant bounding boxes used for the anonymization can be increased by such filtering processes that can be performed without the need for complex computations, for example, mainly based on geometric criteria.

For example, clusters with too less points as compared to a pre-determined threshold may be filtered out, since they probably represent spurious detections. Clusters with too large or too small dimensions or aspect ratios of dimensions as compared to appropriately set thresholds may be filtered out. For example, in the case of license plates represented by ROIs characteristic heights and widths are well-known and the thresholds can be determined accordingly. Similar criteria may be applied to bounding boxes determined for the point clusters. Further, filter criteria may include the point cloud dimensionality that corresponds to the number of non-zero eigenvalues of the stacked 3D coordinates. When all points of a cluster are aligned, the dimensionality of the point cloud is 1 , when all points of a cluster are coplanar to each other, the dimensionality of the point cloud is 2, and else the dimensionality of the point cloud is 3. If one is interested in license plates detection, for example, clusters exhibiting a point cloud dimensionality of 1 or 3 can be discarded.

Irrelevant bounding boxes projected on the images may also be filtered out according to geometric criteria related to dimensions and aspect ratios. Further, the position of a projected bounding box in the image can be used to decide on retaining or discarding the bounding box. For example, in the context of anonymizing license plates, the ROIs may be expected to be positioned below some horizontal threshold for common camera orientations of vehicle surveillance systems and a bounding box may be discarded as being irrelevant if the projection of that bounding box on the image lies above the horizontal threshold. For example, road signs that also show a high reflectivity similar to license plates can be excluded by this kind of filtering process.

Image analysis may provide information on letters and numbers present in an image. In the context of anonymizing license plates the ROIs should include letters and numbers and, therefore, a bounding box may be discarded if it’s projection on the image lies on a region of the image where no letters and numbers are present. Furthermore, tracking of an object over several subsequently recorded frames may be used in order to confirm a projected bounding box to be relevant for the anonymization of an ROI. If a particular projected bounding box is comparable in location in the image to bounding boxes projected on previously or subsequently recorded images the bounding box may be confirmed to be relevant for the anonymization.

As already stated, the systems and methods disclosed herein do not need any employment of Deep Learning techniques. Nevertheless, a combination with such techniques is not excluded, in principle. For example, a Deep Learning image detector may be applied to an image that outputs contours around objects comprised in an image that are members of a pre-learned class of objects, for example, trucks, cars, motor bikes, etc. Projected bounding boxes may be confirmed, if they lie in such contours.

In all of the above-described embodiments, anonymization of ROIs can be performed by a variety of anonymization techniques. These anonymization techniques can be applied in real time. Examples for these anonymization techniques include the following. The ROI or bounding box may be colored, for example, all pixels of the ROI of an image may be overwritten by some predefined constant color value. Thereby, a fast and reliable hiding of the ROI can be achieved. Another effective way to hide the ROI is Gaussian blurring. A more advanced anonymization technique is inpainting which results in a higher computational load as compared to colorizing or blurring but can still be applied in real time, since only a limited ROI rather than the entire image has to be processed. Further, pre-masking of the image during capturing can be performed by suppressing acquiring pixels by a camera in an ROI as estimated based on the LIDAR data. The thus anonymized images (i.e., images comprising one or more anonymized ROIs) can be stored and transmitted to data centers for further processing.

Figure 7 shows examples of the anonymization of an ROI representing a license plate of a vehicle. The picture in the upper row of Figure 7 shows an image with a (confirmed) ROI. The lower row in Figure 7 shows the effects of different anonymization techniques: Option a) shows the effect of coloring by a white color, option b) shows the effect of blurring, option c) shows the effect of inpainting and option d) shows the effect of pre-masking of the image during capturing by the camera device. In each case, anonymization can be reliably achieved in real time.

Figure 8 illustrates a process flow 80 for monitoring objects carried out by a system comprising a LIDAR device, a camera device and a processing unit according to an embodiment. The process flow 80 may be implemented in the system 10 shown in Figure 1 or the system 30 shown in Figure 3, for example. The process flow 80 comprises a LIDAR data workflow 81, a camera data workflow 82 and a mixed data workflow 83 that is based on both LIDAR data and camera data.

The LIDAR device records a temporal sequence of frames and the camera device records a temporal sequence of images corresponding to the frames recorded by the LIDAR device. In the LIDAR data workflow 81 a last frame comprising a 3D point cloud data set is kept in a memory of the processing unit and in the camera data workflow 82 a last image is kept in a memory of the processing unit. Selected points of the 3D point cloud data set are clustered in the LIDAR data workflow 81. Depending on the actual application LIDAR points having intensity values below some pre-determined threshold may be discarded or only such LIDAR points are selected that match with an Omega shape filter. In the LIDAR data workflow 81 clusters are discarded or maintained according to filtering criteria, in particular, geometric criteria, as described above. Bounding boxes/convex hulls are determined for the retained clusters.

In the mixed data workflow 83 bounding boxes together with the LIDAR points comprised therein are projected on the image held in the memory. One or more of the projected bounding boxes are discarded according to filtering criteria as described above. Optionally, the camera data workflow 82 comprises employment of a Deep Learning image detector that outputs contours around objects comprised in an image that are members of a pre-learned class of objects, for example, trucks, cars, motor bikes, pedestrians, etc. Projected bounding boxes may be confirmed if they lie in such contours and discarded otherwise.

After projection/confirmation of the relevant bounding box(es), ROIs comprised in the image held in the memory are anonymized in the mixed data workflow 83 based on the retained bounding boxes by image processing techniques (cf. description above).

All previously discussed embodiments are not intended as limitations but serve as examples illustrating features and advantages of the invention. It is to be understood that some or all of the above described features can also be combined in different ways.

Claims

1. System (10, 30) for monitoring objects, comprising a Light Detection and Ranging, LIDAR, device (11, 32) configured for obtaining a temporal sequence of 3D point cloud data sets for at least one of the objects, wherein the temporal sequence of 3D point cloud data sets comprises a first 3D point cloud data set; a camera device (12, 33) configured for capturing a temporal sequence of images of the at least one of the objects, wherein the temporal sequence of images comprises a first image; and a processing unit (13) configured for determining a bounding box based on the first 3D point cloud data set; projecting the determined bounding box on a region of interest, ROI, comprised in the first image; and anonymizing the ROI of the first image based on the projected bounding box.

2. The system (10, 30) according to claim 1 , wherein the processing unit (13) is configured for anonymizing the ROI by one of a) deleting the ROI; b) colorizing the ROI or the projected bounding box; c) image inpainting the ROI or the projected bounding box; d) blurring the ROI; and e) controlling the camera device (12, 33) to not acquire pixels of the ROI.

3. The system (10, 30) according to claim 1 or 2, wherein the objects are persons or vehicles.

4. The system (10, 30) according to one of the preceding claims, wherein the ROI contains personal data, in particular, representing a face of a person or a license plate of a vehicle.

5. The system (10, 30) according to one of the preceding claims, wherein the system (10, 30) is installed on a static or moving platform.

6. The system (10, 30) according to one of the preceding claims, wherein the processing unit (13) is further configured for determining the bounding box by a) selecting 3D points of the first 3D point cloud data set; b) clustering at least some of the selected 3D points to obtain at least one 3D points cluster; and c) computing for one or more of the obtained 3D points clusters a convex hull, respectively.

7. The system (10, 30) according to claim 6, wherein the selected 3D points only have intensity values above some pre-determined threshold.

8. The system (10, 30) according to claims 6 and 7, wherein the processing unit (13) is further configured for determining the bounding box by filtering out at least one of the obtained 3D points clusters by geometric filtering means when more than one 3D points cluster are obtained by the clustering.

9. The system (10, 30) according to claim 8, wherein the geometric filtering means are configured for filtering out the at least one of the obtained 3D points clusters based on at least one of the following criteria: a) the number of 3D points of the at least one of the obtained 3D points clusters is less than a pre-determined threshold; b) a geometric dimension of the at least one of the obtained 3D points clusters exceeds a pre-determined threshold or does not exceed another pre-determined threshold; c) an aspect ratio of two geometric dimensions of the at least one of the obtained 3D points clusters exceeds a pre-determined threshold or does not exceed another pre determined threshold; and d) a point cloud dimensionality of the at least one of the obtained 3D points clusters exceeds or does not exceed a pre-determined threshold.

10. The system (10, 30) according to claims 6 to 9, wherein the objects are or comprise persons and the selected 3D points only match with a 3D filtering template obtained based on an Omega shape template.

11. The system (10, 30) according to one of the preceding claims, wherein the processing unit (13) is further configured for determining the bounding box by determining a plurality of bounding boxes and filtering out at least one bounding box of the determined plurality of bounding boxes according to at least one of a plurality of geometric criteria.

12. The system (10, 30) according to claim 11, wherein the geometric criteria comprise at least one of: a) a geometric dimension of the at least one bounding box exceeds a pre-determined threshold or does not exceed another pre-determined threshold; b) an aspect ratio of two geometric dimensions of the at least bounding box exceeds a pre-determined threshold or does not exceed another pre-determined threshold; and c) a location of the at least one bounding box.

13. The system (10, 30) according to one of the preceding claims, wherein the processing unit (13) is further configured for determining the bounding box by determining at least one candidate bounding box based on the first 3D point cloud data set and verifying at least one of the determined candidate bounding boxes based on an image analysis of the first image.

14. The system according to claim 13, wherein the image analysis comprises at least one of a) comparing one or more dimensions of the projected bounding box with one or more dimensions of one or more picture elements shown in the first image; b) determining the location of the projected bounding box with respect to one or more picture elements shown in the first image; c) recognizing letters or numbers shown in the first image; and d) performing face detection.

15. The system (10, 30) according to one of the preceding claims, wherein the processing unit (13) is configured for determining a first bounding box and a second bounding box different from the first bounding box based on the first 3D point cloud data set and projecting the determined first bounding box on a first ROI comprised in the first image and the determined second bounding box on a second ROI different from the first ROI comprised in the first image.

16. The system (10, 30) according to one of the preceding claims, wherein the temporal sequence of 3D point cloud data sets comprises one or more second 3D point cloud data sets obtained after the first 3D point cloud data set and the temporal sequence of images comprises one or more second images captured after the first image and wherein the processing unit (13) is further configured for determining additional bounding boxes based on the one or more second 3D point cloud data sets corresponding to the bounding box determined based on the first 3D point cloud data set; projecting the determined additional bounding boxes on respective ROIs comprised in the one or more second images and corresponding to the ROI of the first image; and anonymizing the respective ROIs of the one or more second images.

17. The system (10, 30) according to claim 16, wherein the processing unit (13) is further configured for at least one of comparing the locations of the projected additional bounding boxes with the location of the bounding box projected on the ROI comprised in the first image and verifying the additional bounding boxes based on the comparison; and comparing the locations of the ROIs comprised in the one or more second images with the location of the ROI comprised in the first image and verifying the ROIs comprised in the one or more second images based on the comparison.

18. System (10, 30) for vehicle surveillance, comprising a Light Detection and Ranging, LIDAR, device (11, 32) configured for obtaining a temporal sequence of 3D point cloud data sets for at least one vehicle, wherein the temporal sequence of 3D point cloud data sets comprises a first 3D point cloud data set; a camera device (12, 33) configured for capturing a temporal sequence of images of the at least one vehicle, wherein the temporal sequence of images comprises a first image; and a processing unit (13) configured for determining a first bounding box based on the first 3D point cloud data set; determining a second bounding box based on the first 3D point cloud data set; projecting the determined first bounding box on a first region of interest, ROI comprised in the first image, wherein the first ROI represents a license plate of the vehicle; projecting the determined second bounding box on a second ROI comprised in the first image, wherein the second ROI represents a passenger or a face of a passenger of the vehicle; and anonymizing the first and second ROIs based on the projected first and second bounding boxes, respectively.

19. Method of monitoring objects, comprising the steps of: obtaining (21) a temporal sequence of 3D point cloud data sets for at least one of the objects by a Light Detection and Ranging, LIDAR, device (11 , 32), wherein the temporal sequence of 3D point cloud data sets comprises a first 3D point cloud data set; capturing (21) a temporal sequence of images of the at least one of the objects by a camera device (12, 33), wherein the temporal sequence of images comprises a first image; and performing by a processing unit (13) determining (22) a bounding box based on the first 3D point cloud data set; projecting (23) the determined bounding box on a region of interest, ROI comprised in the first image; and anonymizing (24) the ROI of the first image based on the projected bounding box.

20. The method according to claim 19, wherein the anonymizing (24) of the ROI comprises at least one of a) deleting the ROI; b) colorizing the ROI or the projected bounding box; c) image inpainting the ROI or the projected bounding box; d) blurring the ROI; and e) controlling the camera device (12, 33) to not acquire pixels of the ROI.

21. The method according to claim 19 or 20, wherein the objects are persons or vehicles.

22. The method according to one of the claims 19 to 21, wherein the ROI contains personal data, in particular, representing a face of a person or a license plate of a vehicle.

23. The method according to one of the claims 19 to 22, wherein both the LIDAR device (11 , 32) and the camera device (12, 33) are installed on a static or moving platform.

24. The method according to one of the claims 19 to 23, wherein the determining (22) of the bounding box comprises a) selecting 3D points of the first 3D point cloud data set; b) clustering at least some of the selected 3D points to obtain at least one 3D points cluster; and c) computing for one or more of the obtained 3D points clusters a convex hull, respectively.

25. The method according to claim 24, wherein the selected 3D points only have intensity values above some pre-determined threshold.

26. The method according to claim 24 and 25, wherein the determining (22) of the bounding box comprises filtering out at least one of the obtained 3D points clusters by geometric filtering means when more than one 3D points cluster are obtained by the clustering.

27. The method according to claim 26, wherein the at least one of the obtained 3D points clusters is filtered out based on at least one of the following criteria: a) the number of 3D points of the at least one of the obtained 3D points clusters is less than a pre-determined threshold; b) a geometric dimension of the at least one of the obtained 3D points clusters exceeds a pre-determined threshold or does not exceed another pre-determined threshold; c) an aspect ratio of two geometric dimensions of the at least one of the obtained 3D points clusters exceeds a pre-determined threshold or does not exceed another pre determined threshold; and d) a point cloud dimensionality of the at least one of the obtained 3D points clusters exceeds or does not exceed a pre-determined threshold.

28. The method according to claims 24 to 27, wherein the objects are or comprise persons and the selected 3D points only match with a 3D filtering template obtained based on an Omega shape template.

29. The method according to one of the claims 19 to 28, wherein the determining (22) of the bounding box comprises determining a plurality of bounding boxes and filtering out at least one bounding box of the determined plurality of bounding boxes according to at least one of a plurality of geometric criteria.

30. The method according to claim 29, wherein the geometric criteria comprise at least one of: a) a geometric dimension of the at least one bounding box exceeds a pre-determined threshold or does not exceed another pre-determined threshold; b) an aspect ratio of two geometric dimensions of the at least one bounding box exceeds a pre-determined threshold or does not exceed another pre-determined threshold; and c) a location of the at least one bounding box.

31. The method according to one of the claims 19 to 30, wherein the determining (22) of the bounding box comprises determining at least one candidate bounding box based on the first 3D point cloud data set and verifying at least one of the determined candidate bounding boxes based on an image analysis of the first image.

32. The method according to claim 31 , wherein the image analysis comprises at least one of a) comparing one or more dimensions of the projected bounding box with one or more dimensions of one or more picture elements shown in the first image; b) determining the location of the projected bounding box with respect to one or more picture elements shown in the first image; c) recognizing letters or numbers shown in the first image; and d) performing face detection.

33. The method according to one of the claims 19 to 32, comprising determining a first bounding box and a second bounding box different from the first bounding box based on the first 3D point cloud data set and projecting the determined first bounding box on a first ROI comprised in the first image and the determined second bounding box on a second ROI different from the first ROI comprised in the first image.

34. The method according to one of the claims 19 to 33, wherein the temporal sequence of 3D point cloud data sets comprises one or more second 3D point cloud data sets obtained after the first 3D point cloud data set and the temporal sequence of images comprises one or more second images captured after the first image and wherein the method comprises determining additional bounding boxes based on the one or more second 3D point cloud data sets corresponding to the bounding box determined based on the first 3D point cloud data set; projecting the determined additional bounding boxes on respective ROIs comprised in the one or more second images and corresponding to the ROI of the first image; and anonymizing the respective ROIs of the second one or more images.

35. The method according to claim 34, further comprising at least one of comparing the locations of the projected additional bounding boxes with the location of the bounding box projected on the ROI comprised in the first image and verifying the additional bounding boxes based on the comparison; and comparing the locations of the ROIs comprised in the one or more second images with the location of the ROI comprised in the first image and verifying the ROIs comprised in the one or more second images based on the comparison.

36. Method of surveilling vehicles, comprising obtaining a temporal sequence of 3D point cloud data sets for at least one vehicle by a Light Detection and Ranging, LIDAR, device (11, 32), wherein the temporal sequence of 3D point cloud data sets comprises a first 3D point cloud data set; capturing a temporal sequence of images of the at least one vehicle by a camera device (12, 33), wherein the temporal sequence of images comprises a first image; and performing by a processing unit (13) determining a first bounding box based on the first 3D point cloud data set; determining a second bounding box based on the first 3D point cloud data set; projecting the determined first bounding box on a first region of interest, ROI comprised in the first image, wherein the first ROI represents a license plate of the vehicle; projecting the determined second bounding box on a second ROI comprised in the first image, wherein the second ROI represents a passenger or a face of a passenger of the vehicle; and anonymizing the first and second ROIs based on the projected first and second bounding boxes, respectively.

37. A computer program product comprising computer readable instructions for, when run on a computer, performing the steps of the method according to one of the claims 19 to 36.