CN111401133A

CN111401133A - Target data augmentation method, device, electronic device and readable storage medium

Info

Publication number: CN111401133A
Application number: CN202010101994.1A
Authority: CN
Inventors: 付良成
Original assignee: Beijing Sankuai Online Technology Co Ltd
Current assignee: Beijing Sankuai Online Technology Co Ltd
Priority date: 2020-02-19
Filing date: 2020-02-19
Publication date: 2020-07-10

Abstract

The application discloses a target data augmentation method and device, electronic equipment and a computer readable storage medium. The method comprises the following steps: extracting a plurality of target data which accord with preset conditions from a target data set, wherein the target data comprise a target image and a target point cloud corresponding to the target image; acquiring scene data, wherein the scene data comprises a scene image and a scene point cloud corresponding to the scene image; and fusing the target data into the scene data according to the labeling information of the scene image and/or the scene point cloud in the scene data to obtain the augmented data containing the target. The scheme can realize linkage augmentation of the point cloud and the image data, ensures the consistency of the image and the point cloud data, is low in cost, and can well improve the prediction performance of the neural network model by the obtained augmented data.

Description

Target data augmentation method, device, electronic device and readable storage medium

Technical Field

The application relates to the technical field of deep learning, in particular to a target data augmentation method and device, electronic equipment and a readable storage medium.

Background

Unmanned driving is a complex system, and relates to a plurality of technologies such as positioning, perception, decision-making, control and the like, wherein a perception module is used for identifying and positioning various obstacles such as pedestrians, vehicles and the like in the environment. In order to realize accurate 3D obstacle detection, deep learning is the mainstream technology at present, and it needs to train a convolutional neural network model with a large amount of road surface data, and the size of the data volume determines the performance of the model, so how to expand and expand the data volume is the technology that must be faced by using the deep learning technology.

Along with the iteration update of technique, generally load camera and lidar on the unmanned equipment to combine the advantage of a plurality of sensors, realize more accurate obstacle detection effect. Therefore, the deep learning network takes the image and the point cloud as input, and data augmentation needs to be performed synchronously on the image and the point cloud, so that a small challenge is brought to the data augmentation. The augmentation scheme in the prior art generally adopts manual acquisition of augmentation targets, and a large amount of manpower and financial resources are consumed; and the augmented object may be placed in an inappropriate position due to the random placement of the augmented object on the background image.

Content of application

In view of the above, the present application is made to provide a target data augmentation method, apparatus, electronic device, and readable storage medium that overcome or at least partially solve the above problems.

In accordance with one aspect of the present application, there is provided a target data augmentation method, the method comprising:

extracting a plurality of target data meeting preset conditions from a target data set, wherein the target data comprise a target image and a target point cloud corresponding to the target image;

acquiring scene data, wherein the scene data comprises a scene image and a scene point cloud corresponding to the scene image;

and fusing the target data into the scene data according to the labeling information of the scene image and/or the scene point cloud in the scene data to obtain augmented data containing a target.

Optionally, the target data set is obtained by:

selecting or collecting point cloud data and image data which have corresponding relations;

judging whether each point cloud point in the point cloud data is in a 3D frame of a target according to the labeling information of the point cloud data, so as to obtain the target point cloud of the target;

and acquiring a 2D frame of the target image according to the labeling information of the image data, and acquiring instance segmentation of the target image according to an instance segmentation network model.

Optionally, the labeling information of the point cloud data includes any one or more of the following information: target position, target size, orientation angle, whether to block; the annotation information of the image data comprises the position of the 2D frame of the target image;

the obtaining of the instance segmentation of the target image according to the instance segmentation network model comprises:

and generating foreground instance segmentation of the target image from the image data by adopting a pre-trained recognition network model.

Optionally, the fusing the target data into the scene data according to the labeling information of the scene image and/or the scene point cloud in the scene data to obtain augmented data including the target includes:

according to the ground information marked in the scene point cloud, determining the theoretical height of the target from the ground through the top view position of the target, and determining the offset of the target in height according to the difference between the theoretical height and the current height;

and translating the target point cloud according to the offset, so that the target point cloud is on the ground of the scene point cloud.

Optionally, the fusing the target data into the scene data according to the labeling information of the scene image and/or the scene point cloud in the scene data to obtain augmented data including the target further includes:

determining the intersection ratio of the target point cloud and other target point clouds in the scene point cloud, judging whether the intersection ratio is equal to 0, if so, selecting the target if the intersection ratio is equal to 0, and if the intersection ratio is greater than 0, indicating that the position of the target has other targets, and if so, abandoning the target.

generating a 2D frame of the target image in the scene image according to the position of the target point cloud in the scene point cloud and the calibration relation from the scene point cloud to the scene image;

performing 2D collision detection and/or front shielding detection on the target image according to the 2D frame;

adjusting the size of the target image according to the calculated and determined 2D frame size;

and covering the pixels of the scene image with the pixel points in the instance segmentation in the generated target image.

Optionally, the performing 2D collision detection on the target image includes:

determining the intersection ratio of the 2D frame of the target image and the 2D frames of other targets in the scene image, judging whether the intersection ratio is equal to 0, if so, indicating that no other target exists at the position of the target, selecting the target, and if so, indicating that other target exists at the position of the target, and if so, giving up the target;

the pre-occlusion detection of the target image comprises:

and acquiring all point cloud points which can be projected into the 2D frame of the target image in the scene point cloud, wherein the farthest distance of the point cloud points does not exceed the distance between the target and the sensor, if the point cloud points exist, the fact that a front shielding object exists is indicated, and if the point cloud points do not exist, the fact that the front shielding object does not exist is indicated.

In accordance with another aspect of the present application, there is provided a target data augmentation apparatus, the apparatus including:

the target data extraction unit is suitable for extracting a plurality of target data meeting preset conditions from a target data set, and the target data comprises a target image and a target point cloud corresponding to the target image;

the scene data acquisition unit is suitable for acquiring scene data, and the scene data comprises a scene image and a scene point cloud corresponding to the scene image;

and the data augmentation realizing unit is suitable for fusing the target data into the scene data according to the labeling information of the scene image and/or the scene point cloud in the scene data to obtain augmented data containing the target.

In accordance with yet another aspect of the present application, there is provided an electronic device including: a processor; and a memory arranged to store computer executable instructions that, when executed, cause the processor to perform a method as any one of the above.

According to a further aspect of the application, there is provided a computer readable storage medium, wherein the computer readable storage medium stores one or more programs which, when executed by a processor, implement a method as in any above.

As can be seen from the above, the technical solution for target data augmentation disclosed in the present application can achieve the following technical effects:

compared with the augmentation of pure image data or pure point cloud data, the linkage augmentation scheme disclosed by the application can ensure the consistency of the image and the point cloud data, realizes the augmentation of the fusion book, can be used for deep learning model training of multi-sensor fusion, and improves the detection performance;

the augmentation data are directly obtained from the original data set, and extra data acquisition is not needed, so that the manpower and financial expenditure are saved;

by utilizing the ground information marked in the original data set, the augmented target is ensured to appear on the ground level, and the appearance of an unreal scene is avoided.

The foregoing description is only an overview of the technical solutions of the present application, and the present application can be implemented according to the content of the description in order to make the technical means of the present application more clearly understood, and the following detailed description of the present application is given in order to make the above and other objects, features, and advantages of the present application more clearly understandable.

Drawings

Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the application. Also, like reference numerals are used to refer to like parts throughout the drawings. In the drawings:

FIG. 1 shows a schematic flow diagram of a method of target data augmentation according to one embodiment of the present application;

FIG. 2 illustrates a schematic diagram of a target data augmentation apparatus according to one embodiment of the present application;

FIG. 3 shows a schematic structural diagram of an electronic device according to an embodiment of the present application;

FIG. 4 shows a schematic structural diagram of a computer-readable storage medium according to an embodiment of the present application

FIG. 5 illustrates a schematic flow diagram for extracting a target point cloud from point cloud data according to one embodiment of the present application;

FIG. 6 illustrates a flow diagram for extracting a target image 2D box and target instance segmentation from image data according to one embodiment of the present application;

FIG. 7 illustrates a flow diagram of a data augmentation ensemble according to an embodiment of the present application;

FIG. 8 illustrates an exemplary graph of a comparison before and after augmentation of target data according to an embodiment of the application.

Detailed Description

Exemplary embodiments of the present application will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present application are shown in the drawings, it should be understood that the present application may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

The technical idea key points of the data augmentation scheme disclosed in the embodiments of the present application include: acquiring and setting a target data set in advance, wherein the target data set comprises point clouds and images of a plurality of targets, such as the point clouds of vehicles and the images of the vehicles, namely the target data set is preset as a source of augmented target data; then acquiring scene data to be augmented, wherein the scene data is generally data acquired from an actual scene through a camera and a laser radar, and the scene data and a target data set can be data sets from the same source; and finally, fusing the extracted target data into scene data to obtain the augmented scene data. The method aims to add a new target to the scene data to obtain more training sample data, and the target not only comprises a vehicle, but also can be other obstacles such as pedestrians, animals and the like according to different purposes of model training.

FIG. 1 shows a schematic flow diagram of a method of target data augmentation according to one embodiment of the present application; the method comprises the following steps:

step S110, extracting a plurality of target data meeting preset conditions from a target data set, wherein the target data comprises a target image and a target point cloud corresponding to the target image.

The step is to select a plurality of targets to be fused from a preset target data set, so as to prepare for data augmentation. The target data set may be an image or a point cloud of a vehicle, a pedestrian, or other obstacles acquired by a fusion sensor used in unmanned driving, or may be any other augmented target data set used for identifying a model training sample without being limited to the field of unmanned driving.

The target data set of this step may be stored in a database, and each target extracted or selected from the target data set may include a plurality of pieces of data, including, for example, a target image, a target point cloud, and their coordinate locations. Each target point cloud and target image have a corresponding relationship indicating that they are images or point clouds of the same target. The corresponding relationship specifically includes a time alignment relationship between the point cloud and the image data, and for example, the point cloud and the image data may be acquired by a camera and a laser radar on the same unmanned device at the same time. Preferably, in order to reduce the cost of human and financial resources, the target data in this embodiment may be extracted directly from the data set by using the labeled information of the data set, for example, vehicles, pedestrians, animals, other obstacles, etc., so as to form the target data set, unlike other schemes in which the target data is manually re-collected.

In addition, in order to improve the quality of the target data, the target data may be filtered through preset conditions, where the preset conditions for filtering include: whether the target is shielded and cut off, whether the number of points of the contained point cloud is too small, whether a foreground segmentation mask (mask) exists, whether the number of pixels contained in the foreground segmentation mask is too small and the like, wherein whether the number of points is too small can be judged by setting a threshold value. And if the target does not meet the preset condition, abandoning the target, and re-extracting or selecting from the target data set.

Step S120, scene data is obtained, and the scene data comprises a scene image and a scene point cloud corresponding to the scene image.

The scene data is a basic data set, and the augmented data set is formed by taking the scene data as a basic bluebook. In order to save cost, the scene data may be selected from a data set as a target data source, and may be selected from other data sets. And according to the requirement of neural network model training, the scene data also comprises scene image data and scene point cloud data with time alignment relation, which are respectively used for fusing the image data of the target and the point cloud data of the target.

Step S130, fusing the target data into the scene data according to the labeling information of the scene image and/or the scene point cloud in the scene data to obtain the augmentation data containing the target.

The labeling information of the scene image and/or the scene point cloud in the scene data, such as ground information, may only appear in the scene image or the scene point cloud, and then the target is placed at the same position in the other party according to the corresponding relationship, or the labeling information of the scene point cloud and the scene image may be also included, and positioning and verification may be performed according to the labeling information.

The labeling information is the same as labeling information in a data set as a target data source, and can be obtained according to artificial labeling, including information such as ground positions, orientation angles and the like of a target 2D frame and a target 3D frame.

Generally, the goal is to be inserted into the scene one by one, and more augmented scenes can be obtained for operations of different scenes. In actual execution, the target point cloud data and the target image data can be respectively fused into the scene point cloud and the scene image of the scene data; or the target point cloud is fused into the scene point cloud, and then the target 3D frame in the scene point cloud is mapped to the target 2D frame in the scene point cloud according to the calibration relation between the scene point cloud and the scene image; or by the reverse mapping procedure, i.e. first generating a 2D frame of the target image in the scene image and then obtaining a target 3D frame of the scene point cloud.

In a specific implementation, the target data set and the scene data set are the same source data set, and they may be collected by the same fusion sensor, or may obtain the same coordinate system through a coordinate transformation relationship. Therefore, when the target data is integrated into the scene data, the target data is placed in the same position of the scene data according to the coordinate value in the target data, and then the adjustment is simply performed.

Through the steps, a random number of new targets can be added into each scene in the data set. In order to ensure the authenticity of the generated target, the target can be inserted into a reasonable position according to the labeling information, and meanwhile, in order to keep the consistency of scene point cloud data and image data, the newly added target not only appears in the point cloud data, but also appears in a corresponding position in the image.

In one embodiment, the target data set is obtained by: selecting or collecting point cloud data and image data which have corresponding relations; and judging whether each point cloud point in the point cloud data is in a 3D frame of a target or not according to the labeling information of the point cloud data, thereby obtaining the target point cloud of the target.

Firstly, sample data of target data to be extracted needs to be determined, for example, a part of data samples can be selected from a data set acquired by unmanned equipment.

Referring to fig. 5, a process of extracting a target point cloud from point cloud data is shown, some point cloud data and a label file in a data set are obtained, and according to label information in the label file, a point cloud included in each target can be extracted by calculating whether each point cloud point is in a 3D frame of the target. As shown in fig. 5, the point cloud points corresponding to the target are grayed by comparing the front and rear point cloud images.

And acquiring a 2D frame of the target image according to the labeling information of the image data, and acquiring instance segmentation of the target image according to an instance segmentation network model. The example segmentation here refers to target detection and pixel level segmentation of vehicles, pedestrians and other obstacles, that is, pixel points of a target image are identified and separated from an overall image.

Referring to fig. 6, first, image data corresponding to a point cloud data frame in a data set and a markup file are obtained, where the markup file at least includes position information of a 2D frame of each target in the image. In order to prevent adding pixel points which do not belong to the target into the scene image during data augmentation, a foreground segmentation MASK of each target needs to be obtained, and preferably, the segmentation MASK is realized through a MASK-RCNN instance segmentation network model. According to fig. 6, the image includes four vehicle objects, the corresponding object images are marked by 2D frames, and the corresponding segmentation results are displayed by a mask.

Through the above operation, all the targets in a certain frame in the data set can be obtained, and each target contains three kinds of information: target point cloud (i.e. 3D box), target image slice (i.e. 2D box) and target instance segmentation mask. And summarizing all targets to obtain a target database.

In one embodiment, the labeling information of the point cloud data includes any one or more of the following information: target position, target size, orientation angle, whether to block; the annotation information of the image data includes a position of a 2D frame of the target image.

In order to obtain accurate model detection performance and accurate target data, corresponding labeling can be performed as required when manual labeling is performed on point cloud data and image data. For example, the following information may be included in the annotation information of the point cloud data: the position of the target, the size of the target, the orientation angle of the target, whether the target is occluded by other objects, and the like. The annotation file of the image data may further include information such as a 2D frame position of the target, a size of the target image, and the like.

The obtaining of the instance segmentation of the target image according to the instance segmentation network model comprises: and generating foreground instance segmentation of the target image from the image data by adopting a pre-trained recognition network model.

In the case of performing an experiment using an open data set, for example, when using a KITTI data set, since the KITTI data set itself has no result of object segmentation, in order to "scratch" an object image from a 2D frame of the object image without including other pixel points, a recognition network model may be used to generate foreground instance segmentation of the object image from the image data, and preferably, the recognition network model is MASK-RCNN. MASK-RCNN is a general instance segmentation architecture, which is a fast RCNN framework for semantically segmenting each candidate Box (Proposal Box) of the fast RCNN, wherein the segmentation task is performed simultaneously with the positioning and classification tasks. The MASK-RCNN is used in this embodiment for identifying and accurately segmenting the target image.

As can be seen from the lower right hand panel in fig. 6, the target vehicle has been covered by the mask, thereby providing a precondition for the accurate subsequent generation of the target image.

In one embodiment, the step S130 includes: according to the ground information marked in the scene point cloud, determining the theoretical height of the target from the ground through the top view position of the target, and determining the offset of the target in height according to the difference between the theoretical height and the current height; and translating the target point cloud according to the offset, so that the target point cloud is on the ground of the scene point cloud.

Specifically, although the scene in which the extracted target is located in the original point cloud data is largely different from the scene in the scene data, since they adopt the same coordinate system, particularly in the case that the two data are homologous data, when the target point cloud is placed in the scene point cloud, the coordinate position of the target point cloud in the original data may be placed in the scene point cloud first, and then fine tuning is performed. For example, in order to place the target point cloud on the ground of the scene point cloud, the position of the target point cloud in height can be adjusted according to the ground marking information of the scene point cloud so as to place the target point cloud on the ground, and meanwhile, other coordinate positions are kept unchanged. And at the moment, the adjustment amount of the target point cloud to be adjusted is calculated, and the target point cloud is vertically translated according to the adjustment amount, so that the target point cloud can be truly shown in the scene point cloud. And finally generating a target point cloud which accords with the current scene in the scene data and the position of the target point cloud in the 3D frame in the scene point cloud.

In one embodiment, the step S130 further includes: determining the intersection ratio of the target point cloud and other target point clouds in the scene point cloud, judging whether the intersection ratio is equal to 0, if so, selecting the target if the intersection ratio is equal to 0, and if the intersection ratio is greater than 0, indicating that the position of the target has other targets, and if so, abandoning the target.

In the embodiment, whether the target point cloud collides with other targets in the scene point cloud is detected, the method judges whether the target point cloud placed at a certain position collides with other originally existing target point clouds near the certain position by using the intersection ratio of the target point cloud and the other originally existing target point clouds, if cross overlapping exists between the target point cloud and the other target point clouds, the target point cloud and the other target point clouds are judged to collide, if the scene is not true due to the placement of the target, the target needs to be abandoned, and if the cross overlapping does not exist, the target can be placed at the certain position.

Generally, the intersection ratio iou (intersection overlap union) represents the overlapping rate of the target frame and the original mark frame generated by the target detection model, and can be simply understood as: the ratio of the intersection of the detection result (DetectionResult) of the target frame and the true value (groudtruth) of the identification frame to the union of the detection result (DetectionResult) and the true value (groudtruth) of the identification frame is the detection accuracy IoU:

the higher the correlation between the target box and the identification box, the higher the IoU value, and the perfect overlap is the most ideal case, i.e. IoU value is 1.

In the application, whether intersection exists between a 2D or 3D target frame to be selected or extracted and other target frames in scene data after the target frame is placed in the scene data is judged by using an intersection ratio, namely whether common point cloud points or image pixel points exist, if the common point cloud points or the image pixel points do not exist, the intersection of the target frame to be selected or extracted and the other target frames in the scene data is 0, and then the quotient of the intersection and the union of the intersection is 0, which means that the target can be selected for use in the scene data; if the intersection ratio is not 0, it indicates that there is an intersection between the two target frames, which may result in collision between the two target frames, and the target frame to be extracted or selected should be discarded.

The target frame in the embodiment forms a 3D frame for the target point cloud, and at this time, it is substantially determined whether the 3D frame formed by the target point cloud and the 3D frame formed by other target point clouds have the same point cloud point, if not, the intersection ratio is 0, and the target can be selected; if the intersection ratio is greater than 0, the target is discarded.

In one embodiment, the step S130 further includes: generating a 2D frame of the target image in the scene image according to the position of the target point cloud in the scene point cloud and the calibration relation from the scene point cloud to the scene image; performing 2D collision detection and/or front shielding detection on the target image according to the 2D frame; adjusting the size of the target image according to the calculated and determined 2D frame size; and then covering the pixels of the scene image with the pixel points in the instance segmentation in the generated target image.

The embodiment describes that the 2D frame position of the target image in the scene image is determined according to the target 3D frame position and the target point cloud generated in the scene point cloud, and the target image is finally added into the scene image through the steps of 2D collision detection, or front shielding detection, size adjustment of the target image and the like, so that the fusion of the target image and the scene image is realized.

Certainly, the fusion process of the target image and the target point cloud can also be performed separately, or the target image is fused into the scene image first, and then the target point cloud is fused into the scene point cloud data through the mapping relation.

In order to map the 3D frame position in the scene point cloud to the position of the corresponding 2D frame target image in the scene image, the camera coordinates and the laser device coordinates need to be calibrated according to the camera internal reference and external reference, and then the 2D frame position in the scene image is obtained according to the calibration result and the 3D frame position. It is worth to be noted that, due to the change of the position of the 3D frame, the size of the generated 2D frame may be different from the size before the target, and in order to make the target image more realistic, the size of the original target image may be adjusted according to the calibrated size of the 2D frame, so that the target image is closer to the real situation.

In one embodiment, the 2D collision detection of the target image comprises: determining the intersection ratio of the 2D frame of the target image and the 2D frames of other targets in the scene image, judging whether the intersection ratio is equal to 0, if so, indicating that no other target exists at the position of the target, selecting the target, and if so, indicating that other target exists at the position of the target, and if so, needing to abandon the target.

According to the record of the intersection ratio, in the embodiment, it is required to determine whether the 2D frame of the target image and the 2D frames of other targets have the same pixel points, if not, the intersection ratio is 0, and the target can be selected; if the intersection ratio is greater than 0, the target is discarded.

The pre-occlusion detection of the target image comprises: and acquiring all point cloud points which can be projected into the 2D frame of the target image in the scene point cloud, wherein the farthest distance of the point cloud points does not exceed the distance between the target and the sensor, if the point cloud points exist, the fact that a front shielding object exists is indicated, and if the point cloud points do not exist, the fact that the front shielding object does not exist is indicated.

The steps respectively realize the collision detection and the front shielding detection of the 2D frame, and the collision detection principle and the algorithm of the 2D frame are the same as those of the 3D frame, so that the detailed description is omitted.

The front shielding detection is mainly used for detecting whether a front shielding object exists in a laser equipment coordinate system according to the principle that the laser equipment forms point clouds. That is, whether a front shielding object exists is judged by judging whether a point cloud point exists between the target and the sensor. If there is a leading occlusion, it may indicate that the target may be placed on or between buildings or other obstacles, and the fusion of the target may need to be abandoned, thereby further avoiding the occurrence of unreal situations.

FIG. 2 illustrates a schematic diagram of a target data augmentation apparatus according to one embodiment of the present application; the device comprises:

the target data extracting unit 210 is adapted to extract a plurality of target data meeting a preset condition from a target data set, where the target data includes a target image and a target point cloud corresponding to the target image.

The scene data acquiring unit 220 is adapted to acquire scene data, where the scene data includes a scene image and a scene point cloud corresponding to the scene image.

The data augmentation implementing unit 230 is adapted to fuse the target data into the scene data according to the labeling information of the scene image and/or the scene point cloud in the scene data, so as to obtain augmented data including the target.

By the execution of each unit in the device, a random number of new targets can be added into each scene in the data set. In order to ensure the authenticity of the generated target, the target can be inserted into a reasonable position according to the labeling information, and meanwhile, in order to keep the consistency of scene point cloud data and image data, the newly added target not only appears in the point cloud data, but also appears in a corresponding position in the image.

Referring to fig. 6, first, image data corresponding to a point cloud data frame in a data set and a markup file are obtained, where the markup file at least includes position information of a 2D frame of each target in the image. In order to prevent pixel points which do not belong to the targets from being added into the scene image during data augmentation, a foreground segmentation MASK of each target needs to be obtained, and the segmentation MASK is realized through a MASK-RCNN instance segmentation network model. According to fig. 6, the image includes four vehicle objects, the corresponding object images are marked by 2D frames, and the corresponding segmentation results are displayed by a mask.

Through the above operation, all the targets in a certain frame in the data set can be obtained, and each target contains three kinds of information: the system comprises a target point cloud, a target image section and a target instance segmentation mask. And summarizing all targets to obtain a target database.

In order to accurately "scratch out" the target image from the 2D frame of the target image without including other pixel points, a MASK can be generated from the image data by using a MASK-RCNN recognition network model, and it can be known from the lower right hand small image in fig. 6 that the target vehicle has been covered by the MASK.

In one embodiment, the data augmentation unit 230 is adapted to: according to the ground information marked in the scene point cloud, determining the theoretical height of the target from the ground through the top view position of the target, and determining the offset of the target in height according to the difference between the theoretical height and the current height; and translating the target point cloud according to the offset, so that the target point cloud is on the ground of the scene point cloud.

In one embodiment, the data augmentation implementation unit 230 is further adapted to: determining the intersection ratio of the target point cloud and other target point clouds in the scene point cloud, judging whether the intersection ratio is equal to 0, if so, selecting the target if the intersection ratio is equal to 0, and if the intersection ratio is greater than 0, indicating that the position of the target has other targets, and if so, abandoning the target.

In this embodiment, it is detected whether the target point cloud collides with other targets in the scene point cloud, that is, collision detection is performed by using intersection ratio calculation.

In one embodiment, the data augmentation implementation unit 230 is further adapted to:

The embodiment describes that the 2D frame position of the target image in the scene image is determined according to the target 3D frame position and the target point cloud generated in the scene point cloud, and the target image is finally added to the scene image through the steps of 2D collision detection, front occlusion detection, target image size adjustment and the like, so as to realize the fusion of the target image and the scene image.

In summary, referring to the target data augmentation process shown in fig. 7, the data augmentation scheme disclosed in the present application includes: extracting a plurality of target data which meet preset conditions from the target data set through filtering, wherein the target data comprise a target point cloud and a target image which have corresponding relations; acquiring scene data, and generating target augmentation data based on the scene data, wherein the scene data comprises a scene point cloud and a scene image which have a corresponding relationship; and fusing the target data into the scene data according to the labeling information of the scene data, for example, generating a 3D target point cloud in the scene point cloud, generating a target 2D frame and target segmentation according to the corresponding relation between the 3D frame and the 2D frame, and processing to obtain the augmentation data containing the target. The scheme can realize linkage augmentation of the point cloud and the image data, ensures the consistency of the image and the point cloud data, is low in cost, and can well improve the prediction performance of the neural network model by the obtained augmented data.

Fig. 8 shows a specific example of augmenting scene data, wherein the middle image is an original scene, and the upper image and the lower image are respectively augmented scene point cloud and scene image.

It should be noted that:

the algorithms and displays presented herein are not inherently related to any particular computer, virtual machine, or other apparatus. Various general purpose devices may be used with the teachings herein. The required structure for constructing such a device will be apparent from the description above. In addition, this application is not directed to any particular programming language. It will be appreciated that a variety of programming languages may be used to implement the teachings of the present application as described herein, and any descriptions of specific languages are provided above to disclose the best modes of the present application.

In the description provided herein, numerous specific details are set forth. However, it is understood that embodiments of the application may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.

Similarly, it should be appreciated that in the foregoing description of exemplary embodiments of the application, various features of the application are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the application and aiding in the understanding of one or more of the various application aspects. However, the disclosed method should not be interpreted as reflecting an intention that: this application is intended to cover such departures from the present disclosure as come within known or customary practice in the art to which this invention pertains. Rather, as the following claims reflect, application is directed to less than all features of a single foregoing disclosed embodiment. Thus, the claims following the detailed description are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate embodiment of this application.

Those skilled in the art will appreciate that the modules in the device in an embodiment may be adaptively changed and disposed in one or more devices different from the embodiment. The modules or units or components of the embodiments may be combined into one module or unit or component, and furthermore they may be divided into a plurality of sub-modules or sub-units or sub-components. All of the features disclosed in this specification (including any accompanying claims, abstract and drawings), and all of the processes or elements of any method or apparatus so disclosed, may be combined in any combination, except combinations where at least some of such features and/or processes or elements are mutually exclusive. Each feature disclosed in this specification (including any accompanying claims, abstract and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.

Furthermore, those skilled in the art will appreciate that while some embodiments described herein include some features included in other embodiments, rather than other features, combinations of features of different embodiments are meant to be within the scope of the application and form different embodiments. For example, in the following claims, any of the claimed embodiments may be used in any combination.

The various component embodiments of the present application may be implemented in hardware, or in software modules running on one or more processors, or in a combination thereof. Those skilled in the art will appreciate that a microprocessor or Digital Signal Processor (DSP) may be used in practice to implement some or all of the functions of some or all of the components in a targeted data augmentation apparatus according to embodiments of the present application. The present application may also be embodied as apparatus or device programs (e.g., computer programs and computer program products) for performing a portion or all of the methods described herein. Such programs implementing the present application may be stored on a computer readable medium or may be in the form of one or more signals. Such a signal may be downloaded from an internet website or provided on a carrier signal or in any other form.

For example, fig. 3 shows a schematic structural diagram of an electronic device according to an embodiment of the present application. The electronic device 300 comprises a processor 310 and a memory 320 arranged to store computer executable instructions (computer readable program code). The memory 320 may be an electronic memory such as a flash memory, an EEPROM (electrically erasable programmable read only memory), an EPROM, a hard disk, or a ROM. The memory 320 has a storage space 330 storing computer readable program code 331 for performing any of the method steps described above. For example, the storage space 330 for storing the computer readable program code may comprise respective computer readable program codes 331 for respectively implementing various steps in the above method. The computer readable program code 331 may be read from or written to one or more computer program products. These computer program products comprise a program code carrier such as a hard disk, a Compact Disc (CD), a memory card or a floppy disk. Such a computer program product is typically a computer readable storage medium such as described in fig. 4. FIG. 4 shows a schematic structural diagram of a computer-readable storage medium according to an embodiment of the present application. The computer readable storage medium 400 has stored thereon a computer readable program code 331 for performing the steps of the method according to the application, readable by a processor 310 of an electronic device 300, which computer readable program code 331, when executed by the electronic device 300, causes the electronic device 300 to perform the steps of the method described above, in particular the computer readable program code 331 stored on the computer readable storage medium may perform the method shown in any of the embodiments described above. The computer readable program code 331 may be compressed in a suitable form.

It should be noted that the above-mentioned embodiments illustrate rather than limit the application, and that those skilled in the art will be able to design alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The application may be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The usage of the words first, second and third, etcetera do not indicate any ordering. These words may be interpreted as names.

Claims

1. A method of target data augmentation, the method comprising:

2. The target data augmentation method of claim 1, wherein the target data set is obtained by:

3. The method for augmenting target data according to claim 2, wherein the annotation information of the point cloud data comprises any one or more of the following information: target position, target size, orientation angle, whether to block; the annotation information of the image data comprises the position of the 2D frame of the target image;

4. The method as claimed in claim 2 or 3, wherein the fusing the target data into the scene data according to the labeling information of the scene image and/or the scene point cloud in the scene data to obtain the augmented data including the target comprises:

5. The method as claimed in claim 4, wherein the fusing the target data into the scene data according to the labeling information of the scene image and/or the scene point cloud in the scene data to obtain the augmented data including the target further comprises:

6. The method as claimed in claim 4, wherein the fusing the target data into the scene data according to the labeling information of the scene image and/or the scene point cloud in the scene data to obtain the augmented data including the target further comprises:

7. The method of target data augmentation of claim 6, wherein the 2D collision detection of the target image comprises:

the pre-occlusion detection of the target image comprises:

8. A target data augmentation apparatus, comprising:

9. An electronic device, wherein the electronic device comprises: a processor; and a memory arranged to store computer executable instructions that, when executed, cause the processor to perform the target data augmentation method of any one of claims 1-7.

10. A computer readable storage medium, wherein the computer readable storage medium stores one or more programs which, when executed by a processor, implement the target data augmentation method of any one of claims 1-7.