CN119399578A

CN119399578A - A point cloud data sparse convolution training method and system

Info

Publication number: CN119399578A
Application number: CN202411428100.4A
Authority: CN
Inventors: 林军; 吴凯; 陈弘炜; 王佳新
Original assignee: Nanjing University
Current assignee: Nanjing University
Priority date: 2024-10-12
Filing date: 2024-10-12
Publication date: 2025-02-07
Anticipated expiration: 2044-10-12
Also published as: CN119399578B

Abstract

The application provides a point cloud data sparse convolution training method and system, the method comprises the steps of obtaining point cloud data to be trained, carrying out preprocessing operation on the point cloud data to obtain a target convolution layer, disassembling the target convolution layer into a convolution layer group, wherein the convolution layer group comprises a plurality of convolution layers with sizes smaller than that of the target convolution layer, extracting features of the convolution layer group, carrying out normalization processing on the features to obtain values corresponding to the features and a threshold value, screening the values by utilizing the threshold value to obtain target values, and obtaining target features based on the target values to solve the problems that when feature extraction is carried out on the point cloud data at present, the sparsity of the point cloud data is lost, the calculation amount of the point cloud data is huge, the training time length and the training size of a model are greatly increased, and meanwhile the risk of fitting is improved.

Description

Point cloud data sparse convolution training method and system

Technical Field

The application relates to the technical field of point cloud data, in particular to a sparse convolution training method and system for point cloud data.

Background

Point cloud data are now being widely used in many leading research fields such as automatic driving, robots and the like, the research on such data has become the next hot research direction, and deep learning technology has played a dominant role in many research fields, particularly in solving the two-dimensional vision problem.

In the current point cloud target detection, a cylinder (pilar) based method is included, the method processes a 3D data point cloud through conversion into a 2D pseudo image, and the method is operated by introducing two-dimensional convolution, so that the purpose of completing real-time detection is achieved.

The point cloud target detection based on the pilar generally adopts two-dimensional convolution for feature extraction, and the data point cloud is converted into a 2D pseudo image and then sent into a plurality of two-dimensional convolution layers for feature extraction. The strategy can directly use the existing mode of two-dimensional image processing, but after one layer of convolution, the characteristic sparsity of the maximum point cloud data is almost lost, but is converted into a dense graph to be calculated, under the condition that the number of point clouds is huge, the calculated amount is huge, meanwhile, in some large-core convolution layers, the calculated amount is further increased, the parameter is also increased in square level, the training time length and the training size of the model are greatly increased, and meanwhile, the risk of overfitting is improved.

Disclosure of Invention

The application provides a sparse convolution training method and system for point cloud data, which are used for solving the technical problems that the sparsity of the point cloud data is lost when the characteristic extraction is carried out on the point cloud data at present, so that the calculated amount of the point cloud data is huge, the training time length and the training size of a model are greatly increased, and meanwhile, the risk of over fitting is improved.

The first aspect of the application provides a sparse convolution training method for point cloud data, which comprises the following steps:

Acquiring point cloud data to be trained;

preprocessing the point cloud data to obtain a target convolution layer;

The target convolution layer is disassembled into a convolution layer group, wherein the convolution layer group comprises a plurality of convolution layers with the size smaller than that of the target convolution layer;

Extracting the characteristics of the convolution layer group;

Normalizing the features to obtain a numerical value and a threshold value corresponding to the features;

screening the numerical value by utilizing the threshold value to obtain a target numerical value;

And obtaining target characteristics based on the target values.

In some embodiments, the step of preprocessing the point cloud data to obtain a target convolution layer includes:

converting the point cloud data from 3D data to 2D data by using a PCA algorithm;

and obtaining a target convolution layer based on the 2D data.

In some embodiments, the step of disassembling the convolutional layer into a convolutional layer group comprises:

Acquiring the size of the target convolution layer;

Based on the size of the target convolution layer, the number of the target convolution layers which are disassembled into a first convolution layer and a second convolution layer is obtained, wherein the sizes of the first convolution layer and the second convolution layer are smaller than the size of the target convolution layer;

And disassembling the target convolution layer into a convolution layer group based on the number of the first convolution layers and the number of the second convolution layers.

In some embodiments, the step of normalizing the feature to obtain a value and a threshold corresponding to the feature includes:

Normalizing the features to obtain values corresponding to the features;

Based on the values, obtaining training values related to the threshold;

and training the training numerical value by using a loss function to obtain a threshold value.

In some embodiments, the step of screening the value to obtain a target value using the threshold value includes:

if the value is smaller than the threshold value, deleting the value;

if the value is greater than the threshold, retaining the value;

and obtaining a target value based on the reserved value.

In some embodiments, after the step of obtaining the target value, the method comprises:

And adding a nonlinear element into the target value.

In some embodiments, the step of obtaining the target feature based on the target value includes:

Performing deconvolution operation on the target value to obtain a target feature map;

And splicing the target feature graphs to obtain target features.

In some embodiments, the step of deconvoluting the target value to obtain a target feature map includes:

Obtaining deconvolution interpolation according to the target value;

and performing deconvolution operation on the target value by using the deconvolution interpolation to obtain a target feature map, wherein the target feature map has the same size.

The second aspect of the present application provides a sparse convolution training system for point cloud data, comprising:

the acquisition module is used for acquiring point cloud data to be trained;

The preprocessing module is used for preprocessing the point cloud data to obtain a target convolution layer;

the disassembly module is used for disassembling the target convolution layer into a convolution layer group, wherein the convolution layer group comprises a plurality of convolution layers with the size smaller than that of the target convolution layer;

the extraction module is used for extracting the characteristics of the convolution layer group;

The normalization module is used for carrying out normalization processing on the characteristics to obtain a numerical value and a threshold value corresponding to the characteristics;

the screening module is configured to screen the numerical value by utilizing the threshold value to obtain a target numerical value;

and the generating module is configured to obtain a target characteristic based on the target value.

In some embodiments, the disassembly module comprises:

A first obtaining unit, configured to obtain a size of the convolution layer;

A second acquisition unit configured to acquire the number of the first convolution layers and the number of the second convolution layers, which are disassembled based on the size of the convolution layers, the size of the first convolution layers and the size of the second convolution layers being smaller than the size of the convolution layers;

and a deconstructing unit configured to deconstruct the convolutional layers into a convolutional layer group based on the number of the first convolutional layers and the number of the second convolutional layers.

The application provides a point cloud data sparse convolution training method and system, the method comprises the steps of obtaining point cloud data to be trained, carrying out preprocessing operation on the point cloud data to obtain a target convolution layer, disassembling the target convolution layer into a convolution layer group, wherein the convolution layer group comprises a plurality of convolution layers with sizes smaller than that of the target convolution layer, extracting features of the convolution layer group, carrying out normalization processing on the features to obtain values corresponding to the features and a threshold value, screening the values by utilizing the threshold value to obtain target values, and obtaining target features based on the target values to realize that sparsity of the point cloud data is reduced by a small margin when feature extraction is carried out on the point cloud data, so that calculation efficiency of the point cloud data is improved, training duration and size of a model are reduced, and meanwhile probability of occurrence of overfitting is reduced.

Drawings

In order to more clearly illustrate the technical solution of the present application, the drawings that are needed in the embodiments will be briefly described below, and it will be obvious to those skilled in the art that other drawings can be obtained from these drawings without inventive effort.

FIG. 1 is a flow chart of a sparse convolution training method for point cloud data in the application;

FIG. 2 is a schematic diagram of a pruned sparse convolution layer in accordance with the present application;

FIG. 3 is a schematic diagram of a sparse convolution layer deconstructed convolution layer in accordance with the present application;

Fig. 4 is a schematic diagram of a training process of point cloud data according to the present application.

Detailed Description

In order to make the technical solution of the present application better understood by those skilled in the art, the technical solution of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application, and it is apparent that the described embodiments are only some embodiments of the present application, but not all embodiments of the present application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present application without making any inventive effort, shall fall within the scope of the present application.

Because in some technologies, when the feature extraction is carried out on the point cloud data, the sparsity of the point cloud data can disappear, so that the calculation amount of the point cloud data is huge, the training time length and the training size of the model are greatly increased, meanwhile, the risk of overfitting is improved, in order to solve the technical problem, the application provides a point cloud data sparse convolution training method and system, and the point cloud data sparse convolution training method and system are explained as follows:

With the widespread development of 3D acquisition and 3D sensors, point cloud data can be obtained more widely and more easily, and provides more information than 2D data, but also brings more stress to the calculation and storage of data. The 3D point cloud data is processed, and because of special properties such as big data property, three-dimensional property, sparsity, asymmetry and the like, the processing of the point cloud needs larger computational complexity and memory consumption, and the existing 2D data processing method cannot be directly moved during processing, but is re-optimized on the original deep learning method, so that the method is suitable for processing the 3D point cloud.

Point cloud data is now being widely used in many leading research fields, such as autopilot, robotics, etc., and research on such data has become the next hot research direction. Deep learning techniques have also played a dominant role in many areas of research, particularly with great success in solving two-dimensional vision problems. In current Point cloud target detection, the Point-based methods, voxel-based methods, cylinder (Pillar) -based methods, and Transformer-based methods can be generally classified into several categories. The method mainly focuses on a cylinder-based method and mainly benefits from the advantages of the cylinder, wherein the method can introduce two-dimensional convolution to operate by converting 3D point cloud data into 2D pseudo images, so that the method is faster and more likely to fulfill the aim of real-time detection.

The method is characterized in that the characteristic extraction is carried out by adopting two-dimensional convolution based on the point cloud target detection of the column body, point cloud data are converted into 2D pseudo images, and then the 2D pseudo images are sent into a plurality of two-dimensional convolution layers for characteristic extraction. The strategy can directly use the existing mode of two-dimensional image processing, but after one layer of convolution, the characteristic sparsity of the maximum point cloud data is almost lost, but is converted into a dense graph to be calculated, under the condition that the quantity of the point cloud data is huge, the calculated quantity is huge, and in a plurality of large-core convolution layers, the calculated quantity is further increased, and meanwhile, the parameter quantity is also increased in square level, so that the training time length and the model size are greatly increased, and meanwhile, the risk of overfitting is improved.

As can be seen from fig. 1, in order to solve the above problems, the first aspect of the present application provides a sparse convolution training method for point cloud data, which includes the following steps:

And S100, acquiring point cloud data to be trained, wherein the point cloud data which is subjected to pretreatment is input, the point cloud target detection model is obtained after training, and the accuracy of the input point cloud data is ensured by verifying the actual effect of the model and the drawing of a detection frame on a corresponding software training platform.

The method comprises the steps of S200, preprocessing point cloud data to obtain a target convolution layer, converting the 3D point cloud data into a 2D pseudo image, namely the target convolution layer, by using a cylinder (Pillar) -based method, specifically, in the characteristic coding stage, performing projection on a top view plane to enable the point cloud data to be changed into a pseudo 2D image, namely dividing the pseudo 2D image into H multiplied by W grids on a projection screen, wherein each grid is a column, and then performing linear layer up-scaling (C, P, N) and maximum pooling (C, P) on stacked Pillar tensors of (D, P, N) to obtain a two-dimensional pseudo image, namely the target convolution layer.

The step of preprocessing the point cloud data to obtain a target convolution layer comprises the following substeps:

The method comprises the steps of S210, converting the point cloud data from 3D data to 2D data by using a PCA algorithm, wherein the point cloud data can also convert the point cloud data from 3D data to 2D data by using the PCA algorithm, specifically, firstly, carrying out standardization processing on the 3D sparse data so that each feature has the same scale, which is helpful for improving the effect of PCA dimension reduction, and finally, carrying out dimension reduction processing on the 3D data by using the PCA algorithm to obtain the 2D data. In this process, the two principal components that remain the most dominant may be selected to construct 2D data.

And S220, obtaining a target convolution layer based on the 2D data. The 2D data is the target convolutional layer.

And S300, the target convolution layer is disassembled into a convolution layer group, the convolution layer group comprises a plurality of convolution layers with the sizes smaller than the target convolution layer, and the resolution strategy enables the network to keep a larger receptive field, simultaneously reduces the calculation complexity, improves the model performance, reduces the risk of overfitting and the difficulty of optimization, and improves the friendliness to a hardware platform.

The step of disassembling the convolutional layer into a convolutional layer group comprises the following substeps:

the method comprises the steps of S310, obtaining the size of the target convolution layer, S320, obtaining the number of first convolution layers and the number of second convolution layers which are disassembled based on the size of the target convolution layer, wherein the sizes of the first convolution layers and the second convolution layers are smaller than the size of the target convolution layer, the size of the first convolution layer is smaller than the size of the second convolution layer, and S330, disassembling the target convolution layer into a convolution layer group based on the number of the first convolution layers and the number of the second convolution layers. As shown in fig. 3, in the process of disassembling the target convolutional layer into a convolutional layer group, the size of the target convolutional layer, such as n×n, is first obtained, if the sizes of the first convolutional layer and the second convolutional layer, which are disassembled into the target convolutional layer, are 3×3 and 2×2, the number of the first convolutional layer and the number of the second convolutional layer, which are disassembled into the target convolutional layer, are first obtained, wherein the number of the disassembly is based on the complete conversion of the target convolutional layer, and the target convolutional layer is converted into a plurality of convolutional layers with sizes of 3×3 and 2×2 according to the number of the disassembly, so that the problems of increased computational complexity, excessive smoothness, excessive fitting risk, low hardware efficiency and the like caused by large-core convolution are reduced. It should be noted that the sizes of the convolution layers in the convolution layer group are not limited to the above 3×3 and 2×2 sizes, and may also include convolution layers with 4×4 or 1×1 sizes or any combination of the above convolution layers with respect to reducing the problem of increasing the computational complexity caused by large-core convolution.

S400, extracting the characteristics of the convolution layer group, and disassembling the large-core convolution layer into a plurality of small-core convolution layers, so that the problems of increased computational complexity, excessive smoothness, excessive fitting risk, low hardware efficiency and the like caused by large-core convolution are reduced, and the computational efficiency is greatly improved.

The method comprises the steps of S500, carrying out normalization processing on the features to obtain values and threshold values corresponding to the features, carrying out normalization processing on the features, scaling the features to a section of 0-1, wherein the purpose of feature normalization is to convert feature values of different scales to the same scale, so that training speed of a model is increased, performance of the model is improved, carrying out training on the features related to the threshold values by using a loss function, continuously deriving and iterating the values in training to finally generate the threshold values, preparing for subsequent screening of the values, preventing accuracy reduction of the features while improving sparsity, judging which values can be deleted and which values need to be reserved through the threshold values at the training position, balancing between data accuracy and sparsity, and reducing the sparsity of the data while guaranteeing the data accuracy.

The step of normalizing the feature to obtain a value and a threshold corresponding to the feature comprises the following substeps:

And S510, carrying out normalization processing on the features to obtain values corresponding to the features, and carrying out normalization processing on the features, wherein the normalization processing is scaled to a range of 0-1, and the purpose of feature normalization is to convert feature values of different scales to the same scale, so that the training speed of the model is increased, and the performance of the model is improved.

And S520, acquiring a training value related to the threshold value based on the value, and S530, training the training value by using a loss function to acquire the threshold value. The relevant characteristics of the threshold value are trained by using a loss function, the values can be continuously derived and iterated in the training to finally generate the threshold value, preparation is made for subsequent screening of the values, the reduction of the precision of the characteristics is prevented while the sparsity is prevented, the trained threshold value can judge which values can be deleted and which values need to be reserved, balance between the data precision and the sparsity is achieved, and the sparsity of the data is reduced while the data precision is ensured.

And S600, screening the numerical value by using the threshold value to obtain a target numerical value, wherein the related item of the pruning threshold value is added in the loss function, so that the optimal pruning condition, namely the threshold value, can be continuously updated in the training process, and finally, parameters are recorded for network reasoning. The algorithm has stable sparsity after each layer of calculation, but not tends to be dense, and the calculation complexity is greatly reduced. Experiments show that compared with the sparsity of the original convolution layers of each layer, the target value obtained through the threshold value screening is reduced by 20%, so that the sparsity of the convolution layers obtained through the target value conversion is reduced by a large amount of calculation amount, and the loss of precision is only 0.21%.

The step of screening the numerical value by using the threshold value to obtain a target numerical value comprises the following substeps:

and S610, deleting the numerical value if the numerical value is smaller than the threshold value, S620, reserving the numerical value if the numerical value is larger than the threshold value, and S630, obtaining a target numerical value based on the reserved numerical value. And deleting the value smaller than the threshold value through the threshold value obtained through the loss function training, reserving the value larger than the threshold value, obtaining a target value based on the reserved value, and reducing the sparseness of the data while ensuring the data precision through a convolution layer obtained through the target value conversion, thereby reducing a large amount of calculation amount.

After the step of obtaining the target value, the method comprises the following substeps:

And S640, adding nonlinear elements into the target values. For sparse data, nonlinear elements may help the model better capture critical information in the data. By adding nonlinear elements into the point cloud data, the characteristic information of the point cloud data can be extracted more effectively, and preparation is made for the subsequent acquisition of target characteristics.

And S700, obtaining target characteristics based on the target numerical values. As shown in fig. 4, to generate the final detection result, the target feature is then sent to a convolution layer in the SSD detection head to further extract the high-level features. These features are then passed to a classifier and regressor for predicting the class and location of the target.

The step of obtaining the target feature based on the target value comprises the following substeps:

And S710, performing deconvolution operation on the target value to obtain a target feature map, and specifically, defining deconvolution layers firstly, defining a deconvolution layer in a deep learning framework. This layer will contain parameters of the deconvolution kernel size, step size, padding etc., secondly, inputting the target values by taking the target values (low resolution feature map) as input to the deconvolution layer, and finally, performing forward propagation by the deconvolution layer to generate the high resolution target feature map.

The step of deconvolution operation is carried out on the target value to obtain a target feature map, and comprises the following substeps:

S711, obtaining deconvolution interpolation according to the target value, S712, carrying out deconvolution operation on the target value by utilizing the deconvolution interpolation to obtain a target feature map, wherein the sizes of the target feature maps are the same. If the target feature images with the same size are required to be obtained, parameters of deconvolution layers, such as step size and filling parameters, which directly affect the size of the output feature images, are required to be set, and the target feature images with the same size can be obtained through the parameters, so that the splicing work of the subsequent target features is performed.

And S720, splicing the target feature graphs to obtain target features. Channel stitching merges the channels of different feature maps together. For example, if the two feature maps are the same size (i.e., the same height and width) but different channel numbers, they can be stitched together along the channel dimension to form a new feature map with more channels. The feature map obtained by the stitching operation is regarded as a target feature that contains information from different sources or layers, providing a basis for information for subsequent classification, regression or other tasks.

The pruning sparse convolution layer is mainly composed of four layers, namely a normal sparse convolution layer, a regularized normalization layer, a core pruning layer, a threshold value-related training value and a loss function, wherein the four layers are processed, the first layer is a normal sparse convolution layer, features of the target convolution layer are extracted, the second layer is normalized and normalized, the features are scaled to a range from 0 to 1 to obtain values corresponding to the features, the third layer is a core pruning layer, the values are used for obtaining the threshold value-related training value, the loss function is used for training the training value to obtain a threshold value, the threshold value is used for screening the values to obtain a target value, experiments show that the sparsity corresponding to the target value is basically maintained to be about 20%, and the fourth layer is a nonlinear layer in a common neural network, and nonlinear elements are added into the network. The pruning sparse convolution layer can be used for randomly replacing the convolution layer in the current network, so that hardware equipment is not required to be arranged independently.

The application provides a point cloud data sparse convolution training method based on pilar, which comprises a high-efficiency trainable pruning layer and large-kernel convolution splitting, wherein the pruning layer is added after feature extraction of each layer, and a trainable pruning threshold value added into a loss function is arranged, so that the output of each layer is pruned, the convolved result is not rapidly expanded to a dense chart, the calculated amount is reduced, sparse convolution is used at the moment, the calculation efficiency is greatly improved, and the method is decomposed into common superposition of 1x1, 2x2 and 3x3 convolution layers when encountering large convolution kernels, so that the problems of increased calculation complexity, excessive smoothness, fitting risk and low hardware efficiency and the like caused by large-kernel convolution are reduced. The method has low requirements on a training platform, is compatible with preprocessing means of common point cloud data, and can efficiently perform model training and testing. The sparse convolution training method for the point cloud data can obtain a software model with considerable accuracy, sparsity and training speed, and is friendly to compatibility of corresponding hardware acceleration platforms.

The second aspect of the application provides a point cloud data sparse convolution training system, which comprises an acquisition module, a preprocessing module, a disassembly module and a generation module, wherein the acquisition module is used for acquiring point cloud data to be trained, the preprocessing module is used for preprocessing the point cloud data to obtain a target convolution layer, the disassembly module is used for disassembling the target convolution layer into a convolution layer group, the convolution layer group comprises a plurality of convolution layers with the size smaller than that of the target convolution layer, the extraction module is used for extracting characteristics of the convolution layer group, the normalization module is used for carrying out normalization processing on the characteristics to obtain a numerical value corresponding to the characteristics and a threshold value, the screening module is used for screening the numerical value by utilizing the threshold value to obtain a target numerical value, and the generation module is used for obtaining target characteristics based on the target numerical value. The operation effect of the system embodiment during operation may be referred to the operation effect of the method embodiment, and will not be described herein.

In this embodiment, the disassembly module includes a first acquisition unit configured to acquire a size of the convolution layers, a second acquisition unit configured to acquire a number of the convolution layers disassembled into a first convolution layer and a second convolution layer based on the size of the convolution layers, the first and second convolution layers having a size smaller than the convolution layers, the first convolution layer having a size smaller than the second convolution layer, and a disassembly unit configured to disassemble the convolution layers into a convolution group based on the number of the first and second convolution layers. The operation effect of the system embodiment during operation may be referred to the operation effect of the method embodiment, and will not be described herein.

The foregoing detailed description of the embodiments of the present application further illustrates the purposes, technical solutions and advantageous effects of the embodiments of the present application, and it should be understood that the foregoing is merely a specific implementation of the embodiments of the present application, and is not intended to limit the scope of the embodiments of the present application, and any modifications, equivalent substitutions, improvements, etc. made on the basis of the technical solutions of the embodiments of the present application should be included in the scope of the embodiments of the present application.

Claims

1. The sparse convolution training method for the point cloud data is characterized by comprising the following steps of:

Acquiring point cloud data to be trained;

preprocessing the point cloud data to obtain a target convolution layer;

Extracting the characteristics of the convolution layer group;

And obtaining target characteristics based on the target values.

2. The method for sparse convolution training of point cloud data according to claim 1, wherein the step of preprocessing the point cloud data to obtain a target convolution layer comprises:

and obtaining a target convolution layer based on the 2D data.

3. The method of claim 1, wherein the step of disassembling the convolutional layer into a convolutional layer set comprises:

Acquiring the size of the target convolution layer;

4. The method for training sparse convolution of point cloud data according to claim 1, wherein the step of normalizing the features to obtain values and thresholds corresponding to the features comprises:

Normalizing the features to obtain values corresponding to the features;

Based on the values, obtaining training values related to the threshold;

5. The method for training sparse convolution of point cloud data according to claim 1, wherein said step of screening said values to obtain target values using said threshold value comprises:

if the value is smaller than the threshold value, deleting the value;

if the value is greater than the threshold, retaining the value;

and obtaining a target value based on the reserved value.

6. The method for sparse convolution training of point cloud data according to claim 1, wherein after the step of obtaining the target value, the method comprises:

And adding a nonlinear element into the target value.

7. The method for training sparse convolution of point cloud data according to claim 1, wherein the step of obtaining the target feature based on the target value comprises:

And splicing the target feature graphs to obtain target features.

8. The method for sparse convolution training of point cloud data according to claim 7, wherein said step of deconvoluting said target values to obtain a target feature map comprises:

Obtaining deconvolution interpolation according to the target value;

9. A point cloud data sparse convolution training system, comprising:

the acquisition module is used for acquiring point cloud data to be trained;

10. The point cloud data sparse convolution training system of claim 9, wherein said disassembly module comprises:

A first obtaining unit, configured to obtain a size of the convolution layer;