[go: up one dir, main page]

CN119399578A - A point cloud data sparse convolution training method and system - Google Patents

A point cloud data sparse convolution training method and system Download PDF

Info

Publication number
CN119399578A
CN119399578A CN202411428100.4A CN202411428100A CN119399578A CN 119399578 A CN119399578 A CN 119399578A CN 202411428100 A CN202411428100 A CN 202411428100A CN 119399578 A CN119399578 A CN 119399578A
Authority
CN
China
Prior art keywords
target
convolution
point cloud
cloud data
convolution layer
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202411428100.4A
Other languages
Chinese (zh)
Other versions
CN119399578B (en
Inventor
林军
吴凯
陈弘炜
王佳新
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University
Original Assignee
Nanjing University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University filed Critical Nanjing University
Priority to CN202411428100.4A priority Critical patent/CN119399578B/en
Publication of CN119399578A publication Critical patent/CN119399578A/en
Application granted granted Critical
Publication of CN119399578B publication Critical patent/CN119399578B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/082Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/09Supervised learning
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/10Image acquisition
    • G06V10/16Image acquisition using multiple overlapping images; Image stitching
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Biophysics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Image Analysis (AREA)

Abstract

The application provides a point cloud data sparse convolution training method and system, the method comprises the steps of obtaining point cloud data to be trained, carrying out preprocessing operation on the point cloud data to obtain a target convolution layer, disassembling the target convolution layer into a convolution layer group, wherein the convolution layer group comprises a plurality of convolution layers with sizes smaller than that of the target convolution layer, extracting features of the convolution layer group, carrying out normalization processing on the features to obtain values corresponding to the features and a threshold value, screening the values by utilizing the threshold value to obtain target values, and obtaining target features based on the target values to solve the problems that when feature extraction is carried out on the point cloud data at present, the sparsity of the point cloud data is lost, the calculation amount of the point cloud data is huge, the training time length and the training size of a model are greatly increased, and meanwhile the risk of fitting is improved.

Description

Point cloud data sparse convolution training method and system
Technical Field
The application relates to the technical field of point cloud data, in particular to a sparse convolution training method and system for point cloud data.
Background
Point cloud data are now being widely used in many leading research fields such as automatic driving, robots and the like, the research on such data has become the next hot research direction, and deep learning technology has played a dominant role in many research fields, particularly in solving the two-dimensional vision problem.
In the current point cloud target detection, a cylinder (pilar) based method is included, the method processes a 3D data point cloud through conversion into a 2D pseudo image, and the method is operated by introducing two-dimensional convolution, so that the purpose of completing real-time detection is achieved.
The point cloud target detection based on the pilar generally adopts two-dimensional convolution for feature extraction, and the data point cloud is converted into a 2D pseudo image and then sent into a plurality of two-dimensional convolution layers for feature extraction. The strategy can directly use the existing mode of two-dimensional image processing, but after one layer of convolution, the characteristic sparsity of the maximum point cloud data is almost lost, but is converted into a dense graph to be calculated, under the condition that the number of point clouds is huge, the calculated amount is huge, meanwhile, in some large-core convolution layers, the calculated amount is further increased, the parameter is also increased in square level, the training time length and the training size of the model are greatly increased, and meanwhile, the risk of overfitting is improved.
Disclosure of Invention
The application provides a sparse convolution training method and system for point cloud data, which are used for solving the technical problems that the sparsity of the point cloud data is lost when the characteristic extraction is carried out on the point cloud data at present, so that the calculated amount of the point cloud data is huge, the training time length and the training size of a model are greatly increased, and meanwhile, the risk of over fitting is improved.
The first aspect of the application provides a sparse convolution training method for point cloud data, which comprises the following steps:
Acquiring point cloud data to be trained;
preprocessing the point cloud data to obtain a target convolution layer;
The target convolution layer is disassembled into a convolution layer group, wherein the convolution layer group comprises a plurality of convolution layers with the size smaller than that of the target convolution layer;
Extracting the characteristics of the convolution layer group;
Normalizing the features to obtain a numerical value and a threshold value corresponding to the features;
screening the numerical value by utilizing the threshold value to obtain a target numerical value;
And obtaining target characteristics based on the target values.
In some embodiments, the step of preprocessing the point cloud data to obtain a target convolution layer includes:
converting the point cloud data from 3D data to 2D data by using a PCA algorithm;
and obtaining a target convolution layer based on the 2D data.
In some embodiments, the step of disassembling the convolutional layer into a convolutional layer group comprises:
Acquiring the size of the target convolution layer;
Based on the size of the target convolution layer, the number of the target convolution layers which are disassembled into a first convolution layer and a second convolution layer is obtained, wherein the sizes of the first convolution layer and the second convolution layer are smaller than the size of the target convolution layer;
And disassembling the target convolution layer into a convolution layer group based on the number of the first convolution layers and the number of the second convolution layers.
In some embodiments, the step of normalizing the feature to obtain a value and a threshold corresponding to the feature includes:
Normalizing the features to obtain values corresponding to the features;
Based on the values, obtaining training values related to the threshold;
and training the training numerical value by using a loss function to obtain a threshold value.
In some embodiments, the step of screening the value to obtain a target value using the threshold value includes:
if the value is smaller than the threshold value, deleting the value;
if the value is greater than the threshold, retaining the value;
and obtaining a target value based on the reserved value.
In some embodiments, after the step of obtaining the target value, the method comprises:
And adding a nonlinear element into the target value.
In some embodiments, the step of obtaining the target feature based on the target value includes:
Performing deconvolution operation on the target value to obtain a target feature map;
And splicing the target feature graphs to obtain target features.
In some embodiments, the step of deconvoluting the target value to obtain a target feature map includes:
Obtaining deconvolution interpolation according to the target value;
and performing deconvolution operation on the target value by using the deconvolution interpolation to obtain a target feature map, wherein the target feature map has the same size.
The second aspect of the present application provides a sparse convolution training system for point cloud data, comprising:
the acquisition module is used for acquiring point cloud data to be trained;
The preprocessing module is used for preprocessing the point cloud data to obtain a target convolution layer;
the disassembly module is used for disassembling the target convolution layer into a convolution layer group, wherein the convolution layer group comprises a plurality of convolution layers with the size smaller than that of the target convolution layer;
the extraction module is used for extracting the characteristics of the convolution layer group;
The normalization module is used for carrying out normalization processing on the characteristics to obtain a numerical value and a threshold value corresponding to the characteristics;
the screening module is configured to screen the numerical value by utilizing the threshold value to obtain a target numerical value;
and the generating module is configured to obtain a target characteristic based on the target value.
In some embodiments, the disassembly module comprises:
A first obtaining unit, configured to obtain a size of the convolution layer;
A second acquisition unit configured to acquire the number of the first convolution layers and the number of the second convolution layers, which are disassembled based on the size of the convolution layers, the size of the first convolution layers and the size of the second convolution layers being smaller than the size of the convolution layers;
and a deconstructing unit configured to deconstruct the convolutional layers into a convolutional layer group based on the number of the first convolutional layers and the number of the second convolutional layers.
The application provides a point cloud data sparse convolution training method and system, the method comprises the steps of obtaining point cloud data to be trained, carrying out preprocessing operation on the point cloud data to obtain a target convolution layer, disassembling the target convolution layer into a convolution layer group, wherein the convolution layer group comprises a plurality of convolution layers with sizes smaller than that of the target convolution layer, extracting features of the convolution layer group, carrying out normalization processing on the features to obtain values corresponding to the features and a threshold value, screening the values by utilizing the threshold value to obtain target values, and obtaining target features based on the target values to realize that sparsity of the point cloud data is reduced by a small margin when feature extraction is carried out on the point cloud data, so that calculation efficiency of the point cloud data is improved, training duration and size of a model are reduced, and meanwhile probability of occurrence of overfitting is reduced.
Drawings
In order to more clearly illustrate the technical solution of the present application, the drawings that are needed in the embodiments will be briefly described below, and it will be obvious to those skilled in the art that other drawings can be obtained from these drawings without inventive effort.
FIG. 1 is a flow chart of a sparse convolution training method for point cloud data in the application;
FIG. 2 is a schematic diagram of a pruned sparse convolution layer in accordance with the present application;
FIG. 3 is a schematic diagram of a sparse convolution layer deconstructed convolution layer in accordance with the present application;
Fig. 4 is a schematic diagram of a training process of point cloud data according to the present application.
Detailed Description
In order to make the technical solution of the present application better understood by those skilled in the art, the technical solution of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application, and it is apparent that the described embodiments are only some embodiments of the present application, but not all embodiments of the present application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present application without making any inventive effort, shall fall within the scope of the present application.
Because in some technologies, when the feature extraction is carried out on the point cloud data, the sparsity of the point cloud data can disappear, so that the calculation amount of the point cloud data is huge, the training time length and the training size of the model are greatly increased, meanwhile, the risk of overfitting is improved, in order to solve the technical problem, the application provides a point cloud data sparse convolution training method and system, and the point cloud data sparse convolution training method and system are explained as follows:
With the widespread development of 3D acquisition and 3D sensors, point cloud data can be obtained more widely and more easily, and provides more information than 2D data, but also brings more stress to the calculation and storage of data. The 3D point cloud data is processed, and because of special properties such as big data property, three-dimensional property, sparsity, asymmetry and the like, the processing of the point cloud needs larger computational complexity and memory consumption, and the existing 2D data processing method cannot be directly moved during processing, but is re-optimized on the original deep learning method, so that the method is suitable for processing the 3D point cloud.
Point cloud data is now being widely used in many leading research fields, such as autopilot, robotics, etc., and research on such data has become the next hot research direction. Deep learning techniques have also played a dominant role in many areas of research, particularly with great success in solving two-dimensional vision problems. In current Point cloud target detection, the Point-based methods, voxel-based methods, cylinder (Pillar) -based methods, and Transformer-based methods can be generally classified into several categories. The method mainly focuses on a cylinder-based method and mainly benefits from the advantages of the cylinder, wherein the method can introduce two-dimensional convolution to operate by converting 3D point cloud data into 2D pseudo images, so that the method is faster and more likely to fulfill the aim of real-time detection.
The method is characterized in that the characteristic extraction is carried out by adopting two-dimensional convolution based on the point cloud target detection of the column body, point cloud data are converted into 2D pseudo images, and then the 2D pseudo images are sent into a plurality of two-dimensional convolution layers for characteristic extraction. The strategy can directly use the existing mode of two-dimensional image processing, but after one layer of convolution, the characteristic sparsity of the maximum point cloud data is almost lost, but is converted into a dense graph to be calculated, under the condition that the quantity of the point cloud data is huge, the calculated quantity is huge, and in a plurality of large-core convolution layers, the calculated quantity is further increased, and meanwhile, the parameter quantity is also increased in square level, so that the training time length and the model size are greatly increased, and meanwhile, the risk of overfitting is improved.
As can be seen from fig. 1, in order to solve the above problems, the first aspect of the present application provides a sparse convolution training method for point cloud data, which includes the following steps:
And S100, acquiring point cloud data to be trained, wherein the point cloud data which is subjected to pretreatment is input, the point cloud target detection model is obtained after training, and the accuracy of the input point cloud data is ensured by verifying the actual effect of the model and the drawing of a detection frame on a corresponding software training platform.
The method comprises the steps of S200, preprocessing point cloud data to obtain a target convolution layer, converting the 3D point cloud data into a 2D pseudo image, namely the target convolution layer, by using a cylinder (Pillar) -based method, specifically, in the characteristic coding stage, performing projection on a top view plane to enable the point cloud data to be changed into a pseudo 2D image, namely dividing the pseudo 2D image into H multiplied by W grids on a projection screen, wherein each grid is a column, and then performing linear layer up-scaling (C, P, N) and maximum pooling (C, P) on stacked Pillar tensors of (D, P, N) to obtain a two-dimensional pseudo image, namely the target convolution layer.
The step of preprocessing the point cloud data to obtain a target convolution layer comprises the following substeps:
The method comprises the steps of S210, converting the point cloud data from 3D data to 2D data by using a PCA algorithm, wherein the point cloud data can also convert the point cloud data from 3D data to 2D data by using the PCA algorithm, specifically, firstly, carrying out standardization processing on the 3D sparse data so that each feature has the same scale, which is helpful for improving the effect of PCA dimension reduction, and finally, carrying out dimension reduction processing on the 3D data by using the PCA algorithm to obtain the 2D data. In this process, the two principal components that remain the most dominant may be selected to construct 2D data.
And S220, obtaining a target convolution layer based on the 2D data. The 2D data is the target convolutional layer.
And S300, the target convolution layer is disassembled into a convolution layer group, the convolution layer group comprises a plurality of convolution layers with the sizes smaller than the target convolution layer, and the resolution strategy enables the network to keep a larger receptive field, simultaneously reduces the calculation complexity, improves the model performance, reduces the risk of overfitting and the difficulty of optimization, and improves the friendliness to a hardware platform.
The step of disassembling the convolutional layer into a convolutional layer group comprises the following substeps:
the method comprises the steps of S310, obtaining the size of the target convolution layer, S320, obtaining the number of first convolution layers and the number of second convolution layers which are disassembled based on the size of the target convolution layer, wherein the sizes of the first convolution layers and the second convolution layers are smaller than the size of the target convolution layer, the size of the first convolution layer is smaller than the size of the second convolution layer, and S330, disassembling the target convolution layer into a convolution layer group based on the number of the first convolution layers and the number of the second convolution layers. As shown in fig. 3, in the process of disassembling the target convolutional layer into a convolutional layer group, the size of the target convolutional layer, such as n×n, is first obtained, if the sizes of the first convolutional layer and the second convolutional layer, which are disassembled into the target convolutional layer, are 3×3 and 2×2, the number of the first convolutional layer and the number of the second convolutional layer, which are disassembled into the target convolutional layer, are first obtained, wherein the number of the disassembly is based on the complete conversion of the target convolutional layer, and the target convolutional layer is converted into a plurality of convolutional layers with sizes of 3×3 and 2×2 according to the number of the disassembly, so that the problems of increased computational complexity, excessive smoothness, excessive fitting risk, low hardware efficiency and the like caused by large-core convolution are reduced. It should be noted that the sizes of the convolution layers in the convolution layer group are not limited to the above 3×3 and 2×2 sizes, and may also include convolution layers with 4×4 or 1×1 sizes or any combination of the above convolution layers with respect to reducing the problem of increasing the computational complexity caused by large-core convolution.
S400, extracting the characteristics of the convolution layer group, and disassembling the large-core convolution layer into a plurality of small-core convolution layers, so that the problems of increased computational complexity, excessive smoothness, excessive fitting risk, low hardware efficiency and the like caused by large-core convolution are reduced, and the computational efficiency is greatly improved.
The method comprises the steps of S500, carrying out normalization processing on the features to obtain values and threshold values corresponding to the features, carrying out normalization processing on the features, scaling the features to a section of 0-1, wherein the purpose of feature normalization is to convert feature values of different scales to the same scale, so that training speed of a model is increased, performance of the model is improved, carrying out training on the features related to the threshold values by using a loss function, continuously deriving and iterating the values in training to finally generate the threshold values, preparing for subsequent screening of the values, preventing accuracy reduction of the features while improving sparsity, judging which values can be deleted and which values need to be reserved through the threshold values at the training position, balancing between data accuracy and sparsity, and reducing the sparsity of the data while guaranteeing the data accuracy.
The step of normalizing the feature to obtain a value and a threshold corresponding to the feature comprises the following substeps:
And S510, carrying out normalization processing on the features to obtain values corresponding to the features, and carrying out normalization processing on the features, wherein the normalization processing is scaled to a range of 0-1, and the purpose of feature normalization is to convert feature values of different scales to the same scale, so that the training speed of the model is increased, and the performance of the model is improved.
And S520, acquiring a training value related to the threshold value based on the value, and S530, training the training value by using a loss function to acquire the threshold value. The relevant characteristics of the threshold value are trained by using a loss function, the values can be continuously derived and iterated in the training to finally generate the threshold value, preparation is made for subsequent screening of the values, the reduction of the precision of the characteristics is prevented while the sparsity is prevented, the trained threshold value can judge which values can be deleted and which values need to be reserved, balance between the data precision and the sparsity is achieved, and the sparsity of the data is reduced while the data precision is ensured.
And S600, screening the numerical value by using the threshold value to obtain a target numerical value, wherein the related item of the pruning threshold value is added in the loss function, so that the optimal pruning condition, namely the threshold value, can be continuously updated in the training process, and finally, parameters are recorded for network reasoning. The algorithm has stable sparsity after each layer of calculation, but not tends to be dense, and the calculation complexity is greatly reduced. Experiments show that compared with the sparsity of the original convolution layers of each layer, the target value obtained through the threshold value screening is reduced by 20%, so that the sparsity of the convolution layers obtained through the target value conversion is reduced by a large amount of calculation amount, and the loss of precision is only 0.21%.
The step of screening the numerical value by using the threshold value to obtain a target numerical value comprises the following substeps:
and S610, deleting the numerical value if the numerical value is smaller than the threshold value, S620, reserving the numerical value if the numerical value is larger than the threshold value, and S630, obtaining a target numerical value based on the reserved numerical value. And deleting the value smaller than the threshold value through the threshold value obtained through the loss function training, reserving the value larger than the threshold value, obtaining a target value based on the reserved value, and reducing the sparseness of the data while ensuring the data precision through a convolution layer obtained through the target value conversion, thereby reducing a large amount of calculation amount.
After the step of obtaining the target value, the method comprises the following substeps:
And S640, adding nonlinear elements into the target values. For sparse data, nonlinear elements may help the model better capture critical information in the data. By adding nonlinear elements into the point cloud data, the characteristic information of the point cloud data can be extracted more effectively, and preparation is made for the subsequent acquisition of target characteristics.
And S700, obtaining target characteristics based on the target numerical values. As shown in fig. 4, to generate the final detection result, the target feature is then sent to a convolution layer in the SSD detection head to further extract the high-level features. These features are then passed to a classifier and regressor for predicting the class and location of the target.
The step of obtaining the target feature based on the target value comprises the following substeps:
And S710, performing deconvolution operation on the target value to obtain a target feature map, and specifically, defining deconvolution layers firstly, defining a deconvolution layer in a deep learning framework. This layer will contain parameters of the deconvolution kernel size, step size, padding etc., secondly, inputting the target values by taking the target values (low resolution feature map) as input to the deconvolution layer, and finally, performing forward propagation by the deconvolution layer to generate the high resolution target feature map.
The step of deconvolution operation is carried out on the target value to obtain a target feature map, and comprises the following substeps:
S711, obtaining deconvolution interpolation according to the target value, S712, carrying out deconvolution operation on the target value by utilizing the deconvolution interpolation to obtain a target feature map, wherein the sizes of the target feature maps are the same. If the target feature images with the same size are required to be obtained, parameters of deconvolution layers, such as step size and filling parameters, which directly affect the size of the output feature images, are required to be set, and the target feature images with the same size can be obtained through the parameters, so that the splicing work of the subsequent target features is performed.
And S720, splicing the target feature graphs to obtain target features. Channel stitching merges the channels of different feature maps together. For example, if the two feature maps are the same size (i.e., the same height and width) but different channel numbers, they can be stitched together along the channel dimension to form a new feature map with more channels. The feature map obtained by the stitching operation is regarded as a target feature that contains information from different sources or layers, providing a basis for information for subsequent classification, regression or other tasks.
The pruning sparse convolution layer is mainly composed of four layers, namely a normal sparse convolution layer, a regularized normalization layer, a core pruning layer, a threshold value-related training value and a loss function, wherein the four layers are processed, the first layer is a normal sparse convolution layer, features of the target convolution layer are extracted, the second layer is normalized and normalized, the features are scaled to a range from 0 to 1 to obtain values corresponding to the features, the third layer is a core pruning layer, the values are used for obtaining the threshold value-related training value, the loss function is used for training the training value to obtain a threshold value, the threshold value is used for screening the values to obtain a target value, experiments show that the sparsity corresponding to the target value is basically maintained to be about 20%, and the fourth layer is a nonlinear layer in a common neural network, and nonlinear elements are added into the network. The pruning sparse convolution layer can be used for randomly replacing the convolution layer in the current network, so that hardware equipment is not required to be arranged independently.
The application provides a point cloud data sparse convolution training method based on pilar, which comprises a high-efficiency trainable pruning layer and large-kernel convolution splitting, wherein the pruning layer is added after feature extraction of each layer, and a trainable pruning threshold value added into a loss function is arranged, so that the output of each layer is pruned, the convolved result is not rapidly expanded to a dense chart, the calculated amount is reduced, sparse convolution is used at the moment, the calculation efficiency is greatly improved, and the method is decomposed into common superposition of 1x1, 2x2 and 3x3 convolution layers when encountering large convolution kernels, so that the problems of increased calculation complexity, excessive smoothness, fitting risk and low hardware efficiency and the like caused by large-kernel convolution are reduced. The method has low requirements on a training platform, is compatible with preprocessing means of common point cloud data, and can efficiently perform model training and testing. The sparse convolution training method for the point cloud data can obtain a software model with considerable accuracy, sparsity and training speed, and is friendly to compatibility of corresponding hardware acceleration platforms.
The second aspect of the application provides a point cloud data sparse convolution training system, which comprises an acquisition module, a preprocessing module, a disassembly module and a generation module, wherein the acquisition module is used for acquiring point cloud data to be trained, the preprocessing module is used for preprocessing the point cloud data to obtain a target convolution layer, the disassembly module is used for disassembling the target convolution layer into a convolution layer group, the convolution layer group comprises a plurality of convolution layers with the size smaller than that of the target convolution layer, the extraction module is used for extracting characteristics of the convolution layer group, the normalization module is used for carrying out normalization processing on the characteristics to obtain a numerical value corresponding to the characteristics and a threshold value, the screening module is used for screening the numerical value by utilizing the threshold value to obtain a target numerical value, and the generation module is used for obtaining target characteristics based on the target numerical value. The operation effect of the system embodiment during operation may be referred to the operation effect of the method embodiment, and will not be described herein.
In this embodiment, the disassembly module includes a first acquisition unit configured to acquire a size of the convolution layers, a second acquisition unit configured to acquire a number of the convolution layers disassembled into a first convolution layer and a second convolution layer based on the size of the convolution layers, the first and second convolution layers having a size smaller than the convolution layers, the first convolution layer having a size smaller than the second convolution layer, and a disassembly unit configured to disassemble the convolution layers into a convolution group based on the number of the first and second convolution layers. The operation effect of the system embodiment during operation may be referred to the operation effect of the method embodiment, and will not be described herein.
The foregoing detailed description of the embodiments of the present application further illustrates the purposes, technical solutions and advantageous effects of the embodiments of the present application, and it should be understood that the foregoing is merely a specific implementation of the embodiments of the present application, and is not intended to limit the scope of the embodiments of the present application, and any modifications, equivalent substitutions, improvements, etc. made on the basis of the technical solutions of the embodiments of the present application should be included in the scope of the embodiments of the present application.

Claims (10)

1. The sparse convolution training method for the point cloud data is characterized by comprising the following steps of:
Acquiring point cloud data to be trained;
preprocessing the point cloud data to obtain a target convolution layer;
The target convolution layer is disassembled into a convolution layer group, wherein the convolution layer group comprises a plurality of convolution layers with the size smaller than that of the target convolution layer;
Extracting the characteristics of the convolution layer group;
Normalizing the features to obtain a numerical value and a threshold value corresponding to the features;
screening the numerical value by utilizing the threshold value to obtain a target numerical value;
And obtaining target characteristics based on the target values.
2. The method for sparse convolution training of point cloud data according to claim 1, wherein the step of preprocessing the point cloud data to obtain a target convolution layer comprises:
converting the point cloud data from 3D data to 2D data by using a PCA algorithm;
and obtaining a target convolution layer based on the 2D data.
3. The method of claim 1, wherein the step of disassembling the convolutional layer into a convolutional layer set comprises:
Acquiring the size of the target convolution layer;
Based on the size of the target convolution layer, the number of the target convolution layers which are disassembled into a first convolution layer and a second convolution layer is obtained, wherein the sizes of the first convolution layer and the second convolution layer are smaller than the size of the target convolution layer;
And disassembling the target convolution layer into a convolution layer group based on the number of the first convolution layers and the number of the second convolution layers.
4. The method for training sparse convolution of point cloud data according to claim 1, wherein the step of normalizing the features to obtain values and thresholds corresponding to the features comprises:
Normalizing the features to obtain values corresponding to the features;
Based on the values, obtaining training values related to the threshold;
and training the training numerical value by using a loss function to obtain a threshold value.
5. The method for training sparse convolution of point cloud data according to claim 1, wherein said step of screening said values to obtain target values using said threshold value comprises:
if the value is smaller than the threshold value, deleting the value;
if the value is greater than the threshold, retaining the value;
and obtaining a target value based on the reserved value.
6. The method for sparse convolution training of point cloud data according to claim 1, wherein after the step of obtaining the target value, the method comprises:
And adding a nonlinear element into the target value.
7. The method for training sparse convolution of point cloud data according to claim 1, wherein the step of obtaining the target feature based on the target value comprises:
Performing deconvolution operation on the target value to obtain a target feature map;
And splicing the target feature graphs to obtain target features.
8. The method for sparse convolution training of point cloud data according to claim 7, wherein said step of deconvoluting said target values to obtain a target feature map comprises:
Obtaining deconvolution interpolation according to the target value;
and performing deconvolution operation on the target value by using the deconvolution interpolation to obtain a target feature map, wherein the target feature map has the same size.
9. A point cloud data sparse convolution training system, comprising:
the acquisition module is used for acquiring point cloud data to be trained;
The preprocessing module is used for preprocessing the point cloud data to obtain a target convolution layer;
the disassembly module is used for disassembling the target convolution layer into a convolution layer group, wherein the convolution layer group comprises a plurality of convolution layers with the size smaller than that of the target convolution layer;
the extraction module is used for extracting the characteristics of the convolution layer group;
The normalization module is used for carrying out normalization processing on the characteristics to obtain a numerical value and a threshold value corresponding to the characteristics;
the screening module is configured to screen the numerical value by utilizing the threshold value to obtain a target numerical value;
and the generating module is configured to obtain a target characteristic based on the target value.
10. The point cloud data sparse convolution training system of claim 9, wherein said disassembly module comprises:
A first obtaining unit, configured to obtain a size of the convolution layer;
A second acquisition unit configured to acquire the number of the first convolution layers and the number of the second convolution layers, which are disassembled based on the size of the convolution layers, the size of the first convolution layers and the size of the second convolution layers being smaller than the size of the convolution layers;
and a deconstructing unit configured to deconstruct the convolutional layers into a convolutional layer group based on the number of the first convolutional layers and the number of the second convolutional layers.
CN202411428100.4A 2024-10-12 2024-10-12 Point cloud data sparse convolution training method and system Active CN119399578B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202411428100.4A CN119399578B (en) 2024-10-12 2024-10-12 Point cloud data sparse convolution training method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202411428100.4A CN119399578B (en) 2024-10-12 2024-10-12 Point cloud data sparse convolution training method and system

Publications (2)

Publication Number Publication Date
CN119399578A true CN119399578A (en) 2025-02-07
CN119399578B CN119399578B (en) 2025-11-11

Family

ID=94421441

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202411428100.4A Active CN119399578B (en) 2024-10-12 2024-10-12 Point cloud data sparse convolution training method and system

Country Status (1)

Country Link
CN (1) CN119399578B (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190147335A1 (en) * 2017-11-15 2019-05-16 Uber Technologies, Inc. Continuous Convolution and Fusion in Neural Networks
CN115862000A (en) * 2022-12-22 2023-03-28 重庆长安汽车股份有限公司 Target detection method, device, vehicle and storage medium
CN117408325A (en) * 2023-10-31 2024-01-16 慧之安信息技术股份有限公司 A model compression method and system based on BN layer sparse pruning algorithm
CN117910535A (en) * 2024-01-24 2024-04-19 江苏大学 Asymmetric knowledge distillation target detection system for vehicle-road collaboration
CN118314567A (en) * 2024-04-03 2024-07-09 常州大学 3D object detection method and model based on cascade feature fusion
CN118429655A (en) * 2024-04-29 2024-08-02 燕山大学 3D object detection method based on column sequence attention and dilated convolution
CN118736288A (en) * 2024-06-18 2024-10-01 西安电子科技大学 A single-stage point cloud density-aware focused convolutional 3D object detection method

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190147335A1 (en) * 2017-11-15 2019-05-16 Uber Technologies, Inc. Continuous Convolution and Fusion in Neural Networks
CN115862000A (en) * 2022-12-22 2023-03-28 重庆长安汽车股份有限公司 Target detection method, device, vehicle and storage medium
CN117408325A (en) * 2023-10-31 2024-01-16 慧之安信息技术股份有限公司 A model compression method and system based on BN layer sparse pruning algorithm
CN117910535A (en) * 2024-01-24 2024-04-19 江苏大学 Asymmetric knowledge distillation target detection system for vehicle-road collaboration
CN118314567A (en) * 2024-04-03 2024-07-09 常州大学 3D object detection method and model based on cascade feature fusion
CN118429655A (en) * 2024-04-29 2024-08-02 燕山大学 3D object detection method based on column sequence attention and dilated convolution
CN118736288A (en) * 2024-06-18 2024-10-01 西安电子科技大学 A single-stage point cloud density-aware focused convolutional 3D object detection method

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
MINJAE L.等: "PillarAcc: Sparse PointPillars Accelerator for Real-Time Point Cloud 3D Object Detection on Edge Devices", ARXIV, 15 May 2023 (2023-05-15) *
陈奔 等: "基于动态图卷积的混合注意力点云特征学习网络", 计算机技术与发展, 10 October 2023 (2023-10-10) *

Also Published As

Publication number Publication date
CN119399578B (en) 2025-11-11

Similar Documents

Publication Publication Date Title
CN116994140B (en) Cultivated land extraction method, device, equipment and medium based on remote sensing image
CN110322453B (en) 3D point cloud semantic segmentation method based on position attention and auxiliary network
CN114092833A (en) Remote sensing image classification method and device, computer equipment and storage medium
EP4471737A1 (en) Face pose estimation method and apparatus, electronic device, and storage medium
CN114581789B (en) A hyperspectral image classification method and system
CN115880529A (en) Bird fine-grained classification method and system based on attention and decoupling knowledge distillation
CN112819832B (en) Fine-grained boundary extraction method for semantic segmentation of urban scenes based on laser point cloud
CN117079098A (en) A spatial small target detection method based on position coding
CN113657214B (en) Building damage assessment method based on Mask RCNN
CN118644666B (en) An image processing method and system for remote sensing target detection scenes
CN118781320A (en) SAR target detection method based on multi-scale perception and Transformer-assisted attention generation
CN117475080A (en) Battlefield target three-dimensional reconstruction and damage evaluation method based on multi-source information fusion
CN112766123A (en) Crowd counting method and system based on criss-cross attention network
CN119784692A (en) A wind turbine blade crack defect detection method based on feature recombination network
CN115457379B (en) Remote sensing image road extraction method and system combining semantic segmentation and angle prediction
CN116805415A (en) Cage broiler health status identification method based on lightweight improved YOLOv5
CN120318499B (en) UAV target detection method and electronic equipment based on cross-spatial frequency domain
CN112084912B (en) Face feature point positioning method and system based on self-adaptive information enhancement
CN119399578B (en) Point cloud data sparse convolution training method and system
CN113610015A (en) Attitude estimation method, device and medium based on end-to-end fast ladder network
Lü et al. Tree Detection Algorithm Based on Embedded YOLO Lightweight Network
CN119295823A (en) PointNet++ 3D point cloud object classification method integrating contextual anchor attention mechanism
CN119048740A (en) Coal gangue intelligent identification method based on selective state space equation
Lin et al. Attention EdgeConv for 3D point cloud classification
CN118552717A (en) Three-dimensional target detection method, system, equipment and medium based on multi-modal feature fusion convolution

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant