CN114821074B - Airborne LiDAR point cloud semantic segmentation method, electronic equipment and storage medium - Google Patents
Airborne LiDAR point cloud semantic segmentation method, electronic equipment and storage medium Download PDFInfo
- Publication number
- CN114821074B CN114821074B CN202210764481.8A CN202210764481A CN114821074B CN 114821074 B CN114821074 B CN 114821074B CN 202210764481 A CN202210764481 A CN 202210764481A CN 114821074 B CN114821074 B CN 114821074B
- Authority
- CN
- China
- Prior art keywords
- point cloud
- feature vector
- hyperspectral
- cloud data
- lidar point
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/26—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
- G06F18/253—Fusion techniques of extracted features
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/764—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/80—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
- G06V10/806—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Data Mining & Analysis (AREA)
- Health & Medical Sciences (AREA)
- Multimedia (AREA)
- Software Systems (AREA)
- Computing Systems (AREA)
- General Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- General Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Medical Informatics (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Evolutionary Biology (AREA)
- Molecular Biology (AREA)
- Mathematical Physics (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Image Processing (AREA)
Abstract
The invention discloses a semantic segmentation method for airborne LiDAR point clouds, which comprises the steps of fusing the LiDAR point clouds with hyperspectral images to obtain hyperspectral point clouds; performing feature extraction on each group of LiDAR point cloud data to obtain a first feature vector; each group of LiDAR point cloud data and hyperspectral point cloud data is coded to obtain LiDAR point cloud feature vectors and hyperspectral feature vectors of each coding stage; fusing the LiDAR point cloud feature vector and the hyperspectral feature vector of the current encoding stage with the fused feature vector of the previous decoding stage by adopting an A-MLP layer, inputting the fused feature vector into the next decoding stage, and obtaining a second feature vector after decoding is finished; and after the first eigenvector and the second eigenvector are spliced, the third eigenvector is obtained through calculation of the full connection layer, each point is segmented and labeled according to the third eigenvector, and the accuracy of complex scene classification is improved.
Description
Technical Field
The invention belongs to the technical field of remote sensing image processing, and particularly relates to an airborne LiDAR point cloud semantic segmentation method based on hyperspectral-space enhancement, electronic equipment and a storage medium.
Background
With the wide application of unmanned aerial vehicles, laser radars and spectrum technologies, the direct acquisition of three-dimensional space and hyperspectral information by adopting a low-altitude remote sensing technology gradually becomes a research hotspot in the fields of mapping and agriculture. Airborne LiDAR technology has been widely used in the fields of digital elevation model extraction (DEM), digital Surface Model (DSM), and forest vegetation statistics. Point cloud (three-dimensional space) semantic segmentation is the basis of a point cloud processing technology, high-accuracy segmentation and classification of ground objects in airborne LiDAR point cloud are key steps for generating subsequent surveying and mapping products, and the method plays an important role in the field of airborne LiDAR point cloud processing. Airborne LiDAR point clouds have the following three characteristics:
(1) The airborne LiDAR point cloud has a large number of ground points and ground objects such as trees, buildings and the like, and the geometrical characteristics of the point cloud are complicated;
(2) The airborne LiDAR point cloud has a large coverage range, the types of ground objects contained in the airborne LiDAR point cloud are rich and complex, the airborne LiDAR point cloud has not only buildings with large scale, but also trees, automobiles, telegraph poles and the like with small scale, and the multi-scale characteristics are obvious;
(3) The elevation of the airborne LiDAR point cloud not only contains the terrain change of the earth surface, but also contains the elevation features of the ground objects, and the features of the elevation dimensions have strong distinctiveness.
Because the airborne LiDAR point cloud target has the characteristics of the complicated space geometric structure, obvious multi-scale change and the like, the current point cloud semantic segmentation algorithm is low in accuracy and poor in robustness. At present, a general semantic segmentation algorithm only comprises three-dimensional coordinates and RGB (red, green and blue) information of point cloud, and the local geometric structure information of the point cloud and the sensing capability of ground feature fine categories are lacked. In the face of a complex geometrical structure of airborne LiDAR point clouds, the classification accuracy of the algorithm is reduced due to the input of a model with single attribute, and how to improve the perception capability of the network on the geometrical structure of the point clouds and the fine types of ground objects is the key for further improving the classification accuracy. Meanwhile, the ground feature scales of different categories in the airborne LiDAR point cloud are different, and if the deep learning network with the difference of multi-scale features is neglected, the classification accuracy of small-scale targets is low, and the whole classification accuracy is difficult to improve.
Disclosure of Invention
Aiming at the technical problems in the prior art, the invention provides an airborne liDAR point cloud semantic segmentation method, electronic equipment and a storage medium, wherein an airborne liDAR point cloud and hyperspectral image fusion technology is adopted, the dimension of the airborne liDAR point cloud is enlarged, hyperspectral attributes are fused into the point cloud, the semantic content in the liDAR point cloud is greatly improved, the auxiliary information content in a channel attention module is enhanced, and the classification accuracy is improved.
The invention solves the technical problems through the following technical scheme: an airborne LiDAR point cloud semantic segmentation method comprises the following steps:
fusing the acquired LiDAR point cloud and the hyperspectral image to obtain a hyperspectral point cloud aligned with the LiDAR point cloud;
the LiDAR point clouds, the hyperspectral point clouds are equally divided into F groups, each group of LiDAR point cloud data is (N,b) Each set of hyperspectral point cloud data is (N,q) Wherein N is the number of points in each group of point cloud data,bfor the feature dimension of each point in each set of LiDAR point cloud data,qthe spectral characteristic dimension of each point in each group of hyperspectral point cloud data is obtained;
performing feature extraction on each group of LiDAR point cloud data by using a GAM (Gamma-ray model) to obtain a first feature vector;
respectively encoding each group of LiDAR point cloud data and each group of hyperspectral point cloud data by adopting a PointSIFT network to respectively obtain LiDAR point cloud feature vectors and hyperspectral feature vectors of each encoding stage;
in the decoding process, fusing the LiDAR point cloud feature vector and the hyperspectral feature vector of the current encoding stage with the fused feature vector of the previous decoding stage, inputting the fused feature vector into the next decoding stage for decoding, and obtaining a second feature vector after decoding is finished;
and splicing the corresponding first eigenvector and the second eigenvector, then calculating through a full connection layer to obtain a third eigenvector, and segmenting and labeling each point according to the third eigenvector.
Further, the concrete implementation process of fusing the acquired LiDAR point cloud and the hyperspectral image is as follows:
performing orthographic projection on the LiDAR point cloud in the elevation direction to form an orthographic projection image;
matching the orthographic projection image with the hyperspectral image to obtain a coordinate transformation parameter;
correcting the hyperspectral image according to the coordinate transformation parameter to obtain a corrected hyperspectral image;
and searching spectral reflection information of the corrected hyperspectral image according to the horizontal coordinate of the LiDAR point cloud, and replacing laser reflection information of each point in the LiDAR point cloud with the spectral reflection information to form a hyperspectral point cloud aligned with the LiDAR point cloud.
Further, the specific implementation process of extracting the features of each group of LiDAR point cloud data by using the GAM model is as follows:
for each point in each set of LiDAR point cloud dataSearch for itkA nearest neighbor point according toAnd itkConstructing a covariance matrix by the nearest neighbors, and solving an eigenvalue and an eigenvector of the covariance matrix;
constructing geometric features from eigenvalues of the covariance matrix, based on the pointsAnd itkConstructing elevation features from elevation values of nearest neighbor points, and forming points by the geometric features and the elevation featuresThe feature vector of (2);
layer-to-layer using MLPsThe characteristic vector is subjected to high-dimensional mapping, and then normalization processing is carried out on the characteristic dimension by adopting an activation function to obtain each pointAnd further obtaining a first feature vector corresponding to the set of LiDAR point cloud data.
Further, the specific implementation process of encoding LiDAR point cloud data or hyperspectral point cloud data by using the PointSIFT network is as follows:
calculating each group of point cloud data by adopting a full-connection layer to obtain a characteristic vector (N, 32), wherein each group of point cloud data is LiDAR point cloud data or hyperspectral point cloud data;
down-sampling and feature extracting the feature vector (N, 32) to obtain a feature vector (N/4, 64) of a first coding stage;
down-sampling and feature extracting the feature vector (N/4, 64) to obtain a feature vector (N/16, 128) of a second encoding stage;
down-sampling and feature extracting are carried out on the feature vector (N/16, 128) to obtain a feature vector (N/64, 256) of a third coding stage;
and performing down-sampling and feature extraction on the feature vector (N/64, 256) to obtain a feature vector (N/128, 512) of a fourth encoding stage.
Further, the specific implementation process of the decoding is as follows:
splicing the feature vector (N/128, 512) of the LiDAR point cloud data in the fourth encoding stage with the feature vector (N/128, 512) of the hyperspectral point cloud data in the fourth encoding stage, and then processing the spliced feature vector by an MLP layer to obtain a first fusion feature vector;
splicing a feature vector (N/64, 256) of the LiDAR point cloud data in a third encoding stage with a feature vector (N/64, 256) of the hyperspectral point cloud data in the third encoding stage, and then fusing the spliced feature vector with a feature vector obtained by upsampling the first fused feature vector by adopting an A-MLP layer to obtain a second fused feature vector of a first decoding stage;
splicing a feature vector (N/16, 128) of LiDAR point cloud data in a second coding stage with a feature vector (N/16, 128) of hyperspectral point cloud data in the second coding stage, and then fusing the spliced feature vector with a feature vector obtained by up-sampling the second fused feature vector by adopting an A-MLP layer to obtain a third fused feature vector in a second decoding stage;
splicing the characteristic vector (N/4, 64) of the LiDAR point cloud data in the first coding stage and the characteristic vector (N/4, 64) of the hyperspectral point cloud data in the first coding stage, and then fusing the spliced characteristic vector and the characteristic vector subjected to upsampling on the third fused characteristic vector by adopting an A-MLP layer to obtain a fourth fused characteristic vector of a third decoding stage, wherein the fourth fused characteristic vector is the second characteristic vector.
Further, the specific implementation process of fusing the LiDAR point cloud feature vector and the hyperspectral feature vector at the current encoding stage and the fused feature vector at the previous decoding stage is as follows:
splicing the LiDAR point cloud characteristic vector and the hyperspectral characteristic vector at the current encoding stage to obtain a spliced characteristic vectorX(ii) a The fused feature vector of the previous decoding stage is up-sampled to obtain the feature vectorY;
For the weight coefficientCarrying out redistribution to obtain the distributed weight coefficient(ii) a Wherein the redistribution formula is:
using the assigned weight coefficientsFor the spliced feature vectorXAnd the upsampled feature vectorYAnd fusing to obtain corresponding fusion characteristic vectors, wherein the specific fusion formula is as follows:
wherein Z is a fusion feature vector.
Further, the CAM module is adopted to solve the feature vectorXOr feature vectorYThe specific implementation process of the weight coefficient is as follows:
feature vectorXOr feature vectorYAfter the average pooling layer processing, the feature vector (1,C) (ii) a Wherein the feature vectorXOr feature vectorYThe size of (a) is (N,C);
the feature vectors (1,C) The feature vector (N,C) Respectively and sequentially processing the first convolution layer, the first normalization layer, the activation layer, the second convolution layer and the second normalization layer, and performing addition operation to obtain a weight coefficientOr。
The invention also provides an electronic device comprising a memory and a processor, wherein the memory stores a computer program capable of running on the processor, and the processor executes the steps of the onboard liDAR point cloud semantic segmentation method when running the computer program.
The present invention also provides a computer-readable storage medium, which is a non-volatile storage medium or a non-transitory storage medium, having stored thereon a computer program which, when executed by a processor, performs the steps of the onboard liDAR point cloud semantic segmentation method as described above.
Advantageous effects
Compared with the prior art, the invention has the advantages that:
1. aiming at the geometrical characteristic of complicated cloud labyrinth of airborne LiDAR (light detection and ranging) points, the fact that the information of the current airborne remote sensing LiDAR point cloud for semantic enhancement understanding is very limited except for the elevation information characteristic is considered, the airborne LiDAR point cloud and hyperspectral image fusion technology is adopted, the dimensionality of the airborne LiDAR point cloud is enlarged, hyperspectral attributes are fused into the point cloud, the semantic connotation in the LiDAR point cloud is greatly improved, the auxiliary information quantity in a channel attention module (CAM module) is enhanced, and the geometrical characteristic understanding of the semantic to the complicated cloud labyrinth is greatly enhanced.
2. Aiming at a complex topographic and geomorphic environment scene, the problem of low performance of a current conventional PointNet network based on a spatial attention enhancement mechanism when the onboard remote sensing LiDAR point cloud semantic segmentation is carried out is considered, a spectral attribute channel attention module is added and is fused with the LiDAR point cloud for processing, the fusion of laser intensity, echo and spectral attribute is emphasized on a network channel, and the adaptability to an intricate complex scene and the segmentation and classification accuracy are greatly improved;
3. in consideration of the characteristics of remarkable multi-scale change of an airborne LiDAR point cloud target in an intricate scene and the like, the conventional attention enhancement-based deep learning network adopts a classical coding-decoding structure, although different scale features are utilized, the scale perception capability of the network is enhanced in a certain range, and the difference of the different scale features is ignored in the fixed weight ratio. Aiming at the problems that the algorithm convergence performance is poor when targets with different scales are trained, and the adaptability to multi-scale and large-difference scenes is poor, the method and the system give different weight coefficients to the features of different levels aiming at the targets with different scales, through continuous training, a deep learning network gives a higher weight coefficient to the low-level features of small-scale targets and a higher weight coefficient to the deep features of large-scale targets, and therefore the adaptability of the network is greatly improved.
Drawings
In order to more clearly illustrate the technical solution of the present invention, the drawings required to be used in the description of the embodiments are briefly introduced below, it is obvious that the drawings in the following description are only one embodiment of the present invention, and it is obvious for those skilled in the art to obtain other drawings without creative efforts.
Fig. 1 is a schematic structural diagram of a GAM module in an embodiment of the present invention;
FIG. 2 is a flow chart of encoding and decoding according to an embodiment of the present invention;
FIG. 3 is a schematic structural diagram of an AFF module according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of a CAM bank structure according to an embodiment of the present invention.
Detailed Description
The technical solutions in the present invention are clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The embodiment of the invention provides an airborne LiDAR point cloud semantic segmentation method, which comprises the steps of firstly, acquiring LiDAR point cloud data and hyperspectral data by an airborne platform; projecting the LiDAR point cloud onto a hyperspectral image (namely coordinate alignment) by taking the hyperspectral image as a reference coordinate, and back-projecting hyperspectral data onto the LiDAR point cloud through a projection relation so as to expand the characteristic dimension of the point cloud; then, respectively encoding point cloud data and hyperspectral data by two encoders; splicing corresponding output characteristics of the two encoders, inputting the spliced output characteristics into a decoder, performing characteristic fusion with the output of a corresponding layer of the decoder by adopting an attention mechanism, and then continuing decoding; meanwhile, the point cloud data is subjected to a geometrical-aware Module (GAM Module) to obtain a better point cloud local geometrical structural expression; and finally, fusing the final output of the decoder with the output of the GAM module, and inputting the fused features into a classifier to obtain a final classification score (namely, a classification probability). The invention improves the accuracy of complex scene classification.
The semantic segmentation method specifically comprises the following steps:
step 1: preliminary fusion of airborne LiDAR point cloud and hyperspectral image
The LiDAR point cloud information only reflects the three-dimensional positions and the laser reflection information of the ground and the ground object points, the hyperspectral image reflects the reflection information of each spectrum of the ground and the ground object points, and the spectrum reflection information can classify different kinds of substances. Therefore, the onboard liDAR point cloud and the hyperspectral image are fused to endow the hyperspectral image information to the liDAR point cloud, and the specific implementation process is as follows:
step 1.1: performing orthographic projection on the LiDAR point cloud in an elevation direction (namely the Z-axis direction of a coordinate system where the point cloud is located) to form an orthographic projection image;
step 1.2: matching the orthographic projection image with the hyperspectral image to obtain a coordinate transformation parameter;
step 1.3: correcting the hyperspectral image according to the coordinate transformation parameters to obtain a corrected hyperspectral image;
step 1.4: and searching spectral reflection information of the corrected hyperspectral image according to the horizontal coordinate of the LiDAR point cloud, and replacing laser reflection information of each point in the LiDAR point cloud with the spectral reflection information to form a hyperspectral point cloud aligned with the LiDAR point cloud.
If the characteristic dimension of each point in the liDAR point cloud is originally 5 dimensions, namely three-dimensional coordinates, laser intensity and echo number, and if the liDAR point cloud is a q-dimensional hyperspectral image, the characteristic dimension of each point in the hyperspectral point cloud is 3+ q dimensions, namely three-dimensional coordinate information and three-dimensional spectral information.
Step 2: data packet
The LiDAR point clouds and the hyperspectral point clouds are equally divided into F groups, and each group of LiDAR point cloud data is (N,b) Each group of hyperspectrumThe point cloud data is (N,q) Wherein N is the number of points in each group of point cloud data,bfor the feature dimension of each point in each set of LiDAR point cloud data,qand the spectral feature dimension of each point in each group of hyperspectral point cloud data is obtained. In this example, each set of LiDAR point cloud data is (N, 5), each set of hyperspectral point cloud data is (N,q). And (3) respectively inputting each group of LiDAR point cloud data and each group of corresponding hyperspectral point cloud data, and executing steps 3 to 5.
And 3, step 3: feature extraction using geometric perception techniques
Geometric perception is carried out on coordinate information (N, 3) in each group of LiDAR point cloud data (N, 5) by using a GAM (GAM: geometric-aware Module), namely 18-dimensional Geometric features are extracted by a GP (GP: geometric primitive, space Geometric description Module) to obtain first feature vectors (N, 32), and as shown in figure 1, the specific implementation process is as follows:
step 3.1: for each point in each set of LiDAR point cloud dataSearch for itkA nearest neighbor point according toAnd itkConstructing a covariance matrix M by the nearest neighbors, and solving three non-negative eigenvalues of the covariance matrix MAnd feature vectors。
Step 3.2: and constructing the geometric characteristics according to the eigenvalues of the covariance matrix.
Construction of linearity using eigenvaluesFlatness of the surfaceDispersion, dispersionAll aspects of the inventionCharacteristic entropyAnisotropy of the compositionSum of characteristic valuesAnd rate of change of curvatureThe specific structural formulas are shown in tables 1 and 2.
TABLE 1 structural formula of geometric characteristics
TABLE 2 structural formula of geometric characteristics
Step 3.3: according to the pointAnd itkConstructing elevation features from the elevation values of the nearest neighbors, and forming points by the geometric features and the elevation featuresThe feature vector of (2).
Using the maximum elevation value in the neighborhoodInter-neighborhood elevation minimumWithin a neighborhood elevation rangeDifference value of elevation minimum value of current point and neighborhood point(Z is the elevation value of the current point), the average value of the elevations in the neighborhoodAnd variance of elevation within neighborhoodTo characterize elevation features.
Combining the geometric features and the elevation features, the point cloud space structure is as follows:
for 15-dimensional feature vectors, MLPs layer-to-point are usedFeature vector ofHigh-dimensional mapping is performed, then on the feature dimensionCarrying out normalization processing by adopting softmax function to obtain deep layer characteristicsIs given by the weighting coefficient matrixFinally, the optimized deep features are obtained through element-by-element multiplication, so that the perception capability of the network to the point cloud local geometric structure is enhanced, and the specific formula is as follows:
wherein M is a GAM module which is composed of a plurality of layers of perceptron layers, BN layers and ReLU activation functions, N represents a normalization layer based on the softmax activation function,representing point-by-point multiplication.
And 4, step 4: coding and decoding by PointSIFT network
According to the PointSIFT network idea, the encoding process is that the hyperspectral point cloud and the LiDAR point cloud are respectively carried out, and the hyperspectral point cloud and the LiDAR point cloud are spliced and then decoded together in the decoding process. The coding process adopts a point feature description PointSIFT module and a Set Abstraction (SA) module to be combined in a down-sampling stage or a down-sampling stage, and the decoding process adopts a characteristic propagation (FP) module and a point feature description PointSIFT module. Different from the general point cloud processing semantic segmentation technology, the method comprises the following steps: the hyperspectral point cloud and LiDAR point cloud information are fused in the encoding and decoding process, and then are respectively encoded and then are fused together for decoding, so that the laser reflection information and the hyperspectral information in the same position area are fused, and meanwhile, compared with a method for fusing the information into the high-dimensional information point cloud, the method has the advantage that the calculation cost is greatly reduced.
The encoding and decoding process includes an encoding process and a decoding process, as shown in FIG. 2, the PointSIFT network structure is also called an encoding-decoding structure, and the left and right sides of the structure are respectively LiDAR point cloud data and hyperspectral point cloud data from top to bottomThe encoder/encoding process, the intermediate bottom-up flow is the decoder/decoding process. An encoder comprising convolutional layers + downsampled layers stacked multiple times, the output characterized by: the spatial dimensions are getting smaller, the channel dimensions are getting larger ((N,c) N in (b) is getting smaller and smaller,clarger and larger). That is, the original point cloud is encoded into high-dimensional (channel) features; the decoder is the opposite, i.e. the process of gradually restoring the spatial scale.
The specific implementation process of the encoding process of each group of LiDAR point cloud data or hyperspectral point cloud data is as follows:
step 4.1: calculating each group of point cloud data (N, 5) by adopting a full connection layer FC to obtain a characteristic vector (N, 32), wherein each group of point cloud data is LiDAR point cloud data or hyperspectral point cloud data;
step 4.2: down-sampling and feature extracting the feature vector (N, 32) to obtain a feature vector (N/4, 64) of a first coding stage;
step 4.3: down-sampling and feature extracting the feature vector (N/4, 64) to obtain a feature vector (N/16, 128) of a second coding stage;
step 4.4: carrying out down-sampling and feature extraction on the feature vector (N/16, 128) to obtain a feature vector (N/64, 256) of a third coding stage;
step 4.5: and (5) performing down-sampling and feature extraction on the feature vector (N/64, 256) to obtain a feature vector (N/128, 512) of a fourth encoding stage.
The specific implementation process of decoding is as follows:
step 4.6: splicing the feature vector (N/128, 512) of the LiDAR point cloud data in the fourth encoding stage with the feature vector (N/128, 512) of the hyperspectral point cloud data in the fourth encoding stage, and then processing the spliced feature vector by an MLP layer to obtain a first fusion feature vector;
step 4.7: splicing a feature vector (N/64, 256) of LiDAR point cloud data in a third coding stage with a feature vector (N/64, 256) of hyperspectral point cloud data in the third coding stage, and then fusing the spliced feature vector with a feature vector obtained by up-sampling the first fused feature vector by adopting an A-MLP layer to obtain a second fused feature vector of a first decoding stage;
step 4.8: splicing the feature vector (N/16, 128) of the LiDAR point cloud data in the second encoding stage with the feature vector (N/16, 128) of the hyperspectral point cloud data in the second encoding stage, and then fusing the spliced feature vector with the feature vector subjected to upsampling on the second fused feature vector by adopting an A-MLP layer to obtain a third fused feature vector in the second decoding stage;
step 4.9: splicing the characteristic vector (N/4, 64) of the LiDAR point cloud data in the first coding stage with the characteristic vector (N/4, 64) of the hyperspectral point cloud data in the first coding stage, and then fusing the spliced characteristic vector and the characteristic vector subjected to upsampling on the third fused characteristic vector by adopting an A-MLP layer to obtain a fourth fused characteristic vector of the third decoding stage, wherein the fourth fused characteristic vector is the second characteristic vector (N, 32).
The PointSIFT network is described by using different scale features, the multi-scale sensing capability of the network can be expanded, and the difference of the multi-scale features is not considered due to the fact that the PointSIFT network adopts the fusion of a fixed weight ratio. To this end, the present invention employs an AFF (enhanced Feature Fusion) module based on traditional channel attention enhancement, which combines Feature Fusion with channel attention. The AFF module considers features of two groups of different scale levels(X is a small-scale feature extracted in the PointSIFT network encoding process (i.e., a feature vector corresponding to each encoding stage), Y is a large-scale feature propagated in the PointSIFT network decoding process (i.e., a fused feature vector corresponding to each decoding stage), C and N respectively represent the number of channels of the feature and the size of the point cloud), and the structural diagram is shown in FIG. 3, wherein CAM is a Channel Attention Module (Channel Attention Module), and the structural diagram of CAM is shown in FIG. 4, wherein AvgPool is Average Power, i.e., an Average Pooling layerThe calculation is an average over the entire spatial scale, i.e. (N, C) becomes (1, C) after this layer; conv is the convolutional layer; reLu is a ReLu activation function and is a nonlinear function, so that the reasoning process of the whole network is changed into nonlinearity; BN is a Batch Norm layer, namely a normalization layer, and is used for normalizing data distribution; reshape is a shape transformation, or can be marked as Repeat, i.e. the originalIs transformed intoThe purpose of the data is to make the two data consistent in structure so as to carry out arithmetic operations such as addition and the like; sigmoid is a Sigmoid activation function for mapping data distribution toSuch that its meaning may represent probability, weight, etc.
The CAM is used to convert a certain feature into the weight of each element thereof, and the AFF module calculates the weight of the two features and performs weighting processing on the basis of the CAM, so as to realize the specific implementation process of fusing the spliced feature vector and the up-sampled feature vector by using the CAFF module (CAM module + AFF module) at each decoding stage as shown in fig. 3:
a, step a: respectively solving spliced feature vectors by using CAM moduleXUpsampled feature vectorYWeight coefficient of (2);
Step b: for the weight coefficientCarrying out redistribution to obtain the distributed weight coefficient(ii) a Wherein the redistribution formula is:
step c: using the assigned weight coefficientsFor the spliced feature vectorXAnd the upsampled feature vectorYAnd fusing to obtain corresponding fusion characteristic vectors, wherein the specific fusion formula is as follows:
wherein Z is a fusion feature vector.
The differences from general are: and the small-scale feature X extracted by the network in the coding process and the large-scale feature Y in the feature propagation process are respectively fused and then redistributed through the weight coefficient obtained by the channel attention enhancement module. For multi-scale point cloud, the AFF module considers context information, cross-fuses small-scale and large-scale weights, and gives different weight coefficients to features of different levels, so that continuous iterative training is realized, a network can give a higher low-level feature weight coefficient to a small-scale target, and a higher deep-level feature weight coefficient to a large-scale target.
As shown in fig. 4, the specific implementation process of solving the weight coefficient by using the CAM module is as follows:
step a1: feature vectorXOr feature vectorYAfter the average pooling layer processing, the feature vector (1,C) (ii) a Wherein the feature vectorXOr feature vectorYThe size of (a) is (N,C);
step a2: the feature vector (1,C) The feature vector (N,C) Respectively sequentially processing the first convolution layer, the first normalization layer, the activation layer, the second convolution layer and the second normalization layer, and performing addition operation to obtain a weight coefficientOr。
And 5: and splicing the first eigenvector (N, 32) and the second eigenvector (N, 32), calculating through a full connection layer FC to obtain a third eigenvector (N, 9), and segmenting and labeling each point according to the third eigenvector (N, 9). Among the 9-dimensional features of the third feature vector (N, 9), three dimensions are coordinate information, and the other 6 dimensions are segmentation types.
The method is based on point feature description, combines a GAM module, considers characteristics of laser point reflection attributes, surface feature spectral attributes and the like, and adds a geometric perception module (GAM), a CAM module and an AFF module (namely an A-MLP layer comprises the CAM module and the AFF module) on the basis of conventional PointSIFT to construct a new network model CAFF-PointNet.
Note: f1 score F-score = precision recall 2/(precision + recall)
The above disclosure is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of changes or modifications within the technical scope of the present invention, and shall be covered by the scope of the present invention.
Claims (7)
1. An airborne LiDAR point cloud semantic segmentation method is characterized by comprising the following steps:
fusing the acquired LiDAR point cloud and the hyperspectral image to obtain a hyperspectral point cloud aligned with the LiDAR point cloud;
equally dividing the LiDAR point clouds and the hyperspectral point clouds into F groups, wherein each group of LiDAR point cloud data is (N, b), each group of hyperspectral point cloud data is (N, q), N is the number of points in each group of point cloud data, b is the characteristic dimension of each point in each group of LiDAR point cloud data, and q is the spectral characteristic dimension of each point in each group of hyperspectral point cloud data;
performing feature extraction on each group of LiDAR point cloud data by using a GAM (gamma ray modeling) model to obtain a first feature vector; the specific implementation process of extracting the features of each group of LiDAR point cloud data by adopting the GAM model comprises the following steps:
for each point p in each set of LiDAR point cloud data i Search its k nearest neighbors, according to point p i Constructing a covariance matrix by the k nearest neighbors, and solving an eigenvalue and an eigenvector of the covariance matrix;
constructing geometric features from eigenvalues of the covariance matrix, from the points p i And constructing elevation features by using the elevation values of k nearest neighbor points, and forming a point p by using the geometric features and the elevation features i The feature vector of (2);
using MLPs layer-to-point p i The characteristic vector is subjected to high-dimensional mapping, and then normalization processing is carried out on the characteristic dimension by adopting an activation function to obtain each point p i Further obtaining a first feature vector corresponding to the set of LiDAR point cloud data;
respectively encoding each group of LiDAR point cloud data and each group of hyperspectral point cloud data by adopting a PointSIFT network to respectively obtain LiDAR point cloud characteristic vectors and hyperspectral characteristic vectors of each encoding stage;
in the decoding process, fusing the LiDAR point cloud feature vector and the hyperspectral feature vector of the current encoding stage with the fused feature vector of the previous decoding stage, inputting the fused feature vectors into the next decoding stage for decoding, and obtaining a second feature vector after decoding is completed; the specific implementation process of the fusion is as follows: splicing the LiDAR point cloud feature vector and the hyperspectral feature vector in the current encoding stage to obtain a spliced feature vector X; performing up-sampling on the fused feature vector in the previous decoding stage to obtain a feature vector Y; respectively solving the weight coefficients w of the eigenvector X and the eigenvector Y by adopting a CAM module x 、w y (ii) a For the weight coefficient w x 、w y Carrying out redistribution to obtain a distributed weight coefficient w gx 、w gy (ii) a Wherein the redistribution formula is:
w gx =(w x +1-w y )/2,w gy =(w y +1-w x )/2
using the assigned weight coefficient w gx 、w gy Fusing the spliced characteristic vector X and the upsampled characteristic vector Y to obtain a corresponding fusion characteristic vector, wherein a specific fusion formula is as follows:wherein Z is a fusion feature vector;
and splicing the corresponding first eigenvector and the second eigenvector, then calculating through a full connection layer to obtain a third eigenvector, and segmenting and labeling each point according to the third eigenvector.
2. The method according to claim 1, wherein the fusing of the acquired LiDAR point cloud and the hyperspectral image is implemented by:
performing orthographic projection on the LiDAR point cloud in the elevation direction to form an orthographic projection image;
matching the orthographic projection image with the hyperspectral image to obtain a coordinate transformation parameter;
correcting the hyperspectral image according to the coordinate transformation parameter to obtain a corrected hyperspectral image;
and searching spectral reflection information of the corrected hyperspectral image according to the horizontal coordinate of the LiDAR point cloud, and replacing laser reflection information of each point in the LiDAR point cloud with the spectral reflection information to form a hyperspectral point cloud aligned with the LiDAR point cloud.
3. The method of semantic segmentation of airborne liDAR point clouds of claim 1, wherein the encoding of liDAR point cloud data or hyper-spectral point cloud data using a PointSIFT network is performed by:
calculating each group of point cloud data by adopting a full-connection layer to obtain a characteristic vector (N, 32), wherein each group of point cloud data is LiDAR point cloud data or hyperspectral point cloud data;
down-sampling and feature extracting the feature vector (N, 32) to obtain a feature vector (N/4, 64) of a first coding stage;
down-sampling and feature extracting the feature vector (N/4, 64) to obtain a feature vector (N/16, 128) of a second encoding stage;
down-sampling and feature extracting are carried out on the feature vector (N/16, 128) to obtain a feature vector (N/64, 256) of a third coding stage;
and performing down-sampling and feature extraction on the feature vector (N/64, 256) to obtain a feature vector (N/128, 512) of a fourth encoding stage.
4. The method of semantic segmentation of airborne liDAR point cloud according to claim 3, wherein the decoding is performed by:
splicing the feature vector (N/128, 512) of the LiDAR point cloud data in the fourth encoding stage with the feature vector (N/128, 512) of the hyperspectral point cloud data in the fourth encoding stage, and then processing the spliced feature vector and the hyperspectral point cloud data in an MLP layer to obtain a first fusion feature vector;
splicing a feature vector (N/64, 256) of the LiDAR point cloud data in a third encoding stage with a feature vector (N/64, 256) of the hyperspectral point cloud data in the third encoding stage, and then fusing the spliced feature vector with a feature vector obtained by upsampling the first fused feature vector by adopting an A-MLP layer to obtain a second fused feature vector of a first decoding stage; wherein, the A-MLP layer comprises a CAM module and an AFF module;
splicing a feature vector (N/16, 128) of LiDAR point cloud data in a second coding stage with a feature vector (N/16, 128) of hyperspectral point cloud data in the second coding stage, and then fusing the spliced feature vector with a feature vector obtained by up-sampling the second fused feature vector by adopting an A-MLP layer to obtain a third fused feature vector in a second decoding stage;
splicing the characteristic vector (N/4, 64) of the LiDAR point cloud data in the first coding stage and the characteristic vector (N/4, 64) of the hyperspectral point cloud data in the first coding stage, and then fusing the spliced characteristic vector and the characteristic vector subjected to upsampling on the third fused characteristic vector by adopting an A-MLP layer to obtain a fourth fused characteristic vector of a third decoding stage, wherein the fourth fused characteristic vector is the second characteristic vector.
5. The method for semantic segmentation of airborne liDAR point clouds of claim 1, wherein the specific implementation process of solving the weight coefficients of the feature vector X or the feature vector Y by using the CAM module is as follows:
processing the characteristic vector X or the characteristic vector Y through an average pooling layer to obtain a characteristic vector (1, C); wherein the size of the feature vector X or the feature vector Y is (N, C);
the characteristic vectors (1, C) and the characteristic vectors (N, C) are respectively processed by a first convolution layer, a first normalization layer, an activation layer, a second convolution layer and a second normalization layer in sequence and then added to obtain a weight coefficient w x Or w y 。
6. An electronic device comprising a memory and a processor, the memory having stored thereon a computer program operable on the processor, the processor when executing the computer program performing the steps of the onboard liDAR point cloud semantic segmentation method of any of claims 1-5.
7. A computer-readable storage medium, being a non-volatile storage medium or a non-transitory storage medium, having stored thereon a computer program, characterized in that the computer program, when being executed by a processor, performs the steps of the onboard liDAR point cloud semantic segmentation method according to any of claims 1 to 5.
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202210764481.8A CN114821074B (en) | 2022-07-01 | 2022-07-01 | Airborne LiDAR point cloud semantic segmentation method, electronic equipment and storage medium |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202210764481.8A CN114821074B (en) | 2022-07-01 | 2022-07-01 | Airborne LiDAR point cloud semantic segmentation method, electronic equipment and storage medium |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| CN114821074A CN114821074A (en) | 2022-07-29 |
| CN114821074B true CN114821074B (en) | 2022-10-25 |
Family
ID=82523201
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN202210764481.8A Active CN114821074B (en) | 2022-07-01 | 2022-07-01 | Airborne LiDAR point cloud semantic segmentation method, electronic equipment and storage medium |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN114821074B (en) |
Families Citing this family (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN116416586B (en) * | 2022-12-19 | 2024-04-02 | 香港中文大学(深圳) | Map element sensing method, terminal and storage medium based on RGB point cloud |
| CN116597148B (en) * | 2023-06-02 | 2025-09-19 | 哈尔滨工业大学 | Method, storage medium and device for semantic segmentation of point cloud of airborne multispectral laser radar |
Family Cites Families (10)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN108846352B (en) * | 2018-06-08 | 2020-07-14 | 广东电网有限责任公司 | A kind of vegetation classification and identification method |
| CN111784699B (en) * | 2019-04-03 | 2024-06-18 | Tcl科技集团股份有限公司 | Method and device for carrying out target segmentation on three-dimensional point cloud data and terminal equipment |
| CN112287939B (en) * | 2020-10-29 | 2024-05-31 | 平安科技(深圳)有限公司 | Three-dimensional point cloud semantic segmentation method, device, equipment and medium |
| CN112749662B (en) * | 2021-01-14 | 2022-08-05 | 东南大学 | A lidar-based method for extracting drivable areas in unstructured environments |
| CN113496495B (en) * | 2021-06-25 | 2022-04-26 | 华中科技大学 | Medical image segmentation model building method capable of realizing missing input and segmentation method |
| CN113936011B (en) * | 2021-10-21 | 2024-08-02 | 上海交通大学 | CT image lung lobe image segmentation system based on attention mechanism |
| CN114120102B (en) * | 2021-11-03 | 2025-11-11 | 中国华能集团清洁能源技术研究院有限公司 | Boundary-optimized remote sensing image semantic segmentation method, device, equipment and medium |
| CN114092534B (en) * | 2021-11-19 | 2022-08-30 | 哈尔滨工业大学 | Hyperspectral image and laser radar data registration method and registration system |
| CN114255456B (en) * | 2021-11-23 | 2024-11-05 | 金陵科技学院 | Natural scene text detection method and system based on attention mechanism feature fusion and enhancement |
| CN114241110B (en) * | 2022-02-23 | 2022-06-03 | 北京邮电大学 | Point cloud semantic uncertainty sensing method based on neighborhood aggregation Monte Carlo inactivation |
-
2022
- 2022-07-01 CN CN202210764481.8A patent/CN114821074B/en active Active
Also Published As
| Publication number | Publication date |
|---|---|
| CN114821074A (en) | 2022-07-29 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN113283435B (en) | A Semantic Segmentation Method of Remote Sensing Image Based on Multi-scale Attention Fusion | |
| CN110728192B (en) | High-resolution remote sensing image classification method based on novel characteristic pyramid depth network | |
| CN114187450B (en) | Remote sensing image semantic segmentation method based on deep learning | |
| CN112464745B (en) | Feature identification and classification method and device based on semantic segmentation | |
| CN117314811A (en) | SAR-optical image fusion method based on hybrid model | |
| CN113344103B (en) | Hyperspectral remote sensing image ground object classification method based on hypergraph convolution neural network | |
| CN117576402B (en) | Deep learning-based multi-scale aggregation transducer remote sensing image semantic segmentation method | |
| CN119399545B (en) | Complex target recognition method based on deep twin self-attention network | |
| CN116524189A (en) | High-resolution remote sensing image semantic segmentation method based on coding and decoding indexing edge characterization | |
| CN114821074B (en) | Airborne LiDAR point cloud semantic segmentation method, electronic equipment and storage medium | |
| CN107423747A (en) | A kind of conspicuousness object detection method based on depth convolutional network | |
| CN118470333B (en) | Geographic environment semantic segmentation method and system based on remote sensing image | |
| CN116012349A (en) | Hyperspectral Image Unmixing Method Based on Minimum Simplex Volume Constraint and Transformer Structure | |
| CN117576483A (en) | Multi-source data fusion feature classification method based on multi-scale convolutional autoencoder | |
| CN115527062A (en) | Hyperspectral graph classification method based on long and short graph convolution network | |
| CN116469095B (en) | A Weather-Adaptive 3D Target Detection Method Based on Vision-Sensory Fusion | |
| Qian et al. | C3DGS: Compressing 3D Gaussian Model for Surface Reconstruction of Large-Scale Scenes Based on Multi-View UAV Images | |
| CN111798460A (en) | Satellite image segmentation method | |
| CN112017159A (en) | Ground target reality simulation method in remote sensing scene | |
| CN115546638A (en) | Change detection method based on Siamese cascaded differential neural network | |
| CN115937704A (en) | Remote sensing image road segmentation method based on topology perception neural network | |
| CN115909045A (en) | An Intelligent Recognition Method of Two-stage Landslide Atlas Features Based on Contrastive Learning | |
| CN105719323A (en) | Hyperspectral dimension reducing method based on map optimizing theory | |
| CN116503746B (en) | Infrared small target detection method based on multi-layer nested non-full mapping U-shaped network | |
| CN116844070B (en) | Point cloud data processing method for 3D modeling of UAV aerial photography based on multi-dimensional shape adaptive convolution |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| GR01 | Patent grant | ||
| GR01 | Patent grant |