[go: up one dir, main page]

CN114821074B - Airborne LiDAR point cloud semantic segmentation method, electronic equipment and storage medium - Google Patents

Airborne LiDAR point cloud semantic segmentation method, electronic equipment and storage medium Download PDF

Info

Publication number
CN114821074B
CN114821074B CN202210764481.8A CN202210764481A CN114821074B CN 114821074 B CN114821074 B CN 114821074B CN 202210764481 A CN202210764481 A CN 202210764481A CN 114821074 B CN114821074 B CN 114821074B
Authority
CN
China
Prior art keywords
point cloud
feature vector
hyperspectral
cloud data
lidar point
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210764481.8A
Other languages
Chinese (zh)
Other versions
CN114821074A (en
Inventor
李修庆
赵健康
王怀採
蔡晓程
谢才望
孔令威
赵丽芝
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hunan Shengding Technology Development Co ltd
Original Assignee
Hunan Shengding Technology Development Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hunan Shengding Technology Development Co ltd filed Critical Hunan Shengding Technology Development Co ltd
Priority to CN202210764481.8A priority Critical patent/CN114821074B/en
Publication of CN114821074A publication Critical patent/CN114821074A/en
Application granted granted Critical
Publication of CN114821074B publication Critical patent/CN114821074B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Evolutionary Biology (AREA)
  • Molecular Biology (AREA)
  • Mathematical Physics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Image Processing (AREA)

Abstract

The invention discloses a semantic segmentation method for airborne LiDAR point clouds, which comprises the steps of fusing the LiDAR point clouds with hyperspectral images to obtain hyperspectral point clouds; performing feature extraction on each group of LiDAR point cloud data to obtain a first feature vector; each group of LiDAR point cloud data and hyperspectral point cloud data is coded to obtain LiDAR point cloud feature vectors and hyperspectral feature vectors of each coding stage; fusing the LiDAR point cloud feature vector and the hyperspectral feature vector of the current encoding stage with the fused feature vector of the previous decoding stage by adopting an A-MLP layer, inputting the fused feature vector into the next decoding stage, and obtaining a second feature vector after decoding is finished; and after the first eigenvector and the second eigenvector are spliced, the third eigenvector is obtained through calculation of the full connection layer, each point is segmented and labeled according to the third eigenvector, and the accuracy of complex scene classification is improved.

Description

Airborne LiDAR point cloud semantic segmentation method, electronic equipment and storage medium
Technical Field
The invention belongs to the technical field of remote sensing image processing, and particularly relates to an airborne LiDAR point cloud semantic segmentation method based on hyperspectral-space enhancement, electronic equipment and a storage medium.
Background
With the wide application of unmanned aerial vehicles, laser radars and spectrum technologies, the direct acquisition of three-dimensional space and hyperspectral information by adopting a low-altitude remote sensing technology gradually becomes a research hotspot in the fields of mapping and agriculture. Airborne LiDAR technology has been widely used in the fields of digital elevation model extraction (DEM), digital Surface Model (DSM), and forest vegetation statistics. Point cloud (three-dimensional space) semantic segmentation is the basis of a point cloud processing technology, high-accuracy segmentation and classification of ground objects in airborne LiDAR point cloud are key steps for generating subsequent surveying and mapping products, and the method plays an important role in the field of airborne LiDAR point cloud processing. Airborne LiDAR point clouds have the following three characteristics:
(1) The airborne LiDAR point cloud has a large number of ground points and ground objects such as trees, buildings and the like, and the geometrical characteristics of the point cloud are complicated;
(2) The airborne LiDAR point cloud has a large coverage range, the types of ground objects contained in the airborne LiDAR point cloud are rich and complex, the airborne LiDAR point cloud has not only buildings with large scale, but also trees, automobiles, telegraph poles and the like with small scale, and the multi-scale characteristics are obvious;
(3) The elevation of the airborne LiDAR point cloud not only contains the terrain change of the earth surface, but also contains the elevation features of the ground objects, and the features of the elevation dimensions have strong distinctiveness.
Because the airborne LiDAR point cloud target has the characteristics of the complicated space geometric structure, obvious multi-scale change and the like, the current point cloud semantic segmentation algorithm is low in accuracy and poor in robustness. At present, a general semantic segmentation algorithm only comprises three-dimensional coordinates and RGB (red, green and blue) information of point cloud, and the local geometric structure information of the point cloud and the sensing capability of ground feature fine categories are lacked. In the face of a complex geometrical structure of airborne LiDAR point clouds, the classification accuracy of the algorithm is reduced due to the input of a model with single attribute, and how to improve the perception capability of the network on the geometrical structure of the point clouds and the fine types of ground objects is the key for further improving the classification accuracy. Meanwhile, the ground feature scales of different categories in the airborne LiDAR point cloud are different, and if the deep learning network with the difference of multi-scale features is neglected, the classification accuracy of small-scale targets is low, and the whole classification accuracy is difficult to improve.
Disclosure of Invention
Aiming at the technical problems in the prior art, the invention provides an airborne liDAR point cloud semantic segmentation method, electronic equipment and a storage medium, wherein an airborne liDAR point cloud and hyperspectral image fusion technology is adopted, the dimension of the airborne liDAR point cloud is enlarged, hyperspectral attributes are fused into the point cloud, the semantic content in the liDAR point cloud is greatly improved, the auxiliary information content in a channel attention module is enhanced, and the classification accuracy is improved.
The invention solves the technical problems through the following technical scheme: an airborne LiDAR point cloud semantic segmentation method comprises the following steps:
fusing the acquired LiDAR point cloud and the hyperspectral image to obtain a hyperspectral point cloud aligned with the LiDAR point cloud;
the LiDAR point clouds, the hyperspectral point clouds are equally divided into F groups, each group of LiDAR point cloud data is (N,b) Each set of hyperspectral point cloud data is (N,q) Wherein N is the number of points in each group of point cloud data,bfor the feature dimension of each point in each set of LiDAR point cloud data,qthe spectral characteristic dimension of each point in each group of hyperspectral point cloud data is obtained;
performing feature extraction on each group of LiDAR point cloud data by using a GAM (Gamma-ray model) to obtain a first feature vector;
respectively encoding each group of LiDAR point cloud data and each group of hyperspectral point cloud data by adopting a PointSIFT network to respectively obtain LiDAR point cloud feature vectors and hyperspectral feature vectors of each encoding stage;
in the decoding process, fusing the LiDAR point cloud feature vector and the hyperspectral feature vector of the current encoding stage with the fused feature vector of the previous decoding stage, inputting the fused feature vector into the next decoding stage for decoding, and obtaining a second feature vector after decoding is finished;
and splicing the corresponding first eigenvector and the second eigenvector, then calculating through a full connection layer to obtain a third eigenvector, and segmenting and labeling each point according to the third eigenvector.
Further, the concrete implementation process of fusing the acquired LiDAR point cloud and the hyperspectral image is as follows:
performing orthographic projection on the LiDAR point cloud in the elevation direction to form an orthographic projection image;
matching the orthographic projection image with the hyperspectral image to obtain a coordinate transformation parameter;
correcting the hyperspectral image according to the coordinate transformation parameter to obtain a corrected hyperspectral image;
and searching spectral reflection information of the corrected hyperspectral image according to the horizontal coordinate of the LiDAR point cloud, and replacing laser reflection information of each point in the LiDAR point cloud with the spectral reflection information to form a hyperspectral point cloud aligned with the LiDAR point cloud.
Further, the specific implementation process of extracting the features of each group of LiDAR point cloud data by using the GAM model is as follows:
for each point in each set of LiDAR point cloud data
Figure DEST_PATH_IMAGE001
Search for itkA nearest neighbor point according to
Figure 402279DEST_PATH_IMAGE001
And itkConstructing a covariance matrix by the nearest neighbors, and solving an eigenvalue and an eigenvector of the covariance matrix;
constructing geometric features from eigenvalues of the covariance matrix, based on the points
Figure 647316DEST_PATH_IMAGE001
And itkConstructing elevation features from elevation values of nearest neighbor points, and forming points by the geometric features and the elevation features
Figure 754949DEST_PATH_IMAGE001
The feature vector of (2);
layer-to-layer using MLPs
Figure 619000DEST_PATH_IMAGE001
The characteristic vector is subjected to high-dimensional mapping, and then normalization processing is carried out on the characteristic dimension by adopting an activation function to obtain each point
Figure 761268DEST_PATH_IMAGE001
And further obtaining a first feature vector corresponding to the set of LiDAR point cloud data.
Further, the specific implementation process of encoding LiDAR point cloud data or hyperspectral point cloud data by using the PointSIFT network is as follows:
calculating each group of point cloud data by adopting a full-connection layer to obtain a characteristic vector (N, 32), wherein each group of point cloud data is LiDAR point cloud data or hyperspectral point cloud data;
down-sampling and feature extracting the feature vector (N, 32) to obtain a feature vector (N/4, 64) of a first coding stage;
down-sampling and feature extracting the feature vector (N/4, 64) to obtain a feature vector (N/16, 128) of a second encoding stage;
down-sampling and feature extracting are carried out on the feature vector (N/16, 128) to obtain a feature vector (N/64, 256) of a third coding stage;
and performing down-sampling and feature extraction on the feature vector (N/64, 256) to obtain a feature vector (N/128, 512) of a fourth encoding stage.
Further, the specific implementation process of the decoding is as follows:
splicing the feature vector (N/128, 512) of the LiDAR point cloud data in the fourth encoding stage with the feature vector (N/128, 512) of the hyperspectral point cloud data in the fourth encoding stage, and then processing the spliced feature vector by an MLP layer to obtain a first fusion feature vector;
splicing a feature vector (N/64, 256) of the LiDAR point cloud data in a third encoding stage with a feature vector (N/64, 256) of the hyperspectral point cloud data in the third encoding stage, and then fusing the spliced feature vector with a feature vector obtained by upsampling the first fused feature vector by adopting an A-MLP layer to obtain a second fused feature vector of a first decoding stage;
splicing a feature vector (N/16, 128) of LiDAR point cloud data in a second coding stage with a feature vector (N/16, 128) of hyperspectral point cloud data in the second coding stage, and then fusing the spliced feature vector with a feature vector obtained by up-sampling the second fused feature vector by adopting an A-MLP layer to obtain a third fused feature vector in a second decoding stage;
splicing the characteristic vector (N/4, 64) of the LiDAR point cloud data in the first coding stage and the characteristic vector (N/4, 64) of the hyperspectral point cloud data in the first coding stage, and then fusing the spliced characteristic vector and the characteristic vector subjected to upsampling on the third fused characteristic vector by adopting an A-MLP layer to obtain a fourth fused characteristic vector of a third decoding stage, wherein the fourth fused characteristic vector is the second characteristic vector.
Further, the specific implementation process of fusing the LiDAR point cloud feature vector and the hyperspectral feature vector at the current encoding stage and the fused feature vector at the previous decoding stage is as follows:
splicing the LiDAR point cloud characteristic vector and the hyperspectral characteristic vector at the current encoding stage to obtain a spliced characteristic vectorX(ii) a The fused feature vector of the previous decoding stage is up-sampled to obtain the feature vectorY
Respectively solving feature vectors by using CAM moduleXFeature vectorYWeight coefficient of
Figure DEST_PATH_IMAGE002
For the weight coefficient
Figure DEST_PATH_IMAGE003
Carrying out redistribution to obtain the distributed weight coefficient
Figure DEST_PATH_IMAGE004
(ii) a Wherein the redistribution formula is:
Figure DEST_PATH_IMAGE005
using the assigned weight coefficients
Figure DEST_PATH_IMAGE006
For the spliced feature vectorXAnd the upsampled feature vectorYAnd fusing to obtain corresponding fusion characteristic vectors, wherein the specific fusion formula is as follows:
Figure DEST_PATH_IMAGE007
wherein Z is a fusion feature vector.
Further, the CAM module is adopted to solve the feature vectorXOr feature vectorYThe specific implementation process of the weight coefficient is as follows:
feature vectorXOr feature vectorYAfter the average pooling layer processing, the feature vector (1,C) (ii) a Wherein the feature vectorXOr feature vectorYThe size of (a) is (N,C);
the feature vectors (1,C) The feature vector (N,C) Respectively and sequentially processing the first convolution layer, the first normalization layer, the activation layer, the second convolution layer and the second normalization layer, and performing addition operation to obtain a weight coefficient
Figure DEST_PATH_IMAGE008
Or
Figure DEST_PATH_IMAGE009
The invention also provides an electronic device comprising a memory and a processor, wherein the memory stores a computer program capable of running on the processor, and the processor executes the steps of the onboard liDAR point cloud semantic segmentation method when running the computer program.
The present invention also provides a computer-readable storage medium, which is a non-volatile storage medium or a non-transitory storage medium, having stored thereon a computer program which, when executed by a processor, performs the steps of the onboard liDAR point cloud semantic segmentation method as described above.
Advantageous effects
Compared with the prior art, the invention has the advantages that:
1. aiming at the geometrical characteristic of complicated cloud labyrinth of airborne LiDAR (light detection and ranging) points, the fact that the information of the current airborne remote sensing LiDAR point cloud for semantic enhancement understanding is very limited except for the elevation information characteristic is considered, the airborne LiDAR point cloud and hyperspectral image fusion technology is adopted, the dimensionality of the airborne LiDAR point cloud is enlarged, hyperspectral attributes are fused into the point cloud, the semantic connotation in the LiDAR point cloud is greatly improved, the auxiliary information quantity in a channel attention module (CAM module) is enhanced, and the geometrical characteristic understanding of the semantic to the complicated cloud labyrinth is greatly enhanced.
2. Aiming at a complex topographic and geomorphic environment scene, the problem of low performance of a current conventional PointNet network based on a spatial attention enhancement mechanism when the onboard remote sensing LiDAR point cloud semantic segmentation is carried out is considered, a spectral attribute channel attention module is added and is fused with the LiDAR point cloud for processing, the fusion of laser intensity, echo and spectral attribute is emphasized on a network channel, and the adaptability to an intricate complex scene and the segmentation and classification accuracy are greatly improved;
3. in consideration of the characteristics of remarkable multi-scale change of an airborne LiDAR point cloud target in an intricate scene and the like, the conventional attention enhancement-based deep learning network adopts a classical coding-decoding structure, although different scale features are utilized, the scale perception capability of the network is enhanced in a certain range, and the difference of the different scale features is ignored in the fixed weight ratio. Aiming at the problems that the algorithm convergence performance is poor when targets with different scales are trained, and the adaptability to multi-scale and large-difference scenes is poor, the method and the system give different weight coefficients to the features of different levels aiming at the targets with different scales, through continuous training, a deep learning network gives a higher weight coefficient to the low-level features of small-scale targets and a higher weight coefficient to the deep features of large-scale targets, and therefore the adaptability of the network is greatly improved.
Drawings
In order to more clearly illustrate the technical solution of the present invention, the drawings required to be used in the description of the embodiments are briefly introduced below, it is obvious that the drawings in the following description are only one embodiment of the present invention, and it is obvious for those skilled in the art to obtain other drawings without creative efforts.
Fig. 1 is a schematic structural diagram of a GAM module in an embodiment of the present invention;
FIG. 2 is a flow chart of encoding and decoding according to an embodiment of the present invention;
FIG. 3 is a schematic structural diagram of an AFF module according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of a CAM bank structure according to an embodiment of the present invention.
Detailed Description
The technical solutions in the present invention are clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The embodiment of the invention provides an airborne LiDAR point cloud semantic segmentation method, which comprises the steps of firstly, acquiring LiDAR point cloud data and hyperspectral data by an airborne platform; projecting the LiDAR point cloud onto a hyperspectral image (namely coordinate alignment) by taking the hyperspectral image as a reference coordinate, and back-projecting hyperspectral data onto the LiDAR point cloud through a projection relation so as to expand the characteristic dimension of the point cloud; then, respectively encoding point cloud data and hyperspectral data by two encoders; splicing corresponding output characteristics of the two encoders, inputting the spliced output characteristics into a decoder, performing characteristic fusion with the output of a corresponding layer of the decoder by adopting an attention mechanism, and then continuing decoding; meanwhile, the point cloud data is subjected to a geometrical-aware Module (GAM Module) to obtain a better point cloud local geometrical structural expression; and finally, fusing the final output of the decoder with the output of the GAM module, and inputting the fused features into a classifier to obtain a final classification score (namely, a classification probability). The invention improves the accuracy of complex scene classification.
The semantic segmentation method specifically comprises the following steps:
step 1: preliminary fusion of airborne LiDAR point cloud and hyperspectral image
The LiDAR point cloud information only reflects the three-dimensional positions and the laser reflection information of the ground and the ground object points, the hyperspectral image reflects the reflection information of each spectrum of the ground and the ground object points, and the spectrum reflection information can classify different kinds of substances. Therefore, the onboard liDAR point cloud and the hyperspectral image are fused to endow the hyperspectral image information to the liDAR point cloud, and the specific implementation process is as follows:
step 1.1: performing orthographic projection on the LiDAR point cloud in an elevation direction (namely the Z-axis direction of a coordinate system where the point cloud is located) to form an orthographic projection image;
step 1.2: matching the orthographic projection image with the hyperspectral image to obtain a coordinate transformation parameter;
step 1.3: correcting the hyperspectral image according to the coordinate transformation parameters to obtain a corrected hyperspectral image;
step 1.4: and searching spectral reflection information of the corrected hyperspectral image according to the horizontal coordinate of the LiDAR point cloud, and replacing laser reflection information of each point in the LiDAR point cloud with the spectral reflection information to form a hyperspectral point cloud aligned with the LiDAR point cloud.
If the characteristic dimension of each point in the liDAR point cloud is originally 5 dimensions, namely three-dimensional coordinates, laser intensity and echo number, and if the liDAR point cloud is a q-dimensional hyperspectral image, the characteristic dimension of each point in the hyperspectral point cloud is 3+ q dimensions, namely three-dimensional coordinate information and three-dimensional spectral information.
Step 2: data packet
The LiDAR point clouds and the hyperspectral point clouds are equally divided into F groups, and each group of LiDAR point cloud data is (N,b) Each group of hyperspectrumThe point cloud data is (N,q) Wherein N is the number of points in each group of point cloud data,bfor the feature dimension of each point in each set of LiDAR point cloud data,qand the spectral feature dimension of each point in each group of hyperspectral point cloud data is obtained. In this example, each set of LiDAR point cloud data is (N, 5), each set of hyperspectral point cloud data is (N,q). And (3) respectively inputting each group of LiDAR point cloud data and each group of corresponding hyperspectral point cloud data, and executing steps 3 to 5.
And 3, step 3: feature extraction using geometric perception techniques
Geometric perception is carried out on coordinate information (N, 3) in each group of LiDAR point cloud data (N, 5) by using a GAM (GAM: geometric-aware Module), namely 18-dimensional Geometric features are extracted by a GP (GP: geometric primitive, space Geometric description Module) to obtain first feature vectors (N, 32), and as shown in figure 1, the specific implementation process is as follows:
step 3.1: for each point in each set of LiDAR point cloud data
Figure DEST_PATH_IMAGE010
Search for itkA nearest neighbor point according to
Figure 694982DEST_PATH_IMAGE010
And itkConstructing a covariance matrix M by the nearest neighbors, and solving three non-negative eigenvalues of the covariance matrix M
Figure DEST_PATH_IMAGE011
And feature vectors
Figure DEST_PATH_IMAGE012
Step 3.2: and constructing the geometric characteristics according to the eigenvalues of the covariance matrix.
Construction of linearity using eigenvalues
Figure DEST_PATH_IMAGE013
Flatness of the surface
Figure DEST_PATH_IMAGE014
Dispersion, dispersion
Figure DEST_PATH_IMAGE015
All aspects of the invention
Figure DEST_PATH_IMAGE016
Characteristic entropy
Figure DEST_PATH_IMAGE017
Anisotropy of the composition
Figure DEST_PATH_IMAGE018
Sum of characteristic values
Figure DEST_PATH_IMAGE019
And rate of change of curvature
Figure DEST_PATH_IMAGE020
The specific structural formulas are shown in tables 1 and 2.
TABLE 1 structural formula of geometric characteristics
Figure DEST_PATH_IMAGE021
TABLE 2 structural formula of geometric characteristics
Figure DEST_PATH_IMAGE022
Step 3.3: according to the point
Figure DEST_PATH_IMAGE023
And itkConstructing elevation features from the elevation values of the nearest neighbors, and forming points by the geometric features and the elevation features
Figure 791377DEST_PATH_IMAGE023
The feature vector of (2).
Using the maximum elevation value in the neighborhood
Figure DEST_PATH_IMAGE024
Inter-neighborhood elevation minimum
Figure DEST_PATH_IMAGE025
Within a neighborhood elevation range
Figure DEST_PATH_IMAGE026
Difference value of elevation minimum value of current point and neighborhood point
Figure DEST_PATH_IMAGE027
(Z is the elevation value of the current point), the average value of the elevations in the neighborhood
Figure DEST_PATH_IMAGE028
And variance of elevation within neighborhood
Figure DEST_PATH_IMAGE029
To characterize elevation features.
Combining the geometric features and the elevation features, the point cloud space structure is as follows:
Figure DEST_PATH_IMAGE030
Figure DEST_PATH_IMAGE031
Figure DEST_PATH_IMAGE032
Figure DEST_PATH_IMAGE033
for 15-dimensional feature vectors, MLPs layer-to-point are used
Figure DEST_PATH_IMAGE034
Feature vector of
Figure DEST_PATH_IMAGE035
High-dimensional mapping is performed, then on the feature dimensionCarrying out normalization processing by adopting softmax function to obtain deep layer characteristics
Figure DEST_PATH_IMAGE036
Is given by the weighting coefficient matrix
Figure DEST_PATH_IMAGE037
Finally, the optimized deep features are obtained through element-by-element multiplication, so that the perception capability of the network to the point cloud local geometric structure is enhanced, and the specific formula is as follows:
Figure DEST_PATH_IMAGE038
wherein M is a GAM module which is composed of a plurality of layers of perceptron layers, BN layers and ReLU activation functions, N represents a normalization layer based on the softmax activation function,
Figure DEST_PATH_IMAGE039
representing point-by-point multiplication.
And 4, step 4: coding and decoding by PointSIFT network
According to the PointSIFT network idea, the encoding process is that the hyperspectral point cloud and the LiDAR point cloud are respectively carried out, and the hyperspectral point cloud and the LiDAR point cloud are spliced and then decoded together in the decoding process. The coding process adopts a point feature description PointSIFT module and a Set Abstraction (SA) module to be combined in a down-sampling stage or a down-sampling stage, and the decoding process adopts a characteristic propagation (FP) module and a point feature description PointSIFT module. Different from the general point cloud processing semantic segmentation technology, the method comprises the following steps: the hyperspectral point cloud and LiDAR point cloud information are fused in the encoding and decoding process, and then are respectively encoded and then are fused together for decoding, so that the laser reflection information and the hyperspectral information in the same position area are fused, and meanwhile, compared with a method for fusing the information into the high-dimensional information point cloud, the method has the advantage that the calculation cost is greatly reduced.
The encoding and decoding process includes an encoding process and a decoding process, as shown in FIG. 2, the PointSIFT network structure is also called an encoding-decoding structure, and the left and right sides of the structure are respectively LiDAR point cloud data and hyperspectral point cloud data from top to bottomThe encoder/encoding process, the intermediate bottom-up flow is the decoder/decoding process. An encoder comprising convolutional layers + downsampled layers stacked multiple times, the output characterized by: the spatial dimensions are getting smaller, the channel dimensions are getting larger ((N,c) N in (b) is getting smaller and smaller,clarger and larger). That is, the original point cloud is encoded into high-dimensional (channel) features; the decoder is the opposite, i.e. the process of gradually restoring the spatial scale.
The specific implementation process of the encoding process of each group of LiDAR point cloud data or hyperspectral point cloud data is as follows:
step 4.1: calculating each group of point cloud data (N, 5) by adopting a full connection layer FC to obtain a characteristic vector (N, 32), wherein each group of point cloud data is LiDAR point cloud data or hyperspectral point cloud data;
step 4.2: down-sampling and feature extracting the feature vector (N, 32) to obtain a feature vector (N/4, 64) of a first coding stage;
step 4.3: down-sampling and feature extracting the feature vector (N/4, 64) to obtain a feature vector (N/16, 128) of a second coding stage;
step 4.4: carrying out down-sampling and feature extraction on the feature vector (N/16, 128) to obtain a feature vector (N/64, 256) of a third coding stage;
step 4.5: and (5) performing down-sampling and feature extraction on the feature vector (N/64, 256) to obtain a feature vector (N/128, 512) of a fourth encoding stage.
The specific implementation process of decoding is as follows:
step 4.6: splicing the feature vector (N/128, 512) of the LiDAR point cloud data in the fourth encoding stage with the feature vector (N/128, 512) of the hyperspectral point cloud data in the fourth encoding stage, and then processing the spliced feature vector by an MLP layer to obtain a first fusion feature vector;
step 4.7: splicing a feature vector (N/64, 256) of LiDAR point cloud data in a third coding stage with a feature vector (N/64, 256) of hyperspectral point cloud data in the third coding stage, and then fusing the spliced feature vector with a feature vector obtained by up-sampling the first fused feature vector by adopting an A-MLP layer to obtain a second fused feature vector of a first decoding stage;
step 4.8: splicing the feature vector (N/16, 128) of the LiDAR point cloud data in the second encoding stage with the feature vector (N/16, 128) of the hyperspectral point cloud data in the second encoding stage, and then fusing the spliced feature vector with the feature vector subjected to upsampling on the second fused feature vector by adopting an A-MLP layer to obtain a third fused feature vector in the second decoding stage;
step 4.9: splicing the characteristic vector (N/4, 64) of the LiDAR point cloud data in the first coding stage with the characteristic vector (N/4, 64) of the hyperspectral point cloud data in the first coding stage, and then fusing the spliced characteristic vector and the characteristic vector subjected to upsampling on the third fused characteristic vector by adopting an A-MLP layer to obtain a fourth fused characteristic vector of the third decoding stage, wherein the fourth fused characteristic vector is the second characteristic vector (N, 32).
The PointSIFT network is described by using different scale features, the multi-scale sensing capability of the network can be expanded, and the difference of the multi-scale features is not considered due to the fact that the PointSIFT network adopts the fusion of a fixed weight ratio. To this end, the present invention employs an AFF (enhanced Feature Fusion) module based on traditional channel attention enhancement, which combines Feature Fusion with channel attention. The AFF module considers features of two groups of different scale levels
Figure DEST_PATH_IMAGE040
(X is a small-scale feature extracted in the PointSIFT network encoding process (i.e., a feature vector corresponding to each encoding stage), Y is a large-scale feature propagated in the PointSIFT network decoding process (i.e., a fused feature vector corresponding to each decoding stage), C and N respectively represent the number of channels of the feature and the size of the point cloud), and the structural diagram is shown in FIG. 3, wherein CAM is a Channel Attention Module (Channel Attention Module), and the structural diagram of CAM is shown in FIG. 4, wherein AvgPool is Average Power, i.e., an Average Pooling layerThe calculation is an average over the entire spatial scale, i.e. (N, C) becomes (1, C) after this layer; conv is the convolutional layer; reLu is a ReLu activation function and is a nonlinear function, so that the reasoning process of the whole network is changed into nonlinearity; BN is a Batch Norm layer, namely a normalization layer, and is used for normalizing data distribution; reshape is a shape transformation, or can be marked as Repeat, i.e. the original
Figure DEST_PATH_IMAGE041
Is transformed into
Figure DEST_PATH_IMAGE042
The purpose of the data is to make the two data consistent in structure so as to carry out arithmetic operations such as addition and the like; sigmoid is a Sigmoid activation function for mapping data distribution to
Figure DEST_PATH_IMAGE043
Such that its meaning may represent probability, weight, etc.
The CAM is used to convert a certain feature into the weight of each element thereof, and the AFF module calculates the weight of the two features and performs weighting processing on the basis of the CAM, so as to realize the specific implementation process of fusing the spliced feature vector and the up-sampled feature vector by using the CAFF module (CAM module + AFF module) at each decoding stage as shown in fig. 3:
a, step a: respectively solving spliced feature vectors by using CAM moduleXUpsampled feature vectorYWeight coefficient of (2)
Figure DEST_PATH_IMAGE044
Step b: for the weight coefficient
Figure DEST_PATH_IMAGE045
Carrying out redistribution to obtain the distributed weight coefficient
Figure DEST_PATH_IMAGE046
(ii) a Wherein the redistribution formula is:
Figure DEST_PATH_IMAGE047
step c: using the assigned weight coefficients
Figure DEST_PATH_IMAGE048
For the spliced feature vectorXAnd the upsampled feature vectorYAnd fusing to obtain corresponding fusion characteristic vectors, wherein the specific fusion formula is as follows:
Figure DEST_PATH_IMAGE049
wherein Z is a fusion feature vector.
The differences from general are: and the small-scale feature X extracted by the network in the coding process and the large-scale feature Y in the feature propagation process are respectively fused and then redistributed through the weight coefficient obtained by the channel attention enhancement module. For multi-scale point cloud, the AFF module considers context information, cross-fuses small-scale and large-scale weights, and gives different weight coefficients to features of different levels, so that continuous iterative training is realized, a network can give a higher low-level feature weight coefficient to a small-scale target, and a higher deep-level feature weight coefficient to a large-scale target.
As shown in fig. 4, the specific implementation process of solving the weight coefficient by using the CAM module is as follows:
step a1: feature vectorXOr feature vectorYAfter the average pooling layer processing, the feature vector (1,C) (ii) a Wherein the feature vectorXOr feature vectorYThe size of (a) is (N,C);
step a2: the feature vector (1,C) The feature vector (N,C) Respectively sequentially processing the first convolution layer, the first normalization layer, the activation layer, the second convolution layer and the second normalization layer, and performing addition operation to obtain a weight coefficient
Figure DEST_PATH_IMAGE050
Or
Figure DEST_PATH_IMAGE051
And 5: and splicing the first eigenvector (N, 32) and the second eigenvector (N, 32), calculating through a full connection layer FC to obtain a third eigenvector (N, 9), and segmenting and labeling each point according to the third eigenvector (N, 9). Among the 9-dimensional features of the third feature vector (N, 9), three dimensions are coordinate information, and the other 6 dimensions are segmentation types.
The method is based on point feature description, combines a GAM module, considers characteristics of laser point reflection attributes, surface feature spectral attributes and the like, and adds a geometric perception module (GAM), a CAM module and an AFF module (namely an A-MLP layer comprises the CAM module and the AFF module) on the basis of conventional PointSIFT to construct a new network model CAFF-PointNet.
Note: f1 score F-score = precision recall 2/(precision + recall)
The above disclosure is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of changes or modifications within the technical scope of the present invention, and shall be covered by the scope of the present invention.

Claims (7)

1. An airborne LiDAR point cloud semantic segmentation method is characterized by comprising the following steps:
fusing the acquired LiDAR point cloud and the hyperspectral image to obtain a hyperspectral point cloud aligned with the LiDAR point cloud;
equally dividing the LiDAR point clouds and the hyperspectral point clouds into F groups, wherein each group of LiDAR point cloud data is (N, b), each group of hyperspectral point cloud data is (N, q), N is the number of points in each group of point cloud data, b is the characteristic dimension of each point in each group of LiDAR point cloud data, and q is the spectral characteristic dimension of each point in each group of hyperspectral point cloud data;
performing feature extraction on each group of LiDAR point cloud data by using a GAM (gamma ray modeling) model to obtain a first feature vector; the specific implementation process of extracting the features of each group of LiDAR point cloud data by adopting the GAM model comprises the following steps:
for each point p in each set of LiDAR point cloud data i Search its k nearest neighbors, according to point p i Constructing a covariance matrix by the k nearest neighbors, and solving an eigenvalue and an eigenvector of the covariance matrix;
constructing geometric features from eigenvalues of the covariance matrix, from the points p i And constructing elevation features by using the elevation values of k nearest neighbor points, and forming a point p by using the geometric features and the elevation features i The feature vector of (2);
using MLPs layer-to-point p i The characteristic vector is subjected to high-dimensional mapping, and then normalization processing is carried out on the characteristic dimension by adopting an activation function to obtain each point p i Further obtaining a first feature vector corresponding to the set of LiDAR point cloud data;
respectively encoding each group of LiDAR point cloud data and each group of hyperspectral point cloud data by adopting a PointSIFT network to respectively obtain LiDAR point cloud characteristic vectors and hyperspectral characteristic vectors of each encoding stage;
in the decoding process, fusing the LiDAR point cloud feature vector and the hyperspectral feature vector of the current encoding stage with the fused feature vector of the previous decoding stage, inputting the fused feature vectors into the next decoding stage for decoding, and obtaining a second feature vector after decoding is completed; the specific implementation process of the fusion is as follows: splicing the LiDAR point cloud feature vector and the hyperspectral feature vector in the current encoding stage to obtain a spliced feature vector X; performing up-sampling on the fused feature vector in the previous decoding stage to obtain a feature vector Y; respectively solving the weight coefficients w of the eigenvector X and the eigenvector Y by adopting a CAM module x 、w y (ii) a For the weight coefficient w x 、w y Carrying out redistribution to obtain a distributed weight coefficient w gx 、w gy (ii) a Wherein the redistribution formula is:
w gx =(w x +1-w y )/2,w gy =(w y +1-w x )/2
using the assigned weight coefficient w gx 、w gy Fusing the spliced characteristic vector X and the upsampled characteristic vector Y to obtain a corresponding fusion characteristic vector, wherein a specific fusion formula is as follows:
Figure FDA0003833588090000011
wherein Z is a fusion feature vector;
and splicing the corresponding first eigenvector and the second eigenvector, then calculating through a full connection layer to obtain a third eigenvector, and segmenting and labeling each point according to the third eigenvector.
2. The method according to claim 1, wherein the fusing of the acquired LiDAR point cloud and the hyperspectral image is implemented by:
performing orthographic projection on the LiDAR point cloud in the elevation direction to form an orthographic projection image;
matching the orthographic projection image with the hyperspectral image to obtain a coordinate transformation parameter;
correcting the hyperspectral image according to the coordinate transformation parameter to obtain a corrected hyperspectral image;
and searching spectral reflection information of the corrected hyperspectral image according to the horizontal coordinate of the LiDAR point cloud, and replacing laser reflection information of each point in the LiDAR point cloud with the spectral reflection information to form a hyperspectral point cloud aligned with the LiDAR point cloud.
3. The method of semantic segmentation of airborne liDAR point clouds of claim 1, wherein the encoding of liDAR point cloud data or hyper-spectral point cloud data using a PointSIFT network is performed by:
calculating each group of point cloud data by adopting a full-connection layer to obtain a characteristic vector (N, 32), wherein each group of point cloud data is LiDAR point cloud data or hyperspectral point cloud data;
down-sampling and feature extracting the feature vector (N, 32) to obtain a feature vector (N/4, 64) of a first coding stage;
down-sampling and feature extracting the feature vector (N/4, 64) to obtain a feature vector (N/16, 128) of a second encoding stage;
down-sampling and feature extracting are carried out on the feature vector (N/16, 128) to obtain a feature vector (N/64, 256) of a third coding stage;
and performing down-sampling and feature extraction on the feature vector (N/64, 256) to obtain a feature vector (N/128, 512) of a fourth encoding stage.
4. The method of semantic segmentation of airborne liDAR point cloud according to claim 3, wherein the decoding is performed by:
splicing the feature vector (N/128, 512) of the LiDAR point cloud data in the fourth encoding stage with the feature vector (N/128, 512) of the hyperspectral point cloud data in the fourth encoding stage, and then processing the spliced feature vector and the hyperspectral point cloud data in an MLP layer to obtain a first fusion feature vector;
splicing a feature vector (N/64, 256) of the LiDAR point cloud data in a third encoding stage with a feature vector (N/64, 256) of the hyperspectral point cloud data in the third encoding stage, and then fusing the spliced feature vector with a feature vector obtained by upsampling the first fused feature vector by adopting an A-MLP layer to obtain a second fused feature vector of a first decoding stage; wherein, the A-MLP layer comprises a CAM module and an AFF module;
splicing a feature vector (N/16, 128) of LiDAR point cloud data in a second coding stage with a feature vector (N/16, 128) of hyperspectral point cloud data in the second coding stage, and then fusing the spliced feature vector with a feature vector obtained by up-sampling the second fused feature vector by adopting an A-MLP layer to obtain a third fused feature vector in a second decoding stage;
splicing the characteristic vector (N/4, 64) of the LiDAR point cloud data in the first coding stage and the characteristic vector (N/4, 64) of the hyperspectral point cloud data in the first coding stage, and then fusing the spliced characteristic vector and the characteristic vector subjected to upsampling on the third fused characteristic vector by adopting an A-MLP layer to obtain a fourth fused characteristic vector of a third decoding stage, wherein the fourth fused characteristic vector is the second characteristic vector.
5. The method for semantic segmentation of airborne liDAR point clouds of claim 1, wherein the specific implementation process of solving the weight coefficients of the feature vector X or the feature vector Y by using the CAM module is as follows:
processing the characteristic vector X or the characteristic vector Y through an average pooling layer to obtain a characteristic vector (1, C); wherein the size of the feature vector X or the feature vector Y is (N, C);
the characteristic vectors (1, C) and the characteristic vectors (N, C) are respectively processed by a first convolution layer, a first normalization layer, an activation layer, a second convolution layer and a second normalization layer in sequence and then added to obtain a weight coefficient w x Or w y
6. An electronic device comprising a memory and a processor, the memory having stored thereon a computer program operable on the processor, the processor when executing the computer program performing the steps of the onboard liDAR point cloud semantic segmentation method of any of claims 1-5.
7. A computer-readable storage medium, being a non-volatile storage medium or a non-transitory storage medium, having stored thereon a computer program, characterized in that the computer program, when being executed by a processor, performs the steps of the onboard liDAR point cloud semantic segmentation method according to any of claims 1 to 5.
CN202210764481.8A 2022-07-01 2022-07-01 Airborne LiDAR point cloud semantic segmentation method, electronic equipment and storage medium Active CN114821074B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210764481.8A CN114821074B (en) 2022-07-01 2022-07-01 Airborne LiDAR point cloud semantic segmentation method, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210764481.8A CN114821074B (en) 2022-07-01 2022-07-01 Airborne LiDAR point cloud semantic segmentation method, electronic equipment and storage medium

Publications (2)

Publication Number Publication Date
CN114821074A CN114821074A (en) 2022-07-29
CN114821074B true CN114821074B (en) 2022-10-25

Family

ID=82523201

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210764481.8A Active CN114821074B (en) 2022-07-01 2022-07-01 Airborne LiDAR point cloud semantic segmentation method, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN114821074B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116416586B (en) * 2022-12-19 2024-04-02 香港中文大学(深圳) Map element sensing method, terminal and storage medium based on RGB point cloud
CN116597148B (en) * 2023-06-02 2025-09-19 哈尔滨工业大学 Method, storage medium and device for semantic segmentation of point cloud of airborne multispectral laser radar

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108846352B (en) * 2018-06-08 2020-07-14 广东电网有限责任公司 A kind of vegetation classification and identification method
CN111784699B (en) * 2019-04-03 2024-06-18 Tcl科技集团股份有限公司 Method and device for carrying out target segmentation on three-dimensional point cloud data and terminal equipment
CN112287939B (en) * 2020-10-29 2024-05-31 平安科技(深圳)有限公司 Three-dimensional point cloud semantic segmentation method, device, equipment and medium
CN112749662B (en) * 2021-01-14 2022-08-05 东南大学 A lidar-based method for extracting drivable areas in unstructured environments
CN113496495B (en) * 2021-06-25 2022-04-26 华中科技大学 Medical image segmentation model building method capable of realizing missing input and segmentation method
CN113936011B (en) * 2021-10-21 2024-08-02 上海交通大学 CT image lung lobe image segmentation system based on attention mechanism
CN114120102B (en) * 2021-11-03 2025-11-11 中国华能集团清洁能源技术研究院有限公司 Boundary-optimized remote sensing image semantic segmentation method, device, equipment and medium
CN114092534B (en) * 2021-11-19 2022-08-30 哈尔滨工业大学 Hyperspectral image and laser radar data registration method and registration system
CN114255456B (en) * 2021-11-23 2024-11-05 金陵科技学院 Natural scene text detection method and system based on attention mechanism feature fusion and enhancement
CN114241110B (en) * 2022-02-23 2022-06-03 北京邮电大学 Point cloud semantic uncertainty sensing method based on neighborhood aggregation Monte Carlo inactivation

Also Published As

Publication number Publication date
CN114821074A (en) 2022-07-29

Similar Documents

Publication Publication Date Title
CN113283435B (en) A Semantic Segmentation Method of Remote Sensing Image Based on Multi-scale Attention Fusion
CN110728192B (en) High-resolution remote sensing image classification method based on novel characteristic pyramid depth network
CN114187450B (en) Remote sensing image semantic segmentation method based on deep learning
CN112464745B (en) Feature identification and classification method and device based on semantic segmentation
CN117314811A (en) SAR-optical image fusion method based on hybrid model
CN113344103B (en) Hyperspectral remote sensing image ground object classification method based on hypergraph convolution neural network
CN117576402B (en) Deep learning-based multi-scale aggregation transducer remote sensing image semantic segmentation method
CN119399545B (en) Complex target recognition method based on deep twin self-attention network
CN116524189A (en) High-resolution remote sensing image semantic segmentation method based on coding and decoding indexing edge characterization
CN114821074B (en) Airborne LiDAR point cloud semantic segmentation method, electronic equipment and storage medium
CN107423747A (en) A kind of conspicuousness object detection method based on depth convolutional network
CN118470333B (en) Geographic environment semantic segmentation method and system based on remote sensing image
CN116012349A (en) Hyperspectral Image Unmixing Method Based on Minimum Simplex Volume Constraint and Transformer Structure
CN117576483A (en) Multi-source data fusion feature classification method based on multi-scale convolutional autoencoder
CN115527062A (en) Hyperspectral graph classification method based on long and short graph convolution network
CN116469095B (en) A Weather-Adaptive 3D Target Detection Method Based on Vision-Sensory Fusion
Qian et al. C3DGS: Compressing 3D Gaussian Model for Surface Reconstruction of Large-Scale Scenes Based on Multi-View UAV Images
CN111798460A (en) Satellite image segmentation method
CN112017159A (en) Ground target reality simulation method in remote sensing scene
CN115546638A (en) Change detection method based on Siamese cascaded differential neural network
CN115937704A (en) Remote sensing image road segmentation method based on topology perception neural network
CN115909045A (en) An Intelligent Recognition Method of Two-stage Landslide Atlas Features Based on Contrastive Learning
CN105719323A (en) Hyperspectral dimension reducing method based on map optimizing theory
CN116503746B (en) Infrared small target detection method based on multi-layer nested non-full mapping U-shaped network
CN116844070B (en) Point cloud data processing method for 3D modeling of UAV aerial photography based on multi-dimensional shape adaptive convolution

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant