CN114821074B

CN114821074B - Airborne LiDAR point cloud semantic segmentation method, electronic equipment and storage medium

Info

Publication number: CN114821074B
Application number: CN202210764481.8A
Authority: CN
Inventors: 李修庆; 赵健康; 王怀採; 蔡晓程; 谢才望; 孔令威; 赵丽芝
Original assignee: Hunan Shengding Technology Development Co ltd
Current assignee: Hunan Shengding Technology Development Co ltd
Priority date: 2022-07-01
Filing date: 2022-07-01
Publication date: 2022-10-25
Anticipated expiration: 2042-07-01
Also published as: CN114821074A

Abstract

The invention discloses a semantic segmentation method for airborne LiDAR point clouds, which comprises the steps of fusing the LiDAR point clouds with hyperspectral images to obtain hyperspectral point clouds; performing feature extraction on each group of LiDAR point cloud data to obtain a first feature vector; each group of LiDAR point cloud data and hyperspectral point cloud data is coded to obtain LiDAR point cloud feature vectors and hyperspectral feature vectors of each coding stage; fusing the LiDAR point cloud feature vector and the hyperspectral feature vector of the current encoding stage with the fused feature vector of the previous decoding stage by adopting an A-MLP layer, inputting the fused feature vector into the next decoding stage, and obtaining a second feature vector after decoding is finished; and after the first eigenvector and the second eigenvector are spliced, the third eigenvector is obtained through calculation of the full connection layer, each point is segmented and labeled according to the third eigenvector, and the accuracy of complex scene classification is improved.

Description

Airborne LiDAR point cloud semantic segmentation method, electronic equipment and storage medium

Technical Field

The invention belongs to the technical field of remote sensing image processing, and particularly relates to an airborne LiDAR point cloud semantic segmentation method based on hyperspectral-space enhancement, electronic equipment and a storage medium.

Background

With the wide application of unmanned aerial vehicles, laser radars and spectrum technologies, the direct acquisition of three-dimensional space and hyperspectral information by adopting a low-altitude remote sensing technology gradually becomes a research hotspot in the fields of mapping and agriculture. Airborne LiDAR technology has been widely used in the fields of digital elevation model extraction (DEM), digital Surface Model (DSM), and forest vegetation statistics. Point cloud (three-dimensional space) semantic segmentation is the basis of a point cloud processing technology, high-accuracy segmentation and classification of ground objects in airborne LiDAR point cloud are key steps for generating subsequent surveying and mapping products, and the method plays an important role in the field of airborne LiDAR point cloud processing. Airborne LiDAR point clouds have the following three characteristics:

(1) The airborne LiDAR point cloud has a large number of ground points and ground objects such as trees, buildings and the like, and the geometrical characteristics of the point cloud are complicated;

(2) The airborne LiDAR point cloud has a large coverage range, the types of ground objects contained in the airborne LiDAR point cloud are rich and complex, the airborne LiDAR point cloud has not only buildings with large scale, but also trees, automobiles, telegraph poles and the like with small scale, and the multi-scale characteristics are obvious;

(3) The elevation of the airborne LiDAR point cloud not only contains the terrain change of the earth surface, but also contains the elevation features of the ground objects, and the features of the elevation dimensions have strong distinctiveness.

Because the airborne LiDAR point cloud target has the characteristics of the complicated space geometric structure, obvious multi-scale change and the like, the current point cloud semantic segmentation algorithm is low in accuracy and poor in robustness. At present, a general semantic segmentation algorithm only comprises three-dimensional coordinates and RGB (red, green and blue) information of point cloud, and the local geometric structure information of the point cloud and the sensing capability of ground feature fine categories are lacked. In the face of a complex geometrical structure of airborne LiDAR point clouds, the classification accuracy of the algorithm is reduced due to the input of a model with single attribute, and how to improve the perception capability of the network on the geometrical structure of the point clouds and the fine types of ground objects is the key for further improving the classification accuracy. Meanwhile, the ground feature scales of different categories in the airborne LiDAR point cloud are different, and if the deep learning network with the difference of multi-scale features is neglected, the classification accuracy of small-scale targets is low, and the whole classification accuracy is difficult to improve.

Disclosure of Invention

Aiming at the technical problems in the prior art, the invention provides an airborne liDAR point cloud semantic segmentation method, electronic equipment and a storage medium, wherein an airborne liDAR point cloud and hyperspectral image fusion technology is adopted, the dimension of the airborne liDAR point cloud is enlarged, hyperspectral attributes are fused into the point cloud, the semantic content in the liDAR point cloud is greatly improved, the auxiliary information content in a channel attention module is enhanced, and the classification accuracy is improved.

The invention solves the technical problems through the following technical scheme: an airborne LiDAR point cloud semantic segmentation method comprises the following steps:

fusing the acquired LiDAR point cloud and the hyperspectral image to obtain a hyperspectral point cloud aligned with the LiDAR point cloud;

the LiDAR point clouds, the hyperspectral point clouds are equally divided into F groups, each group of LiDAR point cloud data is (N,b) Each set of hyperspectral point cloud data is (N,q) Wherein N is the number of points in each group of point cloud data,bfor the feature dimension of each point in each set of LiDAR point cloud data,qthe spectral characteristic dimension of each point in each group of hyperspectral point cloud data is obtained;

performing feature extraction on each group of LiDAR point cloud data by using a GAM (Gamma-ray model) to obtain a first feature vector;

respectively encoding each group of LiDAR point cloud data and each group of hyperspectral point cloud data by adopting a PointSIFT network to respectively obtain LiDAR point cloud feature vectors and hyperspectral feature vectors of each encoding stage;

in the decoding process, fusing the LiDAR point cloud feature vector and the hyperspectral feature vector of the current encoding stage with the fused feature vector of the previous decoding stage, inputting the fused feature vector into the next decoding stage for decoding, and obtaining a second feature vector after decoding is finished;

and splicing the corresponding first eigenvector and the second eigenvector, then calculating through a full connection layer to obtain a third eigenvector, and segmenting and labeling each point according to the third eigenvector.

Further, the concrete implementation process of fusing the acquired LiDAR point cloud and the hyperspectral image is as follows:

performing orthographic projection on the LiDAR point cloud in the elevation direction to form an orthographic projection image;

matching the orthographic projection image with the hyperspectral image to obtain a coordinate transformation parameter;

correcting the hyperspectral image according to the coordinate transformation parameter to obtain a corrected hyperspectral image;

and searching spectral reflection information of the corrected hyperspectral image according to the horizontal coordinate of the LiDAR point cloud, and replacing laser reflection information of each point in the LiDAR point cloud with the spectral reflection information to form a hyperspectral point cloud aligned with the LiDAR point cloud.

Further, the specific implementation process of extracting the features of each group of LiDAR point cloud data by using the GAM model is as follows:

for each point in each set of LiDAR point cloud data

Search for itkA nearest neighbor point according to

And itkConstructing a covariance matrix by the nearest neighbors, and solving an eigenvalue and an eigenvector of the covariance matrix;

constructing geometric features from eigenvalues of the covariance matrix, based on the points

And itkConstructing elevation features from elevation values of nearest neighbor points, and forming points by the geometric features and the elevation features

The feature vector of (2);

layer-to-layer using MLPs

The characteristic vector is subjected to high-dimensional mapping, and then normalization processing is carried out on the characteristic dimension by adopting an activation function to obtain each point

And further obtaining a first feature vector corresponding to the set of LiDAR point cloud data.

Further, the specific implementation process of encoding LiDAR point cloud data or hyperspectral point cloud data by using the PointSIFT network is as follows:

calculating each group of point cloud data by adopting a full-connection layer to obtain a characteristic vector (N, 32), wherein each group of point cloud data is LiDAR point cloud data or hyperspectral point cloud data;

down-sampling and feature extracting the feature vector (N, 32) to obtain a feature vector (N/4, 64) of a first coding stage;

down-sampling and feature extracting the feature vector (N/4, 64) to obtain a feature vector (N/16, 128) of a second encoding stage;

down-sampling and feature extracting are carried out on the feature vector (N/16, 128) to obtain a feature vector (N/64, 256) of a third coding stage;

and performing down-sampling and feature extraction on the feature vector (N/64, 256) to obtain a feature vector (N/128, 512) of a fourth encoding stage.

Further, the specific implementation process of the decoding is as follows:

splicing the feature vector (N/128, 512) of the LiDAR point cloud data in the fourth encoding stage with the feature vector (N/128, 512) of the hyperspectral point cloud data in the fourth encoding stage, and then processing the spliced feature vector by an MLP layer to obtain a first fusion feature vector;

splicing a feature vector (N/64, 256) of the LiDAR point cloud data in a third encoding stage with a feature vector (N/64, 256) of the hyperspectral point cloud data in the third encoding stage, and then fusing the spliced feature vector with a feature vector obtained by upsampling the first fused feature vector by adopting an A-MLP layer to obtain a second fused feature vector of a first decoding stage;

splicing a feature vector (N/16, 128) of LiDAR point cloud data in a second coding stage with a feature vector (N/16, 128) of hyperspectral point cloud data in the second coding stage, and then fusing the spliced feature vector with a feature vector obtained by up-sampling the second fused feature vector by adopting an A-MLP layer to obtain a third fused feature vector in a second decoding stage;

splicing the characteristic vector (N/4, 64) of the LiDAR point cloud data in the first coding stage and the characteristic vector (N/4, 64) of the hyperspectral point cloud data in the first coding stage, and then fusing the spliced characteristic vector and the characteristic vector subjected to upsampling on the third fused characteristic vector by adopting an A-MLP layer to obtain a fourth fused characteristic vector of a third decoding stage, wherein the fourth fused characteristic vector is the second characteristic vector.

Further, the specific implementation process of fusing the LiDAR point cloud feature vector and the hyperspectral feature vector at the current encoding stage and the fused feature vector at the previous decoding stage is as follows:

splicing the LiDAR point cloud characteristic vector and the hyperspectral characteristic vector at the current encoding stage to obtain a spliced characteristic vectorX(ii) a The fused feature vector of the previous decoding stage is up-sampled to obtain the feature vectorY；

Respectively solving feature vectors by using CAM moduleXFeature vectorYWeight coefficient of

；

For the weight coefficient

Carrying out redistribution to obtain the distributed weight coefficient

(ii) a Wherein the redistribution formula is:

using the assigned weight coefficients

For the spliced feature vectorXAnd the upsampled feature vectorYAnd fusing to obtain corresponding fusion characteristic vectors, wherein the specific fusion formula is as follows:

wherein Z is a fusion feature vector.

Further, the CAM module is adopted to solve the feature vectorXOr feature vectorYThe specific implementation process of the weight coefficient is as follows:

feature vectorXOr feature vectorYAfter the average pooling layer processing, the feature vector (1,C) (ii) a Wherein the feature vectorXOr feature vectorYThe size of (a) is (N,C）；

the feature vectors (1,C) The feature vector (N,C) Respectively and sequentially processing the first convolution layer, the first normalization layer, the activation layer, the second convolution layer and the second normalization layer, and performing addition operation to obtain a weight coefficient

Or

。

The invention also provides an electronic device comprising a memory and a processor, wherein the memory stores a computer program capable of running on the processor, and the processor executes the steps of the onboard liDAR point cloud semantic segmentation method when running the computer program.

The present invention also provides a computer-readable storage medium, which is a non-volatile storage medium or a non-transitory storage medium, having stored thereon a computer program which, when executed by a processor, performs the steps of the onboard liDAR point cloud semantic segmentation method as described above.

Advantageous effects

Compared with the prior art, the invention has the advantages that:

1. aiming at the geometrical characteristic of complicated cloud labyrinth of airborne LiDAR (light detection and ranging) points, the fact that the information of the current airborne remote sensing LiDAR point cloud for semantic enhancement understanding is very limited except for the elevation information characteristic is considered, the airborne LiDAR point cloud and hyperspectral image fusion technology is adopted, the dimensionality of the airborne LiDAR point cloud is enlarged, hyperspectral attributes are fused into the point cloud, the semantic connotation in the LiDAR point cloud is greatly improved, the auxiliary information quantity in a channel attention module (CAM module) is enhanced, and the geometrical characteristic understanding of the semantic to the complicated cloud labyrinth is greatly enhanced.

2. Aiming at a complex topographic and geomorphic environment scene, the problem of low performance of a current conventional PointNet network based on a spatial attention enhancement mechanism when the onboard remote sensing LiDAR point cloud semantic segmentation is carried out is considered, a spectral attribute channel attention module is added and is fused with the LiDAR point cloud for processing, the fusion of laser intensity, echo and spectral attribute is emphasized on a network channel, and the adaptability to an intricate complex scene and the segmentation and classification accuracy are greatly improved;

3. in consideration of the characteristics of remarkable multi-scale change of an airborne LiDAR point cloud target in an intricate scene and the like, the conventional attention enhancement-based deep learning network adopts a classical coding-decoding structure, although different scale features are utilized, the scale perception capability of the network is enhanced in a certain range, and the difference of the different scale features is ignored in the fixed weight ratio. Aiming at the problems that the algorithm convergence performance is poor when targets with different scales are trained, and the adaptability to multi-scale and large-difference scenes is poor, the method and the system give different weight coefficients to the features of different levels aiming at the targets with different scales, through continuous training, a deep learning network gives a higher weight coefficient to the low-level features of small-scale targets and a higher weight coefficient to the deep features of large-scale targets, and therefore the adaptability of the network is greatly improved.

Drawings

In order to more clearly illustrate the technical solution of the present invention, the drawings required to be used in the description of the embodiments are briefly introduced below, it is obvious that the drawings in the following description are only one embodiment of the present invention, and it is obvious for those skilled in the art to obtain other drawings without creative efforts.

Fig. 1 is a schematic structural diagram of a GAM module in an embodiment of the present invention;

FIG. 2 is a flow chart of encoding and decoding according to an embodiment of the present invention;

FIG. 3 is a schematic structural diagram of an AFF module according to an embodiment of the present invention;

FIG. 4 is a schematic diagram of a CAM bank structure according to an embodiment of the present invention.

Detailed Description

The technical solutions in the present invention are clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The embodiment of the invention provides an airborne LiDAR point cloud semantic segmentation method, which comprises the steps of firstly, acquiring LiDAR point cloud data and hyperspectral data by an airborne platform; projecting the LiDAR point cloud onto a hyperspectral image (namely coordinate alignment) by taking the hyperspectral image as a reference coordinate, and back-projecting hyperspectral data onto the LiDAR point cloud through a projection relation so as to expand the characteristic dimension of the point cloud; then, respectively encoding point cloud data and hyperspectral data by two encoders; splicing corresponding output characteristics of the two encoders, inputting the spliced output characteristics into a decoder, performing characteristic fusion with the output of a corresponding layer of the decoder by adopting an attention mechanism, and then continuing decoding; meanwhile, the point cloud data is subjected to a geometrical-aware Module (GAM Module) to obtain a better point cloud local geometrical structural expression; and finally, fusing the final output of the decoder with the output of the GAM module, and inputting the fused features into a classifier to obtain a final classification score (namely, a classification probability). The invention improves the accuracy of complex scene classification.

The semantic segmentation method specifically comprises the following steps:

step 1: preliminary fusion of airborne LiDAR point cloud and hyperspectral image

The LiDAR point cloud information only reflects the three-dimensional positions and the laser reflection information of the ground and the ground object points, the hyperspectral image reflects the reflection information of each spectrum of the ground and the ground object points, and the spectrum reflection information can classify different kinds of substances. Therefore, the onboard liDAR point cloud and the hyperspectral image are fused to endow the hyperspectral image information to the liDAR point cloud, and the specific implementation process is as follows:

step 1.1: performing orthographic projection on the LiDAR point cloud in an elevation direction (namely the Z-axis direction of a coordinate system where the point cloud is located) to form an orthographic projection image;

step 1.2: matching the orthographic projection image with the hyperspectral image to obtain a coordinate transformation parameter;

step 1.3: correcting the hyperspectral image according to the coordinate transformation parameters to obtain a corrected hyperspectral image;

step 1.4: and searching spectral reflection information of the corrected hyperspectral image according to the horizontal coordinate of the LiDAR point cloud, and replacing laser reflection information of each point in the LiDAR point cloud with the spectral reflection information to form a hyperspectral point cloud aligned with the LiDAR point cloud.

If the characteristic dimension of each point in the liDAR point cloud is originally 5 dimensions, namely three-dimensional coordinates, laser intensity and echo number, and if the liDAR point cloud is a q-dimensional hyperspectral image, the characteristic dimension of each point in the hyperspectral point cloud is 3+ q dimensions, namely three-dimensional coordinate information and three-dimensional spectral information.

Step 2: data packet

The LiDAR point clouds and the hyperspectral point clouds are equally divided into F groups, and each group of LiDAR point cloud data is (N,b) Each group of hyperspectrumThe point cloud data is (N,q) Wherein N is the number of points in each group of point cloud data,bfor the feature dimension of each point in each set of LiDAR point cloud data,qand the spectral feature dimension of each point in each group of hyperspectral point cloud data is obtained. In this example, each set of LiDAR point cloud data is (N, 5), each set of hyperspectral point cloud data is (N,q). And (3) respectively inputting each group of LiDAR point cloud data and each group of corresponding hyperspectral point cloud data, and executing steps 3 to 5.

And 3, step 3: feature extraction using geometric perception techniques

Geometric perception is carried out on coordinate information (N, 3) in each group of LiDAR point cloud data (N, 5) by using a GAM (GAM: geometric-aware Module), namely 18-dimensional Geometric features are extracted by a GP (GP: geometric primitive, space Geometric description Module) to obtain first feature vectors (N, 32), and as shown in figure 1, the specific implementation process is as follows:

step 3.1: for each point in each set of LiDAR point cloud data

Search for itkA nearest neighbor point according to

And itkConstructing a covariance matrix M by the nearest neighbors, and solving three non-negative eigenvalues of the covariance matrix M

And feature vectors

。

Step 3.2: and constructing the geometric characteristics according to the eigenvalues of the covariance matrix.

Construction of linearity using eigenvalues

Flatness of the surface

Dispersion, dispersion

All aspects of the invention

Characteristic entropy

Anisotropy of the composition

Sum of characteristic values

And rate of change of curvature

The specific structural formulas are shown in tables 1 and 2.

TABLE 1 structural formula of geometric characteristics

TABLE 2 structural formula of geometric characteristics

Step 3.3: according to the point

And itkConstructing elevation features from the elevation values of the nearest neighbors, and forming points by the geometric features and the elevation features

The feature vector of (2).

Using the maximum elevation value in the neighborhood

Inter-neighborhood elevation minimum

Within a neighborhood elevation range

Difference value of elevation minimum value of current point and neighborhood point

(Z is the elevation value of the current point), the average value of the elevations in the neighborhood

And variance of elevation within neighborhood

To characterize elevation features.

Combining the geometric features and the elevation features, the point cloud space structure is as follows:

for 15-dimensional feature vectors, MLPs layer-to-point are used

Feature vector of

High-dimensional mapping is performed, then on the feature dimensionCarrying out normalization processing by adopting softmax function to obtain deep layer characteristics

Is given by the weighting coefficient matrix

Finally, the optimized deep features are obtained through element-by-element multiplication, so that the perception capability of the network to the point cloud local geometric structure is enhanced, and the specific formula is as follows:

wherein M is a GAM module which is composed of a plurality of layers of perceptron layers, BN layers and ReLU activation functions, N represents a normalization layer based on the softmax activation function,

representing point-by-point multiplication.

And 4, step 4: coding and decoding by PointSIFT network

According to the PointSIFT network idea, the encoding process is that the hyperspectral point cloud and the LiDAR point cloud are respectively carried out, and the hyperspectral point cloud and the LiDAR point cloud are spliced and then decoded together in the decoding process. The coding process adopts a point feature description PointSIFT module and a Set Abstraction (SA) module to be combined in a down-sampling stage or a down-sampling stage, and the decoding process adopts a characteristic propagation (FP) module and a point feature description PointSIFT module. Different from the general point cloud processing semantic segmentation technology, the method comprises the following steps: the hyperspectral point cloud and LiDAR point cloud information are fused in the encoding and decoding process, and then are respectively encoded and then are fused together for decoding, so that the laser reflection information and the hyperspectral information in the same position area are fused, and meanwhile, compared with a method for fusing the information into the high-dimensional information point cloud, the method has the advantage that the calculation cost is greatly reduced.

The encoding and decoding process includes an encoding process and a decoding process, as shown in FIG. 2, the PointSIFT network structure is also called an encoding-decoding structure, and the left and right sides of the structure are respectively LiDAR point cloud data and hyperspectral point cloud data from top to bottomThe encoder/encoding process, the intermediate bottom-up flow is the decoder/decoding process. An encoder comprising convolutional layers + downsampled layers stacked multiple times, the output characterized by: the spatial dimensions are getting smaller, the channel dimensions are getting larger ((N,c) N in (b) is getting smaller and smaller,clarger and larger). That is, the original point cloud is encoded into high-dimensional (channel) features; the decoder is the opposite, i.e. the process of gradually restoring the spatial scale.

The specific implementation process of the encoding process of each group of LiDAR point cloud data or hyperspectral point cloud data is as follows:

step 4.1: calculating each group of point cloud data (N, 5) by adopting a full connection layer FC to obtain a characteristic vector (N, 32), wherein each group of point cloud data is LiDAR point cloud data or hyperspectral point cloud data;

step 4.2: down-sampling and feature extracting the feature vector (N, 32) to obtain a feature vector (N/4, 64) of a first coding stage;

step 4.3: down-sampling and feature extracting the feature vector (N/4, 64) to obtain a feature vector (N/16, 128) of a second coding stage;

step 4.4: carrying out down-sampling and feature extraction on the feature vector (N/16, 128) to obtain a feature vector (N/64, 256) of a third coding stage;

step 4.5: and (5) performing down-sampling and feature extraction on the feature vector (N/64, 256) to obtain a feature vector (N/128, 512) of a fourth encoding stage.

The specific implementation process of decoding is as follows:

step 4.6: splicing the feature vector (N/128, 512) of the LiDAR point cloud data in the fourth encoding stage with the feature vector (N/128, 512) of the hyperspectral point cloud data in the fourth encoding stage, and then processing the spliced feature vector by an MLP layer to obtain a first fusion feature vector;

step 4.7: splicing a feature vector (N/64, 256) of LiDAR point cloud data in a third coding stage with a feature vector (N/64, 256) of hyperspectral point cloud data in the third coding stage, and then fusing the spliced feature vector with a feature vector obtained by up-sampling the first fused feature vector by adopting an A-MLP layer to obtain a second fused feature vector of a first decoding stage;

step 4.8: splicing the feature vector (N/16, 128) of the LiDAR point cloud data in the second encoding stage with the feature vector (N/16, 128) of the hyperspectral point cloud data in the second encoding stage, and then fusing the spliced feature vector with the feature vector subjected to upsampling on the second fused feature vector by adopting an A-MLP layer to obtain a third fused feature vector in the second decoding stage;

step 4.9: splicing the characteristic vector (N/4, 64) of the LiDAR point cloud data in the first coding stage with the characteristic vector (N/4, 64) of the hyperspectral point cloud data in the first coding stage, and then fusing the spliced characteristic vector and the characteristic vector subjected to upsampling on the third fused characteristic vector by adopting an A-MLP layer to obtain a fourth fused characteristic vector of the third decoding stage, wherein the fourth fused characteristic vector is the second characteristic vector (N, 32).

The PointSIFT network is described by using different scale features, the multi-scale sensing capability of the network can be expanded, and the difference of the multi-scale features is not considered due to the fact that the PointSIFT network adopts the fusion of a fixed weight ratio. To this end, the present invention employs an AFF (enhanced Feature Fusion) module based on traditional channel attention enhancement, which combines Feature Fusion with channel attention. The AFF module considers features of two groups of different scale levels

(X is a small-scale feature extracted in the PointSIFT network encoding process (i.e., a feature vector corresponding to each encoding stage), Y is a large-scale feature propagated in the PointSIFT network decoding process (i.e., a fused feature vector corresponding to each decoding stage), C and N respectively represent the number of channels of the feature and the size of the point cloud), and the structural diagram is shown in FIG. 3, wherein CAM is a Channel Attention Module (Channel Attention Module), and the structural diagram of CAM is shown in FIG. 4, wherein AvgPool is Average Power, i.e., an Average Pooling layerThe calculation is an average over the entire spatial scale, i.e. (N, C) becomes (1, C) after this layer; conv is the convolutional layer; reLu is a ReLu activation function and is a nonlinear function, so that the reasoning process of the whole network is changed into nonlinearity; BN is a Batch Norm layer, namely a normalization layer, and is used for normalizing data distribution; reshape is a shape transformation, or can be marked as Repeat, i.e. the original

Is transformed into

The purpose of the data is to make the two data consistent in structure so as to carry out arithmetic operations such as addition and the like; sigmoid is a Sigmoid activation function for mapping data distribution to

Such that its meaning may represent probability, weight, etc.

The CAM is used to convert a certain feature into the weight of each element thereof, and the AFF module calculates the weight of the two features and performs weighting processing on the basis of the CAM, so as to realize the specific implementation process of fusing the spliced feature vector and the up-sampled feature vector by using the CAFF module (CAM module + AFF module) at each decoding stage as shown in fig. 3:

a, step a: respectively solving spliced feature vectors by using CAM moduleXUpsampled feature vectorYWeight coefficient of (2)

；

Step b: for the weight coefficient

Carrying out redistribution to obtain the distributed weight coefficient

(ii) a Wherein the redistribution formula is:

step c: using the assigned weight coefficients

wherein Z is a fusion feature vector.

The differences from general are: and the small-scale feature X extracted by the network in the coding process and the large-scale feature Y in the feature propagation process are respectively fused and then redistributed through the weight coefficient obtained by the channel attention enhancement module. For multi-scale point cloud, the AFF module considers context information, cross-fuses small-scale and large-scale weights, and gives different weight coefficients to features of different levels, so that continuous iterative training is realized, a network can give a higher low-level feature weight coefficient to a small-scale target, and a higher deep-level feature weight coefficient to a large-scale target.

As shown in fig. 4, the specific implementation process of solving the weight coefficient by using the CAM module is as follows:

step a1: feature vectorXOr feature vectorYAfter the average pooling layer processing, the feature vector (1,C) (ii) a Wherein the feature vectorXOr feature vectorYThe size of (a) is (N,C）；

step a2: the feature vector (1,C) The feature vector (N,C) Respectively sequentially processing the first convolution layer, the first normalization layer, the activation layer, the second convolution layer and the second normalization layer, and performing addition operation to obtain a weight coefficient

Or

。

And 5: and splicing the first eigenvector (N, 32) and the second eigenvector (N, 32), calculating through a full connection layer FC to obtain a third eigenvector (N, 9), and segmenting and labeling each point according to the third eigenvector (N, 9). Among the 9-dimensional features of the third feature vector (N, 9), three dimensions are coordinate information, and the other 6 dimensions are segmentation types.

The method is based on point feature description, combines a GAM module, considers characteristics of laser point reflection attributes, surface feature spectral attributes and the like, and adds a geometric perception module (GAM), a CAM module and an AFF module (namely an A-MLP layer comprises the CAM module and the AFF module) on the basis of conventional PointSIFT to construct a new network model CAFF-PointNet.

Note: f1 score F-score = precision recall 2/(precision + recall)

The above disclosure is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of changes or modifications within the technical scope of the present invention, and shall be covered by the scope of the present invention.

Claims

1. An airborne LiDAR point cloud semantic segmentation method is characterized by comprising the following steps:

equally dividing the LiDAR point clouds and the hyperspectral point clouds into F groups, wherein each group of LiDAR point cloud data is (N, b), each group of hyperspectral point cloud data is (N, q), N is the number of points in each group of point cloud data, b is the characteristic dimension of each point in each group of LiDAR point cloud data, and q is the spectral characteristic dimension of each point in each group of hyperspectral point cloud data;

performing feature extraction on each group of LiDAR point cloud data by using a GAM (gamma ray modeling) model to obtain a first feature vector; the specific implementation process of extracting the features of each group of LiDAR point cloud data by adopting the GAM model comprises the following steps:

for each point p in each set of LiDAR point cloud data _i Search its k nearest neighbors, according to point p _i Constructing a covariance matrix by the k nearest neighbors, and solving an eigenvalue and an eigenvector of the covariance matrix;

constructing geometric features from eigenvalues of the covariance matrix, from the points p _i And constructing elevation features by using the elevation values of k nearest neighbor points, and forming a point p by using the geometric features and the elevation features _i The feature vector of (2);

using MLPs layer-to-point p _i The characteristic vector is subjected to high-dimensional mapping, and then normalization processing is carried out on the characteristic dimension by adopting an activation function to obtain each point p _i Further obtaining a first feature vector corresponding to the set of LiDAR point cloud data;

respectively encoding each group of LiDAR point cloud data and each group of hyperspectral point cloud data by adopting a PointSIFT network to respectively obtain LiDAR point cloud characteristic vectors and hyperspectral characteristic vectors of each encoding stage;

in the decoding process, fusing the LiDAR point cloud feature vector and the hyperspectral feature vector of the current encoding stage with the fused feature vector of the previous decoding stage, inputting the fused feature vectors into the next decoding stage for decoding, and obtaining a second feature vector after decoding is completed; the specific implementation process of the fusion is as follows: splicing the LiDAR point cloud feature vector and the hyperspectral feature vector in the current encoding stage to obtain a spliced feature vector X; performing up-sampling on the fused feature vector in the previous decoding stage to obtain a feature vector Y; respectively solving the weight coefficients w of the eigenvector X and the eigenvector Y by adopting a CAM module _x 、w _y (ii) a For the weight coefficient w _x 、w _y Carrying out redistribution to obtain a distributed weight coefficient w _gx 、w _gy (ii) a Wherein the redistribution formula is:

w _gx ＝(w _x +1-w _y )/2，w _gy ＝(w _y +1-w _x )/2

using the assigned weight coefficient w _gx 、w _gy Fusing the spliced characteristic vector X and the upsampled characteristic vector Y to obtain a corresponding fusion characteristic vector, wherein a specific fusion formula is as follows:

wherein Z is a fusion feature vector;

2. The method according to claim 1, wherein the fusing of the acquired LiDAR point cloud and the hyperspectral image is implemented by:

3. The method of semantic segmentation of airborne liDAR point clouds of claim 1, wherein the encoding of liDAR point cloud data or hyper-spectral point cloud data using a PointSIFT network is performed by:

4. The method of semantic segmentation of airborne liDAR point cloud according to claim 3, wherein the decoding is performed by:

splicing the feature vector (N/128, 512) of the LiDAR point cloud data in the fourth encoding stage with the feature vector (N/128, 512) of the hyperspectral point cloud data in the fourth encoding stage, and then processing the spliced feature vector and the hyperspectral point cloud data in an MLP layer to obtain a first fusion feature vector;

splicing a feature vector (N/64, 256) of the LiDAR point cloud data in a third encoding stage with a feature vector (N/64, 256) of the hyperspectral point cloud data in the third encoding stage, and then fusing the spliced feature vector with a feature vector obtained by upsampling the first fused feature vector by adopting an A-MLP layer to obtain a second fused feature vector of a first decoding stage; wherein, the A-MLP layer comprises a CAM module and an AFF module;

5. The method for semantic segmentation of airborne liDAR point clouds of claim 1, wherein the specific implementation process of solving the weight coefficients of the feature vector X or the feature vector Y by using the CAM module is as follows:

processing the characteristic vector X or the characteristic vector Y through an average pooling layer to obtain a characteristic vector (1, C); wherein the size of the feature vector X or the feature vector Y is (N, C);

the characteristic vectors (1, C) and the characteristic vectors (N, C) are respectively processed by a first convolution layer, a first normalization layer, an activation layer, a second convolution layer and a second normalization layer in sequence and then added to obtain a weight coefficient w _x Or w _y 。

6. An electronic device comprising a memory and a processor, the memory having stored thereon a computer program operable on the processor, the processor when executing the computer program performing the steps of the onboard liDAR point cloud semantic segmentation method of any of claims 1-5.

7. A computer-readable storage medium, being a non-volatile storage medium or a non-transitory storage medium, having stored thereon a computer program, characterized in that the computer program, when being executed by a processor, performs the steps of the onboard liDAR point cloud semantic segmentation method according to any of claims 1 to 5.