CN116626701B

CN116626701B - A lidar target detection method based on spatiotemporal attention mechanism

Info

Publication number: CN116626701B
Application number: CN202310355420.0A
Authority: CN
Inventors: 余杰; 余昊; 朱亚坤; 刘义军; 胡天人
Original assignee: Dongfeng Motor Group Co Ltd
Current assignee: Dongfeng Motor Group Co Ltd
Priority date: 2023-04-04
Filing date: 2023-04-04
Publication date: 2025-09-30
Anticipated expiration: 2043-04-04
Also published as: CN116626701A

Abstract

This invention discloses a laser radar target detection method based on a spatiotemporal attention mechanism, comprising a model training phase and a target detection phase. During the model training phase, model weights are updated through multiple cycles to obtain final model weights, which include an attention weight matrix, thereby improving target detection accuracy. During the target detection phase, the attention weight matrix is used to extract features from key target areas, and the laser radar point cloud of the current frame is sampled based on the feature extraction results, thereby achieving target detection. Based on the attention mechanism, the present invention takes into account the spatial variations of the detection target, thereby improving sampling speed and accuracy.

Description

Laser radar target detection method based on space-time attention mechanism

Technical Field

The invention relates to the field of laser radar target detection, in particular to a laser radar target detection method based on a space-time attention mechanism.

Background

At present, a high-order intelligent driving technology is rapidly developed, and along with the increasing technical maturity of a laser radar sensing sensor, target sensing based on laser radar point cloud becomes an important component of the high-order intelligent driving. The 3D target perception based on the laser radar point cloud mainly comprises two modes, namely, filtering, ground segmentation and point cloud clustering of the original laser radar point cloud to realize the perception of a 3D target, wherein the traditional laser radar target perception method has the defects that specific categories of the target cannot be identified, clustering results are easily influenced by point cloud noise and ground segmentation effects, and a convolutional neural network technology, such as PointPillars and 3DSSD, is utilized to build a neural network model based on the 3D target perception of the laser radar point cloud, tens of thousands of laser radar point cloud data are acquired, target labeling is carried out, and target related characteristics are detected through model training. The convolutional neural network model-based target perception is realized, the accuracy of target perception can be greatly improved, and good effects can be obtained in different scenes, so that the method is the most widely used technology in intelligent driving target perception at present.

The prior art discloses a 4D millimeter wave three-dimensional target detection method based on a self-attention mechanism, which comprises the steps of collecting 4D millimeter wave Lei Dadian cloud data in real time and preprocessing, inputting the preprocessed 4D millimeter wave radar point cloud data into a pre-trained three-dimensional target detection model to output a target detection result, wherein the three-dimensional target detection model comprises a bird's eye view voying module, a stand column self-attention feature extraction module, a CNN trunk network and a PRN detection head, the bird's eye view voying module is used for voying the 4D millimeter wave Lei Dadian cloud data in a bird's eye view perspective to extract feature information F of the whole space, the stand column self-attention feature extraction module is used for extracting point cloud global features based on the self-attention mechanism to generate a BEV pseudo image, the CNN trunk network is used for extracting features of the BEV pseudo image to output a feature map, and the PRN detection head is used for carrying out target detection on the feature map to output a 3D target detection result.

The first disadvantage of the prior art includes, 1) the aerial view voxelization module loses part of millimeter wave Lei Dadian cloud information in the down-sampling process of generating the upright post, so that the subsequent target detection result cannot reach a higher level. 2) Attention characteristic extraction is performed based on the segmented upright columns, the prior information effect of the current frame target detection result on the next frame point cloud is not utilized, and therefore the attention characteristic extraction has limited detection precision and speed improvement.

The second prior art discloses a three-dimensional target detection method for 4D millimeter wave and laser point cloud multi-view feature fusion, which comprises the steps of collecting millimeter wave Lei Dadian cloud data and laser radar point cloud data simultaneously, inputting the millimeter wave Lei Dadian cloud data and the laser radar point cloud data into a pre-established and trained millimeter wave and laser radar fusion network, and outputting a three-dimensional target detection result, wherein the millimeter wave and laser radar fusion network is used for respectively learning interaction information of a laser radar and a millimeter wave radar from a BEV view angle, learning interaction information of the laser radar and the millimeter wave radar from a perspective view angle, and splicing the interaction information, so that fusion of the millimeter wave radar point cloud data and the laser radar point cloud data is realized.

The second disadvantage of the prior art comprises 1) that the voxelization module outputs Pillar characteristics of millimeter wave Lei Dadian cloud data and Pillar characteristics of laser radar point cloud data, and then converts the characteristics into BEV views for target detection, and the detection accuracy and speed are limited by the size of the partitioned villar characteristics. 2) The effect of detecting multi-scale targets under a single-scale feature map is poor.

Disclosure of Invention

The invention aims to provide a laser radar target detection method based on a space-time attention mechanism, so as to improve the target detection precision and the target detection rate of a laser radar.

In order to solve the technical problems, the invention provides a technical scheme that the laser radar target detection method based on a space-time attention mechanism builds a target detection model, and comprises a model training stage and a target detection stage;

in the model training stage, the data set subjected to target labeling is used for training, the following steps are carried out on the data set frame by frame,

S101, if the current frame is a first frame laser radar point cloud, sampling the first frame laser radar point cloud according to a certain sampling rule to obtain a sampling initial seed set P _seed, if the current frame is a non-first frame laser radar point cloud, sampling the current frame laser radar point cloud according to a convolution result of the attention force F _RIO of the important point target area by utilizing the attention force F _RIO comprising a plurality of important point target areas obtained in S107 to obtain a sampling initial seed set P _seed;

s102, carrying out fusion sampling on a sampling initial seed set P _seed to obtain a characteristic point cloud P _FS;

s103, mapping the characteristic point cloud P _FS to the BEV view;

S104, constructing a backbone network of a multi-scale feature pyramid based on an FPN (fuzzy neural network), inputting a BEV view M _bev into the backbone network, and extracting feature graphs with different scales through convolution operation;

s105, constructing a multi-scale detection head by using a full-connection layer under the characteristic diagram of each scale;

s106, selecting a next frame of laser radar point cloud;

S107, calculating a rotation transformation matrix T by utilizing motion transformation according to the motion information v of the laser radar carrier and the time interval T between the current frame and the previous adjacent frame, and then extracting key target areas in the laser radar point cloud of the current frame according to the target pose information p and the dimension information S in the target detection result of S105 and combining the obtained rotation transformation matrix T, so as to form an attention map F _RIO corresponding to a plurality of target key target areas;

S108, returning to S101;

In the model training stage, S101-S108 are used for carrying out cyclic iteration, loss is calculated according to a target detection result obtained by a target detection model and a target labeling true value in a dataset for training, model weight updating is carried out for a plurality of times, and a final model weight is obtained, wherein the final model weight comprises an attention weight matrix F _w;

the target detection stage configures a target detection model according to the final model weight obtained in the model training stage, the target detection model performs the following steps in the target detection stage,

S201, if the current frame is a first frame laser radar point cloud, sampling the first frame laser radar point cloud according to a certain sampling rule to obtain a sampling initial seed set P _seed, if the current frame is a non-first frame laser radar point cloud, extracting features of a heavy target area F according to attention force diagram F _RIO containing a plurality of key target areas obtained in S207, and sampling the current frame laser radar point cloud according to a feature extraction result by using an attention weight matrix F _w obtained in a model training stage to obtain a sampling initial seed set P _seed;

Subsequently, S202-S207 and S202-S207 are sequentially executed, wherein the specific operation steps are the same as those of S102-107;

And finally, executing S208 and returning to S201.

According to the scheme, the data set adopted in the model training stage is marked with targets including motor vehicles, non-motor vehicles and pedestrians.

According to the above scheme, in S101, when the current frame is a non-initial frame lidar point cloud, the sampling initial seed set P _seed is obtained by convolving the attention map F _RIO of the heavy target area to obtain the feature F _out,

F_out＝Conv(F_RIO)

Then F _out and the current frame point cloud set P get the sampling initial seed set index S _index,

S_index＝F_out*P

Then extracting a sampling initial seed set P _seed from the current frame point cloud set P according to the sampling initial seed set index S _index,

P_seed＝P[S_index]。

According to the above scheme, the fusion sampling in S102 specifically includes sampling the sampling initial seed set P _seed by using the furthest point sampling based on the euclidean distance and the furthest point sampling based on the feature distance in the 3DSSD backbone network to obtain sampling results Ld (P) and Lf (P), respectively, and then obtaining the fusion sampling result P _FS of the sampling initial seed set P _seed based on the fusion policy C (P), where the fusion policy C (P) is,

C(P)=Ld(P)+Lf(P)。

The model training phase calculates the loss using the loss function in PointPillar, as per the scheme described above.

According to the scheme, the model weight is updated by using an Adam optimizer in the model training stage.

The beneficial effects of the invention are as follows:

1. The method is used for extracting the target area and sampling seed set based on a space-time attention mechanism, so that the sampling precision and speed are improved, rich characteristic point clouds are included, the resolution of the BEV view can be improved, the additional calculation cost is not increased, and the high-resolution BEV view can improve the target detection precision.

2. The detection rate of target detection can be improved based on the multi-scale feature pyramid backbone network and the multi-scale detection head, and targets with different scales are covered.

Drawings

FIG. 1 is a flow chart of a method for detecting a target of a lidar based on a spatio-temporal attention mechanism according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of a backbone network architecture of a multi-scale feature pyramid in accordance with one embodiment of the present invention.

In fig. 2, conv1, conv2, and conv3 are respectively multi-layer convolutional neural networks, and M _L、M_M、M_S is a feature map for detecting a large-scale target, a middle-scale target, and a small-scale target, respectively.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present disclosure more apparent, the technical solutions of the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings of the embodiments of the present disclosure. It will be apparent that the described embodiments are some, but not all, of the embodiments of the present disclosure. All other embodiments, which can be made by one of ordinary skill in the art without the need for inventive faculty, are within the scope of the present disclosure, based on the described embodiments of the present disclosure.

A laser radar target detection method based on a space-time attention mechanism is provided, and a target detection model is built by the method, wherein the method comprises a model training stage and a target detection stage;

the data set used for training in the model training stage is laser radar point cloud which is acquired in the actual scene and is labeled by a frame-by-frame target, the labeled target comprises motor vehicles, non-motor vehicles and pedestrians, the target detection model is implemented in the model training stage as follows,

S101, if the current frame is a first frame laser radar point cloud, calculating to obtain the center of gravity of the first frame laser radar point cloud, selecting the point farthest from the center of gravity in the first frame laser radar point cloud as a sampling initial seed set P _seed, if the current frame is a non-first frame laser radar point cloud, convolving the attention map F _RIO of the heavy point target area according to the attention map F _RIO containing a plurality of key target areas obtained in S107 to obtain a characteristic F _out, namely F _out＝Conv(F_RIO), then obtaining a sampling initial seed set index S _index, particularly S _index＝F_out P according to the F _out and the current frame point cloud set P, and then extracting a sampling initial seed set P _seed, namely P _seed＝P[S_index, from the current frame point cloud set P according to the sampling initial seed set index S _index;

S102, sampling a sampling initial seed set P _seed by using the furthest point sampling based on Euclidean distance and the furthest point sampling based on characteristic distance in a 3DSSD backbone network to respectively obtain sampling results Ld (P) and Lf (P), obtaining a fusion sampling result of the sampling initial seed set P _seed based on a fusion strategy C (P), namely obtaining a characteristic point cloud P _FS, wherein the size of the characteristic point cloud is (N _m,C_m), N _m is the number of point clouds after fusion sampling, and C _m is the number of point cloud characteristic channels after fusion sampling;

S103, mapping the characteristic point cloud P _FS to the BEV view with a certain mapping resolution according to the (X, Y) coordinate information of the characteristic point cloud P _FS (the mapping resolution f=0.10m in the embodiment, considering that the original laser radar point cloud is subjected to fusion sampling processing, the mapping resolution can improve the precision of the BEV view), generating a BEV view M _bev with the size of (H, W, C _m), wherein H, W respectively represents the length and the width of the BEV view, the pixel value M _bev(i,j)＝max(P_cell of M _bev, wherein i and j are coordinate values of pixels in the BEV view, and P _cell is a point cloud set in the pixel range of a certain pixel;

S105, constructing a multi-scale detection head by using a full-connection layer under the characteristic diagram of each scale, carrying out target detection on the laser radar point cloud of the current frame by using the multi-scale detection head, and obtaining a target detection result after NMS processing;

s106, selecting a next frame of laser radar point cloud;

S108, returning to S101;

In the model training stage, S101-S108 are used for carrying out cyclic iteration, a target detection result obtained by a target detection model and a target annotation true value in a data set for training are subjected to loss calculation through a loss function in PointPillar, an Adam optimizer is used for carrying out model weight updating, and a final model weight is obtained after multiple cyclic iterations, wherein the final model weight comprises an attention weight matrix F _w;

S201, if the current frame is a first frame laser radar point cloud, calculating to obtain the center of gravity of the first frame laser radar point cloud, selecting the point farthest from the center of gravity in the first frame laser radar point cloud as a sampling initial seed set P _seed, if the current frame is a non-first frame laser radar point cloud, extracting features of an attention map F _RIO of a heavy point target area by using an attention weight matrix F _w obtained in a model training stage according to an attention map F _RIO containing a plurality of key target areas obtained in S207, obtaining F _out＝F_w(F_RIO), obtaining a sampling initial seed set index S _index, specifically S _index＝F_out is adopted as P according to the F _out and the current frame point cloud set P, and extracting a sampling initial seed set P _seed, namely P _seed＝P[S_index, from the current frame point cloud set P according to the sampling initial seed set index S _index;

And finally, executing S208 and returning to S201.

The foregoing description is only illustrative of the present invention and is not intended to limit the scope of the invention, and all equivalent structures or equivalent processes or direct or indirect application in other related technical fields are included in the scope of the present invention.

Claims

1. A laser radar target detection method based on a space-time attention mechanism is characterized in that a target detection model is built by the method, and the method comprises a model training stage and a target detection stage;

s103, mapping the characteristic point cloud P _FS to the BEV view;

s106, selecting a next frame of laser radar point cloud;

S108, returning to S101;

And finally, executing S208 and returning to S201.

2. The method for detecting laser radar targets based on the space-time attention mechanism of claim 1, wherein the data set used in the model training stage is labeled targets including motor vehicles, non-motor vehicles and pedestrians.

3. The method for detecting a target of a lidar based on a space-time attention mechanism of claim 1, wherein in S101, when the current frame is a non-initial frame lidar point cloud, the initial seed set P _seed is obtained by convolving an attention map F _RIO of a heavy target area to obtain a feature F _out,

F_out＝Conv(F_RIO)

S_index＝F_out*P

P_seed＝P[S_index]。

4. The method for detecting a laser radar target based on a space-time attention mechanism according to claim 1, wherein the fused sampling in S102 is specifically that sampling initial seed set P _seed is sampled by using the furthest point sampling based on Euclidean distance and the furthest point sampling based on characteristic distance in a 3DSSD backbone network respectively to obtain sampling results LD (P) and Lf (P), then fused sampling result P _FS of sampling initial seed set P _seed is obtained based on fusion strategy C (P), and the fusion strategy C (P) is that,

C(P)=Ld(P)+Lf(P)。

5. The method for detecting a lidar target based on a spatio-temporal concentration mechanism of claim 1 wherein the model training phase calculates the loss using a loss function in PointPillar.

6. The method for detecting the laser radar target based on the space-time attention mechanism of claim 1, wherein the model training stage utilizes an Adam optimizer to update the model weights.