Laser radar target detection method based on space-time attention mechanism
Technical Field
The invention relates to the field of laser radar target detection, in particular to a laser radar target detection method based on a space-time attention mechanism.
Background
At present, a high-order intelligent driving technology is rapidly developed, and along with the increasing technical maturity of a laser radar sensing sensor, target sensing based on laser radar point cloud becomes an important component of the high-order intelligent driving. The 3D target perception based on the laser radar point cloud mainly comprises two modes, namely, filtering, ground segmentation and point cloud clustering of the original laser radar point cloud to realize the perception of a 3D target, wherein the traditional laser radar target perception method has the defects that specific categories of the target cannot be identified, clustering results are easily influenced by point cloud noise and ground segmentation effects, and a convolutional neural network technology, such as PointPillars and 3DSSD, is utilized to build a neural network model based on the 3D target perception of the laser radar point cloud, tens of thousands of laser radar point cloud data are acquired, target labeling is carried out, and target related characteristics are detected through model training. The convolutional neural network model-based target perception is realized, the accuracy of target perception can be greatly improved, and good effects can be obtained in different scenes, so that the method is the most widely used technology in intelligent driving target perception at present.
The prior art discloses a 4D millimeter wave three-dimensional target detection method based on a self-attention mechanism, which comprises the steps of collecting 4D millimeter wave Lei Dadian cloud data in real time and preprocessing, inputting the preprocessed 4D millimeter wave radar point cloud data into a pre-trained three-dimensional target detection model to output a target detection result, wherein the three-dimensional target detection model comprises a bird's eye view voying module, a stand column self-attention feature extraction module, a CNN trunk network and a PRN detection head, the bird's eye view voying module is used for voying the 4D millimeter wave Lei Dadian cloud data in a bird's eye view perspective to extract feature information F of the whole space, the stand column self-attention feature extraction module is used for extracting point cloud global features based on the self-attention mechanism to generate a BEV pseudo image, the CNN trunk network is used for extracting features of the BEV pseudo image to output a feature map, and the PRN detection head is used for carrying out target detection on the feature map to output a 3D target detection result.
The first disadvantage of the prior art includes, 1) the aerial view voxelization module loses part of millimeter wave Lei Dadian cloud information in the down-sampling process of generating the upright post, so that the subsequent target detection result cannot reach a higher level. 2) Attention characteristic extraction is performed based on the segmented upright columns, the prior information effect of the current frame target detection result on the next frame point cloud is not utilized, and therefore the attention characteristic extraction has limited detection precision and speed improvement.
The second prior art discloses a three-dimensional target detection method for 4D millimeter wave and laser point cloud multi-view feature fusion, which comprises the steps of collecting millimeter wave Lei Dadian cloud data and laser radar point cloud data simultaneously, inputting the millimeter wave Lei Dadian cloud data and the laser radar point cloud data into a pre-established and trained millimeter wave and laser radar fusion network, and outputting a three-dimensional target detection result, wherein the millimeter wave and laser radar fusion network is used for respectively learning interaction information of a laser radar and a millimeter wave radar from a BEV view angle, learning interaction information of the laser radar and the millimeter wave radar from a perspective view angle, and splicing the interaction information, so that fusion of the millimeter wave radar point cloud data and the laser radar point cloud data is realized.
The second disadvantage of the prior art comprises 1) that the voxelization module outputs Pillar characteristics of millimeter wave Lei Dadian cloud data and Pillar characteristics of laser radar point cloud data, and then converts the characteristics into BEV views for target detection, and the detection accuracy and speed are limited by the size of the partitioned villar characteristics. 2) The effect of detecting multi-scale targets under a single-scale feature map is poor.
Disclosure of Invention
The invention aims to provide a laser radar target detection method based on a space-time attention mechanism, so as to improve the target detection precision and the target detection rate of a laser radar.
In order to solve the technical problems, the invention provides a technical scheme that the laser radar target detection method based on a space-time attention mechanism builds a target detection model, and comprises a model training stage and a target detection stage;
in the model training stage, the data set subjected to target labeling is used for training, the following steps are carried out on the data set frame by frame,
S101, if the current frame is a first frame laser radar point cloud, sampling the first frame laser radar point cloud according to a certain sampling rule to obtain a sampling initial seed set P seed, if the current frame is a non-first frame laser radar point cloud, sampling the current frame laser radar point cloud according to a convolution result of the attention force F RIO of the important point target area by utilizing the attention force F RIO comprising a plurality of important point target areas obtained in S107 to obtain a sampling initial seed set P seed;
s102, carrying out fusion sampling on a sampling initial seed set P seed to obtain a characteristic point cloud P FS;
s103, mapping the characteristic point cloud P FS to the BEV view;
S104, constructing a backbone network of a multi-scale feature pyramid based on an FPN (fuzzy neural network), inputting a BEV view M bev into the backbone network, and extracting feature graphs with different scales through convolution operation;
s105, constructing a multi-scale detection head by using a full-connection layer under the characteristic diagram of each scale;
s106, selecting a next frame of laser radar point cloud;
S107, calculating a rotation transformation matrix T by utilizing motion transformation according to the motion information v of the laser radar carrier and the time interval T between the current frame and the previous adjacent frame, and then extracting key target areas in the laser radar point cloud of the current frame according to the target pose information p and the dimension information S in the target detection result of S105 and combining the obtained rotation transformation matrix T, so as to form an attention map F RIO corresponding to a plurality of target key target areas;
S108, returning to S101;
In the model training stage, S101-S108 are used for carrying out cyclic iteration, loss is calculated according to a target detection result obtained by a target detection model and a target labeling true value in a dataset for training, model weight updating is carried out for a plurality of times, and a final model weight is obtained, wherein the final model weight comprises an attention weight matrix F w;
the target detection stage configures a target detection model according to the final model weight obtained in the model training stage, the target detection model performs the following steps in the target detection stage,
S201, if the current frame is a first frame laser radar point cloud, sampling the first frame laser radar point cloud according to a certain sampling rule to obtain a sampling initial seed set P seed, if the current frame is a non-first frame laser radar point cloud, extracting features of a heavy target area F according to attention force diagram F RIO containing a plurality of key target areas obtained in S207, and sampling the current frame laser radar point cloud according to a feature extraction result by using an attention weight matrix F w obtained in a model training stage to obtain a sampling initial seed set P seed;
Subsequently, S202-S207 and S202-S207 are sequentially executed, wherein the specific operation steps are the same as those of S102-107;
And finally, executing S208 and returning to S201.
According to the scheme, the data set adopted in the model training stage is marked with targets including motor vehicles, non-motor vehicles and pedestrians.
According to the above scheme, in S101, when the current frame is a non-initial frame lidar point cloud, the sampling initial seed set P seed is obtained by convolving the attention map F RIO of the heavy target area to obtain the feature F out,
Fout=Conv(FRIO)
Then F out and the current frame point cloud set P get the sampling initial seed set index S index,
Sindex=Fout*P
Then extracting a sampling initial seed set P seed from the current frame point cloud set P according to the sampling initial seed set index S index,
Pseed=P[Sindex]。
According to the above scheme, the fusion sampling in S102 specifically includes sampling the sampling initial seed set P seed by using the furthest point sampling based on the euclidean distance and the furthest point sampling based on the feature distance in the 3DSSD backbone network to obtain sampling results Ld (P) and Lf (P), respectively, and then obtaining the fusion sampling result P FS of the sampling initial seed set P seed based on the fusion policy C (P), where the fusion policy C (P) is,
C(P)=Ld(P)+Lf(P)。
The model training phase calculates the loss using the loss function in PointPillar, as per the scheme described above.
According to the scheme, the model weight is updated by using an Adam optimizer in the model training stage.
The beneficial effects of the invention are as follows:
1. The method is used for extracting the target area and sampling seed set based on a space-time attention mechanism, so that the sampling precision and speed are improved, rich characteristic point clouds are included, the resolution of the BEV view can be improved, the additional calculation cost is not increased, and the high-resolution BEV view can improve the target detection precision.
2. The detection rate of target detection can be improved based on the multi-scale feature pyramid backbone network and the multi-scale detection head, and targets with different scales are covered.
Drawings
FIG. 1 is a flow chart of a method for detecting a target of a lidar based on a spatio-temporal attention mechanism according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a backbone network architecture of a multi-scale feature pyramid in accordance with one embodiment of the present invention.
In fig. 2, conv1, conv2, and conv3 are respectively multi-layer convolutional neural networks, and M L、MM、MS is a feature map for detecting a large-scale target, a middle-scale target, and a small-scale target, respectively.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present disclosure more apparent, the technical solutions of the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings of the embodiments of the present disclosure. It will be apparent that the described embodiments are some, but not all, of the embodiments of the present disclosure. All other embodiments, which can be made by one of ordinary skill in the art without the need for inventive faculty, are within the scope of the present disclosure, based on the described embodiments of the present disclosure.
A laser radar target detection method based on a space-time attention mechanism is provided, and a target detection model is built by the method, wherein the method comprises a model training stage and a target detection stage;
the data set used for training in the model training stage is laser radar point cloud which is acquired in the actual scene and is labeled by a frame-by-frame target, the labeled target comprises motor vehicles, non-motor vehicles and pedestrians, the target detection model is implemented in the model training stage as follows,
S101, if the current frame is a first frame laser radar point cloud, calculating to obtain the center of gravity of the first frame laser radar point cloud, selecting the point farthest from the center of gravity in the first frame laser radar point cloud as a sampling initial seed set P seed, if the current frame is a non-first frame laser radar point cloud, convolving the attention map F RIO of the heavy point target area according to the attention map F RIO containing a plurality of key target areas obtained in S107 to obtain a characteristic F out, namely F out=Conv(FRIO), then obtaining a sampling initial seed set index S index, particularly S index=Fout P according to the F out and the current frame point cloud set P, and then extracting a sampling initial seed set P seed, namely P seed=P[Sindex, from the current frame point cloud set P according to the sampling initial seed set index S index;
S102, sampling a sampling initial seed set P seed by using the furthest point sampling based on Euclidean distance and the furthest point sampling based on characteristic distance in a 3DSSD backbone network to respectively obtain sampling results Ld (P) and Lf (P), obtaining a fusion sampling result of the sampling initial seed set P seed based on a fusion strategy C (P), namely obtaining a characteristic point cloud P FS, wherein the size of the characteristic point cloud is (N m,Cm), N m is the number of point clouds after fusion sampling, and C m is the number of point cloud characteristic channels after fusion sampling;
S103, mapping the characteristic point cloud P FS to the BEV view with a certain mapping resolution according to the (X, Y) coordinate information of the characteristic point cloud P FS (the mapping resolution f=0.10m in the embodiment, considering that the original laser radar point cloud is subjected to fusion sampling processing, the mapping resolution can improve the precision of the BEV view), generating a BEV view M bev with the size of (H, W, C m), wherein H, W respectively represents the length and the width of the BEV view, the pixel value M bev(i,j)=max(Pcell of M bev, wherein i and j are coordinate values of pixels in the BEV view, and P cell is a point cloud set in the pixel range of a certain pixel;
S104, constructing a backbone network of a multi-scale feature pyramid based on an FPN (fuzzy neural network), inputting a BEV view M bev into the backbone network, and extracting feature graphs with different scales through convolution operation;
S105, constructing a multi-scale detection head by using a full-connection layer under the characteristic diagram of each scale, carrying out target detection on the laser radar point cloud of the current frame by using the multi-scale detection head, and obtaining a target detection result after NMS processing;
s106, selecting a next frame of laser radar point cloud;
S107, calculating a rotation transformation matrix T by utilizing motion transformation according to the motion information v of the laser radar carrier and the time interval T between the current frame and the previous adjacent frame, and then extracting key target areas in the laser radar point cloud of the current frame according to the target pose information p and the dimension information S in the target detection result of S105 and combining the obtained rotation transformation matrix T, so as to form an attention map F RIO corresponding to a plurality of target key target areas;
S108, returning to S101;
In the model training stage, S101-S108 are used for carrying out cyclic iteration, a target detection result obtained by a target detection model and a target annotation true value in a data set for training are subjected to loss calculation through a loss function in PointPillar, an Adam optimizer is used for carrying out model weight updating, and a final model weight is obtained after multiple cyclic iterations, wherein the final model weight comprises an attention weight matrix F w;
the target detection stage configures a target detection model according to the final model weight obtained in the model training stage, the target detection model performs the following steps in the target detection stage,
S201, if the current frame is a first frame laser radar point cloud, calculating to obtain the center of gravity of the first frame laser radar point cloud, selecting the point farthest from the center of gravity in the first frame laser radar point cloud as a sampling initial seed set P seed, if the current frame is a non-first frame laser radar point cloud, extracting features of an attention map F RIO of a heavy point target area by using an attention weight matrix F w obtained in a model training stage according to an attention map F RIO containing a plurality of key target areas obtained in S207, obtaining F out=Fw(FRIO), obtaining a sampling initial seed set index S index, specifically S index=Fout is adopted as P according to the F out and the current frame point cloud set P, and extracting a sampling initial seed set P seed, namely P seed=P[Sindex, from the current frame point cloud set P according to the sampling initial seed set index S index;
Subsequently, S202-S207 and S202-S207 are sequentially executed, wherein the specific operation steps are the same as those of S102-107;
And finally, executing S208 and returning to S201.
The foregoing description is only illustrative of the present invention and is not intended to limit the scope of the invention, and all equivalent structures or equivalent processes or direct or indirect application in other related technical fields are included in the scope of the present invention.