CN115294539A

CN115294539A - Multitask detection method and device, storage medium and terminal

Info

Publication number: CN115294539A
Application number: CN202210582263.2A
Authority: CN
Inventors: 黄超; 姚为龙; 郑伟伟
Original assignee: Shanghai Xiantu Intelligent Technology Co Ltd
Current assignee: Shanghai Xiantu Intelligent Technology Co Ltd
Priority date: 2022-05-26
Filing date: 2022-05-26
Publication date: 2022-11-04

Abstract

A multi-task detection method and device, storage medium and terminal, the method includes: acquiring input data, and inputting the input data into a multi-task detection model obtained by training, the multi-task detection model includes: a single feature extraction network, multiple Adjust the network and multiple prediction networks, wherein the prediction network corresponds to the task one-to-one, and the adjustment network corresponds to the prediction network one-to-one; the feature extraction network is used to extract features from the input data to obtain a general feature vector; the kth adjustment is used The network performs feature extraction and/or dimension transformation on the general feature vector to obtain the feature vector corresponding to the k-th task, denoted as the k-th feature vector, where 1≤k≤M, k, M are positive integers, M is the task The kth prediction network is used to calculate the kth feature vector to obtain the detection result of the kth task. Through the above solution, the calculation amount of the multi-task detection process can be reduced, and the detection efficiency can be improved.

Description

Multi-task detection method and device, storage medium, terminal

技术领域technical field

本发明涉及深度学习技术领域，尤其涉及一种多任务检测方法及装置、存储介质，终端。The present invention relates to the technical field of deep learning, in particular to a multi-task detection method and device, a storage medium, and a terminal.

背景技术Background technique

近年来，自动驾驶技术得到快速发展。目前，自动驾驶系统主要由环境感知系统、定位导航系统、路径规划系统、速度控制系统和运动控制系统等组成。其中，环境感知系统是自动驾驶系统的重要组成部分，作为整个自动驾驶的上游环节，其检测性能直接影响规划、决策等自动驾驶的后续环节。In recent years, autonomous driving technology has developed rapidly. At present, the automatic driving system is mainly composed of environment perception system, positioning and navigation system, path planning system, speed control system and motion control system. Among them, the environmental perception system is an important part of the automatic driving system. As the upstream link of the entire automatic driving, its detection performance directly affects the subsequent links of automatic driving such as planning and decision-making.

雷达作为感知设备，具有较好的稳定性和适应性，现有技术中通常将雷达部署于自动驾驶车辆上来进行3D目标检测，以实现环境感知。自动驾驶的环境感知会涉及障碍物检测、车道线检测和可行驶区域分割等多个检测任务，现有方案中通常是分别独立地处理这些检测任务，考虑到有限的计算资源，单独处理这些任务极大地增加了车载计算资源的消耗，检测效率较低。As a perception device, radar has good stability and adaptability. In the prior art, radar is usually deployed on self-driving vehicles for 3D target detection to realize environment perception. Environmental perception for autonomous driving involves multiple detection tasks such as obstacle detection, lane line detection, and drivable area segmentation. In existing solutions, these detection tasks are usually processed independently. Considering limited computing resources, these tasks are processed separately. It greatly increases the consumption of on-board computing resources, and the detection efficiency is low.

因此，亟需一种多任务检测方法，能够减少多任务检测过程的计算量，提高检测效率。Therefore, there is an urgent need for a multi-task detection method, which can reduce the calculation amount of the multi-task detection process and improve detection efficiency.

发明内容Contents of the invention

本发明解决的技术问题是如何减少多任务检测过程的计算量，提高检测效率。The technical problem solved by the invention is how to reduce the calculation amount in the multi-task detection process and improve the detection efficiency.

为解决上述技术问题，本发明实施例提供一种多任务检测方法，所述方法包括：获取输入数据，并将所述输入数据输入至训练得到的多任务检测模型，所述多任务检测模型包括：单个特征提取网络、多个调整网络和多个预测网络，其中，所述预测网络和所述任务一一对应，且所述调整网络和所述预测网络一一对应；采用所述特征提取网络对所述输入数据进行特征提取，以得到通用特征向量；采用第k个调整网络对所述通用特征向量进行特征提取和/或维度变换，以得到第k个任务对应的特征向量，记为第k特征向量，其中，1≤k≤M，k、M为正整数，M为所述任务的数量；采用第k个预测网络对所述第k特征向量进行计算，以得到第k个任务的检测结果。In order to solve the above-mentioned technical problems, an embodiment of the present invention provides a multi-task detection method, the method comprising: acquiring input data, and inputting the input data into a trained multi-task detection model, the multi-task detection model includes : a single feature extraction network, a plurality of adjustment networks and a plurality of prediction networks, wherein the prediction network corresponds to the task one-to-one, and the adjustment network corresponds to the prediction network; the feature extraction network is adopted Perform feature extraction on the input data to obtain a general feature vector; use the kth adjustment network to perform feature extraction and/or dimension transformation on the general feature vector to obtain the feature vector corresponding to the kth task, which is denoted as k eigenvectors, wherein, 1≤k≤M, k and M are positive integers, and M is the number of tasks; use the kth prediction network to calculate the kth eigenvectors to obtain the kth task Test results.

可选的，所述输入数据为雷达采集的点云数据，所述特征提取网络包括卷积层，所述卷积层执行稀疏卷积计算。Optionally, the input data is point cloud data collected by radar, and the feature extraction network includes a convolutional layer, and the convolutional layer performs sparse convolution calculation.

可选的，所述多任务检测模型是预先采用训练数据对预设模型进行训练得到的，所述预设模型包括：单个初始特征提取网络、多个初始调整网络和多个初始预测网络，所述训练数据包括：第k个任务对应的样本数据，获取输入数据之前，所述方法还包括：步骤一：采用所述训练数据对所述预设模型进行有监督训练，当满足第一预设条件时，得到中间检测模型，其中，所述中间检测模型包括所述特征提取网络、多个中间调整网络和多个中间预测网络；步骤二：采用第k个任务对应的样本数据对第k个中间调整网络和第k个中间预测网络进行有监督训练，当满足第二预设条件时，得到所述多任务检测网络；其中，所述第一预设条件包括：总损失小于或等于第一预设损失，所述第二预设条件包括：每个任务的损失小于或等于第二预设损失，所述总损失是根据M个任务的损失计算得到的。Optionally, the multi-task detection model is obtained by using training data to train a preset model in advance, and the preset model includes: a single initial feature extraction network, multiple initial adjustment networks, and multiple initial prediction networks, so The training data includes: sample data corresponding to the kth task. Before obtaining the input data, the method further includes: Step 1: Using the training data to perform supervised training on the preset model, when the first preset is satisfied conditions, an intermediate detection model is obtained, wherein the intermediate detection model includes the feature extraction network, a plurality of intermediate adjustment networks and a plurality of intermediate prediction networks; Step 2: use the sample data corresponding to the kth task to The intermediate adjustment network and the kth intermediate prediction network perform supervised training, and when the second preset condition is met, the multi-task detection network is obtained; wherein, the first preset condition includes: the total loss is less than or equal to the first The preset loss, the second preset condition includes: the loss of each task is less than or equal to the second preset loss, and the total loss is calculated based on the losses of M tasks.

可选的，采用下列公式计算所述总损失：Optionally, the total loss is calculated using the following formula:

L为所述总损失，σ_k为所述第k个初始调整网络的权重，σ_k为可学习的参数，L_k为所述第k个任务的损失。L is the total loss, σ _k is the weight of the k-th initial adjustment network, σ _k is a learnable parameter, and L _k is the loss of the k-th task.

可选的，所述步骤一采用的学习率为第一学习率，步骤二采用的学习率为第二学习率，所述第一学习率大于第二学习率。Optionally, the learning rate used in step 1 is a first learning rate, the learning rate used in step 2 is a second learning rate, and the first learning rate is greater than the second learning rate.

可选的，所述样本数据集还包括：无标注样本数据，步骤一之前，所述方法还包括：采用所述无标注样本数据对所述预设模型进行无监督训练，当满足第三预设条件时，得到用于进行所述有监督训练的预设模型。Optionally, the sample data set further includes: unlabeled sample data. Before step 1, the method further includes: using the unlabeled sample data to perform unsupervised training on the preset model, when the third preset When the conditions are set, a preset model for performing the supervised training is obtained.

可选的，所述无标注样本数据包括多帧第一点云数据，所述无监督训练的约束条件包括：匹配点之间的特征距离最小化和/或非匹配点之间的特征距离最大化，采用所述无标注样本数据对所述预设模型进行无监督训练之前，所述方法还包括：对每帧第一点云数据进行数据增强处理，以得到该帧第一点云数据对应的第二点云数据，其中，所述第一点云数据中的点和所述第二点云数据中的点一一对应；在每帧第一点云数据及其对应的第二点云数据中确定匹配点和非匹配点；其中，所述匹配点是指具有对应关系的点，所述非匹配点是指不具有对应关系的点。Optionally, the unlabeled sample data includes multiple frames of first point cloud data, and the constraints of the unsupervised training include: the minimum feature distance between matching points and/or the maximum feature distance between non-matching points Before performing unsupervised training on the preset model using the unlabeled sample data, the method further includes: performing data enhancement processing on the first point cloud data of each frame, so as to obtain the first point cloud data corresponding to the frame. The second point cloud data, wherein, the points in the first point cloud data and the points in the second point cloud data correspond one-to-one; in each frame of the first point cloud data and its corresponding second point cloud Matching points and non-matching points are determined in the data; wherein, the matching points refer to points with a corresponding relationship, and the non-matching points refer to points without a corresponding relationship.

为解决上述技术问题，本发明实施例还提供一种多任务检测方法装置，所述装置包括：获取模块，用于获取输入数据，并将所述输入数据输入至训练得到的多任务检测模型，所述多任务检测模型包括：单个特征提取网络、多个调整网络和多个预测网络，其中，所述预测网络和所述任务一一对应，且所述调整网络和所述预测网络一一对应；特征提取模块，用于采用特征提取网络对所述输入数据进行特征提取，以得到通用特征向量；调整模块，用于采用第k个调整网络对所述通用特征向量进行特征提取和/或维度变换，以得到第k个任务对应的特征向量，记为第k特征向量，其中，1≤k≤M，k、M为正整数，M为所述任务的数量；检测模块，用于采用第k个预测网络对所述第k特征向量进行计算，以得到第k个任务的检测结果。In order to solve the above-mentioned technical problems, an embodiment of the present invention also provides a multi-task detection method device, the device includes: an acquisition module, configured to acquire input data, and input the input data to the trained multi-task detection model, The multi-task detection model includes: a single feature extraction network, multiple adjustment networks and multiple prediction networks, wherein the prediction network corresponds to the task one by one, and the adjustment network corresponds to the prediction network one by one ; The feature extraction module is used to perform feature extraction on the input data using a feature extraction network to obtain a general feature vector; the adjustment module is used to perform feature extraction and/or dimensionality on the general feature vector using the kth adjustment network Transform to obtain the feature vector corresponding to the kth task, which is recorded as the kth feature vector, wherein, 1≤k≤M, k and M are positive integers, and M is the number of tasks; the detection module is used to adopt the kth task The k prediction networks calculate the kth feature vector to obtain the detection result of the kth task.

本发明实施例还提供一种存储介质，其上存储有计算机程序，所述计算机程序被处理器运行时，执行上述的多任务检测方法的步骤。An embodiment of the present invention also provides a storage medium on which a computer program is stored, and when the computer program is run by a processor, the steps of the above-mentioned multitasking detection method are executed.

本发明实施例还提供一种终端，包括存储器和处理器，所述存储器上存储有可在所述处理器上运行的计算机程序，所述处理器运行所述计算机程序时执行上述的多任务检测方法的步骤。An embodiment of the present invention also provides a terminal, including a memory and a processor, the memory stores a computer program that can run on the processor, and the processor executes the above-mentioned multitasking detection when running the computer program method steps.

与现有技术相比，本发明实施例的技术方案具有以下有益效果：Compared with the prior art, the technical solutions of the embodiments of the present invention have the following beneficial effects:

在本发明实施例的方案中，获取输入数据并将输入数据输入至训练得到的多任务检测模型。其中，多任务检测模型包括：单个特征提取网络、多个调整网络和多个预测网络，其中，预测网络和所述任务一一对应，且调整网络和所述预测网络一一对应。进一步地，采用所述特征提取网络对所述输入数据进行特征提取，以得到通用特征向量；采用第k个调整网络对通用特征向量进行特征提取和/或维度变换，以得到第k个任务对应的特征向量，记为第k特征向量；采用第k个预测网络对所述第k特征向量进行计算，以得到第k个任务的检测结果。由此，本发明实施例的方案中，可以仅对每帧的输入数据进行单次的特征提取，然后基于提取到的通用特征向量得到多个检测任务的检测结果，从而完成多个检测任务。也即，多个检测任务共享特征提取网络。一方面，相较于现有的多任务检测模型，多个检测任务共用特征提取网络，减少了模型容量和计算复杂度。另一方面，针对每个检测任务采用相应的调整网络，以对通用特征向量进行域适应转换，从而确保每个预测网络的检测效果。因此，本发明实施例的方案可以在确保检测效果的情况下，减少计算复杂度，提高检测效率。In the solution of the embodiment of the present invention, the input data is acquired and input into the trained multi-task detection model. Wherein, the multi-task detection model includes: a single feature extraction network, multiple adjustment networks and multiple prediction networks, wherein the prediction network corresponds to the tasks one by one, and the adjustment network corresponds to the prediction network one by one. Further, the feature extraction network is used to perform feature extraction on the input data to obtain a general feature vector; the kth adjustment network is used to perform feature extraction and/or dimension transformation on the general feature vector to obtain the kth task corresponding to The eigenvector of is denoted as the kth eigenvector; the kth prediction network is used to calculate the kth eigenvector to obtain the detection result of the kth task. Therefore, in the solution of the embodiment of the present invention, only a single feature extraction can be performed on the input data of each frame, and then the detection results of multiple detection tasks can be obtained based on the extracted common feature vector, thereby completing multiple detection tasks. That is, multiple detection tasks share the feature extraction network. On the one hand, compared with the existing multi-task detection model, multiple detection tasks share the feature extraction network, which reduces the model capacity and computational complexity. On the other hand, a corresponding tuned network is adopted for each detection task to perform domain-adaptive transformation on the common feature vectors to ensure the detection performance of each prediction network. Therefore, the solutions in the embodiments of the present invention can reduce computational complexity and improve detection efficiency while ensuring detection effects.

进一步，本发明实施例的方案中，采用训练数据对预设模型进行有监督训练，当满足第一预设条件时，得到中间检测模型；然后采用第k个任务对应的样本数据对第k个中间调整网络以及第k个中间预测网络进行有监督训练，当满足第二预设条件时，得到多任务检测网络；其中，第一预设条件包括：总损失小于或等于第一预设损失，第二预设条件包括：每个任务的损失小于或等于第二预设损失，总损失是根据M个任务的损失计算得到的。相较于现有技术中各个任务检测模型是互相独立训练的，采用上述方案可以对采用联合训练的方法训练得到特征提取网络，能够挖掘出各个检测任务之间的关联信息，有利于提高特征提取网络提取特征的能力，从而提升了多任务检测的准确度和泛化能力。Further, in the solution of the embodiment of the present invention, the training data is used to carry out supervised training on the preset model, and when the first preset condition is met, an intermediate detection model is obtained; then the sample data corresponding to the kth task is used to test the kth task The intermediate adjustment network and the kth intermediate prediction network perform supervised training, and when the second preset condition is met, a multi-task detection network is obtained; wherein, the first preset condition includes: the total loss is less than or equal to the first preset loss, The second preset condition includes: the loss of each task is less than or equal to the second preset loss, and the total loss is calculated based on the losses of M tasks. Compared with the prior art, each task detection model is trained independently of each other. Using the above scheme, the joint training method can be used to train the feature extraction network, which can dig out the correlation information between each detection task, which is conducive to improving feature extraction. The ability of the network to extract features, thereby improving the accuracy and generalization ability of multi-task detection.

进一步，本发明实施例的方案中，在采用有监督训练之前，先进行无监督训练，以得到用于进行有监督训练的预设模型。采用这样的方案，将无监督训练和有监督训练进行结合，有利于可以减少对有标注样本数据的需求，也有利于提高网络的准确度和泛化能力。Furthermore, in the solutions of the embodiments of the present invention, before using supervised training, unsupervised training is performed first to obtain a preset model for supervised training. Using such a scheme, combining unsupervised training with supervised training will help reduce the need for labeled sample data and improve the accuracy and generalization capabilities of the network.

进一步，本发明实施例的方案中，采用下列公式计算总损失：

其中，σ_k为第k个初始调整网络的权重，且σ_k为可学习的参数。采用这样的方案，对每个调整网络采用可学习的自适应权重，可以减少人为超参数设定，有利于网络自动优化权重，从而提高特征提取网络的训练效果。Further, in the solution of the embodiment of the present invention, the following formula is used to calculate the total loss:

Among them, σ _k is the weight of the kth initial adjustment network, and σ _k is a learnable parameter. Adopting such a scheme, adopting learnable adaptive weights for each adjustment network can reduce artificial hyperparameter settings, which is conducive to the automatic optimization of weights by the network, thereby improving the training effect of the feature extraction network.

附图说明Description of drawings

图1是现有技术中一种多任务检测模型的结构示意图；Fig. 1 is a schematic structural diagram of a multi-task detection model in the prior art;

图2是本发明实施例中一种多任务检测方法的流程示意图；Fig. 2 is a schematic flow chart of a multi-task detection method in an embodiment of the present invention;

图3是本发明实施例中一种多任务检测模型的结构示意图；Fig. 3 is a schematic structural diagram of a multi-task detection model in an embodiment of the present invention;

图4是本发明实施例中一种多任务检测模型的训练方法的流程示意图；4 is a schematic flow diagram of a training method for a multi-task detection model in an embodiment of the present invention;

图5是本发明实施例中一种多任务检测装置的结构示意图。Fig. 5 is a schematic structural diagram of a multi-task detection device in an embodiment of the present invention.

具体实施方式Detailed ways

如背景技术所述，亟需一种多任务检测方法，能够减少多任务检测过程的计算量，提高检测效率。As mentioned in the background art, there is an urgent need for a multi-task detection method that can reduce the amount of calculation in the multi-task detection process and improve detection efficiency.

在自动驾驶感知系统的应用场景中，由于自动驾驶感知系统对于实时性的要求，通常需要对于同一帧点云数据同时执行多个检测任务。现有的多任务检测模型通常包括多个互相独立的检测模型。如图1所示，第一检测模型11、第二检测模型12和第三检测模型13可以分别用于执行不同的检测任务。例如，可以将同一帧点云数据分别输入至第一检测模型11、第二检测模型12和第三检测模型13中，以得到不同检测任务的检测结果，从而完成多任务检测。In the application scenario of the automatic driving perception system, due to the real-time requirements of the automatic driving perception system, it is usually necessary to simultaneously perform multiple detection tasks on the same frame of point cloud data. Existing multi-task detection models usually include multiple independent detection models. As shown in FIG. 1 , the first detection model 11 , the second detection model 12 and the third detection model 13 can be used to perform different detection tasks respectively. For example, the same frame of point cloud data can be input into the first detection model 11 , the second detection model 12 and the third detection model 13 to obtain detection results of different detection tasks, thereby completing multi-task detection.

另外，执行多任务检测时，需要将多个检测模型均读取至内存中同时进行任务检测。因此，现有的方案需占用大量的内存，计算量也十分庞大，容易造成卡顿，检测效率较低。In addition, when performing multi-task detection, it is necessary to read multiple detection models into the memory and perform task detection simultaneously. Therefore, the existing solutions need to occupy a large amount of memory, and the amount of calculation is also very large, which is easy to cause freezes and low detection efficiency.

为了解决上述技术问题，本发明实施例提供一种多任务检测方法，在本发明实施例的方案中，可以仅对每帧的输入数据进行单次的特征提取，然后基于提取到的通用特征向量得到多个检测任务的检测结果，从而完成多个检测任务。也即，多个检测任务共享特征提取网络。一方面，相较于现有技术中的多任务检测模型，本发明实施例的方案中多个检测任务可以共用特征提取网络，减少了模型容量和计算复杂度。另一方面，针对每个检测任务采用相应的调整网络，以对通用特征向量进行域适应转换，从而确保每个预测网络的检测效果。因此，本发明实施例的方案可以在确保检测效果的情况下，减少计算复杂度，提高检测效率。In order to solve the above technical problems, an embodiment of the present invention provides a multi-task detection method. In the solution of the embodiment of the present invention, only a single feature extraction can be performed on the input data of each frame, and then based on the extracted general feature vector The detection results of multiple detection tasks are obtained, so as to complete multiple detection tasks. That is, multiple detection tasks share the feature extraction network. On the one hand, compared with the multi-task detection model in the prior art, multiple detection tasks in the solution of the embodiment of the present invention can share the feature extraction network, which reduces the model capacity and computational complexity. On the other hand, a corresponding tuned network is adopted for each detection task to perform domain-adaptive transformation on the common feature vectors to ensure the detection performance of each prediction network. Therefore, the solutions in the embodiments of the present invention can reduce computational complexity and improve detection efficiency while ensuring detection effects.

为使本发明的上述目的、特征和有益效果能够更为明显易懂，下面结合附图对本发明的具体实施例做详细的说明。In order to make the above objects, features and beneficial effects of the present invention more comprehensible, specific embodiments of the present invention will be described in detail below in conjunction with the accompanying drawings.

参照图2，图2是本发明实施例中一种多任务检测方法的流程示意图。图2示出的方法可以由终端执行，所述终端可以是现有的各种具有数据接收和数据处理功能的设备。Referring to FIG. 2 , FIG. 2 is a schematic flowchart of a multi-task detection method in an embodiment of the present invention. The method shown in FIG. 2 can be executed by a terminal, and the terminal can be various existing devices with data receiving and data processing functions.

在一个应用场景中，所述终端可以是车载终端，例如，可以是车辆的电子控制单元(Electronic Control Unit，ECU)，其中，所述车辆可以配置有雷达，本发明实施例对于车辆的类型并不进行限制，例如，可以是自动引导车(Automated Guided Vehicles，AGV)、自动驾驶汽车(Autonomous vehicles)等，但并不限于此。In an application scenario, the terminal may be a vehicle-mounted terminal, for example, it may be an electronic control unit (Electronic Control Unit, ECU) of a vehicle, wherein the vehicle may be equipped with a radar, and the embodiment of the present invention does not apply to the type of the vehicle Without limitation, for example, it may be automatic guided vehicles (Automated Guided Vehicles, AGV), automatic driving vehicles (Autonomous vehicles), etc., but is not limited thereto.

在另一个应用场景中，所述终端还可以是服务器，例如，服务器与车辆通信连接，所述车辆可以配置有雷达，服务器可以从车辆处接收雷达采集的点云数据，并执行本发明实施例提供的多任务检测方法，以得到检测结果。In another application scenario, the terminal can also be a server, for example, the server is connected to a vehicle in communication, the vehicle can be equipped with a radar, and the server can receive point cloud data collected by the radar from the vehicle and execute the embodiment of the present invention A multi-task detection method is provided to obtain detection results.

需要说明的是，本发明实施例的方案中，所述雷达可以是激光雷达，也可以是毫米波雷达。It should be noted that, in the solution of the embodiment of the present invention, the radar may be a laser radar or a millimeter wave radar.

还需要说明的是，本发明实施例中多任务检测是指同时处理多个检测任务。具体而言，对同一个输入数据(例如，同一帧点云数据)进行处理，以得到多个检测任务的检测结果，而并非是在得到其中一个检测任务的检测结果之后再进行下一个检测任务，也并不是对不同的点云数据分别执行不同的检测任务。It should also be noted that multi-task detection in this embodiment of the present invention refers to processing multiple detection tasks at the same time. Specifically, the same input data (for example, the same frame of point cloud data) is processed to obtain the detection results of multiple detection tasks, instead of performing the next detection task after obtaining the detection results of one detection task , and it is not to perform different detection tasks on different point cloud data.

还需要说明的是，本发明实施例提供的方法能够应用于多个技术领域，例如，自动驾驶、增强现实、虚拟现实和智能机器人等技术领域，本发明实施例仅以自动驾驶领域为例进行描述，并不构成对本发明实施例提供方法的应用场景的限制。It should also be noted that the method provided by the embodiment of the present invention can be applied to multiple technical fields, such as automatic driving, augmented reality, virtual reality and intelligent robot, and the embodiment of the present invention only takes the field of automatic driving as an example. The description does not constitute a limitation on the application scenarios of the methods provided in the embodiments of the present invention.

图2示出的多任务检测方法可以包括以下步骤：The multitasking detection method shown in Figure 2 may comprise the following steps:

步骤S11：获取输入数据，并将所述输入数据输入至训练得到的多任务检测模型；Step S11: Acquiring input data, and inputting the input data into the trained multi-task detection model;

步骤S12：采用所述特征提取网络对所述输入数据进行特征提取，以得到通用特征向量；Step S12: Using the feature extraction network to perform feature extraction on the input data to obtain a general feature vector;

步骤S13：采用第k个调整网络对所述通用特征向量进行特征提取和/或维度变换，以得到第k个任务对应的特征向量，记为第k特征向量；Step S13: Use the kth adjustment network to perform feature extraction and/or dimension transformation on the general feature vector to obtain the feature vector corresponding to the kth task, which is denoted as the kth feature vector;

步骤S14：采用第k个预测网络对所述第k特征向量进行计算，以得到第k个任务的检测结果。Step S14: Using the kth prediction network to calculate the kth feature vector to obtain the detection result of the kth task.

可以理解的是，在具体实施中，所述方法可以采用软件程序的方式实现，该软件程序运行于芯片或芯片模组内部集成的处理器中；或者，该方法可以采用硬件或者软硬结合的方式来实现。It can be understood that, in a specific implementation, the method can be implemented in the form of a software program, and the software program runs in a processor integrated inside the chip or chip module; or, the method can be implemented by using hardware or a combination of hardware and software way to achieve.

参照图3，图3是本发明实施例中一种多任务检测模型的结构示意图。如图3所示，多任务检测模型可以包括：单个特征提取网络31、多个调整网络32和多个预测网络33。Referring to FIG. 3 , FIG. 3 is a schematic structural diagram of a multi-task detection model in an embodiment of the present invention. As shown in FIG. 3 , the multi-task detection model may include: a single feature extraction network 31 , multiple adjustment networks 32 and multiple prediction networks 33 .

下面结合图2和图3对本发明实施例提供的多任务检测方法进行非限制性的说明。The multi-task detection method provided by the embodiment of the present invention will be described in a non-limiting manner below with reference to FIG. 2 and FIG. 3 .

在步骤S21的具体实施中，可以获取当前时刻待处理的输入数据，所述输入数据可以是雷达采集到的点云数据。在车辆的行驶过程中，雷达可以对行驶环境进行感知，以得到点云数据。也即，输入数据可以是点云数据。In the specific implementation of step S21, the input data to be processed at the current moment may be obtained, and the input data may be point cloud data collected by the radar. During the driving process of the vehicle, the radar can sense the driving environment to obtain point cloud data. That is, the input data may be point cloud data.

在其他实施例中，输入数据也可以是摄像头采集到的图像，本发明实施例对此并不进行限制。In other embodiments, the input data may also be images collected by a camera, which is not limited in this embodiment of the present invention.

在输入数据为点云数据的情况下，在将点云数据输入至多任务检测模型之前，需要先对点云数据进行预处理，以得到处理后的点云数据，并将处理后的点云数据输入至多任务检测模型。When the input data is point cloud data, before inputting the point cloud data into the multi-task detection model, the point cloud data needs to be preprocessed to obtain the processed point cloud data, and the processed point cloud data Input to the multi-task detection model.

具体而言，点云数据为N×C维数据，其中，N为点云数据中点的数量，C为点云数据的维度。对点云数据进行预处理可以包括：将三维空间进行体素分块，以得到多个体素格，然后将点云数据中的点投影至体素格中，以得到体素特征图；将点云数据沿鸟瞰方向进行投影，以得到鸟瞰图(Bird-Eye View，BEV)；将点云数据沿深度方向进行投影，以得到深度图(Range Image)。由此，处理后的点云数据可以包括：体素特征图、鸟瞰图和深度图。Specifically, the point cloud data is N×C dimensional data, where N is the number of points in the point cloud data, and C is the dimension of the point cloud data. Preprocessing the point cloud data may include: dividing the three-dimensional space into voxel blocks to obtain multiple voxel grids, and then projecting the points in the point cloud data into the voxel grids to obtain a voxel feature map; The cloud data is projected along the bird's-eye view direction to obtain a bird's-eye view (Bird-Eye View, BEV); the point cloud data is projected along the depth direction to obtain a depth map (Range Image). Thus, the processed point cloud data can include: voxel feature map, bird's eye view map and depth map.

相应的，还可以一并保存点云数据和处理后的点云数据之间的投影关系，以便后续处理根据所述投影关系进行反变换。其中，所述投影关系可以包括点云数据和体素特征图之间的投影关系、点云数据和鸟瞰图之间的投影关系以及点云数据和深度图之间的投影关系。更具体地，在语义分割任务对应的调整网络中，可以根据所述投影关系进行反变换，以便计算每个点的预测结果。Correspondingly, the projection relationship between the point cloud data and the processed point cloud data can also be saved together, so that subsequent processing can perform inverse transformation according to the projection relationship. Wherein, the projection relationship may include a projection relationship between point cloud data and a voxel feature map, a projection relationship between point cloud data and a bird's-eye view image, and a projection relationship between point cloud data and a depth map. More specifically, in the adjustment network corresponding to the semantic segmentation task, the inverse transformation can be performed according to the projection relationship, so as to calculate the prediction result of each point.

进一步地，可以将输入数据输入至预先训练得到的多任务检测模型中。也即，可以将输入数据输入至图3示出的多任务检测模型中。Further, the input data can be input into the pre-trained multi-task detection model. That is, the input data can be input into the multi-task detection model shown in FIG. 3 .

所述多任务检测模型可以包括：单个特征提取网络31，特征提取网络31用于对输入数据进行特征提取。The multi-task detection model may include: a single feature extraction network 31, and the feature extraction network 31 is used to extract features from input data.

多任务检测模型还可以包括：多个调整网络32和多个预测网络33。其中，调整网络32和预测网络33的数量是相同的，调整网络32和预测网络33一一对应。更具体地，每个调整网络32的输入端和特征提取网络31的输出端连接，输出端和其对应的预测网络33的输入端连接。The multi-task detection model may also include: multiple adjustment networks 32 and multiple prediction networks 33 . Wherein, the number of the adjustment network 32 and the prediction network 33 are the same, and the adjustment network 32 and the prediction network 33 correspond one to one. More specifically, the input end of each adjustment network 32 is connected to the output end of the feature extraction network 31 , and the output end is connected to the input end of its corresponding prediction network 33 .

其中，调整网络32和预测网络33的数量M可以是根据实际的应用需求确定。更具体地，调整网络32和预测网络33的数量是根据检测任务的数量确定的，预测网络33和检测任务一一对应，也即，调整网络32也和检测任务一一对应。Wherein, the number M of the adjustment network 32 and the prediction network 33 may be determined according to actual application requirements. More specifically, the numbers of the adjustment network 32 and the prediction network 33 are determined according to the number of detection tasks, and the prediction network 33 is in one-to-one correspondence with the detection tasks, that is, the adjustment network 32 is also in one-to-one correspondence with the detection tasks.

其中，调整网络32的结构和预测网络33的结构是根据对应的检测任务确定的。Wherein, the structure of the adjustment network 32 and the structure of the prediction network 33 are determined according to the corresponding detection tasks.

在步骤S22的具体实施中，可以采用多任务检测模型中的特征提取网络对输入数据进行特征提取，以得到通用特征向量。In the specific implementation of step S22, the feature extraction network in the multi-task detection model can be used to perform feature extraction on the input data to obtain a general feature vector.

在具体实施中，特征提取网络31的结构可以是输入数据的类型来确定。在一个具体的例子中，输入数据为点云数据的情况下，特征提取网络包括卷积层，且卷积层执行稀疏卷积(Sparse Convolution)计算。进一步地，特征提取网络输出的通用特征向量可以是四维的特征数据，其中，通用特征向量的通道方向可以用于指示点云数据中的点的属性信息。In a specific implementation, the structure of the feature extraction network 31 may be determined by the type of input data. In a specific example, when the input data is point cloud data, the feature extraction network includes a convolutional layer, and the convolutional layer performs sparse convolution (Sparse Convolution) calculation. Further, the general feature vector output by the feature extraction network may be four-dimensional feature data, wherein the channel direction of the general feature vector may be used to indicate the attribute information of the points in the point cloud data.

在另一个具体的例子中，输入数据为图像的情况下，特征提取网络31的卷积层可以执行全卷积(Fully Convolution)计算。In another specific example, when the input data is an image, the convolutional layer of the feature extraction network 31 can perform a fully convolution (Fully Convolution) calculation.

更具体地，相较于图1示出的多个特征提取网络，图3中示出的特征提取网络可以是图1中多个特征提取网络的结构相同的部分。More specifically, compared with the multiple feature extraction networks shown in FIG. 1 , the feature extraction network shown in FIG. 3 may be a structurally identical part of the multiple feature extraction networks in FIG. 1 .

在步骤S23的具体实施中，可以将特征提取网络输出的通用特征向量传输至多个调整网络中，并采用每个调整网络执行对应的计算，以得到该调整网络对应的检测任务所需的特征向量。In the specific implementation of step S23, the general feature vector output by the feature extraction network can be transmitted to multiple adjustment networks, and each adjustment network is used to perform corresponding calculations to obtain the feature vector required for the detection task corresponding to the adjustment network .

更具体地，可以采用第k个调整网络对通用特征向量进行特征提取和/或维度变化，以得到第k个任务对应的特征向量，记为第k特征向量。其中，1≤k≤M，k、M为正整数，M为检测任务的数量。More specifically, the kth adjustment network can be used to perform feature extraction and/or dimension change on the general feature vector to obtain the feature vector corresponding to the kth task, which is denoted as the kth feature vector. Among them, 1≤k≤M, k and M are positive integers, and M is the number of detection tasks.

一方面，可以采用第k个调整网络对通用特征向量进行特征提取，以得到第k特征向量。需要说明的是，与特征提取网络进行特征提取相比，第k个调整网络是对通用特征向量进行进一步的特征提取，以得到更深层次的特征向量。On the one hand, the kth adjustment network can be used to perform feature extraction on the general feature vector to obtain the kth feature vector. It should be noted that, compared with feature extraction performed by the feature extraction network, the k-th adjustment network performs further feature extraction on the general feature vectors to obtain deeper feature vectors.

在具体实施中，多个检测任务可以包括：目标检测任务。相应的，目标检测任务对应的调整网络可以包括：特征金字塔网络(Feature Pyramid Network，FPN)和跨阶段局部网络(Cross Stage Partial Networks，CSPNet)。可以将通用特征向量输入至目标检测任务对应的调整网络，FPN和CSPNet可以对通用特征向量进行进一步地特征提取，以得到该调整网络输出的特征向量，该特征向量可以是二维的特征图。进一步地，目标检测任务对应的预测网络可以包括分类网络和回归网络，分类网络可以根据调整网络输出的特征向量得到目标中心点的位置，回归网络可以根据调整网络输出的特征向量得到目标的长宽高和朝向角度，从而得到每个目标的检测结果(x，y，z，dx，dy，dz，heading)。其中，x，y，z为目标中心点的三维位置，dx，dy和dz分别为目标外接三维矩形框的长宽高，heading为目标的旋转角度(也即上述的朝向角度)。In a specific implementation, the multiple detection tasks may include: a target detection task. Correspondingly, the adjustment network corresponding to the target detection task may include: Feature Pyramid Network (FPN) and Cross Stage Partial Networks (CSPNet). The general feature vector can be input to the adjustment network corresponding to the target detection task, and FPN and CSPNet can further extract features from the general feature vector to obtain the feature vector output by the adjustment network, which can be a two-dimensional feature map. Furthermore, the prediction network corresponding to the target detection task can include a classification network and a regression network. The classification network can obtain the position of the center point of the target according to the feature vector output by the adjustment network, and the regression network can obtain the length and width of the target according to the feature vector output by the adjustment network. Height and heading angle to get the detection result (x, y, z, dx, dy, dz, heading) of each target. Among them, x, y, and z are the three-dimensional positions of the center point of the target, dx, dy, and dz are the length, width, and height of the three-dimensional rectangular frame circumscribing the target, respectively, and heading is the rotation angle of the target (that is, the above-mentioned orientation angle).

另一方面，可以采用第k个调整网络对通用特征向量进行维度变化，以得到第k特征向量。On the other hand, the kth adjustment network can be used to change the dimension of the general feature vector to obtain the kth feature vector.

在一个具体的例子中，多个检测任务也可以包括：语义分割任务。相应的，语义分割任务对应的调整网络可以包括特征投影网络，特征投影网络可以用于将特征提取网络输出的通用特征向量投影到点云数据中的每一个点。采用调整网络对通用特征向量进行维度变换可以是指采用特征投影网络将通用特征向量投影到点云数据中的每一个点。In a specific example, multiple detection tasks may also include: a semantic segmentation task. Correspondingly, the adjustment network corresponding to the semantic segmentation task can include a feature projection network, which can be used to project the general feature vector output by the feature extraction network to each point in the point cloud data. Using the adjustment network to perform dimension transformation on the general feature vector may refer to using a feature projection network to project the general feature vector to each point in the point cloud data.

在具体实施中，特征投影网络可以根据对点云数据进行预处理时采用的投影关系对通用特征向量进行反变换，以将通用特征向量投影到点云数据中的每一个点，从而得到第k特征向量。其中，第k特征向量包括N个点的特征向量。进一步地，可以采用语义分割任务对应的预测网络根据每个点的特征向量，计算得到该点的预测类别。进一步地，语义分割任务对应的预测网络可以为2层全连接网络，该预测网络可以根据调整网络输出的特征向量得到每个点的类别。所述类别可以是以下任意一项：水雾、灰尘、树枝和其他。In the specific implementation, the feature projection network can inversely transform the general feature vector according to the projection relationship used when preprocessing the point cloud data, so as to project the general feature vector to each point in the point cloud data, so as to obtain the kth Feature vector. Wherein, the kth eigenvector includes the eigenvectors of N points. Furthermore, the prediction network corresponding to the semantic segmentation task can be used to calculate the predicted category of the point according to the feature vector of each point. Furthermore, the prediction network corresponding to the semantic segmentation task can be a 2-layer fully connected network, and the prediction network can obtain the category of each point according to the feature vector output by the adjustment network. The category can be any of the following: mist, dust, twigs, and others.

在步骤S24的具体实施中，可以采用第k个预测网络对第k特征向量进行计算，以得到第k个任务的检测结果。其中，第k个预测网络的具体结构可以是根据对应的检测任务确定的，本实施例对此并不进行限制。In the specific implementation of step S24, the kth prediction network may be used to calculate the kth feature vector to obtain the detection result of the kth task. Wherein, the specific structure of the k-th prediction network may be determined according to the corresponding detection task, which is not limited in this embodiment.

由上，本发明实施例提供的方案中，可以仅对每帧的输入数据(例如，点云数据)进行单次的特征提取，然后基于提取到的通用特征向量得到多个检测任务的检测结果，从而完成多个检测任务。一方面，相较于图1示出的现有的多任务检测模型，本发明实施例的方案中多个检测任务可以共用特征提取网络，减少了模型容量和计算复杂度。另一方面，本发明实施例的方案中，针对每个检测任务采用相应的调整网络，以对通用特征向量进行域适应转换，从而提高每个预测网络的训练效果和检测效果。因此，本发明实施例的方案可以在确保检测效果的情况下，减少计算复杂度，提高检测效率。From the above, in the solution provided by the embodiment of the present invention, only a single feature extraction can be performed on the input data (for example, point cloud data) of each frame, and then the detection results of multiple detection tasks can be obtained based on the extracted general feature vector , so as to complete multiple detection tasks. On the one hand, compared with the existing multi-task detection model shown in FIG. 1 , multiple detection tasks in the solution of the embodiment of the present invention can share a feature extraction network, which reduces model capacity and computational complexity. On the other hand, in the solution of the embodiment of the present invention, a corresponding adjustment network is used for each detection task to perform domain-adaptive transformation on the general feature vector, so as to improve the training effect and detection effect of each prediction network. Therefore, the solutions in the embodiments of the present invention can reduce computational complexity and improve detection efficiency while ensuring detection effects.

在实际应用中，在无人驾驶过程中，车载激光雷达会实时获取点云数据，输入到图3示出的多任务检测模型，得到各个任务的检测结果。进一步地，可以对检测结果进行格式转换后输入给预测模块和规划模块等下游模块。In practical applications, in the process of unmanned driving, the vehicle-mounted lidar will acquire point cloud data in real time, input it into the multi-task detection model shown in Figure 3, and obtain the detection results of each task. Furthermore, the detection results can be format-converted and then input to downstream modules such as the prediction module and the planning module.

下面对图3示出的多任务检测模型的训练过程进行非限制性的说明。The training process of the multi-task detection model shown in FIG. 3 is described below in a non-limiting manner.

具体而言，可以采用训练数据对预测模型进行训练，其中，预设模型为训练之前的模型，预测模型可以包括：单个初始特征提取网络、多个初始调整网络和多个初始预测网络。所述训练数据可以包括多个任务对应的样本数据。其中，初始特征提取网络为训练之前的特征提取网络，初始调整网络和调整网络一一对应，初始调整网络是训练之前的调整网络，初始预测网络和预测网络一一对应，初始预测网络是训练之前的预测网络。Specifically, training data can be used to train the prediction model, wherein the preset model is a model before training, and the prediction model can include: a single initial feature extraction network, multiple initial adjustment networks, and multiple initial prediction networks. The training data may include sample data corresponding to multiple tasks. Among them, the initial feature extraction network is the feature extraction network before training, the initial adjustment network corresponds to the adjustment network one by one, the initial adjustment network is the adjustment network before training, the initial prediction network corresponds to the prediction network, and the initial prediction network is before training prediction network.

参照图4，图4是本发明实施例中一种多任务检测模型的训练方法的流程示意图。下面结合图4对图3中示出的多任务检测模型的训练过程进行非限制性的说明。图4示出的训练方法可以包括以下步骤：Referring to FIG. 4 , FIG. 4 is a schematic flowchart of a training method for a multi-task detection model in an embodiment of the present invention. The training process of the multi-task detection model shown in FIG. 3 will be described in a non-limiting manner below with reference to FIG. 4 . The training method shown in Fig. 4 can comprise the following steps:

步骤S41：采用无标注样本数据对预设模型进行无监督训练，当满足第三预设条件时，得到用于进行有监督训练的预设模型；Step S41: Perform unsupervised training on the preset model using unlabeled sample data, and obtain a preset model for supervised training when the third preset condition is met;

步骤S42：采用训练数据对预设模型进行有监督训练，当满足第一预设条件时，得到中间检测模型；Step S42: Using the training data to perform supervised training on the preset model, and when the first preset condition is met, an intermediate detection model is obtained;

步骤S43：采用第k个任务对应的样本数据对第k个中间调整网络和第k个中间预测网络进行有监督训练，当满足第二预设条件时，得到所述多任务检测网络。Step S43: Perform supervised training on the k-th intermediate adjustment network and the k-th intermediate prediction network by using the sample data corresponding to the k-th task, and obtain the multi-task detection network when the second preset condition is met.

图4示出的训练方法中，用于训练的训练数据可以包括：多个任务对应的样本数据，更具体的，每个任务对应的样本数据可以包括：无标注样本数据和有标注样本数据。更具体地，每个任务对应的有标注样本数据可以是对该任务对应的无标注样本数据进行标注得到的，也可以是对该任务对应的无标注样本数据以外的其他样本数据进行标注得到的，本实施例对此并不进行限制。In the training method shown in FIG. 4 , the training data used for training may include: sample data corresponding to multiple tasks, more specifically, the sample data corresponding to each task may include: unlabeled sample data and labeled sample data. More specifically, the labeled sample data corresponding to each task can be obtained by labeling the unlabeled sample data corresponding to the task, or can be obtained by labeling other sample data than the unlabeled sample data corresponding to the task , which is not limited in this embodiment.

在步骤S41的具体实施中，可以先确定匹配点和非匹配点。In the specific implementation of step S41, matching points and non-matching points may be determined first.

具体而言，无标注样本数据可以包括多帧第一点云数据，可以先对每帧第一点云数据进行数据增强处理，以得到该帧点云对应的第二点云数据。其中，数据增强处理可以包括以下一项或多项：加入随机噪声、随机翻转、随机噪声和随机的尺度变换(例如，随机缩放)。Specifically, the unlabeled sample data may include multiple frames of first point cloud data, and data enhancement processing may be performed on each frame of first point cloud data to obtain second point cloud data corresponding to the frame of point cloud. Wherein, the data enhancement processing may include one or more of the following: adding random noise, random flipping, random noise and random scale transformation (for example, random scaling).

由于第二点云数据是由第一点云数据生成的，第一点云数据中的点和第二点云数据中的点一一对应，因此，可以在每帧第一点云数据及其对应的第二点云数据中确定匹配点和非匹配点。其中，匹配点可以是指具有对应关系的点，非匹配点是指不具有对应关系的点。Since the second point cloud data is generated by the first point cloud data, the points in the first point cloud data correspond to the points in the second point cloud data, therefore, the first point cloud data and its Matching points and non-matching points are determined in the corresponding second point cloud data. Wherein, a matching point may refer to a point having a corresponding relationship, and a non-matching point refers to a point not having a corresponding relationship.

进一步地，可以对预设模型进行无监督训练。其中，无监督训练的损失函数可以为三元组损失函数，无监督训练的约束条件可以包括：匹配点之间的特征距离最小化和/或非匹配点之间的特征距离最大化。其中，特征距离可以是余弦距离、切比雪夫距离和曼哈顿距离等，本实施例对此并不进行限制。Furthermore, unsupervised training can be performed on the preset model. Wherein, the loss function of the unsupervised training may be a triplet loss function, and the constraint conditions of the unsupervised training may include: minimizing a feature distance between matching points and/or maximizing a feature distance between non-matching points. Wherein, the characteristic distance may be cosine distance, Chebyshev distance, Manhattan distance, etc., which is not limited in this embodiment.

进一步地，当满足第三预设条件时，可以停止无监督训练，并得到用于有监督训练的预设模型。其中，第三预设条件可以包括无监督训练的训练损失小于或等于预设值，本实施例对此并不进行限制。Further, when the third preset condition is met, unsupervised training can be stopped, and a preset model for supervised training can be obtained. Wherein, the third preset condition may include that the training loss of the unsupervised training is less than or equal to a preset value, which is not limited in this embodiment.

在步骤S42的具体实施中，可以采用训练数据对预设模型进行有监督训练。其中，训练数据可以包括多个任务对应的有标注样本数据。需要说明的是，步骤S42中的预设模型是指执行步骤S41之后得到的预设模型。换言之，有监督训练可以将无监督训练得到的预设模型作为训练的初始模型。In the specific implementation of step S42, the training data may be used to perform supervised training on the preset model. Wherein, the training data may include labeled sample data corresponding to multiple tasks. It should be noted that the preset model in step S42 refers to the preset model obtained after step S41 is executed. In other words, supervised training can use the preset model obtained from unsupervised training as the initial model for training.

进一步地，在训练过程中，每批训练数据可以包括多个任务对应的有标注样本数据，将所有任务的有标注样本数据输入至初始特征提取网络，然后将初始特征提取网络输出的每个任务对应的样本通用特征向量分别输入至对应的初始调整网络和初始预测网络中，以得到每个任务的样本预测结果。Further, in the training process, each batch of training data can include labeled sample data corresponding to multiple tasks, input the labeled sample data of all tasks to the initial feature extraction network, and then output each task from the initial feature extraction network The corresponding sample universal feature vectors are respectively input into the corresponding initial adjustment network and initial prediction network to obtain the sample prediction results of each task.

进一步地，对于每个检测任务，可以根据该任务对应的样本数据的标签和样本预测结果，计算该任务的损失。Further, for each detection task, the loss of the task can be calculated according to the label of the sample data corresponding to the task and the sample prediction result.

进一步地，可以根据多个任务的损失计算总损失，并根据总损失更新预设网络模型的参数。Furthermore, the total loss can be calculated according to the losses of multiple tasks, and the parameters of the preset network model can be updated according to the total loss.

在一个非限制性的例子中，可以采用下列公式计算所述总损失：In a non-limiting example, the total loss may be calculated using the following formula:

其中，L为所述总损失，σ_k为所述第k个初始调整网络的权重，σ_k为可学习的参数，L_k为所述第k个任务的损失，M为任务的总数。Wherein, L is the total loss, σ _k is the weight of the k-th initial adjustment network, σ _k is a learnable parameter, L _k is the loss of the k-th task, and M is the total number of tasks.

需要说明的是，σ_k为可学习的参数。具体而言，在进行第一次有监督训练(也即，步骤S42)的过程中，每次更新各个网络的参数时，也会一并更新各个调整网络的权重σ_k。其中，σ_k的初始值可以是预先设置的。需要说明的是，σ_k用于计算总损失，通过优化σ_k来优化总损失，从而优化特征提取网络的训练效果。It should be noted that σ _k is a learnable parameter. Specifically, during the first supervised training (ie, step S42 ), each time the parameters of each network are updated, the weights σ _k of each adjustment network are also updated together. Wherein, the initial value of σ _k may be preset. It should be noted that σ _k is used to calculate the total loss, and the total loss is optimized by optimizing σ _k , thereby optimizing the training effect of the feature extraction network.

进一步地，在步骤S42的训练过程中，可以采用随机梯度下降的方法(也即，SGD优化器)更新参数。Further, during the training process of step S42, the stochastic gradient descent method (that is, the SGD optimizer) may be used to update parameters.

进一步地，当满足第一预设条件时，得到中间检测模型，其中，所述中间检测模型包括所述特征提取网络、多个中间调整网络和多个中间预测网络。其中，第一预设条件包括：总损失小于或等于第一预设损失。Further, when the first preset condition is satisfied, an intermediate detection model is obtained, wherein the intermediate detection model includes the feature extraction network, multiple intermediate adjustment networks and multiple intermediate prediction networks. Wherein, the first preset condition includes: the total loss is less than or equal to the first preset loss.

在步骤S42中，对每个调整网络采用可学习的自适应权重，可以减少人为超参数设定，有利于网络自动优化参数梯度权重，从而提高特征提取网络训练效果。In step S42, adopting learnable adaptive weights for each adjustment network can reduce artificial hyperparameter settings, which is beneficial for the network to automatically optimize parameter gradient weights, thereby improving the training effect of the feature extraction network.

在步骤S43的具体实施中，采用第k个任务对应的样本数据对第k个中间调整网络和第k个中间预测网络进行有监督训练，当满足第二预设条件时，得到所述多任务检测网络。In the specific implementation of step S43, the sample data corresponding to the kth task is used to perform supervised training on the kth intermediate adjustment network and the kth intermediate prediction network. When the second preset condition is met, the multi-task Detect network.

需要说明的是，执行完成步骤S42时(也即，满足第一预设条件时)，可以得到特征提取网络，也即，特征提取网络的训练完成。在步骤S43中，仅对调整网络和预测网络进行训练，并不对特征提取网络进行训练。也即，特征提取网络的参数在满足第一预设条件时已经被固定，在步骤S43中仅对调整网络和预测网络的参数进行更新。It should be noted that when step S42 is completed (that is, when the first preset condition is met), the feature extraction network can be obtained, that is, the training of the feature extraction network is completed. In step S43, only the adjustment network and the prediction network are trained, and the feature extraction network is not trained. That is, the parameters of the feature extraction network are fixed when the first preset condition is satisfied, and only the parameters of the adjustment network and the prediction network are updated in step S43.

进一步地，由于步骤S43中分别采用各个任务对应的样本数据对该任务对应的调整网络和预测网络进行训练，因此，在步骤S43中也无需计算总损失，而是分别计算每个任务的损失，并根据每个任务的损失更新该任务对应的中间调整网络和中间预测网络的参数。Further, since in step S43, the sample data corresponding to each task is used to train the adjustment network and prediction network corresponding to the task, therefore, in step S43, there is no need to calculate the total loss, but the loss of each task is calculated separately, And update the parameters of the intermediate adjustment network and intermediate prediction network corresponding to the task according to the loss of each task.

在具体实施中，步骤S42采用的学习率为第一学习率，步骤S43采用的学习率为第二学习率，第一学习率大于第二学习率。In a specific implementation, the learning rate adopted in step S42 is the first learning rate, the learning rate adopted in step S43 is the second learning rate, and the first learning rate is greater than the second learning rate.

进一步地，当满足第二预设条件时，即可得到上述的多任务检测模型。其中，第二预设条件包括：每个任务的损失小于或等于第二预设损失。Further, when the second preset condition is satisfied, the above-mentioned multi-task detection model can be obtained. Wherein, the second preset condition includes: the loss of each task is less than or equal to the second preset loss.

在一个具体的例子中，在每个训练周期后可以采用验证集数据进行验证，观察验证效果和损失变化，如果验证效果不再提升且loss上升，即时停止训练训练。In a specific example, the verification set data can be used for verification after each training cycle, and the verification effect and loss changes can be observed. If the verification effect does not improve and the loss increases, the training is stopped immediately.

由上，本发明实施例的方案中，采用无监督和有监督训练结合的方法，可以减少对样本数据的需求，有利于提高网络的准确度和泛化能力。From the above, in the solution of the embodiment of the present invention, the method of combining unsupervised and supervised training can reduce the demand for sample data and help improve the accuracy and generalization ability of the network.

在训练数据为点云数据的情况下，容易出现难以标注的问题。具体而言，现有的标注通常是人为的标注，在自动驾驶场景下，点云数据中的点通常包括水雾、灰尘对应的点，人为标注时通常难以判断是否为水雾或灰尘对应的点。为此，采用上述将无监督和有监督训练结合的方法，也可以有效应对有标注样本缺乏的问题。In the case where the training data is point cloud data, it is prone to problems that are difficult to label. Specifically, the existing annotations are usually artificial. In the autonomous driving scenario, the points in the point cloud data usually include points corresponding to water mist and dust. It is usually difficult to judge whether they correspond to water mist or dust point. For this reason, the above-mentioned method of combining unsupervised and supervised training can also effectively deal with the problem of lack of labeled samples.

需要说明的是，在其他实施例中，也可以仅采用有监督的训练方法。也即，可以通过执行步骤S42和步骤S43得到多任务检测模型。It should be noted that, in other embodiments, only a supervised training method may also be used. That is, the multi-task detection model can be obtained by executing steps S42 and S43.

如上文所述，现有的任务检测模型之间是互相独立的，训练过程也是互相独立的。然而，在自动驾驶的感知场景中，不同的检测任务之间具有关联性。例如，车道往往是可行驶区域的边界，可行驶区域通常紧密围绕着交通目标。本发明实施例中构建图3所示的多任务检测模型，并进行联合训练。这样的训练过程可以通过多个任务之间的共享信息来学习更好的表示，训练得到的特征提取网络能够提取更多的信息，从而提升每一个检测任务的性能。As mentioned above, the existing task detection models are independent of each other, and the training process is also independent of each other. However, in the perception scenario of autonomous driving, there are correlations between different detection tasks. For example, lanes are often the boundaries of drivable areas, which usually closely surround traffic objects. In the embodiment of the present invention, the multi-task detection model shown in FIG. 3 is constructed and jointly trained. Such a training process can learn better representations by sharing information between multiple tasks, and the trained feature extraction network can extract more information, thereby improving the performance of each detection task.

在实际应用中，可以采用pytorch进行训练，训练完成后可以通过tensorrt库对训练得到的多任务检测模型进行量化加速，用C++语言进行模型实现和线上部署。In practical applications, pytorch can be used for training. After the training is completed, the trained multi-task detection model can be quantified and accelerated through the tensorrt library, and the model can be implemented and deployed online in C++ language.

参照图5，图5是本发明实施例中一种多任务检测装置的结构示意图。图5示出的装置可以包括：Referring to FIG. 5 , FIG. 5 is a schematic structural diagram of a multi-task detection device in an embodiment of the present invention. The device shown in Figure 5 may include:

获取模块51，用于获取输入数据，并将所述输入数据输入至训练得到的多任务检测模型，所述多任务检测模型包括：单个特征提取网络、多个调整网络和多个预测网络，其中，所述预测网络和所述任务一一对应，且所述调整网络和所述预测网络一一对应；The obtaining module 51 is used to obtain input data, and input the input data into the trained multi-task detection model, the multi-task detection model includes: a single feature extraction network, multiple adjustment networks and multiple prediction networks, wherein , the prediction network is in one-to-one correspondence with the tasks, and the adjustment network is in one-to-one correspondence with the prediction network;

特征提取模块52，用于采用所述特征提取网络对所述输入数据进行特征提取，以得到通用特征向量；A feature extraction module 52, configured to extract features from the input data using the feature extraction network to obtain a general feature vector;

调整模块53，用于采用第k个调整网络对所述通用特征向量进行特征提取和/或维度变换，以得到第k个任务对应的特征向量，记为第k特征向量，其中，1≤k≤M，k、M为正整数，M为所述任务的数量；The adjustment module 53 is configured to use the kth adjustment network to perform feature extraction and/or dimension transformation on the general feature vector to obtain the feature vector corresponding to the kth task, which is denoted as the kth feature vector, where 1≤k ≤M, k and M are positive integers, and M is the number of tasks;

检测模块54，用于采用第k个预测网络对所述第k特征向量进行计算，以得到第k个任务的检测结果。The detection module 54 is configured to use the k-th prediction network to calculate the k-th feature vector, so as to obtain the detection result of the k-th task.

在具体实施中，上述的多任务检测装置可以对应于终端内具有数据处理功能的芯片；或者对应于终端中具有数据处理功能的芯片模组，或者对应于终端。In a specific implementation, the above-mentioned multitasking detection device may correspond to a chip with data processing function in the terminal; or correspond to a chip module with data processing function in the terminal, or correspond to the terminal.

关于图5示出的多任务检测装置的工作原理、工作方式和有益效果等更多内容，可以参照上文关于图1至图4的相关描述，在此不再赘述。For more details about the working principle, working mode and beneficial effects of the multi-tasking detection device shown in FIG. 5 , reference may be made to the relevant descriptions in FIG. 1 to FIG. 4 above, which will not be repeated here.

本发明实施例还提供一种存储介质，其上存储有计算机程序，所述计算机程序被处理器运行时，执行上述的多任务检测方法的步骤。所述存储介质可以包括ROM、RAM、磁盘或光盘等。所述存储介质还可以包括非挥发性存储器(non-volatile)或者非瞬态(non-transitory)存储器等。An embodiment of the present invention also provides a storage medium on which a computer program is stored, and when the computer program is run by a processor, the steps of the above-mentioned multitasking detection method are executed. The storage medium may include ROM, RAM, magnetic or optical disks, and the like. The storage medium may also include a non-volatile memory (non-volatile) or a non-transitory (non-transitory) memory, and the like.

本发明实施例还提供一种终端，包括存储器和处理器，所述存储器上存储有可在所述处理器上运行的计算机程序，所述处理器运行所述计算机程序时执行上述的多任务检测方法的步骤。所述终端可以是车载终端。An embodiment of the present invention also provides a terminal, including a memory and a processor, the memory stores a computer program that can run on the processor, and the processor executes the above-mentioned multitasking detection when running the computer program method steps. The terminal may be a vehicle-mounted terminal.

本发明实施例还提供一种车辆，该车辆可以包括上述的终端，所述终端可以执行上述的多任务检测方法。An embodiment of the present invention also provides a vehicle, which may include the above-mentioned terminal, and the terminal may execute the above-mentioned multi-task detection method.

应理解，本申请实施例中，所述处理器可以为中央处理单元(central processingunit，简称CPU)，该处理器还可以是其他通用处理器、数字信号处理器(digital signalprocessor，简称DSP)、专用集成电路(application specific integrated circuit，简称ASIC)、现成可编程门阵列(field programmable gate array，简称FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。It should be understood that in the embodiment of the present application, the processor may be a central processing unit (CPU for short), and the processor may also be other general-purpose processors, digital signal processors (Digital Signal Processor, DSP for short), dedicated Integrated circuit (application specific integrated circuit, referred to as ASIC), off-the-shelf programmable gate array (field programmable gate array, referred to as FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. A general-purpose processor may be a microprocessor, or the processor may be any conventional processor, and the like.

还应理解，本申请实施例中的存储器可以是易失性存储器或非易失性存储器，或可包括易失性和非易失性存储器两者。其中，非易失性存储器可以是只读存储器(read-only memory，简称ROM)、可编程只读存储器(programmable ROM，简称PROM)、可擦除可编程只读存储器(erasable PROM，简称EPROM)、电可擦除可编程只读存储器(electricallyEPROM，简称EEPROM)或闪存。易失性存储器可以是随机存取存储器(random accessmemory，简称RAM)，其用作外部高速缓存。通过示例性但不是限制性说明，许多形式的随机存取存储器(random access memory，简称RAM)可用，例如静态随机存取存储器(staticRAM，简称SRAM)、动态随机存取存储器(DRAM)、同步动态随机存取存储器(synchronousDRAM，简称SDRAM)、双倍数据速率同步动态随机存取存储器(double data rate SDRAM，简称DDR SDRAM)、增强型同步动态随机存取存储器(enhanced SDRAM，简称ESDRAM)、同步连接动态随机存取存储器(synchlink DRAM，简称SLDRAM)和直接内存总线随机存取存储器(direct rambus RAM，简称DR RAM)。It should also be understood that the memory in the embodiments of the present application may be a volatile memory or a nonvolatile memory, or may include both volatile and nonvolatile memories. Wherein, the non-volatile memory can be read-only memory (read-only memory, referred to as ROM), programmable read-only memory (programmable ROM, referred to as PROM), erasable programmable read-only memory (erasable PROM, referred to as EPROM) , Electrically Erasable Programmable Read-Only Memory (electrically EPROM, referred to as EEPROM) or flash memory. The volatile memory may be random access memory (RAM for short), which is used as an external cache. By way of illustration and not limitation, many forms of random access memory (RAM) are available such as static random access memory (static RAM (SRAM), dynamic random access memory (DRAM), synchronous dynamic Random access memory (synchronousDRAM, referred to as SDRAM), double data rate synchronous dynamic random access memory (double data rate SDRAM, referred to as DDR SDRAM), enhanced synchronous dynamic random access memory (enhanced SDRAM, referred to as ESDRAM), synchronous connection Dynamic random access memory (synchlink DRAM, SLDRAM for short) and direct memory bus random access memory (direct rambus RAM, DR RAM for short).

上述实施例，可以全部或部分地通过软件、硬件、固件或其他任意组合来实现。当使用软件实现时，上述实施例可以全部或部分地以计算机程序产品的形式实现。所述计算机程序产品包括一个或多个计算机指令或计算机程序。在计算机上加载或执行所述计算机指令或计算机程序时，全部或部分地产生按照本申请实施例所述的流程或功能。所述计算机可以为通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机程序可以存储在计算机可读存储介质中，或者从一个计算机可读存储介质向另一个计算机可读存储介质传输，例如，所述计算机程序可以从一个网站站点、计算机、服务器或数据中心通过有线或无线方式向另一个网站站点、计算机、服务器或数据中心进行传输。The above-mentioned embodiments may be implemented in whole or in part by software, hardware, firmware or other arbitrary combinations. When implemented using software, the above-described embodiments may be implemented in whole or in part in the form of computer program products. The computer program product comprises one or more computer instructions or computer programs. When the computer instruction or computer program is loaded or executed on the computer, the processes or functions according to the embodiments of the present application will be generated in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable devices. The computer program can be stored in or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer program can be downloaded from a website, computer, server or data center Wired or wireless transmission to another website site, computer, server or data center.

在本申请所提供的几个实施例中，应该理解到，所揭露的方法、装置和系统，可以通过其它的方式实现。例如，以上所描述的装置实施例仅仅是示意性的；例如，所述单元的划分，仅仅为一种逻辑功能划分，实际实现时可以有另外的划分方式；例如多个单元或组件可以结合或者可以集成到另一个系统，或一些特征可以忽略，或不执行。所述作为分离部件说明的单元可以是或者也可以不是物理上分开的，作为单元显示的部件可以是或者也可以不是物理单元，即可以位于一个地方，或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。In the several embodiments provided in this application, it should be understood that the disclosed methods, devices and systems can be implemented in other ways. For example, the device embodiments described above are only illustrative; for example, the division of the units is only a logical function division, and there may be other division methods in actual implementation; for example, multiple units or components can be combined or May be integrated into another system, or some features may be ignored, or not implemented. The units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or may be distributed to multiple network units. Part or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.

另外，在本发明各个实施例中的各功能单元可以集成在一个处理单元中，也可以是各个单元单独物理包括，也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现，也可以采用硬件加软件功能单元的形式实现。例如，对于应用于或集成于芯片的各个装置、产品，其包含的各个模块/单元可以都采用电路等硬件的方式实现，或者，至少部分模块/单元可以采用软件程序的方式实现，该软件程序运行于芯片内部集成的处理器，剩余的(如果有)部分模块/单元可以采用电路等硬件方式实现；对于应用于或集成于芯片模组的各个装置、产品，其包含的各个模块/单元可以都采用电路等硬件的方式实现，不同的模块/单元可以位于芯片模组的同一组件(例如芯片、电路模块等)或者不同组件中，或者，至少部分模块/单元可以采用软件程序的方式实现，该软件程序运行于芯片模组内部集成的处理器，剩余的(如果有)部分模块/单元可以采用电路等硬件方式实现；对于应用于或集成于终端的各个装置、产品，其包含的各个模块/单元可以都采用电路等硬件的方式实现，不同的模块/单元可以位于终端内同一组件(例如，芯片、电路模块等)或者不同组件中，或者，至少部分模块/单元可以采用软件程序的方式实现，该软件程序运行于终端内部集成的处理器，剩余的(如果有)部分模块/单元可以采用电路等硬件方式实现。In addition, each functional unit in each embodiment of the present invention may be integrated into one processing unit, each unit may be physically included separately, or two or more units may be integrated into one unit. The above-mentioned integrated units can be implemented in the form of hardware, or in the form of hardware plus software functional units. For example, for each device or product applied to or integrated into a chip, each module/unit contained therein may be realized by hardware such as a circuit, or at least some modules/units may be realized by a software program, and the software program Running on the integrated processor inside the chip, the remaining (if any) modules/units can be realized by means of hardware such as circuits; They are all realized by means of hardware such as circuits, and different modules/units can be located in the same component (such as chips, circuit modules, etc.) or different components of the chip module, or at least some modules/units can be realized by means of software programs, The software program runs on the processor integrated in the chip module, and the remaining (if any) modules/units can be realized by hardware such as circuits; /Units can be realized by means of hardware such as circuits, and different modules/units can be located in the same component (such as chips, circuit modules, etc.) or different components in the terminal, or at least some modules/units can be implemented in the form of software programs Realization, the software program runs on the processor integrated in the terminal, and the remaining (if any) modules/units can be implemented by means of hardware such as circuits.

应理解，本文中术语“和/或”，仅仅是一种描述关联对象的关联关系，表示可以存在三种关系，例如，A和/或B，可以表示：单独存在A，同时存在A和B，单独存在B这三种情况。另外，本文中字符“/”，表示前后关联对象是一种“或”的关系。It should be understood that the term "and/or" in this article is only an association relationship describing associated objects, which means that there may be three relationships, for example, A and/or B may mean: A exists alone, and A and B exist at the same time , there are three cases of B alone. In addition, the character "/" in this article indicates that the contextual objects are an "or" relationship.

本申请实施例中出现的“多个”是指两个或两个以上。"Multiple" appearing in the embodiments of the present application means two or more.

本申请实施例中出现的第一、第二等描述，仅作示意与区分描述对象之用，没有次序之分，也不表示本申请实施例中对设备个数的特别限定，不能构成对本申请实施例的任何限制。The first, second, etc. descriptions that appear in the embodiments of this application are only for illustration and to distinguish the description objects. Any limitations of the examples.

虽然本发明披露如上，但本发明并非限定于此。任何本领域技术人员，在不脱离本发明的精神和范围内，均可作各种更动与修改，因此本发明的保护范围应当以权利要求所限定的范围为准。Although the present invention is disclosed above, the present invention is not limited thereto. Any person skilled in the art can make various changes and modifications without departing from the spirit and scope of the present invention, so the protection scope of the present invention should be based on the scope defined in the claims.

Claims

1. A method of multi-tasking detection, the method comprising:

acquiring input data, and inputting the input data into a multi-task detection model obtained by training, wherein the multi-task detection model comprises: the task scheduling system comprises a single feature extraction network, a plurality of adjustment networks and a plurality of prediction networks, wherein the prediction networks correspond to the tasks one to one, and the adjustment networks correspond to the prediction networks one to one;

extracting the features of the input data by adopting the feature extraction network to obtain a universal feature vector;

performing feature extraction and/or dimension transformation on the general feature vector by adopting a kth adjusting network to obtain a feature vector corresponding to a kth task, and marking the feature vector as the kth feature vector, wherein k is more than or equal to 1 and is less than or equal to M, k and M are positive integers, and M is the number of the tasks;

and calculating the kth characteristic vector by adopting a kth prediction network to obtain a detection result of the kth task.

2. The multi-tasking detection method of claim 1, wherein the input data is radar-acquired point cloud data, the feature extraction network comprises convolutional layers, and the convolutional layers perform sparse convolution calculations.

3. The multitask detection method according to claim 2, wherein the multitask detection model is obtained by training a preset model in advance by using training data, and the preset model comprises: a single initial feature extraction network, a plurality of initial adjustment networks, and a plurality of initial prediction networks, the training data comprising: before the sample data corresponding to the kth task is obtained, the method further includes:

the method comprises the following steps: carrying out supervised training on the preset model by adopting the training data, and obtaining an intermediate detection model when a first preset condition is met, wherein the intermediate detection model comprises the feature extraction network, a plurality of intermediate adjustment networks and a plurality of intermediate prediction networks;

step two: carrying out supervised training on a kth intermediate adjusting network and a kth intermediate predicting network by adopting sample data corresponding to the kth task, and obtaining the multi-task detecting network when a second preset condition is met;

wherein the first preset condition comprises: the total loss is less than or equal to a first preset loss, and the second preset condition comprises that: and the loss of each task is less than or equal to a second preset loss, and the total loss is calculated according to the losses of the M tasks.

4. A method of multitasking detection according to claim 3, characterized in that the total loss is calculated using the following formula:

wherein L is the total loss, σ _k For the kth initial adjustment network weight, σ _k As a learnable parameter, L _k Is the loss of the k-th task.

5. The multitask detection method according to claim 3, wherein the learning rate used in the first step is a first learning rate, the learning rate used in the second step is a second learning rate, and the first learning rate is larger than the second learning rate.

6. The multitask detection method according to claim 3, characterized in that said sample data set further comprises: no labeled sample data, before the first step, the method further comprises:

and performing unsupervised training on the preset model by adopting the unlabeled sample data, and obtaining the preset model for performing the supervised training when a third preset condition is met.

7. The multitask detection method according to claim 6, wherein the unlabeled sample data comprises a plurality of frames of first point cloud data, and the unsupervised training constraints comprise: minimizing the feature distance between the matching points and/or maximizing the feature distance between the non-matching points, wherein before performing unsupervised training on the preset model by using the unlabeled sample data, the method further comprises:

performing data enhancement processing on each frame of first point cloud data to obtain second point cloud data corresponding to the frame of first point cloud data, wherein points in the first point cloud data correspond to points in the second point cloud data one to one;

determining matching points and non-matching points in each frame of first point cloud data and corresponding second point cloud data; the matching points refer to points with corresponding relations, and the non-matching points refer to points without corresponding relations.

8. A method and apparatus for multi-task detection, the apparatus comprising:

an obtaining module, configured to obtain input data and input the input data into a trained multi-task detection model, where the multi-task detection model includes: the system comprises a single feature extraction network, a plurality of adjustment networks and a plurality of prediction networks, wherein the prediction networks correspond to the tasks one to one, and the adjustment networks correspond to the prediction networks one to one;

the characteristic extraction module is used for extracting the characteristics of the input data by adopting the characteristic extraction network so as to obtain a universal characteristic vector;

the adjusting module is used for performing feature extraction and/or dimension transformation on the general feature vector by adopting a kth adjusting network to obtain a feature vector corresponding to a kth task, and the feature vector is marked as the kth feature vector, wherein k is more than or equal to 1 and less than or equal to M, k and M are positive integers, and M is the number of the tasks;

and the detection module is used for calculating the kth characteristic vector by adopting the kth prediction network so as to obtain a detection result of the kth task.

9. A storage medium having a computer program stored thereon, the computer program, when executed by a processor, performing the steps of the multitask detection method according to any one of claims 1 to 7.

10. A terminal comprising a memory and a processor, the memory having stored thereon a computer program operable on the processor, wherein the processor, when executing the computer program, performs the steps of the multitask detection method according to any one of claims 1-7.