CN115035357A

CN115035357A - Object detection model construction method, object detection method, device and computing device

Info

Publication number: CN115035357A
Application number: CN202110194080.9A
Authority: CN
Inventors: 杨陈弘毅; 黄泽昊; 王乃岩
Original assignee: Beijing Tusimple Technology Co Ltd
Current assignee: Beijing Tusimple Technology Co Ltd
Priority date: 2021-02-20
Filing date: 2021-02-20
Publication date: 2022-09-09

Abstract

The disclosure provides a construction method of a target detection model, a target detection method, a target detection device and computing equipment, which are used for solving the problem that a single-stage target detector in the existing scheme has poor detection performance on a small object. The construction method of the target detection model comprises the following steps: constructing a feature extraction network, wherein the feature extraction network is used for carrying out feature extraction on an input picture to obtain a multilayer feature map, and the multilayer feature map comprises a first feature map and a second feature map; constructing a target detection network, wherein the target detection network comprises a plurality of network layers corresponding to the multilayer characteristic diagram, and the plurality of network layers comprise a first network layer and a second network layer; the first network layer is used for carrying out query operation on the first feature map and transmitting the obtained query result to the second network layer, and the query result comprises a query point of a specific target in the first feature map; the second network layer is used for determining a mapping area of the query point in the second feature map, and performing detection operation in the mapping area to obtain a detection result.

Description

Object detection model construction method, object detection method, device and computing device

技术领域technical field

本公开涉及目标检测领域，尤其涉及一种目标检测模型的构建方法、目标检测方法、装置和计算设备。The present disclosure relates to the field of target detection, and in particular, to a method for constructing a target detection model, a target detection method, an apparatus and a computing device.

背景技术Background technique

近几年来，基于深度神经网络的目标检测器取得了巨大的成功。常见的目标检测而器可分为单阶段检测器和两阶段检测器。其中，单阶段检测器具有结构简单、推理速度快等优点。例如，基于锚点的RetinaNet目标检测器在图片中预先划定一些特定大小及形状的候选框，然后利用神经网络对这些候选框进行分类和回归修正，从而进行目标检测。然而，现有的单阶段目标检测器对于小物体的检测效果却并不尽如人意，为了提升小目标检测的性能，通常会采用高分辨率的输入图片和特征，但这将会带来巨大的计算量，严重减慢推理速度。因此，需要提供一种更高效快速的目标检测方法。Object detectors based on deep neural networks have achieved great success in recent years. Common object detectors can be divided into single-stage detectors and two-stage detectors. Among them, the single-stage detector has the advantages of simple structure and fast inference speed. For example, the anchor-based RetinaNet object detector pre-defines some candidate boxes of specific size and shape in the picture, and then uses the neural network to classify and regress these candidate boxes to perform object detection. However, the detection effect of the existing single-stage object detectors for small objects is not satisfactory. In order to improve the performance of small object detection, high-resolution input images and features are usually used, but this will bring huge the amount of computation, which seriously slows down the inference speed. Therefore, it is necessary to provide a more efficient and fast target detection method.

发明内容SUMMARY OF THE INVENTION

本公开的实施例提供一种目标检测模型的构建方法、目标检测方法、装置和计算设备，以提高单阶段目标检测器对于小物体的检测效果率。Embodiments of the present disclosure provide a method for constructing a target detection model, a method for target detection, an apparatus, and a computing device, so as to improve the detection effect rate of a single-stage target detector for small objects.

为达到上述目的，本公开的实施例采用如下技术方案：In order to achieve the above object, the embodiments of the present disclosure adopt the following technical solutions:

本公开实施例的第一方面，提供一种构建特征提取网络，特征提取网络用于对输入图片进行特征提取，得到具有不同尺寸的多层特征图，多层特征图包括第一特征图和第二特征图；构建目标检测网络，目标检测网络包括与多层特征图对应的多个网络层，多个网络层包括第一网络层和第二网络层；第一网络层与第一特征图对应，用于对第一特征图进行查询操作，并将所得到的查询结果传输给第二网络层，查询结果包括第一特征图中特定目标的查询点；第二网络层与所述第二特征图对应，用于确定查询点在第二特征图中的映射区域，并在映射区域内进行检测操作，得到检测结果。A first aspect of the embodiments of the present disclosure provides a feature extraction network for constructing a feature extraction network for performing feature extraction on an input image to obtain multi-layer feature maps with different sizes, where the multi-layer feature maps include a first feature map and a second feature map. Two feature maps; construct a target detection network, the target detection network includes multiple network layers corresponding to the multi-layer feature maps, and the multiple network layers include a first network layer and a second network layer; the first network layer corresponds to the first feature map , which is used to query the first feature map, and transmit the obtained query result to the second network layer, where the query result includes the query point of a specific target in the first feature map; the second network layer and the second feature map The corresponding map is used to determine the mapping area of the query point in the second feature map, and perform a detection operation in the mapping area to obtain a detection result.

本公开实施例的第二方面，提供一种目标检测方法，包括：将待检测图片输入到目标检测模型中，目标检测模型包括特征提取网络和目标检测网络；采用特征提取网络对待检测图片进行特征提取，得到具有不同尺寸的多层特征图，多层特征图包括第一特征图和第二特征图；以及采用目标检测网络输出对多层特征图的检测结果，目标检测网络包括与多层特征图对应的多个网络层，多个网络层包括第一网络层和第二网络层；其中，第一网络层与所述第一特征图对应，用于对第一特征图进行查询操作，并将所得到的查询结果传输给第二网络层，查询结果包括第一特征图中特定目标的查询点；第二网络层与第二特征图对应，用于确定查询点在第二特征图中的映射区域，并在映射区域内进行检测操作，得到检测结果。A second aspect of the embodiments of the present disclosure provides a target detection method, including: inputting a to-be-detected picture into a target detection model, where the target detection model includes a feature extraction network and a target detection network; using the feature extraction network to characterize the to-be-detected picture Extraction to obtain multi-layer feature maps with different sizes, the multi-layer feature maps include a first feature map and a second feature map; and a target detection network is used to output the detection results of the multi-layer feature maps, and the target detection network includes and multi-layer features. The multiple network layers corresponding to the map, the multiple network layers include a first network layer and a second network layer; wherein, the first network layer corresponds to the first feature map, and is used to perform a query operation on the first feature map, and The obtained query result is transmitted to the second network layer, and the query result includes the query point of the specific target in the first feature map; the second network layer corresponds to the second feature map and is used to determine the query point in the second feature map. Map the area, and perform the detection operation in the mapped area to obtain the detection result.

本公开实施例的第三方面，提供一种目标检测模型的构建装置，包括：第一构建单元，用于构建特征提取网络，特征提取网络用于对输入图片进行特征提取，得到具有不同尺寸的多层特征图，多层特征图包括第一特征图和第二特征图；第二构建单元，用于构建目标检测网络，目标检测网络包括与多层特征图对应的多个网络层，多个网络层包括第一网络层和第二网络层；第一网络层与第一特征图对应，用于对第一特征图进行查询操作，并将所得到的查询结果传输给第二网络层，查询结果包括第一特征图中特定目标的查询点；第二网络层与第二特征图对应，用于确定查询点在第二特征图中的映射区域，并在映射区域内进行检测操作，得到检测结果。In a third aspect of the embodiments of the present disclosure, there is provided an apparatus for constructing a target detection model, including: a first constructing unit for constructing a feature extraction network, and the feature extraction network is configured to perform feature extraction on an input picture to obtain images with different sizes. A multi-layer feature map, the multi-layer feature map includes a first feature map and a second feature map; the second construction unit is used to construct a target detection network, and the target detection network includes multiple network layers corresponding to the multi-layer feature maps. The network layer includes a first network layer and a second network layer; the first network layer corresponds to the first feature map, and is used to perform a query operation on the first feature map, and transmit the obtained query result to the second network layer. The result includes the query point of the specific target in the first feature map; the second network layer corresponds to the second feature map, and is used to determine the mapping area of the query point in the second feature map, and perform a detection operation in the mapped area to obtain the detection result.

本公开实施例的第四方面，提供一种目标检测装置，包括：输入单元，用于将待检测图片输入到目标检测模型中，目标检测模型包括特征提取网络和目标检测网络；特征提取单元，用于采用特征提取网络对待检测图片进行特征提取，得到具有不同尺寸的多层特征图，多层特征图包括第一特征图和第二特征图；以及目标检测单元，用于采用目标检测网络输出对多层特征图的检测结果，目标检测网络包括与多层特征图对应的多个网络层，多个网络层包括第一网络层和第二网络层；其中，第一网络层与所述第一特征图对应，用于对第一特征图进行查询操作，并将所得到的查询结果传输给第二网络层，查询结果包括第一特征图中特定目标的查询点；第二网络层与第二特征图对应，用于确定查询点在第二特征图中的映射区域，并在映射区域内进行检测操作，得到检测结果。A fourth aspect of the embodiments of the present disclosure provides a target detection device, comprising: an input unit for inputting a picture to be detected into a target detection model, where the target detection model includes a feature extraction network and a target detection network; a feature extraction unit, It is used for extracting features from the image to be detected by using a feature extraction network to obtain multi-layer feature maps with different sizes, and the multi-layer feature maps include a first feature map and a second feature map; and a target detection unit, used for using the target detection network output For the detection results of the multi-layer feature maps, the target detection network includes multiple network layers corresponding to the multi-layer feature maps, and the multiple network layers include a first network layer and a second network layer; wherein, the first network layer and the first network layer. Corresponding to a feature map, it is used to query the first feature map, and transmit the obtained query result to the second network layer, where the query result includes the query point of a specific target in the first feature map; The two feature maps correspond to, and are used to determine the mapping area of the query point in the second feature map, and perform a detection operation in the mapped area to obtain a detection result.

本公开实施例的第五方面，提供一种计算设备，包括：处理器、存储器、以及存储在存储器上并可在处理器上运行的计算机程序；其中，处理器在运行计算机程序时，执行如上所述的目标检测模型的构建方法和/或目标检测方法。In a fifth aspect of the embodiments of the present disclosure, a computing device is provided, including: a processor, a memory, and a computer program stored in the memory and executable on the processor; wherein, when the processor runs the computer program, the processor executes the above The method for constructing the target detection model and/or the method for target detection.

本公开实施例的第六方面，提供一种算机可读存储介质，其上存储有计算机程序，所述计算机程序被处理器运行时实现如上所述的目标检测模型的构建方法和/或目标检测方法。In a sixth aspect of the embodiments of the present disclosure, a computer-readable storage medium is provided on which a computer program is stored, and when the computer program is run by a processor, the above-described method and/or target for constructing a target detection model are implemented Detection method.

本公开实施例的第七方面，提供一种车辆，包括如上所述的计算设备。A seventh aspect of the embodiments of the present disclosure provides a vehicle, including the computing device as described above.

根据本公开的技术方案，提出了一种新颖的基于查询机制的单阶段目标检测器，实现了快速在高分辨率特征上检测小物体。基本思路为先在低分辨率的特征图上找出小物体的大致位置，即特定目标的查询点，以便根据该查询点在高分辨率的特征图中进行映射和检测检测。本公开了还可以在检测头中加入了查询网络，该查询网络用于判断某个位置是否存在尺寸小于该层尺寸阈值的物体。同时，本公开还可以配合稀疏卷积来提升推理速度，基于这些查询点利用高分辨率的特征构建出一个稀疏特征张量，并利用稀疏卷积计算结果。According to the technical solution of the present disclosure, a novel single-stage object detector based on a query mechanism is proposed, which can quickly detect small objects on high-resolution features. The basic idea is to first find the approximate position of the small object on the low-resolution feature map, that is, the query point of a specific target, so as to map and detect in the high-resolution feature map according to the query point. In the present disclosure, a query network may also be added to the detection head, and the query network is used to determine whether there is an object whose size is smaller than the size threshold of the layer at a certain position. At the same time, the present disclosure can also cooperate with sparse convolution to improve the inference speed, construct a sparse feature tensor with high-resolution features based on these query points, and use sparse convolution to calculate the result.

附图说明Description of drawings

为了更清楚地说明本公开实施例或现有技术中的技术方案，下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍，显而易见地，下面描述中的附图仅仅是本公开的一些实施例，对于本领域普通技术人员来讲，在不付出创造性劳动性的前提下，还可以根据这些附图获得其他的附图。In order to more clearly illustrate the embodiments of the present disclosure or the technical solutions in the prior art, the following briefly introduces the accompanying drawings that need to be used in the description of the embodiments or the prior art. Obviously, the drawings in the following description are only These are some embodiments of the present disclosure, and for those of ordinary skill in the art, other drawings can also be obtained from these drawings without any creative effort.

图1为本公开实施例提供的一种车辆100的结构图；FIG. 1 is a structural diagram of a vehicle 100 according to an embodiment of the present disclosure;

图2为本公开实施例提供的一种目标检测模型的构建方法200的流程图；FIG. 2 is a flowchart of a method 200 for constructing a target detection model according to an embodiment of the present disclosure;

图3为本公开实施例提供的一种目标检测模型的示意图；3 is a schematic diagram of a target detection model provided by an embodiment of the present disclosure;

图4为本公开实施例提供的另一种目标检测模型的示意图；4 is a schematic diagram of another target detection model provided by an embodiment of the present disclosure;

图5为本公开实施例提供的另一种目标检测方法500的流程图；FIG. 5 is a flowchart of another target detection method 500 provided by an embodiment of the present disclosure;

图6为本公开实施例提供的一种目标检测方法的检测效果示意图；6 is a schematic diagram of a detection effect of a target detection method provided by an embodiment of the present disclosure;

图7为本公开实施例提供的一种目标检测模型的构建装置700的结构图；FIG. 7 is a structural diagram of an apparatus 700 for constructing a target detection model according to an embodiment of the present disclosure;

图8为本公开实施例提供的另一种目标检测模型装置800的结构图；FIG. 8 is a structural diagram of another target detection model apparatus 800 provided by an embodiment of the present disclosure;

图9为本公开实施例提供的一种计算设备900的结构图。FIG. 9 is a structural diagram of a computing device 900 according to an embodiment of the present disclosure.

具体实施方式Detailed ways

下面将结合本公开实施例中的附图，对本公开实施例中的技术方案进行清楚、完整地描述，显然，所描述的实施例仅仅是本公开一部分实施例，而不是全部的实施例。基于本公开中的实施例，本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例，都属于本公开保护的范围。The technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present disclosure. Obviously, the described embodiments are only a part of the embodiments of the present disclosure, but not all of the embodiments. Based on the embodiments in the present disclosure, all other embodiments obtained by those of ordinary skill in the art without creative efforts shall fall within the protection scope of the present disclosure.

需要说明的是，本公开的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象，而不必用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换，以便这里描述的本公开的实施例。此外，术语“包括”和“具有”以及他们的任何变形，意图在于覆盖不排他的包含，例如，包含了一系列步骤或单元的过程、方法、系统、产品或设备不必限于清楚地列出的那些步骤或单元，而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它步骤或单元。It should be noted that the terms "first", "second" and the like in the description and claims of the present disclosure and the above drawings are used to distinguish similar objects, and are not necessarily used to describe a specific sequence or sequence. It is to be understood that the data so used are interchangeable under appropriate circumstances for the embodiments of the present disclosure described herein. Furthermore, the terms "comprising" and "having" and any variations thereof, are intended to cover non-exclusive inclusion, for example, a process, method, system, product or device comprising a series of steps or units is not necessarily limited to those expressly listed Rather, those steps or units may include other steps or units not expressly listed or inherent to these processes, methods, products or devices.

图1是可以在其中实现本文公开的各种技术的车辆100的示意图。车辆100可以是轿车、卡车、摩托车、公共汽车、船只、飞机、直升机、割草机、挖土机、摩托雪橇、航空器、旅游休闲车、游乐园车辆、农场装置、建筑装置、有轨电车、高尔夫车、火车、无轨电车，或其它车辆。车辆100可以完全地或部分地以自动驾驶模式进行运行。车辆100在自动驾驶模式下可以控制其自身，例如车辆100可以确定车辆的当前状态以及车辆所处环境的当前状态，确定在该环境中的至少一个其它车辆的预测行为，确定该至少一个其它车辆执行所预测行为的可能性所对应信任等级，并且基于所确定的信息来控制车辆100自身。在处于自动驾驶模式时，车辆100可以在无人交互的情况下运行。FIG. 1 is a schematic diagram of a vehicle 100 in which the various techniques disclosed herein may be implemented. Vehicle 100 may be a car, truck, motorcycle, bus, boat, airplane, helicopter, lawn mower, backhoe, snowmobile, aircraft, touring vehicle, amusement park vehicle, farm installation, construction installation, streetcar , golf cart, train, trolleybus, or other vehicle. The vehicle 100 may operate fully or partially in an autonomous driving mode. The vehicle 100 may control itself in an autonomous driving mode, for example, the vehicle 100 may determine the current state of the vehicle and the current state of the environment in which the vehicle is located, determine the predicted behavior of at least one other vehicle in the environment, determine the at least one other vehicle The confidence level corresponds to the likelihood of performing the predicted behavior, and the vehicle 100 itself is controlled based on the determined information. When in the autonomous driving mode, the vehicle 100 may operate without human interaction.

车辆100可以包括各种车辆系统，例如驱动系统142、传感器系统144、控制系统146、用户接口系统148、控制计算机系统150以及通信系统152。车辆100可以包括更多或更少的系统，每个系统可以包括多个单元。进一步地，车辆100的每个系统和单元之间可以是互联的。例如，控制计算机系统150能够与车辆系统142-148和152中的一个或多个进行数据通信。从而，车辆100的一个或多个所描述的功能可以被划分为附加的功能性部件或者实体部件，或者结合为数量更少的功能性部件或者实体部件。在更进一步的例子中，附加的功能性部件或者实体部件可以增加到如图1所示的示例中。Vehicle 100 may include various vehicle systems, such as drive system 142 , sensor system 144 , control system 146 , user interface system 148 , control computer system 150 , and communication system 152 . Vehicle 100 may include more or fewer systems, and each system may include multiple units. Further, each system and unit of the vehicle 100 may be interconnected. For example, the control computer system 150 can be in data communication with one or more of the vehicle systems 142 - 148 and 152 . Thus, one or more of the described functions of the vehicle 100 may be divided into additional functional or physical components, or combined into a smaller number of functional or physical components. In a further example, additional functional components or physical components may be added to the example shown in FIG. 1 .

驱动系统142可以包括为车辆100提供动能的多个可操作部件(或单元)。在一个实施例中，驱动系统142可以包括发动机或电动机、车轮、变速器、电子系统、以及动力(或动力源)。发动机或者电动机可以是如下装置的任意组合：内燃机、电机、蒸汽机、燃料电池发动机、丙烷发动机、或者其它形式的发动机或电动机。在一些实施例中，发动机可以将一种动力源转换为机械能。在一些实施例中，驱动系统142可以包括多种发动机或电动机。例如，油电混合车辆可以包括汽油发动机和电动机，也可以包括其它的情况。The drive system 142 may include a number of operable components (or units) that provide kinetic energy to the vehicle 100 . In one embodiment, the drive system 142 may include an engine or electric motor, wheels, a transmission, electronic systems, and power (or power source). The engine or electric motor may be any combination of an internal combustion engine, an electric motor, a steam engine, a fuel cell engine, a propane engine, or other forms of engine or electric motor. In some embodiments, the engine may convert a power source into mechanical energy. In some embodiments, the drive system 142 may include various motors or electric motors. For example, a gasoline-electric hybrid vehicle may include a gasoline engine and an electric motor, among other things.

车辆100的车轮可以是标准车轮。车辆100的车轮可以是多种形式的车轮，包括独轮、双轮、三轮、或者四轮形式，例如轿车或卡车上的四轮。其它数量的车轮也是可以的，例如六轮或者更多的车轮。车辆100的一个或多个车轮可被操作为与其他车轮的旋转方向不同。车轮可以是至少一个与变速器固定连接的车轮。车轮可以包括金属与橡胶的结合，或者是其他物质的结合。变速器可以包括可操作来将发动机的机械动力传送到车轮的单元。出于这个目的，变速器可以包括齿轮箱、离合器、差动齿轮和传动轴。变速器也可以包括其它单元。传动轴可以包括与车轮相匹配的一个或多个轮轴。电子系统可以包括用于传送或控制车辆100的电子信号的单元。这些电子信号可用于启动车辆100中的多个灯、多个伺服机构、多个电动机，以及其它电子驱动或者控制装置。动力源可以是全部或部分地为发动机或电动机提供动力的能源。也即，发动机或电动机能够将动力源转换为机械能。示例性地，动力源可以包括汽油、石油、石油类燃料、丙烷、其它压缩气体燃料、乙醇、燃料电池、太阳能板、电池以及其它电能源。动力源可以附加的或者可选地包括燃料箱、电池、电容、或者飞轮的任意组合。动力源也可以为车辆100的其它系统提供能量。The wheels of the vehicle 100 may be standard wheels. The wheels of the vehicle 100 may be various types of wheels, including single-wheel, double-wheel, three-wheel, or four-wheel forms, such as those on a car or truck. Other numbers of wheels are possible, such as six or more wheels. One or more wheels of the vehicle 100 may be manipulated to rotate in a different direction than the other wheels. The wheel may be at least one wheel that is fixedly connected to the transmission. Wheels can include a combination of metal and rubber, or a combination of other substances. The transmission may include a unit operable to transmit the mechanical power of the engine to the wheels. For this purpose, the transmission may include a gearbox, a clutch, a differential gear and a drive shaft. The transmission may also include other units. The driveshaft may include one or more axles that mate with the wheels. The electronic system may include a unit for transmitting or controlling electronic signals of the vehicle 100 . These electronic signals may be used to activate lights, servos, motors, and other electronic drive or control devices in the vehicle 100 . The power source may be an energy source that powers the engine or electric motor in whole or in part. That is, the engine or electric motor can convert the power source into mechanical energy. Illustratively, the power source may include gasoline, petroleum, petroleum-based fuels, propane, other compressed gas fuels, ethanol, fuel cells, solar panels, batteries, and other electrical energy sources. Power sources may additionally or alternatively include fuel tanks, batteries, capacitors, or any combination of flywheels. The power source may also power other systems of the vehicle 100 .

传感器系统144可以包括多个传感器，这些传感器用于感测车辆100的环境和条件的信息。例如，传感器系统144可以包括惯性测量单元(IMU)、全球定位系统(GPS)收发器、雷达(RADAR)单元、激光测距仪/LIDAR单元(或其它距离测量装置)、声学传感器、以及相机或图像捕捉装置。传感器系统144可以包括用于监控车辆100的多个感应器(例如，氧气(O2)监控器、油量表传感器、发动机油压传感器，等等)。还可以配置其它传感器。包括在传感器系统144中的一个或多个传感器可以被单独驱动或者被集体驱动，以更新一个或多个传感器的位置、方向，或者这二者。The sensor system 144 may include a plurality of sensors for sensing information about the environment and conditions of the vehicle 100 . For example, sensor system 144 may include an inertial measurement unit (IMU), a global positioning system (GPS) transceiver, a radar (RADAR) unit, a laser rangefinder/LIDAR unit (or other distance measurement device), acoustic sensors, and cameras or image capture device. The sensor system 144 may include a number of sensors for monitoring the vehicle 100 (eg, an oxygen (O 2 ) monitor, a fuel gauge sensor, an engine oil pressure sensor, etc.). Other sensors can also be configured. One or more sensors included in sensor system 144 may be driven individually or collectively to update the position, orientation, or both of the one or more sensors.

IMU可以包括传感器的结合(例如加速器和陀螺仪)，用于基于惯性加速来感应车辆100的位置变化和方向变化。GPS收发器可以是任何用于估计车辆100的地理位置的传感器。出于该目的，GPS收发器可以包括接收器/发送器以提供车辆100相对于地球的位置信息。需要说明的是，GPS是全球导航卫星系统的一个示例，因此，在一些实施例中，GPS收发器可以替换为北斗卫星导航系统收发器或者伽利略卫星导航系统收发器。雷达单元可以使用无线电信号来感应车辆100所在环境中的对象。在一些实施例中，除感应对象之外，雷达单元还可以用于感应接近车辆100的物体的速度和前进方向。激光测距仪或LIDAR单元(或者其它距离测量装置)可以是任何使用激光来感应车辆100所在环境中的物体的传感器。在一个实施例中，激光测距仪/LIDAR单元可以包括激光源、激光扫描仪、以及探测器。激光测距仪/LIDAR单元用于以连续(例如使用外差检测)或者不连续的检测模式进行工作。相机可以包括用于捕捉车辆100所在环境的多个图像的装置。相机可以是静态图像相机或者动态视频相机。The IMU may include a combination of sensors (eg, accelerometers and gyroscopes) for sensing changes in position and orientation of the vehicle 100 based on inertial acceleration. The GPS transceiver may be any sensor used to estimate the geographic location of the vehicle 100 . For this purpose, the GPS transceiver may include a receiver/transmitter to provide position information of the vehicle 100 relative to the earth. It should be noted that GPS is an example of a global navigation satellite system, therefore, in some embodiments, the GPS transceiver may be replaced by a Beidou satellite navigation system transceiver or a Galileo satellite navigation system transceiver. The radar unit may use radio signals to sense objects in the environment in which the vehicle 100 is located. In some embodiments, in addition to sensing objects, the radar unit may also be used to sense the speed and heading of objects approaching the vehicle 100 . A laser rangefinder or LIDAR unit (or other distance measuring device) may be any sensor that uses a laser to sense objects in the environment in which the vehicle 100 is located. In one embodiment, a laser rangefinder/LIDAR unit may include a laser source, a laser scanner, and a detector. Laser rangefinder/LIDAR units are designed to operate in continuous (eg using heterodyne detection) or discontinuous detection modes. The camera may include means for capturing multiple images of the environment in which the vehicle 100 is located. The camera may be a still image camera or a motion video camera.

控制系统146用于控制对车辆100及其部件(或单元)的操作。相应地，控制系统146可以包括各种单元，例如转向单元、动力控制单元、制动单元和导航单元。The control system 146 is used to control the operation of the vehicle 100 and its components (or units). Accordingly, the control system 146 may include various units, such as a steering unit, a power control unit, a braking unit, and a navigation unit.

转向单元可以是调整车辆100前进方向的机械的组合。动力控制单元(例如可以为油门)，例如可以被用于控制发动机的运转速度，进而控制车辆100的速度。制动单元可以包括用于对车辆100进行减速的机械的组合。制动单元可以以标准方式利用摩擦力来使车辆减速。在其他实施例中，制动单元可以将车轮的动能转化为电流。制动单元也可以采用其它形式。导航单元可以是任何为车辆100确定驾驶路径或路线的系统。导航单元还可以在车辆100行进的过程中动态的更新驾驶路径。控制系统146还可以附加地或者可选地包括其它未示出或未描述的部件(或单元)。The steering unit may be a combination of machinery that adjusts the direction of travel of the vehicle 100 . A power control unit (which may be an accelerator, for example) may be used to control the running speed of the engine, and thus the speed of the vehicle 100 , for example. The braking unit may include a combination of mechanisms for decelerating the vehicle 100 . The braking unit can use friction to decelerate the vehicle in a standard manner. In other embodiments, the braking unit may convert the kinetic energy of the wheels into electrical current. The braking unit may also take other forms. The navigation unit may be any system that determines a driving path or route for the vehicle 100 . The navigation unit can also dynamically update the driving path as the vehicle 100 travels. The control system 146 may additionally or alternatively include other components (or units) not shown or described.

用户接口系统148可以用于允许车辆100与外部传感器、其它车辆、其它计算机系统和/或车辆100的用户之间的互动。例如，用户接口系统148可以包括标准视觉显示装置(例如，等离子显示器、液晶显示器(LCD)、触屏显示器、头戴显示器，或其它类似的显示器)，扬声器或其它音频输出装置，麦克风或者其它音频输入装置。例如，用户接口系统148还可以包括导航接口以及控制车辆100的内部环境(例如温度、风扇，等等)的接口。User interface system 148 may be used to allow interaction between vehicle 100 and external sensors, other vehicles, other computer systems, and/or a user of vehicle 100 . For example, user interface system 148 may include a standard visual display device (eg, a plasma display, liquid crystal display (LCD), touch screen display, head-mounted display, or other similar display), speakers or other audio output devices, microphones or other audio input device. For example, the user interface system 148 may also include a navigation interface and an interface to control the interior environment of the vehicle 100 (eg, temperature, fans, etc.).

通信系统152可以为车辆100提供与一个或多个设备或者周围其它车辆进行通信的方式。在一个示例性的实施例中，通信系统152可以直接或者通过通信网络与一个或多个设备进行通信。通信系统152例如可以是无线通信系统。例如，通信系统可以使用3G蜂窝通信(例如CDMA、EVDO、GSM/GPRS)或者4G蜂窝通信(例如WiMAX或LTE)，还可以使用5G蜂窝通信。可选地，通信系统可以与无线本地局域网(WLAN)进行通信(例如，使用

)。在一些实施例中，通信系统152可以直接与一个或多个设备或者周围其它车辆进行通信，例如，使用红外线，

或者ZIGBEE。其它无线协议，例如各种车载通信系统，也在本申请公开的范围之内。例如，通信系统可以包括一个或多个专用短程通信(DSRC)装置、V2V装置或者V2X装置，这些装置会与车辆和/或路边站进行公开或私密的数据通信。The communication system 152 may provide a means for the vehicle 100 to communicate with one or more devices or other surrounding vehicles. In an exemplary embodiment, the communication system 152 may communicate with one or more devices directly or through a communication network. Communication system 152 may be, for example, a wireless communication system. For example, the communication system may use 3G cellular communications (eg, CDMA, EVDO, GSM/GPRS) or 4G cellular communications (eg, WiMAX or LTE), and may also use 5G cellular communications. Alternatively, the communication system may communicate with a wireless local area network (WLAN) (eg, using

). In some embodiments, the communication system 152 may communicate directly with one or more devices or other surrounding vehicles, eg, using infrared,

Or ZIGBEE. Other wireless protocols, such as various in-vehicle communication systems, are also within the scope of this disclosure. For example, the communication system may include one or more Dedicated Short Range Communication (DSRC) devices, V2V devices, or V2X devices that may conduct public or private data communications with vehicles and/or roadside stations.

控制计算机系统150能控制车辆100的部分或者全部功能。控制计算机系统150中的自动驾驶控制单元可以用于识别、评估、以及避免或越过车辆100所在环境中的潜在障碍。通常，自动驾驶控制单元可以用于在没有驾驶员的情况下控制车辆100，或者为驾驶员控制车辆提供辅助。在一些实施例中，自动驾驶控制单元用于将来自GPS收发器的数据、雷达数据、LIDAR数据、相机数据、以及来自其它车辆系统的数据结合起来，来确定车辆100的行驶路径或轨迹。自动驾驶控制单元可以被激活以使车辆100能够以自动驾驶模式被驾驶。The control computer system 150 can control some or all of the functions of the vehicle 100 . The autonomous driving control unit in the control computer system 150 may be used to identify, evaluate, and avoid or overcome potential obstacles in the environment in which the vehicle 100 is located. Typically, an automated driving control unit may be used to control the vehicle 100 without a driver, or to provide assistance for the driver to control the vehicle. In some embodiments, the autonomous driving control unit is used to combine data from GPS transceivers, radar data, LIDAR data, camera data, and data from other vehicle systems to determine the travel path or trajectory of the vehicle 100 . The autopilot control unit may be activated to enable the vehicle 100 to be driven in an autopilot mode.

控制计算机系统150可以包括至少一个处理器(其可以包括至少一个微处理器)，处理器执行存储在非易失性计算机可读介质(例如数据存储装置或存储器)中的处理指令(即机器可执行指令)。存储器中存储有至少一条机器可执行指令，处理器执行至少一条机器可执行指令实现包括地图引擎、定位模块、感知模块、导航或路径模块、以及自动控制模块等的功能。地图引擎和定位模块用于提供地图信息和定位信息。感知模块用于根据传感器系统获取到的信息和地图引擎提供的地图信息感知车辆所处环境中的事物。导航或路径模块用于根据地图引擎、定位模块和感知模块的处理结果，为车辆规划行驶路径。自动控制模块将导航或路径模块等模块的决策信息输入解析转换成对车辆控制系统的控制命令输出，并通过车载网(例如通过CAN总线、局域互联网络、多媒体定向系统传输等方式实现的车辆内部电子网络系统)将控制命令发送给车辆控制系统中的对应部件，实现对车辆的自动控制；自动控制模块还可以通过车载网来获取车辆中各部件的信息。Control computer system 150 may include at least one processor (which may include at least one microprocessor) that executes processing instructions (ie, machine-readable) stored in a non-volatile computer-readable medium (eg, data storage or memory). execute command). The memory stores at least one machine-executable instruction, and the processor executes the at least one machine-executable instruction to implement functions including a map engine, a positioning module, a perception module, a navigation or path module, and an automatic control module. The map engine and positioning module are used to provide map information and positioning information. The perception module is used to perceive things in the environment where the vehicle is located according to the information obtained by the sensor system and the map information provided by the map engine. The navigation or route module is used to plan the driving route for the vehicle according to the processing results of the map engine, the positioning module and the perception module. The automatic control module converts the decision information input and analysis of modules such as navigation or route modules into control command output for the vehicle control system, and implements the vehicle through the in-vehicle network (for example, through CAN bus, local area interconnection network, multimedia directional system transmission, etc.). The internal electronic network system) sends control commands to the corresponding components in the vehicle control system to realize automatic control of the vehicle; the automatic control module can also obtain the information of each component in the vehicle through the vehicle network.

控制计算机系统150也可以是多个计算装置，这些计算装置分布式地控制车辆100的部件或者系统。在一些实施例中，存储器中可以包含被处理器执行来实现车辆100的各种功能的处理指令(例如，程序逻辑)。在一个实施例中，控制计算机系统150能够与系统142、144、146、148和/或152进行数据通信。控制计算机系统中的接口用于促进控制计算机系统150和系统142、144、146、148以及152之间的数据通信。Control computer system 150 may also be a plurality of computing devices that control components or systems of vehicle 100 in a distributed manner. In some embodiments, the memory may contain processing instructions (eg, program logic) that are executed by the processor to implement various functions of the vehicle 100 . In one embodiment, control computer system 150 is capable of data communication with systems 142 , 144 , 146 , 148 and/or 152 . Interfaces in the control computer system are used to facilitate data communication between control computer system 150 and systems 142 , 144 , 146 , 148 , and 152 .

存储器还可以包括其它指令，包括用于数据发送的指令、用于数据接收的指令、用于互动的指令、或者用于控制驱动系统140、传感器系统144、或控制系统146或用户接口系统148的指令。The memory may also include other instructions, including instructions for data transmission, instructions for data reception, instructions for interaction, or instructions for controlling drive system 140 , sensor system 144 , or control system 146 or user interface system 148 instruction.

除存储处理指令之外，存储器可以存储多种信息或数据，例如图像处理参数、道路地图、和路径信息。在车辆100以自动方式、半自动方式和/或手动模式运行的期间，这些信息可以被车辆100和控制计算机系统150所使用。In addition to storing processing instructions, the memory may store various information or data, such as image processing parameters, road maps, and route information. This information may be used by the vehicle 100 and the control computer system 150 during operation of the vehicle 100 in automatic, semi-automatic, and/or manual modes.

尽管自动驾驶控制单元被示为与处理器和存储器分离，但是应当理解，在一些实施方式中，自动驾驶控制单元的某些或全部功能可以利用驻留在一个或多个存储器(或数据存储装置)中的程序代码指令来实现并由一个或多个处理器执行，并且自动驾驶控制单元在某些情况下可以使用相同的处理器和/或存储器(或数据存储装置)来实现。在一些实施方式中，自动驾驶控制单元可以至少部分地使用各种专用电路逻辑，各种处理器，各种现场可编程门阵列(“FPGA”)，各种专用集成电路(“ASIC”)，各种实时控制器和硬件来实现。Although the autopilot control unit is shown as being separate from the processor and memory, it should be understood that in some embodiments some or all of the autopilot control unit's functionality may utilize memory (or data storage devices) residing in one or more of the ) and executed by one or more processors, and the autonomous driving control unit may in some cases be implemented using the same processors and/or memory (or data storage). In some embodiments, the autonomous driving control unit may use, at least in part, various special purpose circuit logic, various processors, various field programmable gate arrays ("FPGA"), various application specific integrated circuits ("ASIC"), Various real-time controllers and hardware are implemented.

控制计算机系统150可以根据从各种车辆系统(例如，驱动系统142，传感器系统144，以及控制系统146)接收到的输入，或者从用户接口系统148接收到的输入，来控制车辆100的功能。例如，控制计算机系统150可以使用来自控制系统146的输入来控制转向单元，来避开由传感器系统144检测到的障碍物。在一个实施例中，控制计算机系统150可以用来控制车辆100及其系统的多个方面。Control computer system 150 may control the functions of vehicle 100 based on input received from various vehicle systems (eg, drive system 142 , sensor system 144 , and control system 146 ), or input received from user interface system 148 . For example, control computer system 150 may use input from control system 146 to control the steering unit to avoid obstacles detected by sensor system 144 . In one embodiment, the control computer system 150 may be used to control various aspects of the vehicle 100 and its systems.

虽然图1中显示了集成到车辆100中的各种部件(或单元)，这些部件(或单元)中的一个或多个可以搭载到车辆100上或单独关联到车辆100上。例如，控制计算机系统可以部分或者全部地独立于车辆100存在。从而，车辆100能够以分离的或者集成的设备单元的形式而存在。构成车辆105的设备单元之间可以以有线通信或者无线通信的方式实现相互通信。在一些实施例中，可以将附加部件或单元添加到各个系统或从系统中移除一个或多个以上的部件或单元(例如，图1所示的LiDAR或雷达)。Although various components (or units) are shown integrated into the vehicle 100 in FIG. 1 , one or more of these components (or units) may be onboard or individually associated with the vehicle 100 . For example, the control computer system may exist partially or fully independent of the vehicle 100 . Thus, the vehicle 100 can exist in the form of separate or integrated equipment units. The equipment units constituting the vehicle 105 can communicate with each other in the form of wired communication or wireless communication. In some embodiments, additional components or units may be added to the various systems or one or more of the above components or units (eg, the LiDAR or radar shown in FIG. 1 ) may be added or removed from the system.

如前文所述，现有的单阶段目标检测器对于小物体检测性能较差。虽然特征金字塔的引入使得小物体性能有了较大的提升，但是结果依然不能让人满意：通常的特征金字塔的最高分辨率特征为输入分辨率的1/8，但是对于特别小的物体这依然是不够的。虽然引入更高分辨率的特征可以进一步缓解这个问题，但是这将带来巨大的计算量，例如在RetinaNet中，如果引入分辨率为输入分辨率1/4的特征，检测头上的计算量将会多出300％，这将严重减慢推理速度。为此，本公开实施例旨在提出一种更高效的目标检测方案。As mentioned above, existing single-stage object detectors have poor performance for small object detection. Although the introduction of the feature pyramid has greatly improved the performance of small objects, the results are still unsatisfactory: the highest resolution feature of the usual feature pyramid is 1/8 of the input resolution, but for very small objects this is still is not enough. Although the introduction of higher resolution features can further alleviate this problem, it will bring huge computational load. For example, in RetinaNet, if a feature with a resolution of 1/4 of the input resolution is introduced, the computational load on the detection head will be would be 300% more, which would seriously slow down inference. To this end, the embodiments of the present disclosure aim to propose a more efficient target detection solution.

如图2所示，本公开实施例提供的一种目标检测模型的构建方法200，包括：As shown in FIG. 2 , a method 200 for constructing a target detection model provided by an embodiment of the present disclosure includes:

步骤S201、构建特征提取网络，该特征提取网络用于对输入图片进行特征提取，得到具有不同尺寸的多层特征图，多层特征图包括第一特征图和第二特征图。Step S201 , constructing a feature extraction network, the feature extraction network is used to perform feature extraction on the input picture to obtain multi-layer feature maps with different sizes, and the multi-layer feature maps include a first feature map and a second feature map.

步骤S202、构建目标检测网络，该目标检测网络包括与多层特征图对应的多个网络层，且多个网络层包括第一网络层和第二网络层。Step S202 , constructing a target detection network, where the target detection network includes multiple network layers corresponding to the multi-layer feature maps, and the multiple network layers include a first network layer and a second network layer.

其中，第一网络层与第一特征图对应，用于对第一特征图进行查询操作，并将所得到的查询结果传输给第二网络层，该查询结果包括所述第一特征图中特定目标的查询点。第二网络层与第二特征图对应，用于确定查询点在第二特征图中的映射区域，并在映射区域内进行检测操作，得到检测结果。The first network layer corresponds to the first feature map, and is used to perform a query operation on the first feature map, and transmit the obtained query result to the second network layer, where the query result includes the specific information in the first feature map. The query point for the target. The second network layer corresponds to the second feature map, and is used to determine the mapping area of the query point in the second feature map, and perform a detection operation in the mapped area to obtain a detection result.

为了使本领域的技术人员更好的了解本公开，下面结合附图、实例等对本公开实施例做更为详细的阐述。In order for those skilled in the art to better understand the present disclosure, the embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings and examples.

在本公开的一实施例中，目标检测模型如图3和图4所示，包括特征提取网络和目标检测网络。其中，特征提取网络包括骨干网络和特征金字塔网络。In an embodiment of the present disclosure, the target detection model, as shown in FIG. 3 and FIG. 4 , includes a feature extraction network and a target detection network. Among them, the feature extraction network includes a backbone network and a feature pyramid network.

主干网络用于对输入图片进行特征提取，得到初始的多层特征图。也就是，主干网络通过对输入图片进行下采样操作，得到初始的多层特征图。下采样操作得到的特征图之间具有对应的尺寸缩放关系，如图3骨干网络层中的P2层特征图是输入图片的1/4分辨率，P3层特征图是P2层特征图的1/2分辨率。The backbone network is used to perform feature extraction on the input image to obtain the initial multi-layer feature map. That is, the backbone network obtains the initial multi-layer feature map by down-sampling the input image. The feature maps obtained by the downsampling operation have a corresponding scaling relationship. As shown in Figure 3, the feature map of the P2 layer in the backbone network layer is 1/4 of the resolution of the input image, and the feature map of the P3 layer is 1/4 of the feature map of the P2 layer. 2 resolution.

特征金字塔网络用于对该初始的特征图进行上采样和特征融合，得到改进后的多层特征图。如图3的特征金字塔网络中的特征图是基于骨干网络中的特征图融合得到的，所得到的改进的多层特征图之间依然具有对应的尺寸缩放关系。这里，设定第一特征图，从该第一特征图进行下采样得到多个附加特征图，从该第一特征图开始进行上采样，并和骨干网络中同层级的特征图进行融合，依次得到多个高分辨率的特征图。图3中，特征金字塔网络中的P7层和P8层为附加特征图，用于进行传统的目标检测，P1-P6层为查询特征图，用于进行基于查询操作的目标检测。其中，目标是图像数据中的目标物体，其可包括静态物体和动态物体，例如行人、车辆、动物、障碍物、信号灯、路标等。目标检测是通过算法在原始传感器数据中找到目标物体的位置，一般用长方形或者长方体表示在2D或者3D空间中物体所占据的位置。The feature pyramid network is used to upsample and fuse the initial feature map to obtain an improved multi-layer feature map. The feature maps in the feature pyramid network as shown in Figure 3 are obtained based on the fusion of feature maps in the backbone network, and the resulting improved multi-layer feature maps still have a corresponding scaling relationship. Here, a first feature map is set, a plurality of additional feature maps are obtained by downsampling from the first feature map, up-sampling is performed from the first feature map, and fused with the feature maps of the same level in the backbone network, and sequentially Obtain multiple high-resolution feature maps. In Figure 3, layers P7 and P8 in the feature pyramid network are additional feature maps for traditional target detection, and layers P1-P6 are query feature maps for target detection based on query operations. The target is a target object in the image data, which may include static objects and dynamic objects, such as pedestrians, vehicles, animals, obstacles, signal lights, road signs, and the like. Target detection is to find the position of the target object in the raw sensor data through an algorithm. Generally, a rectangle or cuboid is used to represent the position occupied by the object in 2D or 3D space.

需要说明的是，本公开可以对由骨干网络得到的初始的多层特征图进行目标检测，也可以对由特征金字塔网络得到的改进后的多层特征图进行目标检测，本公开对其检测的特征图来源不作限制。It should be noted that the present disclosure can perform target detection on the initial multi-layer feature map obtained by the backbone network, and can also perform target detection on the improved multi-layer feature map obtained by the feature pyramid network. The source of the feature map is not limited.

每层特征图具有对应的尺寸阈值，例如每层特征图所能检测出的最小目标尺寸，或者每层特征图所能检测出的目标尺寸区间。对于有锚点的特征图，尺寸阈值包括该层特征图的最小锚点尺寸和/或最大锚点尺寸。这样，不同层特征图输出不同大小的目标，将所输出的这些不同大小的目标进行合并，即可得到输入图片的目标检测结果。Each layer of feature maps has a corresponding size threshold, such as the minimum target size that can be detected by each layer of feature maps, or the target size range that can be detected by each layer of feature maps. For feature maps with anchors, the size threshold includes the minimum anchor size and/or the maximum anchor size of the feature map of this layer. In this way, the feature maps of different layers output targets of different sizes, and the output targets of different sizes can be combined to obtain the target detection result of the input image.

目标检测网络包括多个网络层，每个网络层均与一层特征图对应，与附加特征图对应的为附加网络层，与查询特征图对应的为查询层。如图3中的目标检测网络包括两个附加网络层和四个查询层。The target detection network includes a plurality of network layers, each network layer corresponds to a layer of feature maps, the additional network layer corresponds to the additional feature map, and the query layer corresponds to the query feature map. The object detection network in Figure 3 includes two additional network layers and four query layers.

附加网络层为传统的网络层，也可称之为非查询层，用于对对应的附加特征图进行检测操作，得到检测结果。附加网络层的检测网络包括分类网络和/回归网络，该分类网络和回归网络分别用于对对应的附加特征图进行目标的分类检测和回归检测，以得到分类结果和回归结果，分类结果例如为分类类别，回归结果例如为回归框。The additional network layer is a traditional network layer, which may also be called a non-query layer, and is used to perform detection operations on the corresponding additional feature maps to obtain detection results. The detection network of the additional network layer includes a classification network and/or a regression network. The classification network and the regression network are respectively used to perform the classification detection and regression detection of the target on the corresponding additional feature map, so as to obtain the classification result and the regression result. The classification result is, for example, Classification category, the regression result is, for example, the regression box.

查询层为基于查询操作的网络层，查询层中的每个网络层，可实现以下任意一种或多种功能：检测本层级特征图中特定目标的查询点，并将检测到的查询点传输给低层级的网络层；接收高层级的网络层传输过来的查询点，并将该查询点映射为本层特征图中的映射区域。这里，低分辨率特征图所在层为高层级，高分辨率特征图所在层为低层级，如图3中，P2层的层级比P3层的层级低。因此，本公开的多层级特征图从高层级到低层级的分辨率逐级增大，且上下两层特征图之间均具有对应的比例关系。The query layer is a network layer based on query operations. Each network layer in the query layer can implement any one or more of the following functions: Detect query points of specific targets in the feature map of this layer, and transmit the detected query points. To the low-level network layer; receive the query point transmitted by the high-level network layer, and map the query point to the mapping area in the feature map of the layer. Here, the layer where the low-resolution feature map is located is the high-level layer, and the layer where the high-resolution feature map is located is the low-level layer. As shown in Figure 3, the level of the P2 layer is lower than that of the P3 layer. Therefore, the resolution of the multi-level feature map of the present disclosure increases from high level to low level, and there is a corresponding proportional relationship between the upper and lower level feature maps.

根据一个实施例，查询层包括与第一特征图对应的第一网络层、与第二特征图对应的第二网络层、与第三特征图对应的第三网络层、……、与第i网络层对应的第i网络层、……、与第n特征图对应的第n网络层。According to one embodiment, the query layer includes a first network layer corresponding to the first feature map, a second network layer corresponding to the second feature map, a third network layer corresponding to the third feature map, ..., and the i-th network layer. The i-th network layer corresponding to the network layer, ..., the n-th network layer corresponding to the n-th feature map.

第一网络层是查询层的起始层，用于对第一特征图进行查询操作，得到查询结果。查询层的查询结果包括对应层特征图中特定目标的查询点。例如，第一网络层的查询结果包括第一特征图中特定目标的查询点。可选地，查询结果包括查询结果图，某网络层的查询结果图与该网络层对应的特征图尺寸相同，该查询结果图包括对应特征图中各位置点存在特定目标的概率。相对应地，查询点为特征图中概率值大于等于预设阈值的位置点。该预设阈值可以根据需要自行设定，如设定为0.5，本公开对此不作限制。The first network layer is the starting layer of the query layer, and is used to perform a query operation on the first feature map to obtain a query result. The query results of the query layer include query points corresponding to specific targets in the feature map of the layer. For example, the query result of the first network layer includes query points of a specific target in the first feature map. Optionally, the query result includes a query result graph, the query result graph of a certain network layer has the same size as the feature graph corresponding to the network layer, and the query result graph includes the probability that a specific target exists at each position point in the corresponding feature graph. Correspondingly, the query point is a position point whose probability value is greater than or equal to the preset threshold in the feature map. The preset threshold can be set according to needs, for example, it is set to 0.5, which is not limited in the present disclosure.

其中，查询操作可由查询网络执行，该查询网络也是检测网络(也即检测头)的一种。特定目标为尺寸小于等于对应层特征图的尺寸阈值的目标。以第一特征图为例，假设第一特征图的最小尺寸阈值为a，则特定目标是尺寸小于等于a的目标。当然，特定目标也可以是尺寸小于等于该层特征图的尺寸阈值的预定倍数b(b＞1)的目标，此时第一特征图的特定目标是尺寸小于等于a*b的目标。本公开对每层特征图的特定目标尺寸设置这样的扩展区间，有利于生成更大范围的查询点，提高下层网络层的检测准确率。The query operation may be performed by a query network, which is also a type of detection network (ie, detection head). The specific target is the target whose size is less than or equal to the size threshold of the corresponding layer feature map. Taking the first feature map as an example, assuming that the minimum size threshold of the first feature map is a, the specific target is a target whose size is less than or equal to a. Of course, the specific target can also be a target whose size is less than or equal to a predetermined multiple b (b>1) of the size threshold of the feature map of the layer. In this case, the specific target of the first feature map is a target whose size is less than or equal to a*b. The present disclosure sets such an expansion interval for the specific target size of the feature map of each layer, which is conducive to generating a larger range of query points and improving the detection accuracy of the lower network layer.

查询点可以是该特征目标所包含的位置点，或者也可以是该特定目标所在检测框内的位置点，还可以是该特定目标的中心点预设范围内的点，本发明对特定目标的查询点范围不作具体限制。查询点可用坐标表示，即表示该查询点在第一特征图中的坐标位置。The query point may be the position point contained in the feature target, or the position point within the detection frame where the specific target is located, or the point within the preset range of the center point of the specific target. The range of query points is not specifically limited. The query point can be represented by coordinates, that is, the coordinate position of the query point in the first feature map.

第一网络层将查询结果传输给第二网络层，第二网络层提取该传输结果中的查询点，将该查询点映射为第一特征图中的多个映射点，得到映射区域。一般地，第二特征图通过对第一特征图进行上采样和特征融合后得到，第二特征图的尺寸为第一特征图的m倍，且m＞1。因此，第二网络层根据第一特征图和第二特征图之间的放大倍数m，将查询点映射为第二特征图中的多个点，得到映射区域。如果第二特征图是第一特征图的2倍，则第二网络层根据图像间的坐标映射关系，将接收到的每个查询点映射为第二特征图中的四临近点；如果第二特征图是第一特征图的3倍，则第二网络层将接收到的每个查询点映射为第二特征图中的八临近点，以此类推。这里，第二网络层对查询点的映射操作可由映射模块执行。对第二特征图中的映射区域进行查询操作，得到对应的查询结果。The first network layer transmits the query result to the second network layer, and the second network layer extracts the query point in the transmission result, maps the query point to a plurality of mapping points in the first feature map, and obtains a mapping area. Generally, the second feature map is obtained by performing up-sampling and feature fusion on the first feature map, and the size of the second feature map is m times that of the first feature map, and m>1. Therefore, the second network layer maps the query point to a plurality of points in the second feature map according to the magnification m between the first feature map and the second feature map to obtain a mapped region. If the second feature map is twice as large as the first feature map, the second network layer maps each received query point to four adjacent points in the second feature map according to the coordinate mapping relationship between images; The feature map is 3 times larger than the first feature map, then the second network layer maps each received query point to the eight adjacent points in the second feature map, and so on. Here, the mapping operation of the query point by the second network layer may be performed by the mapping module. A query operation is performed on the mapping area in the second feature map to obtain a corresponding query result.

将所得到的查询结果传输给第三网络层，以便第三网络层将该查询结果中的查询点映射为对应的第三特征图中的映射区域，并在该映射区域内进行检测操作，得到检测结果。The obtained query result is transmitted to the third network layer, so that the third network layer maps the query point in the query result to the mapping area in the corresponding third feature map, and performs the detection operation in the mapping area to obtain Test results.

之后，第二网络层对所确定到的映射区域进行检测操作，该检测操作同样由检测网络执行，检测网络如分类网络和/或回归网络，以分别进行分类检测和回归检测。一般地，某网络层若接收了高层网络层传输来的查询结果，则该网络层通常都具有映射模块和检测网络，其中映射模块用于对查询结果内的查询点进行映射，检测模块用于对映射区域进行检测。同理，某网络层若进行了查询操作，则会将查询结果向低层级传递。Afterwards, the second network layer performs a detection operation on the determined mapping area, and the detection operation is also performed by a detection network, such as a classification network and/or a regression network, to perform classification detection and regression detection, respectively. Generally, if a certain network layer receives the query result transmitted from the high-level network layer, the network layer usually has a mapping module and a detection network, wherein the mapping module is used to map the query points in the query result, and the detection module is used to map the query points in the query result. Detect the mapped area. Similarly, if a network layer performs a query operation, the query result will be passed to the lower layer.

这里，第一网络层主要包括查询网络，用于进行查询操作，第二网络层主要包括检测网络，用于进行检测操作。可选地，第一网络层也可以包括检测网络，用于对所第一特征图进行检测操作，得到检测结果。第二网络层也可以包括查询网络，用于对对应的第二特征图进行查询操作，并将查询查询传输给第三网络层。每层网络层输出的查询结果为该层特征图中特定目标的查询点。此时，第三网络层则必然具有映射模块和检测网络，以将从第二网络层接收到的查询点映射为查询区域，并在该映射区域内进行检测操作。Here, the first network layer mainly includes a query network for performing a query operation, and the second network layer mainly includes a detection network for performing a detection operation. Optionally, the first network layer may also include a detection network for performing a detection operation on the first feature map to obtain a detection result. The second network layer may also include a query network for performing a query operation on the corresponding second feature map, and transmitting the query query to the third network layer. The query result output by each layer of the network layer is the query point of the specific target in the feature map of this layer. At this time, the third network layer must have a mapping module and a detection network, so as to map the query point received from the second network layer into a query area, and perform detection operations in the mapped area.

同理，第三网络层除了具有映射和检测网络之外，也可以具有查询网络，以对对应的第三特征图进行查询操作，并将查询结果传输给第四网络层，以便第四网络层将该查询结果映射为映射区域后进行目标检测，以此类推，直至达到查询层的底层。该查询层的底层只包括映射和检测网络，该底层只需要得到映射区域后输出检测结果即可，而不再需要查询网络。Similarly, in addition to the mapping and detection network, the third network layer may also have a query network to perform a query operation on the corresponding third feature map, and transmit the query result to the fourth network layer, so that the fourth network layer After the query result is mapped into the mapping area, target detection is performed, and so on, until the bottom layer of the query layer is reached. The bottom layer of the query layer only includes the mapping and detection network, and the bottom layer only needs to output the detection result after obtaining the mapping area, and no longer needs to query the network.

需要说明的是，本公开将接收同一组查询点的网络层认为是同一组网络层，也就是目标检测网络中的第二网络层、第三网络层、第i网络层等均可以为多个(图4中以两个为例，实际还可以有更多个)。而与第二网络层同层级的特征图均称之为第二特征图，与第三网络层同层级的均称之为第三特征图，与第i网络层同层级的特征图均称之为第i特征图。其中，每个第二网络层均从第一网络层中接收查询结果，并将查询结果映射为对应的第二特征图中的映射区域；每个第三网络层均从第二网络层中接收查询结果，并将查询结果映射为对应的第三特征图中的映射区域；以此类推。It should be noted that the present disclosure considers the network layers that receive the same group of query points to be the same group of network layers, that is, the second network layer, the third network layer, the i-th network layer, etc. in the target detection network can be multiple. (Two are taken as an example in FIG. 4, and there may actually be more). The feature maps at the same level as the second network layer are called second feature maps, those at the same level as the third network layer are called third feature maps, and the feature maps at the same level as the i-th network layer are called is the i-th feature map. Wherein, each second network layer receives the query result from the first network layer, and maps the query result to the mapping area in the corresponding second feature map; each third network layer receives the query result from the second network layer query results, and map the query results to the corresponding mapping regions in the third feature map; and so on.

具体而言，第一网络层对第一特征图进行查询操作后得到第一组查询点，并将该第一组查询点传给一个或多个第二网络层。也就是该一个或多个第二网络层接收同一组查询点，并分别将该第一次查询结果映射为对应层级特征图中的映射区域，以在该映射区域内进行检测操作。Specifically, the first network layer obtains a first group of query points after performing a query operation on the first feature map, and transmits the first group of query points to one or more second network layers. That is, the one or more second network layers receive the same set of query points, and respectively map the first query result to a mapping area in the feature map of the corresponding level, so as to perform detection operations in the mapping area.

同理，某个第二网络层对同层级的第二特征图进行查询操作后，得到第二组查询点，传给一个或多个第三网络层。此时该一个或多个第三网络层分别将该第二组查询点映射为对应层级特征图中的映射区域，以在该映射区域内进行检测操作。Similarly, after a certain second network layer performs a query operation on the second feature map of the same level, a second set of query points are obtained, and are passed to one or more third network layers. At this time, the one or more third network layers respectively map the second group of query points to a mapping area in the corresponding hierarchical feature map, so as to perform a detection operation in the mapping area.

这里，可以有多个第二网络层都进行查询操作，得到对应的查询结果，之后不同层级特征图之间的尺寸缩放关系，综合所得到的多个查询结果，得到最终的合并结果。例如，根据各第二网络层的查询点坐标和坐标间映射关系，得到该多个第二网络层的综合查询点，传输给第三网络层。Here, multiple second network layers can perform query operations to obtain corresponding query results, and then the size scaling relationship between feature maps at different levels can be combined to obtain a final merged result by synthesizing the multiple query results obtained. For example, according to the query point coordinates of each second network layer and the mapping relationship between the coordinates, the comprehensive query points of the plurality of second network layers are obtained and transmitted to the third network layer.

同理，第i网络层对同层级的第i特征图进行查询操作后，得到第i组查询点，并传给一个或多个第i+1网络层。此时该一个或多个第i+1网络层分别将该第i组查询点映射为对应层级特征图中的映射区域，以在该映射区域内进行检测操作。Similarly, after the i-th network layer performs a query operation on the i-th feature map at the same level, it obtains the i-th group of query points and transmits it to one or more i+1-th network layers. At this time, the one or more i+1 th network layers respectively map the i th group of query points to a mapping area in the corresponding hierarchical feature map, so as to perform detection operations in the mapping area.

优选地，每个查询层均同时具有查询网络和检测网络，并依次向下一低层级的网络层传输查询结果。此时，查询层的顶层，也就是第一网络层，输出第一特征图的查询结果并传输给第二网络层，以便第二网络层进行映射和检测操作。同时，第二网络层对第二特征图进行查询操作，并将查询结果传输给第三网络层，以便第三网络层进行映射和检测操作。也就是，查询层的中间层即从高层级网络层获取查询结果进行映射和检测操作，也对本层级的特征图进行查询操作，并将查询结果传输给低层级的网络层。以此类推，直至达到网络层的底层，该底层只进行映射和检测操作，而不再进行查询操作。也就是，对于查询层的顶层和中间层，检测网络同时包括分类网络、回归网络和查询网络，用于输出分类结果、回归结果和查询结果；而查询层的底层，检测网络只包含分类网络和查询网络，用于输出分类结果和回归结果。Preferably, each query layer has a query network and a detection network at the same time, and sequentially transmits the query results to the next lower-level network layer. At this time, the top layer of the query layer, that is, the first network layer, outputs the query result of the first feature map and transmits it to the second network layer, so that the second network layer can perform mapping and detection operations. At the same time, the second network layer performs a query operation on the second feature map, and transmits the query result to the third network layer, so that the third network layer can perform mapping and detection operations. That is, the middle layer of the query layer not only obtains the query results from the high-level network layer for mapping and detection operations, but also performs query operations on the feature maps of this layer, and transmits the query results to the lower-level network layer. And so on, until the bottom layer of the network layer is reached, the bottom layer only performs mapping and detection operations, and no longer performs query operations. That is, for the top and middle layers of the query layer, the detection network includes a classification network, a regression network, and a query network, which are used to output classification results, regression results, and query results; while the bottom layer of the query layer, the detection network only includes the classification network and the query network. Query the network to output classification and regression results.

根据本公开一个实施例，为了提高目标检测网络的整体检测效率，本公开采用稀疏卷积的方式来对特征图进行检测操作。具体地，每个在进行检测操作的网络层(如第二网络层)，提取映射区域的图像特征来构建稀疏张量，并采用稀疏卷积在映射区域内进行检测操作和/或查询操作。According to an embodiment of the present disclosure, in order to improve the overall detection efficiency of the target detection network, the present disclosure uses a sparse convolution method to perform a detection operation on a feature map. Specifically, each network layer (eg, the second network layer) performing detection operations extracts image features of the mapped region to construct a sparse tensor, and uses sparse convolution to perform detection operations and/or query operations in the mapped region.

对于卷积神经网络，其输入通常为一组四维的张量，分别对应批(batch)、通道(channel)、高(height)、宽(width)四个维度，卷积核会在高宽平面的每一个位置计算结果。但是有时只需要这个平面中特定位置的计算结果，因此对整个平面计算是不划算的，为此提出了稀疏卷积的方式，以降低计算量，加速推理速度。除了通常的四维张量输入，额外需要计算结果的位置，使得卷积核只在指定位置进行计算。For a convolutional neural network, the input is usually a set of four-dimensional tensors, corresponding to the four dimensions of batch, channel, height, and width, respectively. The convolution kernel will be in the height and width plane. Calculation results for each position of . However, sometimes only the calculation result of a specific position in the plane is required, so it is not cost-effective to calculate the entire plane. For this reason, a sparse convolution method is proposed to reduce the amount of calculation and speed up the inference speed. In addition to the usual four-dimensional tensor input, the location of the calculation result is additionally required, so that the convolution kernel is only calculated at the specified location.

稀疏卷积在查询层中应用时，对于查询层的最高层(即第一网络层)，由于其没有上层查询点的输入，采取普通卷积来计算结果。对于之后层的特征，先将输入的查询点映射到该层的多临近位置上，然后对所有位置抽取特征并构建出稀疏特征，最后计算所查询位置的结果。When sparse convolution is applied in the query layer, for the highest layer of the query layer (ie, the first network layer), since it has no input from the upper query point, ordinary convolution is used to calculate the result. For the features of the subsequent layers, the input query points are first mapped to the multiple adjacent positions of the layer, then the features are extracted from all positions and sparse features are constructed, and finally the results of the queried positions are calculated.

可选地，该稀疏卷积也可更换为裁切操作，即对查询到的映射区域进行特征裁切，并对裁切出的特征计算卷积结果。Optionally, the sparse convolution can also be replaced with a cropping operation, that is, feature cropping is performed on the queried mapping region, and a convolution result is calculated on the cropped features.

从前文中可以看出，目标检测网络包括有多个能够输出检测结果的网络层，如附加网络层、第二网络层、第三网络层等。基于此，方法200还可以包括步骤：构建输出网络，该输出网络用于将各网络层进行检测操作后所得到的检测结果进行合并输出。合并输出的方式有多种，例如采用加权非极大值抑制方法对多个回归框进行合并，本公开对此不作限制。As can be seen from the foregoing, the target detection network includes multiple network layers capable of outputting detection results, such as an additional network layer, a second network layer, a third network layer, and the like. Based on this, the method 200 may further include the step of: constructing an output network, where the output network is used to combine and output the detection results obtained after each network layer performs the detection operation. There are various ways to combine outputs, for example, a weighted non-maximum suppression method is used to combine multiple regression boxes, which is not limited in the present disclosure.

另外，如图3所示，从检测类型来区分，目标检测网络包括三个网络分支：分类网络、回归网络、查询网络，分别用于执行分类操作、回归操作和查询操作。为了提高模型训练的整体效率，本公开的分类网络、回归网络、查询网络在多个网络层之间参数共享，也就是所有分类网络具有相同的参数类型和参数值，所有回归网络具有相同的参数类型和参数值，所有查询网络具有相同的参数类型和参数值，以实现同一网络分支的同步训练。In addition, as shown in Figure 3, in terms of detection types, the target detection network includes three network branches: classification network, regression network, and query network, which are used to perform classification operations, regression operations, and query operations, respectively. In order to improve the overall efficiency of model training, the classification network, regression network, and query network of the present disclosure share parameters among multiple network layers, that is, all classification networks have the same parameter types and parameter values, and all regression networks have the same parameters Type and parameter value, all query networks have the same parameter type and parameter value to achieve synchronous training of the same network branch.

进一步地，本公开还包括对目标检测模型进行训练的步骤：生成训练样本集，并根据该训练样本集对所构建的目标检测模型进行训练，得到训练后的目标检测模型。其中，训练样本集包括多张训练图像和每张训练图像的标注，该标注包括分类标注和位置标注中的至少一种。此时可根据每张训练图像的分类标注和所预测的分类结果，对分类网络分支进行迭代更新，根据每张训练图像的回归标注和所预测的回归结果，对回归网络分支进行迭代更新。Further, the present disclosure further includes the step of training the target detection model: generating a training sample set, and training the constructed target detection model according to the training sample set to obtain a trained target detection model. Wherein, the training sample set includes a plurality of training images and a label of each training image, and the label includes at least one of a classification label and a position label. At this time, the classification network branch can be iteratively updated according to the classification label of each training image and the predicted classification result, and the regression network branch can be iteratively updated according to the regression label of each training image and the predicted regression result.

另外，每张训练图像的标注还可以包括该训练图像所生成的各层特征图的查询点标注，此时可根据每层特征图的查询点标注和所预测出的该层特征图的查询点，计算损失函数，并根据损失函数对所构建的目标检测网络进行训练，从而对查询网络分支进行迭代更新。在对某训练图像的查询点进行标注时，对于任一层特征图；当某位置点到该层特征图中特定目标的中心的距离小于预设阈值时，将该位置点标注为查询点；或者当某位置点对应的锚框与该层特征图中特定目标的重合度大于预设阈值时，将该位置点标注为查询点。该重合度可以为交并比，也就是交集区域和并集区域的比值。In addition, the annotation of each training image may also include the query point annotation of the feature map of each layer generated by the training image. In this case, the query point annotation of the feature map of each layer and the predicted query point of the feature map of this layer can be , calculate the loss function, and train the constructed target detection network according to the loss function, so as to iteratively update the query network branch. When marking the query point of a training image, for any layer of feature maps; when the distance from a position point to the center of a specific target in the layer's feature map is less than a preset threshold, the position point is marked as a query point; Or when the degree of coincidence between the anchor box corresponding to a position point and the specific target in the feature map of the layer is greater than a preset threshold, the position point is marked as a query point. The degree of coincidence may be an intersection-union ratio, that is, the ratio of the intersection area and the union area.

可选地，本公开采用聚焦损失训练分类网络分支和查询网络分支，利用平滑L₁损失训练回归网络分支。Optionally, the present disclosure employs the focused loss to train the classification network branch and the query network branch, and the smooth L1 _loss to train the regression network branch.

训练好目标检测模型后，即可采用训练好的目标检测模型进行目标检测。图5示出了根据本公开一个实施例的目标检测方法500，如图5所示，目标检测方法500包括：After the target detection model is trained, the trained target detection model can be used for target detection. FIG. 5 shows a target detection method 500 according to an embodiment of the present disclosure. As shown in FIG. 5 , the target detection method 500 includes:

步骤S501，将待检测图片输入到训练好的目标检测模型中，该目标检测模型包括特征提取网络和目标检测网络，具体网络结构如图3和图4所示，这里不再展开赘述。Step S501, input the image to be detected into the trained target detection model, the target detection model includes a feature extraction network and a target detection network, the specific network structure is shown in Figure 3 and Figure 4, and will not be repeated here.

步骤S502，采用特征提取网络对待检测图片进行特征提取，得到具有不同尺寸的多层特征图，多层特征图包括第一特征图和第二特征图。Step S502, using a feature extraction network to perform feature extraction on the image to be detected, to obtain multi-layer feature maps with different sizes, and the multi-layer feature maps include a first feature map and a second feature map.

步骤S503，采用目标检测网络输出对多层特征图的检测结果，该目标检测网络包括与多层特征图对应的多个网络层，该多个网络层包括第一网络层和第二网络层。第一网络层与第一特征图对应，用于对第一特征图进行查询操作，并将所得到的查询结果传输给第二网络层，查询结果包括第一特征图中特定目标的查询点。第二网络层与第二特征图对应，用于确定查询点在第二特征图中的映射区域，并在映射区域内进行检测操作，得到检测结果。Step S503, using a target detection network to output the detection result of the multi-layer feature map, the target detection network includes a plurality of network layers corresponding to the multi-layer feature maps, and the plurality of network layers include a first network layer and a second network layer. The first network layer corresponds to the first feature map, and is used to perform a query operation on the first feature map, and transmit the obtained query result to the second network layer, where the query result includes the query point of a specific target in the first feature map. The second network layer corresponds to the second feature map, and is used to determine the mapping area of the query point in the second feature map, and perform a detection operation in the mapped area to obtain a detection result.

如图3和图4所示，目标检测网络还包括附加网络层和第三网络层、第四网络层、……、第n网络层，每层网络层均具有检测网络，用于得到对应的检测结果。因此，步骤S530中，采用目标检测网络输出对多层特征图的检测结果，包括：将各网络层的检测结果进行合并，得到输入图片的检测结果。As shown in Fig. 3 and Fig. 4, the target detection network also includes an additional network layer, a third network layer, a fourth network layer, ..., an nth network layer, and each network layer has a detection network for obtaining the corresponding Test results. Therefore, in step S530, using the target detection network to output the detection result of the multi-layer feature map includes: combining the detection results of each network layer to obtain the detection result of the input picture.

图6示出了根据本公开一个实施例的采用目标检测模型进行目标检测的效果示意图，其中低分辨率的特征图主要用于检测大物体，并向高分辨率的特征图传输查询点，以便高分辨率的特征图确定该查询点对应的映射区域后，检测该映射区域内的小物体。采用本公开的目标检测模型来进行目标检测，具有鲁棒性高、推理速度快、效率高等优势。FIG. 6 shows a schematic diagram of the effect of using a target detection model for target detection according to an embodiment of the present disclosure, wherein a low-resolution feature map is mainly used to detect large objects, and query points are transmitted to a high-resolution feature map, so as to After the high-resolution feature map determines the mapping area corresponding to the query point, small objects in the mapping area are detected. Using the target detection model of the present disclosure to perform target detection has the advantages of high robustness, fast reasoning speed, and high efficiency.

图7示出了根据本公开一个实施例的目标检测模型的构建装置700，如图7所示，装置700包括：FIG. 7 shows an apparatus 700 for constructing a target detection model according to an embodiment of the present disclosure. As shown in FIG. 7 , the apparatus 700 includes:

第一构建单元701，用于构建特征提取网络，该特征提取网络用于对输入图片进行特征提取，得到具有不同尺寸的多层特征图，该多层特征图包括第一特征图和第二特征图。The first construction unit 701 is used to construct a feature extraction network, the feature extraction network is used to perform feature extraction on the input picture to obtain multi-layer feature maps with different sizes, and the multi-layer feature maps include a first feature map and a second feature picture.

第二构建单元702，用于构建目标检测网络，该目标检测网络包括与多层特征图对应的多个网络层，多个网络层包括第一网络层和第二网络层。第一网络层与第一特征图对应，用于对第一特征图进行查询操作，并将所得到的查询结果传输给第二网络层，查询结果包括第一特征图中特定目标的查询点。第二网络层与第二特征图对应，用于确定查询点在第二特征图中的映射区域，并在映射区域内进行检测操作，得到检测结果。The second construction unit 702 is configured to construct a target detection network, where the target detection network includes multiple network layers corresponding to the multi-layer feature maps, and the multiple network layers include a first network layer and a second network layer. The first network layer corresponds to the first feature map, and is used to perform a query operation on the first feature map, and transmit the obtained query result to the second network layer, where the query result includes the query point of a specific target in the first feature map. The second network layer corresponds to the second feature map, and is used to determine the mapping area of the query point in the second feature map, and perform a detection operation in the mapped area to obtain a detection result.

可选地，目标检测模型的构建装置700还可以包括第三构建单元和模型训练单元(图中均未示出)。其中，第三构建单元用于构建输出网络，输出网络用于将各网络层进行检测操作后所得到的检测结果进行合并输出。模型训练单元用于生成训练样本集，并根据该训练样本集对所构建的目标检测模型进行训练，得到训练后的目标检测模型。训练样本集包括多张训练图像和每张训练图像的标注，标注包括分类标注和位置标注中的至少一种。Optionally, the apparatus 700 for constructing a target detection model may further include a third constructing unit and a model training unit (neither are shown in the figures). The third construction unit is used to construct an output network, and the output network is used to combine and output the detection results obtained after each network layer performs detection operations. The model training unit is used to generate a training sample set, and train the constructed target detection model according to the training sample set to obtain a trained target detection model. The training sample set includes a plurality of training images and a label for each training image, and the label includes at least one of a classification label and a position label.

此外，每张训练图像的标注还包括该训练图像所生成的各层特征图的查询点标注，此时模型训练单元根据每层特征图的查询点标注和所预测出的该层特征图的查询点，计算损失函数，并根据该损失函数对所构建的目标检测网络进行训练。一般的，对于任一层特征图；当某位置点到该层特征图中特定目标的中心的距离小于预设阈值时，将该位置点标注为查询点；或者，当某位置点对应的锚框与该层特征图中特定目标的交并比大于预设阈值时，将该位置点标注为查询点。In addition, the annotation of each training image also includes the query point annotation of the feature map of each layer generated by the training image. At this time, the model training unit is based on the query point annotation of the feature map of each layer and the predicted query point of the feature map of this layer. point, calculate the loss function, and train the constructed object detection network according to the loss function. Generally, for any feature map of any layer; when the distance between a certain position point and the center of a specific target in the feature map of this layer is less than a preset threshold, the position point is marked as a query point; or, when the anchor corresponding to a certain position point is When the intersection ratio between the box and the specific target in the feature map of this layer is greater than the preset threshold, the location point is marked as a query point.

图8示出了根据本公开一个实施例的目标检测装置800，如图8所示，装置800包括：FIG. 8 shows a target detection apparatus 800 according to an embodiment of the present disclosure. As shown in FIG. 8 , the apparatus 800 includes:

输入单元801，用于将待检测图片输入到目标检测模型中，该目标检测模型包括特征提取网络和目标检测网络。The input unit 801 is configured to input the to-be-detected picture into a target detection model, where the target detection model includes a feature extraction network and a target detection network.

特征提取单元802，用于采用特征提取网络对待检测图片进行特征提取，得到具有不同尺寸的多层特征图，多层特征图包括第一特征图和第二特征图。The feature extraction unit 802 is configured to perform feature extraction on the image to be detected by using a feature extraction network to obtain multi-layer feature maps with different sizes, and the multi-layer feature maps include a first feature map and a second feature map.

目标检测单元803，用于采用目标检测网络输出对多层特征图的检测结果，该目标检测网络包括与多层特征图对应的多个网络层，该多个网络层包括第一网络层和第二网络层。其中，第一网络层与第一特征图对应，用于对第一特征图进行查询操作，并将所得到的查询结果传输给第二网络层，查询结果包括第一特征图中特定目标的查询点。第二网络层与第二特征图对应，用于确定查询点在第二特征图中的映射区域，并在映射区域内进行检测操作，得到检测结果。The target detection unit 803 is used to output the detection result of the multi-layer feature map by using a target detection network, the target detection network includes a plurality of network layers corresponding to the multi-layer feature maps, and the plurality of network layers include a first network layer and a first network layer. Two network layers. The first network layer corresponds to the first feature map, and is used to perform a query operation on the first feature map, and transmit the obtained query result to the second network layer, and the query result includes the query of a specific target in the first feature map. point. The second network layer corresponds to the second feature map, and is used to determine the mapping area of the query point in the second feature map, and perform a detection operation in the mapped area to obtain a detection result.

可选地，目标检测单元803用于将各网络层(包括非查询层和具有检测网络的查询层)进行检测操作后所得到的检测结果进行合并输出。Optionally, the target detection unit 803 is configured to combine and output the detection results obtained after the detection operations are performed on each network layer (including the non-query layer and the query layer with the detection network).

需要说明的是，本公开实施例提供的目标检测模型的构建装置700和目标检测装置800的具体实现方式，已在基于图1-图6的描述中已详细公开，此处不再赘述。It should be noted that the specific implementation manners of the target detection model building apparatus 700 and the target detection apparatus 800 provided by the embodiments of the present disclosure have been disclosed in detail in the description based on FIG. 1 to FIG. 6 , and will not be repeated here.

另外，本公开实施例还提供一种计算机可读存储介质，包括程序或指令，当该程序或指令在计算机上运行时，实现如前文所述的对象状态估计方法。In addition, an embodiment of the present disclosure further provides a computer-readable storage medium, including a program or an instruction, when the program or instruction is run on a computer, the object state estimation method as described above is implemented.

另外，本公开实施例还提供一种如图9所示的计算设备900，包括存储器901，以及与存储器通信连接的一个或多个处理器902。存储器901中存储有可被一个或多个处理器902执行的指令，指令被一个或多个处理器902执行，以使一个或多个处理器902实现如前文所述的对象状态估计方法。计算装置900还可以进一步包括一个通信接口903，该通信接口903可以实施一个或多个通信协议(LTE、Wi-Fi，等等)。In addition, an embodiment of the present disclosure further provides a computing device 900 as shown in FIG. 9 , including a memory 901 and one or more processors 902 communicatively connected to the memory. The memory 901 stores instructions executable by the one or more processors 902 to cause the one or more processors 902 to implement the object state estimation method as previously described. Computing device 900 may further include a communication interface 903 that may implement one or more communication protocols (LTE, Wi-Fi, etc.).

根据本公开的技术方案，通过引入更高分辨率的特征提升了小物体的检测性能，并在检测头中加入查询头，实现了基于查询和稀疏卷积的目标检测方法，解决了引入高分辨特征后推理速度变慢的问题。通过将本公开的目标检测方法部署到自动驾驶车辆上，可以使得小物体检测的性能变得又快又好。According to the technical solution of the present disclosure, the detection performance of small objects is improved by introducing features of higher resolution, and a query head is added to the detection head, thereby realizing a target detection method based on query and sparse convolution, and solving the problem of introducing high-resolution The problem of slower inference after features. By deploying the object detection method of the present disclosure to an autonomous vehicle, the performance of small object detection can be made fast and good.

本领域内的技术人员应明白，本公开的实施例可提供为方法、系统、或计算机程序产品。因此，本公开可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且，本公开可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。As will be appreciated by one skilled in the art, embodiments of the present disclosure may be provided as a method, system, or computer program product. Accordingly, the present disclosure may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present disclosure may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.

本公开是参照根据本公开实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器，使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。The present disclosure is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the disclosure. It will be understood that each flow and/or block in the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to the processor of a general purpose computer, special purpose computer, embedded processor or other programmable data processing device to produce a machine such that the instructions executed by the processor of the computer or other programmable data processing device produce Means for implementing the functions specified in a flow or flow of a flowchart and/or a block or blocks of a block diagram.

这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中，使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品，该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。These computer program instructions may also be stored in a computer-readable memory capable of directing a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory result in an article of manufacture comprising instruction means, the instructions The apparatus implements the functions specified in the flow or flow of the flowcharts and/or the block or blocks of the block diagrams.

这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上，使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理，从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。These computer program instructions can also be loaded on a computer or other programmable data processing device to cause a series of operational steps to be performed on the computer or other programmable device to produce a computer-implemented process such that The instructions provide steps for implementing the functions specified in the flow or blocks of the flowcharts and/or the block or blocks of the block diagrams.

本公开中应用了具体实施例对本公开的原理及实施方式进行了阐述，以上实施例的说明只是用于帮助理解本公开的方法及其核心思想；同时，对于本领域的一般技术人员，依据本公开的思想，在具体实施方式及应用范围上均会有改变之处，综上所述，本说明书内容不应理解为对本公开的限制。In this disclosure, specific embodiments are used to illustrate the principles and implementations of the present disclosure. The descriptions of the above embodiments are only used to help understand the methods and core ideas of the present disclosure; There will be changes in the disclosed ideas in terms of specific implementations and application scopes. To sum up, the contents of this specification should not be construed as limiting the present disclosure.

Claims

1. A construction method of an object detection model comprises the following steps:

constructing a feature extraction network, wherein the feature extraction network is used for carrying out feature extraction on an input picture to obtain multilayer feature maps with different sizes, and the multilayer feature maps comprise a first feature map and a second feature map;

constructing an object detection network, wherein the object detection network comprises a plurality of network layers corresponding to the multilayer characteristic diagram, and the plurality of network layers comprise a first network layer and a second network layer;

the first network layer corresponds to the first feature map and is used for carrying out query operation on the first feature map and transmitting an obtained query result to the second network layer, wherein the query result comprises a query point of a specific target in the first feature map;

the second network layer corresponds to the second feature map and is used for determining a mapping area of the query point in the second feature map and performing detection operation in the mapping area to obtain a detection result.

2. The method of claim 1, wherein the first network layer is further configured to perform a detection operation on the first feature map to obtain a detection result.

3. The method of claim 1, wherein the target detection network has a plurality of second network layers, each second network layer receiving the query result from the first network layer and mapping the query result to a mapping region in a corresponding second feature map.

4. The method of claim 1, wherein the multi-layer signature further comprises a third signature, the object detection network further comprises a third network layer corresponding to the third signature, the second network layer further configured to:

inquiring the mapping area in the second characteristic diagram to obtain a corresponding inquiry result;

and transmitting the obtained query result to the third network layer so that the third network layer maps the query point in the query result to a mapping area in the corresponding third feature map, and performing detection operation in the mapping area to obtain a detection result.

5. The method of claim 1, wherein the second network layer is further configured to:

extracting image features of the mapping region to construct a sparse tensor;

and carrying out detection operation and/or query operation in the mapping region by adopting sparse convolution.

6. The method of claim 1, wherein,

the multi-layer feature map further comprises an additional feature map;

the target detection network further comprises an additional network layer corresponding to the additional feature map;

and the additional network layer is used for carrying out detection operation on the additional characteristic diagram to obtain a detection result.

7. The method of any of claims 1-6, further comprising:

and constructing an output network, wherein the output network is used for merging and outputting detection results obtained after detection operation is carried out on each network layer.

8. The method of claim 1, wherein the feature extraction network comprises:

the system comprises a backbone network, a data processing network and a data processing network, wherein the backbone network is used for extracting features of an input picture to obtain an initial multilayer feature map; and

and the characteristic pyramid network is used for performing up-sampling and characteristic fusion on the initial characteristic diagram to obtain an improved multilayer characteristic diagram.

9. The method according to claim 1, wherein the second feature map is obtained by upsampling and feature fusing the first feature map, the size of the second feature map is m times of that of the first feature map, and m > 1.

10. The method of claim 9, wherein the second network layer maps the query point to a plurality of points in the second feature map according to a magnification between the first feature map and the second feature map, resulting in the mapped region.

11. The method of claim 1, wherein,

the detecting operation comprises at least one of a regression operation and a classification operation;

the detection result includes at least one of a regression result and a classification result.

12. The method of claim 11, wherein,

the target detection network comprises a classification network, a regression network and an inquiry network;

the classification operation, the regression operation and the query operation are respectively executed by a classification network, a regression network and a query network.

13. The method of claim 1, wherein,

the query result comprises a query result graph;

the query result graph comprises the probability of the specific target existing at each position point in the corresponding feature graph;

the query point is a position point with the probability value being greater than or equal to a preset threshold value.

14. The method of claim 1, wherein each layer of feature map has a corresponding size threshold, and the particular target of each layer of feature map is a target having a size equal to or less than the size threshold of the layer of feature map.

15. The method of claim 1, further comprising:

generating a training sample set, wherein the training sample set comprises a plurality of training images and labels of each training image, and the labels comprise at least one of classification labels and position labels;

and training the constructed target detection model according to the training sample set to obtain the trained target detection model.

16. The method of claim 15, wherein the label of each training image further includes a query point label of each layer of feature map generated by the training image, and the training of the constructed target detection model according to the training sample set includes:

and calculating a loss function according to the query point label of each layer of feature diagram and the predicted query point of the layer of feature diagram, and training the constructed target detection network according to the loss function.

17. The method of claim 16, wherein, for any layer profile;

when the distance from a certain position point to the center of a specific target in the layer characteristic diagram is smaller than a preset threshold value, marking the position point as a query point; or

And when the intersection ratio of the anchor frame corresponding to a certain position point and a specific target in the layer of feature graph is greater than a preset threshold value, marking the position point as a query point.

18. A method of target detection, comprising:

inputting a picture to be detected into a target detection model, wherein the target detection model comprises a feature extraction network and a target detection network;

performing feature extraction on the picture to be detected by adopting the feature extraction network to obtain multilayer feature maps with different sizes, wherein the multilayer feature maps comprise a first feature map and a second feature map; and

outputting a detection result of the multilayer characteristic diagram by adopting the target detection network, wherein the target detection network comprises a plurality of network layers corresponding to the multilayer characteristic diagram, and the plurality of network layers comprise a first network layer and a second network layer; wherein,

19. An apparatus for constructing an object detection model, comprising:

the device comprises a first construction unit, a second construction unit and a third construction unit, wherein the first construction unit is used for constructing a feature extraction network, the feature extraction network is used for performing feature extraction on an input picture to obtain multilayer feature maps with different sizes, and the multilayer feature maps comprise a first feature map and a second feature map;

a second constructing unit, configured to construct an object detection network, where the object detection network includes multiple network layers corresponding to the multiple layers of feature maps, and the multiple network layers include a first network layer and a second network layer;

20. An object detection device comprising:

the image detection device comprises an input unit, a detection unit and a processing unit, wherein the input unit is used for inputting an image to be detected into a target detection model, and the target detection model comprises a feature extraction network and a target detection network;

the characteristic extraction unit is used for extracting the characteristics of the picture to be detected by adopting the characteristic extraction network to obtain multilayer characteristic diagrams with different sizes, and the multilayer characteristic diagrams comprise a first characteristic diagram and a second characteristic diagram; and

the target detection unit is used for outputting a detection result of the multilayer characteristic diagram by adopting the target detection network, the target detection network comprises a plurality of network layers corresponding to the multilayer characteristic diagram, and the plurality of network layers comprise a first network layer and a second network layer; wherein,

the first network layer corresponds to the first feature diagram and is used for carrying out query operation on the first feature diagram and transmitting an obtained query result to the second network layer, wherein the query result comprises a query point of a specific target in the first feature diagram;

21. A computing device comprising a memory, and one or more processors communicatively connected with the memory;

the memory has stored therein instructions executable by the one or more processors to cause the one or more processors to implement a method according to any one of claims 1 to 18.

22. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the method of any one of claims 1-18.

23. A vehicle comprising the computing device of claim 21.