WO2020258077A1

WO2020258077A1 - Pedestrian detection method and device

Info

Publication number: WO2020258077A1
Application number: PCT/CN2019/093024
Authority: WO
Inventors: 李国法; 杨一帆; 陈耀昱; 谢恒�; 李盛龙; 赖伟鉴; 李晓航; 朱方平; 颜伟荃
Original assignee: Shenzhen University
Current assignee: Shenzhen University
Priority date: 2019-06-26
Filing date: 2019-06-26
Publication date: 2020-12-30
Anticipated expiration: 2021-12-26

Abstract

The present application is applicable for the technical field of computer applications, and provides a pedestrian detection method and device. The method comprises: obtaining an image to be detected in real time; inputting said image to a pedestrian detection model obtained by pre-training, and identifying pedestrian data comprised in said image; and performing non-maximum suppression processing on the pedestrian data and determining a portrait frame corresponding to the pedestrian data in said image. In this embodiment, the pedestrian detection model is obtained by training on the basis of a depthwise separable convolution mode, the obtained image to be detected is identified according to the pedestrian detection model, and the portrait frame corresponding to the pedestrian comprised in said image is determined, which not only improves the efficiency of portrait detection, so that a user can determine a corresponding processing mode according to the detected portrait frame at the first time, but also improves the accuracy of portrait detection, ensuring that the current pedestrian situation can be clearly detected even in the low-visibility environment such as haze.

Description

Pedestrian detection method and device

Technical field

本申请属于计算机应用技术领域，尤其涉及一种行人检测方法及装置。This application belongs to the field of computer application technology, and in particular relates to a pedestrian detection method and device.

Background technique

步行是交通出行的基本方式之一，据调查显示，在欧洲区域，每年有超过7000名行人死亡，占所有死亡人数的百分之二十七，因此，有效检测各种环境下的行人，将显著提高自动驾驶车辆的行驶安全性。然而，由于行人姿态、定位、服装和天气条件的多样性和复杂性，行人检测问题依然存在。Walking is one of the basic modes of transportation. According to surveys, in Europe, more than 7,000 pedestrians die each year, accounting for 27% of all deaths. Therefore, effective detection of pedestrians in various environments will Significantly improve the driving safety of autonomous vehicles. However, due to the diversity and complexity of pedestrian postures, positioning, clothing and weather conditions, pedestrian detection problems still exist.

现有技术中大多数的检测模型只在光照充足的条件下进行了测试，一般而言，它们并没有能力在光照不足的情况下检测行人，比如在雾天，因为恶劣天气的使能见度降低，色彩反射不足，并造成行人轮廓和外观模糊，很难将其与背景区分开来。因此，现有技术中在环境较模糊的情况下，对行人进行检测时很难将其与背景区分开来，而造成行人检测结果不精确的问题。Most of the detection models in the prior art are only tested under sufficient light conditions. Generally speaking, they are not capable of detecting pedestrians under insufficient light conditions, such as foggy days, because the visibility is reduced due to bad weather. Insufficient color reflection and blurring of pedestrian outline and appearance make it difficult to distinguish it from the background. Therefore, in the prior art, when the environment is blurry, it is difficult to distinguish pedestrians from the background when detecting pedestrians, which causes the problem of inaccurate pedestrian detection results.

technical problem

有鉴于此，本申请实施例提供了行人检测方法及装置，以解决现有技术中行人检测结果不精确的问题。In view of this, the embodiments of the present application provide a pedestrian detection method and device to solve the problem of inaccurate pedestrian detection results in the prior art.

Technical solutions

本申请实施例的第一方面提供了一种行人检测方法，包括：The first aspect of the embodiments of the present application provides a pedestrian detection method, including:

实时获取待检测图像；Obtain the image to be inspected in real time;

将所述待检测图像输入预先训练得到的行人检测模型，识别所述待检测图像中包含的行人数据；所述行人检测模型根据预设的深度可分离卷积方式训练得到；Inputting the image to be detected into a pre-trained pedestrian detection model to identify pedestrian data contained in the image to be detected; the pedestrian detection model is trained according to a preset depth separable convolution method;

对所述行人数据进行非极大值抑制处理，确定所述待检测图像中所述行人数据对应的人像边框。Non-maximum value suppression processing is performed on the pedestrian data, and a portrait frame corresponding to the pedestrian data in the image to be detected is determined.

本申请实施例的第二方面提供了一种行人检测装置，包括：A second aspect of the embodiments of the present application provides a pedestrian detection device, including:

获取单元，用于实时获取待检测图像；The acquiring unit is used to acquire the image to be detected in real time;

识别单元，用于将所述待检测图像输入预先训练得到的行人检测模型，识别所述待检测图像中包含的行人数据；所述行人检测模型根据预设的深度可分离卷积方式训练得到；A recognition unit, configured to input the image to be detected into a pre-trained pedestrian detection model to identify pedestrian data contained in the image to be detected; the pedestrian detection model is trained according to a preset depth separable convolution method;

确定单元，用于对所述行人数据进行非极大值抑制处理，确定所述待检测图像中所述行人数据对应的人像边框。The determining unit is configured to perform non-maximum value suppression processing on the pedestrian data, and determine a portrait frame corresponding to the pedestrian data in the image to be detected.

本申请实施例的第三方面提供了一种行人检测装置，包括：处理器、输入设备、输出设备和存储器，所述处理器、输入设备、输出设备和存储器相互连接，其中，所述存储器用于存储支持装置执行上述方法的计算机程序，所述计算机程序包括程序指令，所述处理器被配置用于调用所述程序指令，执行上述第一方面的方法。The third aspect of the embodiments of the present application provides a pedestrian detection device, including a processor, an input device, an output device, and a memory. The processor, input device, output device, and memory are connected to each other, wherein the memory is A computer program that executes the above method in a storage support device, the computer program includes program instructions, and the processor is configured to invoke the program instructions to execute the method of the above first aspect.

本申请实施例的第四方面提供了一种计算机可读存储介质，所述计算机存储介质存储有计算机程序，所述计算机程序包括程序指令，所述程序指令当被处理器执行时使所述处理器执行上述第一方面的方法。The fourth aspect of the embodiments of the present application provides a computer-readable storage medium, the computer storage medium stores a computer program, and the computer program includes program instructions that, when executed by a processor, cause the processing The device executes the method of the first aspect described above.

Beneficial effect

本申请实施例与现有技术相比存在的有益效果是：实时获取待检测图像；将所述待检测图像输入预先训练得到的行人检测模型，识别所述待检测图像中包含的行人数据；对所述行人数据进行非极大值抑制处理，确定所述待检测图像中所述行人数据对应的人像边框。本实施例中通过基于深度可分离卷积方式训练得到行人检测模型，根据行人检测模型对获取到的待检测图像进行识别，确定其中包含的行人对应的人像边框，不仅提高了人像检测的效率，使得用户可以第一时间根据检测到的人像边框确定对应的处理方式，也提高了人像检测的精确度，保证了在雾霾等能见度较低的环境下也能清楚检测出当前的行人情况。Compared with the prior art, the embodiments of the present application have the following beneficial effects: acquiring the image to be detected in real time; inputting the image to be detected into a pre-trained pedestrian detection model, and identifying the pedestrian data contained in the image to be detected; The pedestrian data is subjected to non-maximum value suppression processing to determine the portrait frame corresponding to the pedestrian data in the image to be detected. In this embodiment, the pedestrian detection model is obtained by training based on the depth separable convolution method, the acquired image to be detected is recognized according to the pedestrian detection model, and the portrait frame corresponding to the pedestrian contained therein is determined, which not only improves the efficiency of portrait detection, but also This allows the user to determine the corresponding processing method based on the detected portrait frame in the first time, and also improves the accuracy of portrait detection, ensuring that the current pedestrian situation can be clearly detected in the environment with low visibility such as haze.

Description of the drawings

为了更清楚地说明本申请实施例中的技术方案，下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍，显而易见地，下面描述中的附图仅仅是本申请的一些实施例，对于本领域普通技术人员来讲，在不付出创造性劳动性的前提下，还可以根据这些附图获得其他的附图。In order to more clearly illustrate the technical solutions in the embodiments of the present application, the following will briefly introduce the accompanying drawings that need to be used in the description of the embodiments or the prior art. Obviously, the accompanying drawings in the following description are only of the present application. For some embodiments, for those of ordinary skill in the art, other drawings can be obtained from these drawings without creative labor.

图1是本申请实施例一提供的行人检测方法的流程图；FIG. 1 is a flowchart of a pedestrian detection method provided in Embodiment 1 of the present application;

图2是本申请实施例二提供的行人检测方法的流程图；FIG. 2 is a flowchart of a pedestrian detection method provided in Embodiment 2 of the present application;

图3是本申请实施例二提供的图像增强技术的应用示例；Fig. 3 is an application example of the image enhancement technology provided in the second embodiment of the present application;

图4是本申请实施例二提供的训练模型的训练过程和应用过程示意图；4 is a schematic diagram of the training process and application process of the training model provided in Embodiment 2 of the present application;

图5是本申请实施例二提供的行人检测方法的结构示意图；FIG. 5 is a schematic structural diagram of a pedestrian detection method provided in Embodiment 2 of the present application;

图6是本申请实施例二提供的标准卷积与深度可分离卷积的对比示意图；6 is a schematic diagram of comparison between standard convolution and depth separable convolution provided in the second embodiment of the present application;

图7是本申请实施例二提供的行人检测方法中瓶颈层结构示意图；7 is a schematic diagram of the structure of the bottleneck layer in the pedestrian detection method provided in the second embodiment of the present application;

图8是本申请实施例二提供的权重连结层的结构示意图；FIG. 8 is a schematic structural diagram of a weight connection layer provided in Embodiment 2 of the present application;

图9是本申请实施例二提供的压缩激励机制的示意图；FIG. 9 is a schematic diagram of the compression incentive mechanism provided in the second embodiment of the present application;

图10是本申请实施例二提供的历史图像的先验框的分布图；FIG. 10 is a distribution diagram of a priori frames of historical images provided in Embodiment 2 of the present application;

图11是本申请实施例二提供的MNPB-YOLO标签示意图；FIG. 11 is a schematic diagram of the MNPB-YOLO label provided in Embodiment 2 of the present application;

图12是本申请实施例二提供的检测结果示例；FIG. 12 is an example of the detection result provided in Embodiment 2 of the present application;

图13是本申请实施例三提供的行人检测装置的示意图；FIG. 13 is a schematic diagram of a pedestrian detection device provided in Embodiment 3 of the present application;

图14是本申请实施例四提供的行人检测装置的示意图。FIG. 14 is a schematic diagram of a pedestrian detection device provided in Embodiment 4 of the present application.

本发明的具体实施方式Specific embodiments of the invention

以下描述中，为了说明而不是为了限定，提出了诸如特定系统结构、技术之类的具体细节，以便透彻理解本申请实施例。然而，本领域的技术人员应当清楚，在没有这些具体细节的其它实施例中也可以实现本申请。在其它情况中，省略对众所周知的系统、装置、电路以及方法的详细说明，以免不必要的细节妨碍本申请的描述。In the following description, for the purpose of illustration rather than limitation, specific details such as a specific system structure and technology are proposed for a thorough understanding of the embodiments of the present application. However, it should be clear to those skilled in the art that the present application can also be implemented in other embodiments without these specific details. In other cases, detailed descriptions of well-known systems, devices, circuits, and methods are omitted to avoid unnecessary details from obstructing the description of this application.

为了说明本申请所述的技术方案，下面通过具体实施例来进行说明。In order to illustrate the technical solutions described in the present application, specific embodiments are used for description below.

实施例1：Example 1:

参见图1，图1是本申请实施例一提供的一种行人检测方法的流程图。本实施例中行人检测方法的执行主体为具有行人检测功能的装置，包括但不限于计算机、服务器、平板电脑或者终端等装置。如图所示的行人检测方法可以包括以下步骤：Refer to FIG. 1, which is a flowchart of a pedestrian detection method according to Embodiment 1 of the present application. The execution subject of the pedestrian detection method in this embodiment is a device with a pedestrian detection function, including but not limited to devices such as computers, servers, tablets, or terminals. The pedestrian detection method shown in the figure may include the following steps:

S101：实时获取待检测图像。S101: Acquire images to be detected in real time.

步行是交通出行的基本方式之一，据调查显示，在欧洲区域，每年有超过7000名行人死亡，占所有死亡人数的百分之二十七，因此，有效检测各种环境下的行人，将显著提高自动驾驶车辆的行驶安全性。然而，由于行人姿态、定位、服装和天气条件的多样性和复杂性，行人检测问题依然存在。随着计算机性能的不断提高，基于深度体系结构的检测方法也得到了广泛的应用。然而，大多数的检测模型只在光照充足的条件下进行了测试，一般而言，它们并没有能力在光照不足的情况下检测行人，比如在雾天。据调查显示，雾霾天气下的交通事故发生率要远高于晴天条件下，因为在雾霾天气下，人眼能感受到的视野范围变小，使得司机难以看清路牌和行人，从而增加了交通事故发生的概率。一般来说，雾霾程度越严重，事故发生率越高。然而，在雾霾中检测行人是一项尤其挑战性的任务，因为恶劣天气的使能见度降低，色彩反射不足，并造成行人轮廓和外观模糊，很难将其与背景区分开来。这些算法主要应用于大气光均匀的白天场景。然而，更为恶劣的雾霾情况通常发生在昏暗的光线下，这使得现有的去雾算法无法适应于更为恶劣的雾霾场景。Walking is one of the basic modes of transportation. According to surveys, in Europe, more than 7,000 pedestrians die each year, accounting for 27% of all deaths. Therefore, effective detection of pedestrians in various environments will Significantly improve the driving safety of autonomous vehicles. However, due to the diversity and complexity of pedestrian postures, positioning, clothing and weather conditions, pedestrian detection problems still exist. With the continuous improvement of computer performance, detection methods based on deep architecture have also been widely used. However, most detection models are only tested under sufficient light conditions. Generally speaking, they are not capable of detecting pedestrians under insufficient light conditions, such as foggy days. According to the survey, the incidence of traffic accidents in haze weather is much higher than in sunny conditions, because in haze weather, the field of vision that the human eye can feel becomes smaller, making it difficult for drivers to see road signs and pedestrians, which increases The probability of a traffic accident. Generally speaking, the more severe the haze, the higher the accident rate. However, detecting pedestrians in fog and haze is a particularly challenging task, because bad weather reduces visibility, insufficient color reflection, and blurs the outline and appearance of pedestrians, making it difficult to distinguish them from the background. These algorithms are mainly used in daytime scenes with uniform atmospheric light. However, more severe haze conditions usually occur in dim light, which makes the existing defogging algorithms unable to adapt to more severe haze scenes.

本实施例中在检测行人过程中，先实时获取待检测图像。本实施例中的待检测图像可以是单幅图像的形式，也可以是一段视频的形式，当获取到视频之后，可以对视频进行预设周期内的采样处理，得到单帧的图像画面作为待检测图像。同时，本实施例中的待检测图像可以是彩色图像、黑白图像或者红外图像等，此处不做限定。In this embodiment, in the process of detecting pedestrians, the image to be detected is first acquired in real time. The image to be detected in this embodiment can be in the form of a single image or a piece of video. After the video is obtained, the video can be sampled within a preset period to obtain a single frame of image as the Check the image. At the same time, the image to be detected in this embodiment may be a color image, a black and white image, an infrared image, etc., which is not limited here.

示例性地，在车辆行进过程中，可以通过安装在车辆前端的摄像装置，例如行车记录仪，来实时拍摄图像或者视频，以将该图像作为待检测图像，对视频进行采样处理，得到单帧的图像画面作为待检测图像。Exemplarily, during the traveling of the vehicle, an image or video can be captured in real time through a camera device installed at the front end of the vehicle, such as a driving recorder, to use the image as the image to be detected, and the video is sampled to obtain a single frame The image screen as the image to be detected.

S102：将所述待检测图像输入预先训练得到的行人检测模型，识别所述待检测图像中包含的行人数据；所述行人检测模型根据预设的深度可分离卷积方式训练得到。S102: Input the image to be detected into a pre-trained pedestrian detection model, and identify pedestrian data included in the image to be detected; the pedestrian detection model is trained according to a preset depth separable convolution method.

本实施例中预先训练有行人检测模型，本实施例利用深度可分离卷积和线性瓶颈层技术降低了计算量和参数个数，提高了网络的运行效率。此外，我们还创新性地将多尺度特征融合与压缩激励机制相结合，提出了一种新的特征融合方法——权重连接层。利用上述方法，我们提出了一种高效的在雾霾天气下的行人检测方法MNPB-YOLO行人检测模型。In this embodiment, a pedestrian detection model is pre-trained, and this embodiment uses deep separable convolution and linear bottleneck layer technology to reduce the amount of calculation and the number of parameters, and improve the operating efficiency of the network. In addition, we innovatively combine multi-scale feature fusion with compression incentive mechanism, and propose a new feature fusion method—weight connection layer. Using the above method, we propose an efficient pedestrian detection method MNPB-YOLO pedestrian detection model in haze weather.

在训练得到行人检测模型之后，将获取到的待检测图像输入该行人检测模型，识别得到待检测图像中包含的行人数据。本实施例中的行人数据可以包括行人在待处理图像中的位置、对应的像素点的位置，以及通过行人检测模型找出了一堆行人对应的方框等，此处不做限定。After the pedestrian detection model is trained, the acquired image to be detected is input into the pedestrian detection model, and the pedestrian data contained in the image to be detected is identified. The pedestrian data in this embodiment may include the position of the pedestrian in the image to be processed, the position of the corresponding pixel, and the pedestrian detection model to find a bunch of boxes corresponding to the pedestrian, etc., which are not limited here.

S103：对所述行人数据进行非极大值抑制处理，确定所述待检测图像中所述行人数据对应的人像边框。S103: Perform non-maximum suppression processing on the pedestrian data, and determine a portrait frame corresponding to the pedestrian data in the image to be detected.

在通过行人检测模型得到待处理图像中的行人数据之后，对行人数据进行非极大值抑制处理，以确定待检测图像中行人数据对应的人像边框。本实施例中通过非极大值抑制抑制不是极大值的元素，例如，通过局部最大搜索的方式，确定图像中对应区域的边缘。这个局部用于表示的是一个邻域，邻域有两个参数可变，一是邻域的维数，二是邻域的大小。在本实施例的行人检测中，通过滑动窗口经提取特征，经分类器分类识别后，每个窗口都会得到一个分数。但是滑动窗口会导致很多窗口与其他窗口存在包含或者大部分交叉的情况。这时就需要用到非极大值抑制处理来选取那些邻域里分数最高，即是行人的概率最大，不是行人的概率较小，并且抑制那些分数低的窗口。After the pedestrian data in the image to be processed is obtained through the pedestrian detection model, non-maximum suppression processing is performed on the pedestrian data to determine the portrait frame corresponding to the pedestrian data in the image to be detected. In this embodiment, non-maximum value suppression is used to suppress elements that are not maximum value. For example, the edge of the corresponding area in the image is determined by means of local maximum search. This part is used to represent a neighborhood, and the neighborhood has two variable parameters, one is the dimension of the neighborhood, and the other is the size of the neighborhood. In the pedestrian detection in this embodiment, the feature is extracted through the sliding window, and after classification and recognition by the classifier, each window will get a score. However, sliding windows will cause many windows to contain or mostly overlap with other windows. At this time, it is necessary to use non-maximum value suppression processing to select those neighborhoods with the highest scores, that is, the probability of pedestrians is the largest, and the probability of not being pedestrians is small, and the windows with low scores are suppressed.

具体的，对于识别出的待检测图像中各个行人数据对应的边界框，我们需要判别哪些矩形框是没用的。本实施例中假设每个行人对应的边界框的集合为B，选择边界框最大的为检测框M，将其从B集合中移除并加入到最终的检测结果中，计算B中每个剩余的边界框与检测框M之间的重叠度(Intersection over Union，IOU)，将IOU大于或者等于预设的重叠度阈值的边界框框从B中移除，重复这个过程，直到B为空，最后将被保留下来的边界框作为待检测图像中行人数据对应的人像边框。Specifically, for the identified bounding boxes corresponding to each pedestrian data in the image to be detected, we need to determine which rectangular boxes are useless. In this embodiment, it is assumed that the set of bounding boxes corresponding to each pedestrian is B, and the detection frame M with the largest bounding box is selected, which is removed from the set B and added to the final detection result, and each remaining set in B is calculated Intersection over Union (IOU) between the bounding box and the detection frame M, remove the bounding box with IOU greater than or equal to the preset overlap threshold from B, repeat this process until B is empty, and finally The retained bounding box is used as the portrait bounding box corresponding to the pedestrian data in the image to be detected.

示例性地，在待检测图像定位一个行人时，通过行人检测模型找出了一堆的方框，我们需要判别哪些矩形框是没用的。本实施例中通过非极大值抑制的方法是：先假设有预设数量的矩形框，即边界框，根据分类器的类别分类概率做排序。假设矩形框的数量为6个，从小到大属于行人的概率分别为A、B、C、D、E、F。从最大概率矩形框F开始，分别判断A～E与F的IOU是否大于某个设定的重叠度阈值，假设B、D与F的重叠度超过重叠度阈值，那么就扔掉B、D；并标记第一个矩形框F，是我们保留下来的。从剩下的矩形框A、C、E中，选择概率最大的E，然后判断E与A、C的重叠度，重叠度大于一定的重叠度阈值，那么就扔掉；并标记E是我们保留下来的第二个矩形框，就这样一直重复，找到所有被保留下来的矩形框，作为待检测图像中所述行人数据对应的人像边框。Exemplarily, when a pedestrian is located in the image to be detected, a bunch of boxes are found through the pedestrian detection model, and we need to determine which rectangular boxes are useless. The method of suppressing non-maximum values in this embodiment is as follows: first assume that there are a preset number of rectangular boxes, that is, bounding boxes, and sort according to the classification probability of the classifier. Assuming that the number of rectangular boxes is 6, the probabilities of pedestrians from small to large are A, B, C, D, E, and F respectively. Starting from the maximum probability rectangle F, judge whether the IOUs of A to E and F are greater than a set overlap threshold. If the overlap of B, D and F exceeds the overlap threshold, then discard B and D; And mark the first rectangular frame F, which we kept. From the remaining rectangular boxes A, C, and E, select the E with the highest probability, and then judge the overlap between E and A, C. If the overlap is greater than a certain overlap threshold, then throw it away; and mark E as we keep The second rectangular frame that comes down is repeated in this way, finding all the remaining rectangular frames as the portrait frame corresponding to the pedestrian data in the image to be detected.

上述方案，通过实时获取待检测图像；将所述待检测图像输入预先训练得到的行人检测模型，识别所述待检测图像中包含的行人数据；所述行人检测模型根据预设的深度可分离卷积方式训练得到；对所述行人数据进行非极大值抑制处理，确定所述待检测图像中所述行人数据对应的人像边框。本实施例中通过基于深度可分离卷积方式训练得到行人检测模型，根据行人检测模型对获取到的待检测图像进行识别，确定其中包含的行人对应的人像边框，不仅提高了人像检测的效率，使得用户可以第一时间根据检测到的人像边框确定对应的处理方式，也提高了人像检测的精确度，保证了在雾霾等能见度较低的环境下也能清楚检测出当前的行人情况。In the above solution, the image to be detected is acquired in real time; the image to be detected is input into a pre-trained pedestrian detection model to identify the pedestrian data contained in the image to be detected; the pedestrian detection model can be separated according to a preset depth Obtained by training in a product manner; non-maximum suppression processing is performed on the pedestrian data, and a portrait frame corresponding to the pedestrian data in the image to be detected is determined. In this embodiment, the pedestrian detection model is obtained by training based on the depth separable convolution method, the acquired image to be detected is recognized according to the pedestrian detection model, and the portrait frame corresponding to the pedestrian contained therein is determined, which not only improves the efficiency of portrait detection, but also This allows the user to determine the corresponding processing method based on the detected portrait frame in the first time, and also improves the accuracy of portrait detection, ensuring that the current pedestrian situation can be clearly detected in the environment with low visibility such as haze.

实施例2：Example 2:

参见图2，图2是本申请实施例二提供的一种行人检测方法的流程图。本实施例中行人检测方法的执行主体为具有行人检测功能的装置，包括但不限于计算机、服务器、平板电脑或者终端等装置。如图所示的行人检测方法可以包括以下步骤：Refer to FIG. 2, which is a flowchart of a pedestrian detection method according to Embodiment 2 of the present application. The execution subject of the pedestrian detection method in this embodiment is a device with a pedestrian detection function, including but not limited to devices such as computers, servers, tablets, or terminals. The pedestrian detection method shown in the figure may include the following steps:

S201：实时获取待检测图像。S201: Acquire an image to be detected in real time.

本实施例中在检测行人过程中，先实时获取待检测图像。本实施例中的待检测图像可以是单幅图像的形式，也可以是一段视频的形式，当获取到视频之后，可以对视频进行预设周期内的采样处理，得到单帧的图像画面作为待检测图像。同时，本实施例中的待检测图像可以是彩色图像、黑白图像或者红外图像等，此处不做限定。示例性地，在车辆行进过程中，可以通过安装在车辆前端的摄像装置，例如行车记录仪，来实时拍摄图像或者视频，以将该图像作为待检测图像，对视频进行采样处理，得到单帧的图像画面作为待检测图像。In this embodiment, in the process of detecting pedestrians, the image to be detected is first acquired in real time. The image to be detected in this embodiment can be in the form of a single image or a piece of video. After the video is obtained, the video can be sampled within a preset period to obtain a single frame of image as the Check the image. At the same time, the image to be detected in this embodiment may be a color image, a black and white image, an infrared image, etc., which is not limited here. Exemplarily, during the traveling of the vehicle, an image or video can be captured in real time through a camera device installed at the front end of the vehicle, such as a driving recorder, to use the image as the image to be detected, and the video is sampled to obtain a single frame The image screen as the image to be detected.

S202：获取包含行人的历史图像。S202: Acquire historical images containing pedestrians.

本实施例中在对待检测图像进行识别时，是根据预先训练好的行人检测模型进行识别。因此，在识别获取到的待检测图像之前，先获取包含行人的历史图像，以根据历史图像训练得到行人检测模型。In this embodiment, when the image to be detected is recognized, the recognition is performed based on a pre-trained pedestrian detection model. Therefore, before recognizing the acquired image to be detected, a historical image containing a pedestrian is acquired first, so as to train a pedestrian detection model based on the historical image.

本实施例中的历史图像的形式和待检测图像的形式相同，本实施例中的历史图像可以是彩色图像、黑白图像或者红外图像等，此处不做限定。本实施例中的历史图像可以是单幅图像的形式，也可以是一段视频的形式，当获取到视频之后，可以对视频进行预设周期内的采样处理，得到单帧的图像画面作为历史图像。The form of the historical image in this embodiment is the same as the form of the image to be detected. The historical image in this embodiment may be a color image, a black and white image, an infrared image, etc., which are not limited here. The historical image in this embodiment can be in the form of a single image or a piece of video. After the video is obtained, the video can be sampled within a preset period to obtain a single frame of image as the historical image .

S203：根据预设的权重连接层构建训练模型，根据所述历史图像对所述训练模型进行训练，得到所述行人检测模型。S203: Construct a training model according to a preset weight connection layer, and train the training model according to the historical image to obtain the pedestrian detection model.

在获取到历史图像之后，根据预设的权重连接层构建训练模型，并根据历史图像对训练模型进行训练，得到行人检测模型来检测时待检测图像中的行人。需要说明的是，为了区别不同处理方式对应的训练模型，本实施例中的训练模型包括第一训练模型和第二训练模型，其中，第一训练模型用于表示对历史图像进行扩充之后得到的训练模型，第二训练模型用于表示通过步骤S2031～S2034训练得到的S2031～S2034，两个训练模型可以单独实现训练和图像识别，也可以将两个训练模型结合起来进行训练和图像识别。After acquiring the historical image, a training model is constructed according to the preset weight connection layer, and the training model is trained according to the historical image, and a pedestrian detection model is obtained to detect the pedestrian in the image to be detected. It should be noted that, in order to distinguish the training models corresponding to different processing methods, the training model in this embodiment includes a first training model and a second training model, where the first training model is used to represent the result obtained after the historical image is expanded Training model, the second training model is used to represent S2031～S2034 trained through steps S2031～S2034. The two training models can realize training and image recognition separately, or they can be combined for training and image recognition.

进一步的，步骤S203可以具体包括：Further, step S203 may specifically include:

对所述历史图像进行图像增强处理，得到所述历史图像对应的至少两个扩充图像；Performing image enhancement processing on the historical image to obtain at least two expanded images corresponding to the historical image;

根据预设的权重连接层构建第一训练模型，根据所述历史图像及其对应的所述扩充图像对所述第一训练模型进行训练，得到所述行人检测模型。A first training model is constructed according to a preset weight connection layer, and the first training model is trained according to the historical image and the corresponding extended image to obtain the pedestrian detection model.

具体的，请一并参阅图3所示，图3展示了一些图像增强技术的应用示例，其中，(a)～(f)分别为历史图像的原图、随机翻转之后的图像、随机对比度改变之后的图像、随机裁剪之后的图像、随机颜色改变之后的图像以及随机仿射变换之后的图像。为了让模型具有更好的泛化性能，我们在训练时增加了图像增强技术，通过随机的裁剪、翻转，颜色改变、仿射变换和高斯噪声等操作，来扩充原有的数据集，以保证在具有强大的数据集基础的条件下，增加模型训练的次数，提高模型训练的精度。根据预设的权重连接层构建第一训练模型，根据历史图像及其对应的扩充图像对第一训练模型进行训练，得到行人检测模型。Specifically, please refer to Figure 3 together. Figure 3 shows some application examples of image enhancement technology, where (a) ~ (f) are the original image of the historical image, the image after random flip, and the random contrast change. The image after, the image after random cropping, the image after random color change, and the image after random affine transformation. In order to make the model have better generalization performance, we added image enhancement technology during training, through random cropping, flipping, color change, affine transformation, Gaussian noise and other operations to expand the original data set to ensure With a strong data set foundation, increase the number of model training and improve the accuracy of model training. The first training model is constructed according to the preset weight connection layer, and the first training model is trained according to the historical image and its corresponding expanded image to obtain a pedestrian detection model.

请一并参与图4，图4为本实施例提供的训练模型的训练过程和应用过程示意图，其中，在进行训练时，先获取红绿蓝(Red Green Blue，RGB)形式的历史图像，即RGB图像，再对RGB图像进行图像扩充，例如，进行图像裁剪、图像增强等处理，再通过预先涉及的算法架构，即MNPB-YOLO行人检测方法对历史图像及其扩充之后的图像进行识别，并根据预设的真值标签确定方式计算识别结果与原图之间的损失函数，最后通过损失函数来对MNPB-YOLO行人检测方法中的模型参数进行更新，最后得到训练模型中固定的参数和权重，得到高效、精确的训练模型。Please also participate in Figure 4, which is a schematic diagram of the training process and application process of the training model provided in this embodiment. In the training, the historical image in the form of Red Green Blue (RGB) is first obtained, namely RGB image, and then perform image expansion on the RGB image, for example, image cropping, image enhancement, etc., and then through the pre-involved algorithm architecture, namely the MNPB-YOLO pedestrian detection method to recognize the historical image and the expanded image, and Calculate the loss function between the recognition result and the original image according to the preset truth label determination method, and finally use the loss function to update the model parameters in the MNPB-YOLO pedestrian detection method, and finally obtain the fixed parameters and weights in the training model , Get an efficient and accurate training model.

进一步的，步骤S203可以具体包括步骤S2031～S2034：Further, step S203 may specifically include steps S2031 to S2034:

S2031：根据预设的权重连接层，基于预设的深度可分离卷积方式以及预设的线性瓶颈层技术，构建第二训练模型。S2031: Construct a second training model based on a preset weight connection layer, a preset depth separable convolution method and a preset linear bottleneck layer technology.

请一并参与图5，图5为本实施例提供的行人检测方法的结构示意图，其中，图5中的各个数字之间的相乘计算公式用于表示在当前参与计算的数据量。为了有效的实现雾霾天气下的行人检测，本实施例提出了一新的基于YOLO的深度学习方法，其中包括了基础卷积模型部分、权重连接层、检测模块和分类模块。利用深度可分离卷积和线性瓶颈层技术构成基础卷积模型，降低了计算量和参数个数，提高了网络的运行效率。此外，我们还将空间到深度变换的多尺度特征融合与压缩激励机制相结合，提出了一种新的特征融合方法——权重连接层，最后通过检测模块和分类模块完成实现待处理图像的中的行人检测。Please also participate in FIG. 5. FIG. 5 is a schematic diagram of the structure of the pedestrian detection method provided by this embodiment, where the multiplication calculation formula between the numbers in FIG. 5 is used to indicate the amount of data currently participating in the calculation. In order to effectively realize pedestrian detection in haze weather, this embodiment proposes a new YOLO-based deep learning method, which includes a basic convolution model part, a weight connection layer, a detection module, and a classification module. Using deep separable convolution and linear bottleneck layer technology to form a basic convolution model, the amount of calculation and the number of parameters are reduced, and the operating efficiency of the network is improved. In addition, we also combine the multi-scale feature fusion of space-to-depth transformation with the compression incentive mechanism, and propose a new feature fusion method-weight connection layer. Finally, the detection module and the classification module are used to complete the middle of the image to be processed. Pedestrian detection.

利用上述方法，我们提出了一种高效的在雾霾天气下的行人检测方法——MNPB-YOLO，请一并参与图6，图6为本实施例提供的标准卷积与深度可分离卷积的对比示意图，其中，H，W用于表示卷积核的高和宽，M用于表示输入特征图的通道数或卷积核的通道数，N用于表示卷积核个数。深度可分离卷积与普通卷积的效果差别主要是其可以有效的减少网络参数量和计算量，为让MNPB-YOLO可以更高速地运行在一般的处理器上，我们采用深度可分离卷积用来搭建整体的MNPB-YOLO模型。其卷积方式和普通卷积的对比如图6所示，通过图6可以看到，深度可分离卷积由两部分组成，纵向depthwise卷积和逐点pointwise卷积，depthwise卷积在不同通道上分别对特征图进行卷积，然后pointwise卷积在对特征图的所有通道进行卷积。Using the above method, we propose an efficient pedestrian detection method under haze weather-MNPB-YOLO. Please also participate in Figure 6. Figure 6 shows the standard convolution and depth separable convolution provided by this embodiment. The comparison diagram of, where H, W are used to represent the height and width of the convolution kernel, M is used to represent the number of channels of the input feature map or the number of channels of the convolution kernel, and N is used to represent the number of convolution kernels. The difference between the effect of deep separable convolution and ordinary convolution is that it can effectively reduce the amount of network parameters and calculations. In order to allow MNPB-YOLO to run on a general processor at a higher speed, we use deep separable convolution Used to build the overall MNPB-YOLO model. The comparison between its convolution method and ordinary convolution is shown in Figure 6. It can be seen from Figure 6 that the depthwise separable convolution consists of two parts, vertical depthwise convolution and pointwise convolution, depthwise convolution in different channels The feature map is convolved above, and then the pointwise convolution is convolved on all channels of the feature map.

请一并参与图7，图7为本实施例提供的行人检测方法中瓶颈层结构示意图。深度可分离卷积搭配Relu激活函数会造成一定的信息损失，为了减少这种信息损失，需要用到瓶颈层的技术。具体的，瓶颈层技术在本方案的实施过程如表1所示，图像先经过一个3×3卷积输出一个特征图，该特征图就作为瓶颈层结构中的Input，即图6中所示的根据卷积步幅为1或2采用不同的方法再进行卷积，当卷积步幅为1时，先通过1×1卷积，激活函数为Relu6进行升维，即图7中的Conv 1×1，Relu6；升维的倍数体现在表1中的“扩展系数”一栏，通过此步骤可以将信息更广阔的散布在特征图中，以防止depthwise卷积(激活函数为Relu6时)造成的信息损失)；将升维后的特征图用3×3 Depthwise卷积进行降维处理，即图7中的Dwise3×3，Relu6；然后通过1×1普通卷积、激活函数为线性函数来融合不同通道的信息，即图7中的Conv 1×1，Linear；之后将输入与此部分输出做元素相加element-wise，即图7中的ADD；最后将此特征图输出到下一层，作为下一层的输入。当卷积步幅为2时，即图7中的Stride＝2，与卷积步幅1的区别只在于没有element-wise相加的操作，这是因为卷积步幅为2时，主要目的是对特征图高宽进行下采样，高宽缩减一倍，缩减后的特征图无法与原特征图做element-wise相加操作。Please also participate in FIG. 7, which is a schematic diagram of the structure of the bottleneck layer in the pedestrian detection method provided by this embodiment. Depth separable convolution and Relu activation function will cause a certain amount of information loss. In order to reduce this information loss, a bottleneck layer technology is needed. Specifically, the implementation process of the bottleneck layer technology in this solution is shown in Table 1. The image first undergoes a 3×3 convolution to output a feature map, which is used as the Input in the bottleneck layer structure, as shown in Figure 6. According to the convolution step size of 1 or 2, different methods are used to perform convolution. When the convolution step size is 1, the 1×1 convolution is performed first, and the activation function is Relu6 to increase the dimension, that is, Conv in Figure 7 1×1, Relu6; the multiple of the ascending dimension is reflected in the "expansion coefficient" column in Table 1. Through this step, the information can be spread more widely in the feature map to prevent depthwise convolution (when the activation function is Relu6) Information loss caused by); use 3×3 Depthwise convolution to reduce the dimension of the feature map after dimensionality, that is, Dwise3×3, Relu6 in Figure 7; then through 1×1 ordinary convolution, the activation function is a linear function To fuse the information of different channels, that is, Conv 1×1, Linear in Figure 7; then the input and this part of the output are added element-wise, that is, ADD in Figure 7, and finally this feature map is output to the next Layer, as the input of the next layer. When the convolution stride is 2, that is, Stride=2 in Figure 7, the difference from convolution stride 1 is that there is no element-wise addition operation. This is because when the convolution stride is 2, the main purpose is It is to down-sample the height and width of the feature map, and the height and width are reduced by one time. The reduced feature map cannot be added element-wise with the original feature map.

行人检测方法中详细的网络结构参数如表1所示，需要说明的是，本实施例中文字和附图中的Conv用于表示代表卷积；“数字×数字×数字”格式的字符均用于表示特征图的高×宽×通道数，之后不再一一赘述。The detailed network structure parameters in the pedestrian detection method are shown in Table 1. It should be noted that Conv in the text and drawings in this embodiment is used to represent convolution; the characters in the format of "number×number×number" are used It represents the height×width×the number of channels of the feature map, which will not be repeated hereafter.

表1：MNPB-YOLO网络配置参数Table 1: MNPB-YOLO network configuration parameters

权重连结层Weighted connection layer

请一并参与图8，图8为本实施例提供的权重连结层的结构示意图。权重连结层能自动筛选来自于不同特征尺度的特征图的重要性，然后过滤掉那些不重要的信息，以此来提升网络的性能。首先，收集来自于不同层数、尺度不同的多个特征图的信息，这些特征图大小不一致，即特征图的高×宽×通道数不同，例如图中的28x28x16、14x14x48或者7x7x320等，不能直接拼接，所以采用分割和拼接的方法，先把不同大小的特征图调整至大小一致，即空间到深度变换，然后将所有调整后的特征图拼接在一起，一同构成多尺度的特征信息，例如图中的7x7x256+7x7x192+7x7x320，最后由压缩激励机制模型筛选出重要的特征。Please also participate in FIG. 8, which is a schematic diagram of the structure of the weight connection layer provided by this embodiment. The weight connection layer can automatically filter the importance of feature maps from different feature scales, and then filter out those unimportant information to improve the performance of the network. First, collect information from multiple feature maps with different layers and different scales. These feature maps are inconsistent in size, that is, the feature maps are different in height × width × number of channels, such as 28x28x16, 14x14x48 or 7x7x320 in the figure, which cannot be directly Splicing, so the method of segmentation and splicing is adopted. The feature maps of different sizes are adjusted to the same size, that is, the space-to-depth transformation, and then all the adjusted feature maps are spliced together to form multi-scale feature information, such as image In the 7x7x256+7x7x192+7x7x320, the important features are finally screened out by the compressed incentive mechanism model.

请一并参与图9，图9为本实施例提供的压缩激励机制的示意图，压缩激励机制实际上是一种通道注意力机制，通过给每个通道赋以不同的权重ω来筛选出重要的特征，权重ω由学习得到，其更新的方向为损失下降的方向。如图所示，先通过1×1卷积进行降维处理，然后通过全局池化获取通道的特征值，之后通过全连接层计算出权重ω，再用权重ω乘以压缩后的特征图，最后得到重新标定后的特征。如图9所示，H，W，C分别用于表示某一特征图的高、宽、通道数，且带上标’也是同样的意思；F _tr用于表示一个1×1卷积的降维操作，目的是通过降维的方式来减少通道数，以减少后面步骤所需要的计算量，提高本实施例中的方法检测行人的效率；F _sq(·)用于表示一个在通道上的压缩操作，在实际中采用全局平均池化代替；F _ex(·,W)代表将形状为1×1×C的特征信息映射成另一个形状为1×1×C的特征信息，即映射后的特征信息用于表示通道的重要性系数，映射方案采用多层感知机；W用于表示多层感知机的权重，此权重的更新方向为损失梯度下降的方向；F _scala(·,·)用于表示将特征图U与映射后的特征信息，即通道重要性系数，做通道上的乘法运算；最后得到重新标定后的特征

X用于表示输入的特征图。在本实施例中，X用于表示多尺度特征信息，

为筛选后的信息(请一并参阅图8)。 Please also participate in Figure 9. Figure 9 is a schematic diagram of the compression incentive mechanism provided by this embodiment. The compression incentive mechanism is actually a channel attention mechanism. The important ones are selected by assigning different weights ω to each channel. Characteristic, the weight ω is obtained by learning, and the direction of its update is the direction of loss reduction. As shown in the figure, the dimensionality reduction process is performed through 1×1 convolution first, and then the feature value of the channel is obtained through global pooling, and then the weight ω is calculated through the fully connected layer, and then the weight ω is multiplied by the compressed feature map. Finally, the re-calibrated features are obtained. As shown in Figure 9, H, W, and C are used to represent the height, width, and number of channels of a feature map, and the superscript'has the same meaning; F _{tr is} used to represent the drop of a 1×1 convolution The purpose of dimensional operation is to reduce the number of channels by way of dimensionality reduction, so as to reduce the amount of calculation required in the following steps, and to improve the efficiency of the method in this embodiment to detect pedestrians; F _sq (·) is used to represent a The compression operation is replaced by global average pooling in practice; F _ex (·,W) represents the feature information of the shape of 1×1×C being mapped to another feature information of the shape of 1×1×C, that is, after mapping The feature information of is used to represent the importance coefficient of the channel, and the mapping scheme uses a multi-layer perceptron; W is used to represent the weight of the multi-layer perceptron, and the update direction of this weight is the direction of the loss gradient; F _scala (·,·) It is used to express the feature map U and the mapped feature information, that is, the channel importance coefficient, and do the multiplication on the channel; finally, the re-calibrated feature is obtained

X is used to represent the input feature map. In this embodiment, X is used to represent multi-scale feature information,

This is the filtered information (please refer to Figure 8 together).

S2032：将所述历史图像输入所述第二训练模型中，检测所述历史图像中的行人图像，并确定每个行人图像对应的真值标签；所述真值标签用于表示识别出的历史图像的先验框基于真值框的变换系数。S2032: Input the historical image into the second training model, detect pedestrian images in the historical image, and determine the truth value label corresponding to each pedestrian image; the truth value label is used to indicate the recognized history The a priori box of the image is based on the transform coefficient of the truth box.

MNPB-YOLO检测思想受YOLO启发，将图像分成N×N的网格，每个网格预测B个检测框，每个检测框由描述检测框位置的参数及物体种类的编码构成。在YOLOv1(第一版本)和YOLOv2(第二版本中)，都采用物体中心位置位于哪个网格内，就由哪个网格内的检测框进行预测，但是在MNPB-YOLO中，我们先计算网格内先验框与真值框的IOU值，然后按IOU值降序对先验框排序，选取前k个先验框来复杂预测物体的大小。The MNPB-YOLO detection idea is inspired by YOLO. It divides the image into N×N grids, each grid predicts B detection frames, and each detection frame is composed of the parameters describing the location of the detection frame and the code of the object type. In YOLOv1 (the first version) and YOLOv2 (in the second version), which grid the center of the object is located in, the detection frame in which grid is used for prediction, but in MNPB-YOLO, we first calculate the grid The IOU values of the prior box and the truth box in the grid are then sorted in descending order of the IOU value, and the first k prior boxes are selected to predict the size of the object in a complex manner.

请一并参与图10，图10为本实施例提供的历史图像的先验框的分布图。我们采用聚类方法先预先获取数据集中行人大小的聚类中心，我们将行人大小分为两类，将聚类中心的参数作为先验框的高宽，然后均匀分布在图像的N×N网格中，黑色的粗框就是两个位于网格中心的先验框，黑色网格线用于表示N×N网格，在这里N取为7。Please also participate in FIG. 10, which is a distribution diagram of a priori frames of historical images provided in this embodiment. We use the clustering method to obtain the cluster centers of the pedestrian size in the data set in advance. We divide the pedestrian size into two categories, use the parameters of the cluster centers as the height and width of the prior frame, and then evenly distribute them on the N×N net In the grid, the thick black boxes are two a priori boxes located at the center of the grid. The black grid lines are used to represent the N×N grid, where N is taken as 7.

请一并参与图11，图11为本实施例提供的MNPB-YOLO标签示意图，在MNPB-YOLO中，每个目标人物对应一个真值标签，由6个参数进行描述，分别用于表示先验框向真值框x方向的偏置和y方向的偏置，先验框向真值框在高、宽上的变换系数，以及用于区分人物及背景的特征向量，以one-hot编码表示。在图11的左图中，细线表示的框为先验框，粗线表示的框为真值框，假设细线表示的框就是负责预测真值框的，那么其标签就是

其中，w _a，h _a用于表示先验框的宽和高，x _a，y _a用于表示先验框距离图像左边缘和上边缘的距离。w _g，h _g用于表示真值框的宽和高，x _g，y _g用于表示真值框距离图像左边缘和上边缘的距离。否则，则其标签为

其中a、b、c、d用于表示此处的值随意设置。 Please also participate in Figure 11. Figure 11 is a schematic diagram of the MNPB-YOLO label provided in this embodiment. In MNPB-YOLO, each target person corresponds to a truth label, which is described by 6 parameters, which are used to represent the prior The offset of the box in the x-direction and the y-direction from the box to the truth box, the transformation coefficients of the height and width of the prior box to the truth box, and the feature vector used to distinguish between people and background, expressed in one-hot encoding . In the left image of Figure 11, the box represented by the thin line is a priori box, and the box represented by the thick line is the truth box. Assuming that the box represented by the thin line is responsible for predicting the truth box, then its label is

Wherein, w _{_a,} h _a priori to represent the width and height of the block, x _{_a,} y _a prior frame for indicating the distance from the image of the left edge and upper edge. w _g , h _{g are} used to represent the width and height of the truth value box, and x _g , y _{g are} used to represent the distance of the truth value box from the left and upper edges of the image. Otherwise, its label is

Among them, a, b, c, d are used to indicate that the value here is set at will.

S2033：根据所述真值标签，确定所述第二训练模型对应的损失函数。S2033: Determine the loss function corresponding to the second training model according to the truth label.

对于检测任务(检测框的回归)，我们采用smooth L1对网络输出及真值标签进行损失评价，对于分类任务(区分行人与背景)，我们采用交叉熵作为损失，而YOLOV1和V2均采用L2作为检测与分类的损失。For the detection task (regression of the detection frame), we use smooth L1 to evaluate the loss of the network output and the truth label. For the classification task (to distinguish pedestrians from the background), we use cross entropy as the loss, while YOLOV1 and V2 both use L2 as the loss Loss of detection and classification.

YOLO损失函数如下所示：The YOLO loss function is as follows:

L _total＝L _center+L _size+L _socre+L _class L _total ＝L _center +L _size +L _socre +L _class

其中，L _center,L _size和L _score用于表示检测框中心、大小及框内有无物体的置信度损失，L _class用于表示分类损失，c _i,j,k,

分别用于表示网络预测的网格I,j处检测框k内有无物体的置信度及其真值，x _i,j,k,y _i,j,k,w _i,j,k,h _i,j,k用于表示网络预测的网格I,j处的检测框k的位置及大小，

用于表示与之对应的真值，p _i,j,k和

分别用于表示网络预测的网格I,j处的检测框k内物体所属类别的概率及其真值，

用来指示网格I,j处的检测框k是否用来负责检测物体，

用来指示网格I,j处的检测框k是否用来负责检测背景。 Among them, L _center , L _size, and L _{score are} used to indicate the confidence loss of the center, size, and whether there is an object in the box, and L _{class is} used to indicate the classification loss, c _i,j,k ,

Respectively used to represent the confidence and true value of whether there is an object in the detection frame k at the grid I and j of the network prediction, x _i,j,k ,y _i,j,k ,wi _,j,k ,h _{i,j,k are} used to represent the position and size of the detection frame k at the grid I,j predicted by the network,

Used to represent the corresponding truth value, p _{i, j, k} and

They are used to represent the probability and its true value of the category of the object in the detection frame k at the grid I, j predicted by the network,

Used to indicate whether the detection frame k at grid I, j is used to detect objects,

It is used to indicate whether the detection frame k at grid I, j is used to detect the background.

而本实施例中的，MNPB-YOLO损失如下所示：In this embodiment, the MNPB-YOLO loss is as follows:

L _total＝λ _classL _class+λ _boxL _box L _total =λ _class L _class +λ _box L _box

其中，class _i,j,k用于表示网络预测的物体种类的one-hot编码，

用于表示物体种类的真值，

用于表示网络预测的先验框向真值框变换的值，

用于表示先验框向真值框变换的真值。λ _class和λ _box是两个常数，用以平衡不同类型的损失，CrossEntropy用于表示交叉熵函数；

用于表示网格I,j处的检测框k是否用来负责检测物体；x用于表示函数的输入。 Among them, class _{i, j, k are} used to represent the one-hot encoding of the object type predicted by the network,

Used to represent the true value of the object type,

The value used to represent the transformation of the prior box of network prediction to the truth box,

It is used to represent the true value of the transformation from the prior box to the truth box. λ _class and λ _box are two constants used to balance different types of losses, and CrossEntropy is used to represent the cross entropy function;

It is used to indicate whether the detection box k at grid I, j is used to detect objects; x is used to indicate the input of the function.

S2034：根据所述损失函数优化所述第二训练模型，得到所述行人检测模型。S2034: Optimize the second training model according to the loss function to obtain the pedestrian detection model.

通过权重连结层筛选出的特征还需要进一步的提取，最后才能输出特定的张量形状，与真值标签做对比，构成损失。我们采用3×3卷积和1×1卷积配合的方式构成我们的检测与分类模块。The features filtered by the weight connection layer need to be further extracted, and finally the specific tensor shape can be output, which is compared with the truth label to constitute a loss. We use 3×3 convolution and 1×1 convolution to form our detection and classification module.

如表2所示，张量最后输出的形状所用于表示的意义是：将图像分成7×7的网格，每个网格预测2个检测框，每个检测框需要6个参数描述，其中包括描述检测框位置及大小的4个参数及区分背景和人物的one-hot编码2个。As shown in Table 2, the meaning of the final output shape of the tensor is: divide the image into 7×7 grids, each grid predicts 2 detection frames, and each detection frame requires 6 parameter descriptions, where Including 4 parameters describing the position and size of the detection frame and 2 one-hot codes for distinguishing background and characters.

不同方法的技术对比，可以看到，本实施例的方法参数量比原始YOLO少了非常多，这意味着本实施例的方法可以更高效的运行。与YOLOv1和YOLOv2相比，MNPB-YOLO在网络结构上进行了较大的改动，采用深度可分离卷积及瓶颈层技术将参数量及计算量大大减少，同时提出的权重连结层可以提升网络性能，这让我们的网络同时具备速度及精度的优势。From the technical comparison of different methods, it can be seen that the method parameter amount of this embodiment is much less than that of the original YOLO, which means that the method of this embodiment can run more efficiently. Compared with YOLOv1 and YOLOv2, MNPB-YOLO has made major changes in the network structure, using deep separable convolution and bottleneck layer technology to greatly reduce the amount of parameters and calculations, and the proposed weight connection layer can improve network performance , Which gives our network the advantages of speed and accuracy at the same time.

表2：分类及检测模块Table 2: Classification and detection modules

不同方法的技术对比如表3所示，其中，YOLOv1的网络结构受GoogleLenet启发，输入大小为224×224×3的图像，输出网格划分为7×7，每个网格预测2个检测框，检测思想为利用卷积神经网络的映射能力直接构建图像到检测框参数的映射过程，检测框由物体中心位置(x,y)及物体高宽(h,w)共4个参数构成，同时加上网格中的某个检测框内是否有物体的置信度c及物体种类的one-hot编码一起构成网络的输出，整个网络相当于是从零开始学习如何标注出物体。事实上，对于一些特定场景的物体，我们有一些先验的知识可以帮助网络更好的辨识出物体，比如目标物体的高宽比信息，对于行人，一般在3:1左右。YOLOv2就将这种先验知识融入进了网络预测当中，这就是我们说的先验框技术，对于YOLOv2，输入大小调整为448×448×3，网格划分为13×13，每个网格预测2个检测框，先验框的大小由训练集中的标签大小通过聚类方法得到，此外，还将网络结构做了很大的改动，提出了darknet这个backbone，参数量是原来的一半左右。YOLOv1及YOLOv2的对比可以在表3中看到，由表3可知，本实施例中MNPB-YOLO的方法在批量标准化、先验框技术、卷积方式、权重连接层以及参数量方面，都有明显的优势：The technical comparison of different methods is shown in Table 3. Among them, the network structure of YOLOv1 is inspired by GoogleLenet, the input size is 224×224×3 image, the output grid is divided into 7×7, and each grid predicts 2 detection frames , The detection idea is to use the mapping ability of the convolutional neural network to directly construct the mapping process from the image to the parameters of the detection frame. The detection frame is composed of four parameters: the object center position (x, y) and the object height and width (h, w). In addition to the confidence level c of whether there is an object in a certain detection frame in the grid and the one-hot encoding of the object type together constitute the output of the network, the entire network is equivalent to learning how to label the object from scratch. In fact, for some objects in specific scenes, we have some prior knowledge that can help the network to better identify the objects, such as the aspect ratio information of the target object, which is generally around 3:1 for pedestrians. YOLOv2 incorporates this prior knowledge into network prediction. This is what we call a priori box technology. For YOLOv2, the input size is adjusted to 448×448×3, and the grid is divided into 13×13. Two detection frames are predicted. The size of the prior frame is obtained from the label size in the training set through the clustering method. In addition, the network structure has been greatly changed, and the darknet backbone is proposed, and the parameter amount is about half of the original. The comparison of YOLOv1 and YOLOv2 can be seen in Table 3. From Table 3, it can be seen that the method of MNPB-YOLO in this embodiment includes batch standardization, a priori box technology, convolution method, weight connection layer and parameter amount. obvious advantage:

表3：不同方法的技术对比Table 3: Technical comparison of different methods

S204：将所述待检测图像输入预先训练得到的行人检测模型，识别所述待检测图像中包含的行人数据；所述行人检测模型根据预设的深度可分离卷积方式训练得到；S204: Input the image to be detected into a pre-trained pedestrian detection model, and identify pedestrian data contained in the image to be detected; the pedestrian detection model is trained according to a preset depth separable convolution method;

在本实施例中S204与图1对应的实施例中S102的实现方式完全相同，具体可参考图1对应的实施例中的S102的相关描述，在此不再赘述。The implementation of S204 in this embodiment is exactly the same as that of S102 in the embodiment corresponding to FIG. 1. For details, reference may be made to the related description of S102 in the embodiment corresponding to FIG. 1, which will not be repeated here.

S205：对所述行人数据进行非极大值抑制处理，确定所述待检测图像中所述行人数据对应的人像边框。S205: Perform non-maximum value suppression processing on the pedestrian data, and determine a portrait frame corresponding to the pedestrian data in the image to be detected.

请一并参阅之前引入的图4，图4为本实施例提供的训练模型的训练过程和应用过程示意图，在根据历史图像及其扩充图像确定训练模型中的参数和权重之后，便可以将训练模型投入应用中，即通过训练模型来对待检测图像进行行人检测。在检测过程中，先通过车载摄像头来获取实时图像，即待检测图像，将获取到的实时图像输入训练好的训练模型中，检测得到实时图像中每个行人对应的至少一个边框，通过非极大值抑制的方式对各个边框进行计算和判定，确定待检测图像中的行人对应的行人边框，得到精确的检测结果。Please refer to the previously introduced Figure 4 together. Figure 4 is a schematic diagram of the training process and application process of the training model provided in this embodiment. After the parameters and weights in the training model are determined according to the historical images and their expanded images, the training can be The model is put into application, that is, pedestrian detection is performed on the image to be detected by training the model. In the detection process, the real-time image, that is, the image to be detected, is acquired through the vehicle-mounted camera. The acquired real-time image is input into the trained training model, and at least one frame corresponding to each pedestrian in the real-time image is detected. The large-value suppression method calculates and judges each frame, determines the pedestrian frame corresponding to the pedestrian in the image to be detected, and obtains an accurate detection result.

在进行仿真实验时，原始YOLO是一种多目标检测模型，为了将YOLO与MNPB-YOLO的单目标检测进行对比，我们对其进行了一些改动，改动后的模型我们称之为S-YOLOV1和S-YOLOV2，此外，我们还对比了两种传统的方法：基于HAAR特征和Adaboost分类器的行人检测算法，以及基于HOG特征和SVM分类器的行人检测算法。不同方法的检测结果对比如表4所示。其中，AP用于表示平均精准度，具体为PR曲线与XY轴包围的面积占比，P用于表示精准度，R用于表示召回率，FPS用于表示处理速度，即每秒处理多少帧。从表4可以看出，本实施例的方法不仅比其他方法准确，而且速度还快出很多，实验在Intel-i7 6700K和GTX 1080计算机平台上测得。In the simulation experiment, the original YOLO is a multi-target detection model. In order to compare the single-target detection of YOLO and MNPB-YOLO, we have made some changes. The modified model is called S-YOLOV1 and S-YOLOV2, in addition, we also compared two traditional methods: pedestrian detection algorithm based on HAAR feature and Adaboost classifier, and pedestrian detection algorithm based on HOG feature and SVM classifier. The test results of different methods are shown in Table 4. Among them, AP is used to indicate the average accuracy, specifically the ratio of the area enclosed by the PR curve and the XY axis, P is used to indicate the accuracy, R is used to indicate the recall rate, and FPS is used to indicate the processing speed, that is, how many frames are processed per second . It can be seen from Table 4 that the method of this embodiment is not only more accurate than other methods, but also has a much faster speed. The experiment is measured on the Intel-i7 6700K and GTX 1080 computer platforms.

表4：不同方法的检测结果对比Table 4: Comparison of test results of different methods

请一并参阅图12，图12为本实施例中的一些检测结果示例，通过图12可知，本实施例中的方式可以在雾霾天气或者环境可见度较低的情况下，检测到图像中的行人，并能精确的确定行人的人像边框。Please refer to FIG. 12 together. FIG. 12 is an example of some detection results in this embodiment. It can be seen from FIG. 12 that the method in this embodiment can detect in the image in the case of haze weather or low environmental visibility. Pedestrians, and can accurately determine the pedestrian frame.

S206：检测所述人像边框对应的行人相对于当前车辆的方位、所述当前车辆与所述人像边框对应的行人之间的距离。S206: Detect the orientation of the pedestrian corresponding to the portrait frame relative to the current vehicle, and the distance between the current vehicle and the pedestrian corresponding to the portrait frame.

在确定了获取到的待检测图像中的人像边框之后，可以根据人像边框来确定当前视野中的行人，并进行对应的控制操作。After determining the acquired portrait frame in the image to be detected, the pedestrian in the current field of view can be determined according to the portrait frame, and corresponding control operations can be performed.

示例性地，当检测到车辆前方的行人对应的人像边框之后，即确定车辆前方有人，可以再次检测该行人相对于车辆的方位、以及当前车辆与该行人之间的距离。具体的，方位的确定可以根据行人在待处理图像中的区域来确定，例如根据人像边框在待处理图像中的右下，则判定该行人在当前车辆的右前方。距离的确定可以通过红外测距的方式来确定，通过红外测距装置来确定该行人的位置与当前车辆的距离。Exemplarily, when the portrait frame corresponding to the pedestrian in front of the vehicle is detected, that is, it is determined that there is someone in front of the vehicle, the position of the pedestrian relative to the vehicle and the current distance between the vehicle and the pedestrian can be detected again. Specifically, the position can be determined according to the area of the pedestrian in the image to be processed. For example, according to the lower right of the portrait frame in the image to be processed, it is determined that the pedestrian is in front of the right of the current vehicle. The distance can be determined by infrared ranging, and the distance between the pedestrian's position and the current vehicle can be determined by an infrared ranging device.

S207：根据所述方位和所述距离，生成提醒信息，并进行播报；所述提醒信息用于提醒所述当前车辆的司机注意所述人像边框对应的行人。S207: Generate reminder information according to the direction and the distance, and broadcast it; the reminder information is used to remind the driver of the current vehicle to pay attention to the pedestrian corresponding to the portrait frame.

在确定了行人相对于当前车辆的方位和距离之后，生成提醒信息，并进行播报，以提醒当前车辆的司机注意人像边框对应的行人。After determining the position and distance of the pedestrian relative to the current vehicle, a reminder message is generated and broadcasted to remind the driver of the current vehicle to pay attention to the pedestrian corresponding to the portrait frame.

进一步的，若是在无人驾驶的环境下使用本实施例中的方法，也可以在确定了行人相对于当前车辆的方位和距离之后，根据行人相对于当前车辆的方位和距离，生成车辆控制指令，以通过车辆控制指令来控制车辆行进速度和方向，避免撞向行人。Further, if the method in this embodiment is used in an unmanned environment, after determining the position and distance of the pedestrian relative to the current vehicle, a vehicle control instruction can be generated according to the position and distance of the pedestrian relative to the current vehicle. , To control the speed and direction of the vehicle through vehicle control commands to avoid crashing into pedestrians.

上述方案，通过实时获取待检测图像；获取包含行人的历史图像；根据预设的权重连接层构建训练模型，根据所述历史图像对所述训练模型进行训练，得到所述行人检测模型。将所述待检测图像输入预先训练得到的行人检测模型，识别所述待检测图像中包含的行人数据；所述行人检测模型根据预设的深度可分离卷积方式训练得到；对所述行人数据进行非极大值抑制处理，确定所述待检测图像中所述行人数据对应的人像边框。通过对历史图像进行处理增加图像训练的基数，根据权重连接层构建训练模型并进行训练，得到所述行人检测模型，对待检测图像进行检测，并在得到检测结果之后，根据检测结果进行相应的处理，不仅提高了待检测图像的检测效率和检测精度，也提高了车辆驾驶和路人的安全性。In the above solution, the image to be detected is acquired in real time; the historical image containing the pedestrian is acquired; the training model is constructed according to the preset weight connection layer, and the training model is trained according to the historical image to obtain the pedestrian detection model. Input the image to be detected into a pre-trained pedestrian detection model to identify the pedestrian data contained in the image to be detected; the pedestrian detection model is trained according to a preset depth separable convolution method; for the pedestrian data Performing non-maximum value suppression processing to determine the portrait frame corresponding to the pedestrian data in the image to be detected. Increase the base of image training by processing historical images, construct a training model based on the weighted connection layer and perform training to obtain the pedestrian detection model, detect the image to be detected, and perform corresponding processing based on the detection result after the detection result is obtained , Not only improves the detection efficiency and detection accuracy of the image to be detected, but also improves the safety of vehicle driving and passersby.

实施例3：Example 3:

参见图13，图13是本申请实施例三提供的一种行人检测装置的示意图。行人检测装置1300可以为智能手机、平板电脑等移动终端。本实施例的行人检测装置1300包括的各单元用于执行图1对应的实施例中的各步骤，具体请参阅图1及图1对应的实施例中的相关描述，此处不赘述。本实施例的行人检测装置1300包括：Refer to FIG. 13, which is a schematic diagram of a pedestrian detection device provided in Embodiment 3 of the present application. The pedestrian detection device 1300 may be a mobile terminal such as a smart phone or a tablet computer. Each unit included in the pedestrian detection device 1300 of this embodiment is used to execute the steps in the embodiment corresponding to FIG. 1. For details, please refer to the related descriptions in the embodiment corresponding to FIG. 1 and FIG. 1, which will not be repeated here. The pedestrian detection device 1300 of this embodiment includes:

获取单元1301，用于实时获取待检测图像；The acquiring unit 1301 is configured to acquire the image to be detected in real time;

识别单元1302，用于将所述待检测图像输入预先训练得到的行人检测模型，识别所述待检测图像中包含的行人数据；所述行人检测模型根据预设的深度可分离卷积方式训练得到；The recognition unit 1302 is configured to input the image to be detected into a pre-trained pedestrian detection model to identify pedestrian data contained in the image to be detected; the pedestrian detection model is trained according to a preset depth separable convolution method ；

确定单元1303，用于对所述行人数据进行非极大值抑制处理，确定所述待检测图像中所述行人数据对应的人像边框。The determining unit 1303 is configured to perform non-maximum value suppression processing on the pedestrian data, and determine a portrait frame corresponding to the pedestrian data in the image to be detected.

进一步的，所述行人检测装置还包括：Further, the pedestrian detection device further includes:

历史获取单元，用于获取包含行人的历史图像；The historical acquisition unit is used to acquire historical images containing pedestrians;

训练单元，用于根据预设的权重连接层构建训练模型，根据所述历史图像对所述训练模型进行训练，得到所述行人检测模型。The training unit is configured to construct a training model according to a preset weight connection layer, and train the training model according to the historical image to obtain the pedestrian detection model.

进一步的，所述训练单元包括：Further, the training unit includes:

扩充单元，用于对所述历史图像进行图像增强处理，得到所述历史图像对应的至少两个扩充图像；An expansion unit, configured to perform image enhancement processing on the historical image to obtain at least two expanded images corresponding to the historical image;

第一训练单元，用于根据预设的权重连接层构建第一训练模型，根据所述历史图像及其对应的所述扩充图像对所述第一训练模型进行训练，得到所述行人检测模型。The first training unit is configured to construct a first training model according to a preset weight connection layer, and train the first training model according to the historical image and the corresponding extended image to obtain the pedestrian detection model.

进一步的，所述训练单元包括：Further, the training unit includes:

第二训练单元，用于根据预设的权重连接层，基于预设的深度可分离卷积方式以及预设的线性瓶颈层技术，构建第二训练模型；The second training unit is used for constructing a second training model based on the preset weight connection layer, the preset depth separable convolution method and the preset linear bottleneck layer technology;

真值单元，用于将所述历史图像输入所述第二训练模型中，检测所述历史图像中的行人图像，并确定每个行人图像对应的真值标签；所述真值标签用于表示识别出的历史图像的先验框基于真值框的变换系数；The truth value unit is used to input the historical image into the second training model, detect pedestrian images in the historical image, and determine the truth value label corresponding to each pedestrian image; the truth value label is used to indicate The a priori box of the recognized historical image is based on the transform coefficient of the truth box;

损失函数单元，用于根据所述真值标签，确定所述第二训练模型对应的损失函数；A loss function unit, configured to determine a loss function corresponding to the second training model according to the truth label;

优化单元，用于根据所述损失函数优化所述第二训练模型，得到所述行人检测模型。The optimization unit is configured to optimize the second training model according to the loss function to obtain the pedestrian detection model.

定位单元，用于检测所述人像边框对应的行人相对于当前车辆的方位、所述当前车辆与所述人像边框对应的行人之间的距离；A positioning unit for detecting the orientation of the pedestrian corresponding to the portrait frame relative to the current vehicle, and the distance between the current vehicle and the pedestrian corresponding to the portrait frame;

提醒单元，用于根据所述方位和所述距离，生成提醒信息，并进行播报；所述提醒信息用于提醒所述当前车辆的司机注意所述人像边框对应的行人。The reminding unit is used to generate and broadcast reminding information according to the orientation and the distance; the reminding information is used to remind the driver of the current vehicle to pay attention to the pedestrian corresponding to the portrait frame.

应理解，上述实施例中各步骤的序号的大小并不意味着执行顺序的先后，各过程的执行顺序应以其功能和内在逻辑确定，而不应对本申请实施例的实施过程构成任何限定。It should be understood that the size of the sequence number of each step in the foregoing embodiment does not mean the order of execution. The execution sequence of each process should be determined by its function and internal logic, and should not constitute any limitation to the implementation process of the embodiment of the present application.

实施例4：Example 4:

参见图14，图14是本申请实施例五提供的一种行人检测装置的示意图。如图14所示的本实施例中的行人检测装置1400可以包括：处理器1401、存储器1402以及存储在存储器1402中并可在处理器1401上运行的计算机程序1403。处理器1401执行计算机程序1403时实现上述各个行人检测方法实施例中的步骤。存储器1402用于存储计算机程序，所述计算机程序包括程序指令。处理器1401用于执行存储器1402存储的程序指令。其中，处理器1401被配置用于调用所述程序指令执行以下操作：Refer to FIG. 14, which is a schematic diagram of a pedestrian detection device provided in Embodiment 5 of the present application. The pedestrian detection device 1400 in this embodiment as shown in FIG. 14 may include a processor 1401, a memory 1402, and a computer program 1403 that is stored in the memory 1402 and can run on the processor 1401. When the processor 1401 executes the computer program 1403, the steps in the foregoing embodiments of the pedestrian detection method are implemented. The memory 1402 is used to store a computer program, and the computer program includes program instructions. The processor 1401 is configured to execute program instructions stored in the memory 1402. Wherein, the processor 1401 is configured to call the program instructions to perform the following operations:

处理器1401用于：The processor 1401 is used to:

实时获取待检测图像；Obtain the image to be inspected in real time;

进一步的，处理器1401具体用于：Further, the processor 1401 is specifically configured to:

获取包含行人的历史图像；Obtain historical images containing pedestrians;

根据预设的权重连接层构建训练模型，根据所述历史图像对所述训练模型进行训练，得到所述行人检测模型。A training model is constructed according to a preset weight connection layer, and the training model is trained according to the historical images to obtain the pedestrian detection model.

根据预设的权重连接层，基于预设的深度可分离卷积方式以及预设的线性瓶颈层技术，构建第二训练模型；Construct a second training model based on the preset weight connection layer, based on the preset depth separable convolution method and the preset linear bottleneck layer technology;

将所述历史图像输入所述第二训练模型中，检测所述历史图像中的行人图像，并确定每个行人图像对应的真值标签；所述真值标签用于表示识别出的历史图像的先验框基于真值框的变换系数；Input the historical image into the second training model, detect pedestrian images in the historical image, and determine the truth value label corresponding to each pedestrian image; the truth value label is used to indicate the identity of the recognized historical image The a priori box is based on the transform coefficient of the truth box;

根据所述真值标签，确定所述第二训练模型对应的损失函数；Determine the loss function corresponding to the second training model according to the truth label;

根据所述损失函数优化所述第二训练模型，得到所述行人检测模型。The second training model is optimized according to the loss function to obtain the pedestrian detection model.

检测所述人像边框对应的行人相对于当前车辆的方位、所述当前车辆与所述人像边框对应的行人之间的距离；Detecting the orientation of the pedestrian corresponding to the portrait frame relative to the current vehicle, and the distance between the current vehicle and the pedestrian corresponding to the portrait frame;

根据所述方位和所述距离，生成提醒信息，并进行播报；所述提醒信息用于提醒所述当前车辆的司机注意所述人像边框对应的行人。According to the azimuth and the distance, a reminder message is generated and broadcast; the reminder message is used to remind the driver of the current vehicle to pay attention to the pedestrian corresponding to the portrait frame.

应当理解，在本申请实施例中，所称处理器1401可以是中央处理单元(Central Processing Unit，CPU)，该处理器还可以是其他通用处理器、数字信号处理器(Digital Signal Processor，DSP)、专用集成电路(Application Specific Integrated Circuit，ASIC)、现成可编程门阵列(Field-Programmable Gate Array，FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。It should be understood that in the embodiments of the present application, the processor 1401 may be a central processing unit (Central Processing Unit, CPU), and the processor may also be other general-purpose processors or digital signal processors (Digital Signal Processors, DSP). , Application Specific Integrated Circuit (ASIC), Field-Programmable Gate Array (FPGA) or other programmable logic devices, discrete gates or transistor logic devices, discrete hardware components, etc. The general-purpose processor may be a microprocessor or the processor may also be any conventional processor or the like.

该存储器1402可以包括只读存储器和随机存取存储器，并向处理器1401提供指令和数据。存储器1402的一部分还可以包括非易失性随机存取存储器。例如，存储器1402还可以存储设备类型的信息。The memory 1402 may include a read-only memory and a random access memory, and provides instructions and data to the processor 1401. A part of the memory 1402 may also include a non-volatile random access memory. For example, the memory 1402 may also store device type information.

具体实现中，本申请实施例中所描述的处理器1401、存储器1402、计算机程序1403可执行本申请实施例提供的行人检测方法的第一实施例和第二实施例中所描述的实现方式，也可执行本申请实施例所描述的终端的实现方式，在此不再赘述。In specific implementation, the processor 1401, the memory 1402, and the computer program 1403 described in the embodiments of the present application can execute the implementations described in the first and second embodiments of the pedestrian detection method provided in the embodiments of the present application. The terminal implementation described in the embodiments of the present application can also be implemented, and details are not described herein again.

在本申请的另一实施例中提供一种计算机可读存储介质，所述计算机可读存储介质存储有计算机程序，所述计算机程序包括程序指令，所述程序指令被处理器执行时实现：In another embodiment of the present application, a computer-readable storage medium is provided, the computer-readable storage medium stores a computer program, the computer program includes program instructions, and the program instructions are implemented when executed by a processor:

实时获取待检测图像；Obtain the image to be inspected in real time;

进一步的，所述计算机程序被处理器执行时还实现：Further, when the computer program is executed by the processor, it also implements:

所述计算机可读存储介质可以是前述任一实施例所述的终端的内部存储单元，例如终端的硬盘或内存。所述计算机可读存储介质也可以是所述终端的外部存储设备，例如所述终端上配备的插接式硬盘，智能存储卡(Smart Media Card，SMC)，安全数字(Secure Digital，SD)卡，闪存卡(Flash Card)等。进一步地，所述计算机可读存储介质还可以既包括所述终端的内部存储单元也包括外部存储设备。所述计算机可读存储介质用于存储所述计算机程序及所述终端所需的其他程序和数据。所述计算机可读存储介质还可以用于暂时地存储已经输出或者将要输出的数据。The computer-readable storage medium may be the internal storage unit of the terminal described in any of the foregoing embodiments, such as the hard disk or memory of the terminal. The computer-readable storage medium may also be an external storage device of the terminal, for example, a plug-in hard disk equipped on the terminal, a smart memory card (Smart Media Card, SMC), or a Secure Digital (SD) card , Flash Card, etc. Further, the computer-readable storage medium may also include both an internal storage unit of the terminal and an external storage device. The computer-readable storage medium is used to store the computer program and other programs and data required by the terminal. The computer-readable storage medium can also be used to temporarily store data that has been output or will be output.

本领域普通技术人员可以意识到，结合本文中所公开的实施例描述的各示例的单元及算法步骤，能够以电子硬件、计算机软件或者二者的结合来实现，为了清楚地说明硬件和软件的可互换性，在上述说明中已经按照功能一般性地描述了各示例的组成及步骤。这些功能究竟以硬件还是软件方式来执行，取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能，但是这种实现不应认为超出本申请的范围。A person of ordinary skill in the art may realize that the units and algorithm steps of the examples described in the embodiments disclosed herein can be implemented by electronic hardware, computer software, or a combination of the two, in order to clearly illustrate the hardware and software Interchangeability. In the above description, the composition and steps of each example have been generally described in terms of function. Whether these functions are executed by hardware or software depends on the specific application and design constraint conditions of the technical solution. Professionals and technicians can use different methods for each specific application to implement the described functions, but such implementation should not be considered beyond the scope of this application.

所属领域的技术人员可以清楚地了解到，为了描述的方便和简洁，上述描述的终端和单元的具体工作过程，可以参考前述方法实施例中的对应过程，在此不再赘述。Those skilled in the art can clearly understand that, for the convenience and conciseness of description, the specific working process of the terminal and unit described above can refer to the corresponding process in the foregoing method embodiment, which is not repeated here.

在本申请所提供的几个实施例中，应该理解到，所揭露的终端和方法，可以通过其它的方式实现。例如，以上所描述的装置实施例仅仅是示意性的，例如，所述单元的划分，仅仅为一种逻辑功能划分，实际实现时可以有另外的划分方式，例如多个单元或组件可以结合或者可以集成到另一个系统，或一些特征可以忽略，或不执行。另外，所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口、装置或单元的间接耦合或通信连接，也可以是电的，机械的或其它的形式连接。In the several embodiments provided in this application, it should be understood that the disclosed terminal and method may be implemented in other ways. For example, the device embodiments described above are only illustrative. For example, the division of the units is only a logical function division, and there may be other divisions in actual implementation, for example, multiple units or components can be combined or It can be integrated into another system, or some features can be ignored or not implemented. In addition, the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may also be electrical, mechanical or other forms of connection.

所述作为分离部件说明的单元可以是或者也可以不是物理上分开的，作为单元显示的部件可以是或者也可以不是物理单元，即可以位于一个地方，或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本申请实施例方案的目的。The units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments of the present application.

另外，在本申请各个实施例中的各功能单元可以集成在一个处理单元中，也可以是各个单元单独物理存在，也可以是两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现，也可以采用软件功能单元的形式实现。In addition, the functional units in the various embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit. The above-mentioned integrated unit can be implemented in the form of hardware or software functional unit.

所述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时，可以存储在一个计算机可读取存储介质中。基于这样的理解，本申请的技术方案本质上或者说对现有技术做出贡献的部分，或者该技术方案的全部或部分可以以软件产品的形式体现出来，该计算机软件产品存储在一个存储介质中，包括若干指令用以使得一台计算机设备 (可以是个人计算机，服务器，或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括：U盘、移动硬盘、只读存储器(Read-Only Memory，ROM)、随机存取存储器(Random Access Memory，RAM)、磁碟或者光盘等各种可以存储程序代码的介质。If the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer readable storage medium. Based on this understanding, the technical solution of this application is essentially or the part that contributes to the existing technology, or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium It includes several instructions to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the methods described in the various embodiments of the present application. The aforementioned storage media include: U disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disk and other media that can store program code .

以上所述，仅为本申请的具体实施方式，但本申请的保护范围并不局限于此，任何熟悉本技术领域的技术人员在本申请揭露的技术范围内，可轻易想到各种等效的修改或替换，这些修改或替换都应涵盖在本申请的保护范围之内。因此，本申请的保护范围应以权利要求的保护范围为准。The above are only specific implementations of this application, but the protection scope of this application is not limited to this. Anyone familiar with the technical field can easily think of various equivalents within the technical scope disclosed in this application. Modifications or replacements, these modifications or replacements shall be covered within the protection scope of this application. Therefore, the protection scope of this application shall be subject to the protection scope of the claims.

Claims

A pedestrian detection method, characterized in that it comprises:

Obtain the image to be inspected in real time;

Inputting the image to be detected into a pre-trained pedestrian detection model to identify pedestrian data contained in the image to be detected; the pedestrian detection model is trained according to a preset depth separable convolution method;

Non-maximum value suppression processing is performed on the pedestrian data, and a portrait frame corresponding to the pedestrian data in the image to be detected is determined.

The pedestrian detection method according to claim 1, wherein before inputting the image to be detected into a pre-trained pedestrian detection model, and identifying the pedestrian data contained in the image to be detected, the method comprises:

Obtain historical images containing pedestrians;

A training model is constructed according to a preset weight connection layer, and the training model is trained according to the historical images to obtain the pedestrian detection model.

The pedestrian detection method according to claim 2, wherein the constructing a training model according to a preset weight connection layer, and training the training model according to the historical image to obtain the pedestrian detection model, comprises:

Performing image enhancement processing on the historical image to obtain at least two expanded images corresponding to the historical image;

A first training model is constructed according to a preset weight connection layer, and the first training model is trained according to the historical image and the corresponding extended image to obtain the pedestrian detection model.

Construct a second training model based on the preset weight connection layer, based on the preset depth separable convolution method and the preset linear bottleneck layer technology;

Input the historical image into the second training model, detect pedestrian images in the historical image, and determine the truth value label corresponding to each pedestrian image; the truth value label is used to indicate the identity of the recognized historical image The a priori box is based on the transform coefficient of the truth box;

Determine the loss function corresponding to the second training model according to the truth label;

The second training model is optimized according to the loss function to obtain the pedestrian detection model.

The pedestrian detection method according to any one of claims 1 to 4, wherein the non-maximum value suppression processing is performed on the pedestrian data to determine the portrait frame corresponding to the pedestrian data in the image to be detected After that, it also includes:

Detecting the orientation of the pedestrian corresponding to the portrait frame relative to the current vehicle, and the distance between the current vehicle and the pedestrian corresponding to the portrait frame;

According to the direction and the distance, a reminder message is generated and broadcast; the reminder message is used to remind the driver of the current vehicle to pay attention to the pedestrian corresponding to the portrait frame.

A pedestrian detection device, characterized in that it comprises:

The acquiring unit is used to acquire the image to be detected in real time;

A recognition unit, configured to input the image to be detected into a pre-trained pedestrian detection model to identify pedestrian data contained in the image to be detected; the pedestrian detection model is trained according to a preset depth separable convolution method;

The determining unit is configured to perform non-maximum value suppression processing on the pedestrian data, and determine a portrait frame corresponding to the pedestrian data in the image to be detected.

8. The pedestrian detection device of claim 6, wherein the pedestrian detection device further comprises:

The historical acquisition unit is used to acquire historical images containing pedestrians;

The training unit is configured to construct a training model according to a preset weight connection layer, and train the training model according to the historical image to obtain the pedestrian detection model.

The pedestrian detection device according to claim 6, wherein the training unit comprises:

An expansion unit, configured to perform image enhancement processing on the historical image to obtain at least two expanded images corresponding to the historical image;

The first training unit is configured to construct a first training model according to a preset weight connection layer, and train the first training model according to the historical image and the corresponding extended image to obtain the pedestrian detection model.

A pedestrian detection device, comprising a memory, a processor, and a computer program stored in the memory and running on the processor, wherein the processor executes the computer program as claimed in claim 1. Steps of any one of the methods to 5.

A computer-readable storage medium storing a computer program, wherein the computer program is executed by a processor to implement the steps of the method according to any one of claims 1 to 5.