CN116935436A

CN116935436A - Pedestrian target detection method based on IRD image, electronic equipment and computer readable storage medium

Info

Publication number: CN116935436A
Application number: CN202310825206.7A
Authority: CN
Inventors: 刘鑫鹏; 刘�英; 徐进杰
Original assignee: Qingdao Weigan Technology Co ltd
Current assignee: Qingdao Weigan Zhitong Technology Co ltd
Priority date: 2023-07-06
Filing date: 2023-07-06
Publication date: 2023-10-24

Abstract

The invention discloses a pedestrian target detection method based on an IRD image, electronic equipment and a computer readable storage medium, wherein the pedestrian target detection method comprises the steps of acquiring data of a gray level Image (IR) and a depth image (D) of a pedestrian by utilizing a TOF camera, and fusing the gray level Image (IR) and the depth image (D) into a three-channel Image (IRD); using the trained models, pedestrians in the three-way Image (IRD) are identified inferentially. The method fuses the IR and D images, the resolution of the used image is 320 x 240, the model reasoning speed can be obviously reduced, real-time detection is achieved, and the method can be widely deployed on embedded low-computation platforms; meanwhile, the IRD image is insensitive to information such as color, illumination and the like, the detection effect is less influenced by environmental factors such as clothing color, illumination and the like, the detection precision can be effectively improved, and personal privacy information can be effectively protected.

Description

A pedestrian target detection method, electronic equipment, and computer based on IRD images Read storage media

技术领域Technical field

本发明涉及红外及深度图像处理检测技术领域，具体涉及一种基于IRD图像的行人目标检测方法、电子设备、计算机可读存储介质。The invention relates to the technical field of infrared and depth image processing and detection, and specifically relates to a pedestrian target detection method, electronic equipment, and computer-readable storage media based on IRD images.

背景技术Background technique

行人目标检测是利用计算机视觉技术判断图像或者视频序列中是否存在行人并给予精确定位。行人检测在智能辅助驾驶，客流统计，智能监控等多个场景下都具有广泛的应用，近年来发展迅速，但是也存在着许多问题。Pedestrian target detection uses computer vision technology to determine whether there are pedestrians in images or video sequences and to accurately locate them. Pedestrian detection is widely used in multiple scenarios such as intelligent assisted driving, passenger flow statistics, and intelligent monitoring. It has developed rapidly in recent years, but there are also many problems.

在行人目标检测领域，大多为多行人、常移动、光影环境复杂的场景；传统的方法通常是基于RGB彩色相机的图像，图像分辨率较高，而在行人目标检测的复杂场景中，图像分辨率较高会严重影响推理速度，同时检测精度也受衣着颜色、光照等方面的影响而降低；并且RGB彩色图像包含人脸等个人隐私信息，存在安全隐患；In the field of pedestrian target detection, most of them are scenes with many pedestrians, frequent movements, and complex light and shadow environments; traditional methods are usually based on images from RGB color cameras, and the image resolution is high. In complex scenes of pedestrian target detection, the image resolution A higher rate will seriously affect the inference speed, and the detection accuracy is also reduced by the influence of clothing color, lighting, etc.; and RGB color images contain personal privacy information such as faces, which poses security risks;

而随着TOF相机(TOF相机具有红外景深传感器，含有灰度及深度数据)研发的日渐成熟，利用TOF相机可以实现更好的行人目标检测。As the development of TOF cameras (TOF cameras with infrared depth-of-field sensors and containing grayscale and depth data) becomes increasingly mature, TOF cameras can be used to achieve better pedestrian target detection.

发明内容Contents of the invention

本发明要解决的技术问题在于解决使用RGB图像做行人目标检测存在的推理速度、检测精度低、存在泄露个人隐私隐患等诸多问题；为解决上述技术问题,本发明采用的具体技术方案为:The technical problem to be solved by this invention is to solve many problems such as the reasoning speed, low detection accuracy, and hidden dangers of leaking personal privacy when using RGB images for pedestrian target detection. In order to solve the above technical problems, the specific technical solutions adopted by this invention are:

有鉴于此，本发明基于IDR图像(基于ToF(Time of Flight)技术的IR(InfraredRadiation)图像和D(Depth)图像融合的三通道低分辨率图像)做行人目标检测，可以避免上述问题；本发明提供的技术方In view of this, the present invention performs pedestrian target detection based on IDR images (three-channel low-resolution images based on the fusion of IR (Infrared Radiation) images and D (Depth) images based on ToF (Time of Flight) technology), which can avoid the above problems; The technology provided by the invention

案如下：The case is as follows:

一种基于IRD图像的行人目标检测方法，其特征在于，包括以下步骤：A pedestrian target detection method based on IRD images, which is characterized by including the following steps:

(1)利用TOF相机采集行人的灰度图像(IR)及深度图像(D)的数据；(1) Use a TOF camera to collect data on grayscale images (IR) and depth images (D) of pedestrians;

(2)将所述灰度图像(IR)与所述深度图像(D)融合为三通道图像(IRD)；(2) Fusion of the grayscale image (IR) and the depth image (D) into a three-channel image (IRD);

(3)利用已训练完成的模型，推理识别出所述三通道图像(IRD)中的行人。(3) Use the trained model to infer and identify pedestrians in the three-channel image (IRD).

其中，采用单台TOF相机采集行人的原始图像经过Sensor后输出的所述灰度图像(IR)及所述深度图像(D)的时间一致，空间也对齐，因此其时间和像素上是对齐的，如此可以保证后续图像融合的准确性；Among them, a single TOF camera is used to collect the original image of a pedestrian and the grayscale image (IR) and the depth image (D) output by the Sensor are consistent in time and space, so they are aligned in time and pixels. , which can ensure the accuracy of subsequent image fusion;

优选的，所述三通道图像(IRD)的融合方法包括：Preferably, the three-channel image (IRD) fusion method includes:

①将所述灰度图像(IR)的有效值归一化到0～255；① Normalize the effective value of the grayscale image (IR) to 0~255;

其中，x_IR为所述灰度图像(IR)原始有效值，thresh_IR为所述灰度图像(IR)归一化阈值；Wherein, x _IR is the original effective value of the grayscale image (IR), and thresh _IR is the normalized threshold of the grayscale image (IR);

②将所述深度图像(D)的有效值归一化到0～255；② Normalize the effective value of the depth image (D) to 0~255;

具体的，先利用相机挂高减去所述深度图像(D)的原始有效值，随后再归一化到0～255；Specifically, the original effective value of the depth image (D) is first subtracted from the camera height, and then normalized to 0 to 255;

x_d＝camera_height-x_D x _d =camera_height-x _D

其中，camera_height为所述相机挂高，x_D为所述深度图像(D)的原始有效值，thresh_D为所述深度图像(D)归一化阈值；Among them, camera_height is the height of the camera, x _D is the original effective value of the depth image (D), and threshold _D is the normalized threshold of the depth image (D);

③合并所述灰度图像(IR)和所述深度图像(D)为所述三通道图像(IRD)；③ Merge the grayscale image (IR) and the depth image (D) into the three-channel image (IRD);

其中所述三通道图像(IRD)包含二通道所述灰度图像(IR)和一通道所述深度图像(D)；Wherein the three-channel image (IRD) includes the two-channel grayscale image (IR) and the one-channel depth image (D);

结合相机挂高来归一化图像数据，能提高实际应用场景中的识别效果。Combining the camera height to normalize the image data can improve the recognition effect in actual application scenarios.

进一步的，已训练完成的所述模型的训练方法如下：Further, the training method of the trained model is as follows:

a、取所述三通道图像(IRD)数据做数据标注；a. Take the three-channel image (IRD) data for data annotation;

b、将标注好的数据按比例划分为训练集、验证集和测试集；b. Divide the labeled data into training set, verification set and test set in proportion;

c、利用所述训练集和所述验证集进行所述模型的训练；c. Use the training set and the verification set to train the model;

d、利用所述测试集进行所述模型的评估：d. Use the test set to evaluate the model:

若，模型评估效果符合预期，则所述模型训练完成；If the model evaluation effect meets expectations, the model training is completed;

若，模型评估效果不符合预期，则重复步骤a-d。If the model evaluation effect does not meet expectations, repeat steps a-d.

优选的，取至少2组所述三通道图像(IRD)数据做数据标注，其中不同组的所述三通道图像(IRD)对应的TOF相机的挂高不同；Preferably, at least two groups of the three-channel image (IRD) data are taken for data annotation, wherein different groups of the three-channel image (IRD) correspond to different mounting heights of TOF cameras;

采集TOF相机不同挂高状态下对应的行人的所述灰度图像(IR)及所述深度图像(D)的数据，进而融合成不同挂高的所述三通道图像(IRD)数据，对不同挂高的所述三通道图像(IRD)做数据标注，能提高模型的泛化能力，提高模型对于从未见过的数据的预测能力，在训练数据上有很好的表现。Collect the data of the grayscale image (IR) and the depth image (D) of pedestrians corresponding to different hanging heights of the TOF camera, and then fuse them into the three-channel image (IRD) data of different hanging heights. The high-hanging three-channel image (IRD) used for data annotation can improve the generalization ability of the model, improve the model's prediction ability for never-before-seen data, and perform well on training data.

优选的，所述将标注好的数据按比例划分为训练集、验证集和测试集，其中，所述训练集、所述验证集和所述测试集占比为8:1:1。Preferably, the labeled data is divided into a training set, a verification set and a test set in proportion, wherein the proportion of the training set, the verification set and the test set is 8:1:1.

优选的，所述模型为yolov5s模型。Preferably, the model is the yolov5s model.

进一步的，在对所述三通道图像(IRD)做数据标注时，行人的头部与上半身作为一个标注框进行标注，标注格式为txt标注文件，每行格式为“类别中心点横坐标中心点纵坐标标注框宽度标注框高度”，其中，所述“中心点横坐标”、所述“中心点纵坐标”、所述“标注框宽度”、所述“标注框宽度”、所述“标注框高度”均除以图像的宽度和高度后做归一化；该步骤可以应对不同尺寸输入的影响。Further, when doing data annotation for the three-channel image (IRD), the pedestrian's head and upper body are annotated as an annotation box. The annotation format is a txt annotation file, and the format of each line is "category center point, abscissa center point Vertical coordinate labeling frame width labeling frame height", wherein the "center point abscissa", the "center point ordinate", the "labeling frame width", the "labeling frame width", the "labeling "Box height" are divided by the width and height of the image and then normalized; this step can deal with the impact of different size inputs.

进一步的，所述模型的评估，评估指标包含P(Precision)、R(Recall)、mAP@0.5(mean Average Precision)；当所述P、R、mAP@0.5均达到96％以上时，认为模型评估效果符合预期。Further, for the evaluation of the model, the evaluation indicators include P (Precision), R (Recall), and mAP@0.5 (mean Average Precision); when the P, R, and mAP@0.5 all reach above 96%, the model is considered The evaluation results are in line with expectations.

进一步的，所述数据标注包括人工标注，在第一次采集所述三通道图像(IRD)数据时，使用人工标注方法对数据做标注；Further, the data annotation includes manual annotation. When the three-channel image (IRD) data is collected for the first time, the manual annotation method is used to annotate the data;

进一步的，所述数据标注包括模型预标注；当模型评估效果不符合预期时，重新采集所述三通道图像(IRD)数据，利用不符合预期的模型对所述三通道图像(IRD)做数据预标注，然后使用人工标注对标注数据进行微调以提高标注效率。Further, the data annotation includes model pre-annotation; when the model evaluation effect does not meet expectations, the three-channel image (IRD) data is re-collected, and the three-channel image (IRD) data is made using the model that does not meet expectations. Pre-annotation, and then use manual annotation to fine-tune the annotation data to improve annotation efficiency.

本发明还提供了一种基于IRD图像的行人目标检测电子设备，包括：处理器、通信接口、存储器和通信总线，其中，处理器、通信接口和存储器通过通信总线完成相互间的通信；存储器，用于存储计算机程序；处理器，用于执行所述计算机程序时实现方法步骤。The invention also provides an electronic device for pedestrian target detection based on IRD images, including: a processor, a communication interface, a memory and a communication bus, wherein the processor, communication interface and memory complete communication with each other through the communication bus; the memory, used to store a computer program; a processor used to implement the method steps when executing the computer program.

本发明还提供了一种计算机可读存储介质，其上存储有计算机程序，所述计算机程序被处理器执行时实现基于IRD图像的行人目标检测方法的步骤。The present invention also provides a computer-readable storage medium on which a computer program is stored. When the computer program is executed by a processor, the steps of the pedestrian target detection method based on IRD images are implemented.

该方法将IR与D图像进行融合，使用的图像分辨率较低为320*240，可以明显降低模型推理速度，达到实时检测，能够广泛部署在嵌入式等低算力平台；同时，IRD图像对颜色、光照等信息不敏感，检测效果受衣着颜色，光照等环境因素影响较小，能够有效提升检测精度，而且可以有效保护个人隐私信息。This method fuses IR and D images, and uses a lower image resolution of 320*240, which can significantly reduce the model inference speed, achieve real-time detection, and can be widely deployed on low-computing power platforms such as embedded platforms; at the same time, IRD images Information such as color and lighting is not sensitive, and the detection effect is less affected by environmental factors such as clothing color and lighting. It can effectively improve detection accuracy and effectively protect personal privacy information.

附图说明Description of the drawings

图1为本发明实施例提供的一种基于IRD图像的行人目标检测方法流程图；Figure 1 is a flow chart of a pedestrian target detection method based on IRD images provided by an embodiment of the present invention;

图2为本发明实施例提供的三通道图像(IRD)；Figure 2 is a three-channel image (IRD) provided by an embodiment of the present invention;

图3为本发明实施例提供的检测结果图像；Figure 3 is a detection result image provided by an embodiment of the present invention;

图4为本发明实施例提供的一种基于IRD图像的行人目标检测电子设备结构示意图；Figure 4 is a schematic structural diagram of an electronic device for pedestrian target detection based on IRD images provided by an embodiment of the present invention;

其中图3中的数字为检测为行人的概率。The number in Figure 3 is the probability of being detected as a pedestrian.

具体实施方式Detailed ways

为使本发明实施例的目的、技术方案和优点更加清楚,下面将结合本发明实施例中的附图,对实施例中的技术方案进行清楚、完整地描述,以下实施例用于说明本发明,但不用来限制本发明的范围。In order to make the purpose, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below in conjunction with the drawings in the embodiments of the present invention. The following examples are used to illustrate the present invention. , but shall not be used to limit the scope of the present invention.

实施例1：Example 1:

如图1-3所示，本实施例提供一种基于IRD图像的行人目标检测方法流程；As shown in Figure 1-3, this embodiment provides a pedestrian target detection method flow based on IRD images;

(2)将灰度图像(IR)与深度图像(D)融合为三通道图像(IRD)；具体的，以一个像素点为例：(2) Fuse the grayscale image (IR) and the depth image (D) into a three-channel image (IRD); specifically, take a pixel as an example:

由于灰度图像(IR)及深度图像(D)在时间和像素上是对齐的；因此取一个像素点位置，其灰度图像(IR)和深度图像(D)具有对应关系；假设，灰度图像(IR)原始有效值x_IR为604，深度图像(D)原始有效值x_D为807，相机挂高camera_height为2100，灰度图像(IR)归一化阈值thresh_IR为3840，深度图像(D)归一化阈值thresh_D为2100，数据处理过程如下：Since the grayscale image (IR) and the depth image (D) are aligned in time and pixels; therefore, taking a pixel position, the grayscale image (IR) and the depth image (D) have a corresponding relationship; assuming that the grayscale The original effective value x _IR of the image (IR) is 604, the original effective value x D of the depth image ( _D ) is 807, the camera height camera_height is 2100, the normalized threshold thresh _IR of the grayscale image (IR) is 3840, and the depth image ( D) The normalized threshold thresh _D is 2100, and the data processing process is as follows:

①由于604大于等于0小于等于3840，因此f(604)＝604÷3840×255≈40；① Since 604 is greater than or equal to 0 and less than or equal to 3840, f(604)=604÷3840×255≈40;

②x_d＝2100-807＝1293,由于1293大于等于0小于等于2100，因此f(1293)＝1293÷2100×255≈157；②x _d = 2100-807 = 1293. Since 1293 is greater than or equal to 0 and less than or equal to 2100, f(1293) = 1293÷2100×255≈157;

③IRD三通道图像为IR+D+IR拼接融合产生，上述像素点最终的三通道数值为(40,157,40)；③The IRD three-channel image is generated by IR+D+IR splicing and fusion. The final three-channel value of the above pixels is (40,157,40);

将灰度图像(IR)和深度图像(D)的每个像素点都经过上述计算后，即可拼接融合为三通道图像(IRD)。After each pixel of the grayscale image (IR) and depth image (D) undergoes the above calculation, it can be spliced and fused into a three-channel image (IRD).

(3)利用已训练完成的模型，即可推理识别出三通道图像(IRD)中的行人。(3) Using the trained model, pedestrians in the three-channel image (IRD) can be inferred and recognized.

实施例2：Example 2:

本实施例提供具体的模型训练方法：This embodiment provides specific model training methods:

训练模型可以是yolov5s、yolov7-tiny或SSD等目标检测模型，此处以训练yolov5s模型为例：The training model can be a target detection model such as yolov5s, yolov7-tiny or SSD. Here we take training the yolov5s model as an example:

a、在正式行人检测前，首先取已经处理过的三通道图像(IRD)数据做人工数据标注；相关数据可以是多组，其中至少2组数据，TOF相机的相机挂高不同；在数据标注时，行人的头部与上半身作为一个标注框进行标注，标注格式为txt标注文件，每行格式为“类别中心点横坐标中心点纵坐标标注框宽度标注框高度”，其中，“中心点横坐标”、“中心点纵坐标”、“标注框宽度”、“标注框宽度”、“标注框高度”均除以图像的宽度和高度后做归一化；a. Before formal pedestrian detection, first take the processed three-channel image (IRD) data for manual data annotation; the relevant data can be multiple sets, of which at least 2 sets of data, the camera height of the TOF camera is different; in the data annotation When the pedestrian's head and upper body are marked as a label box, the label format is a txt label file. The format of each line is "category center point abscissa coordinate center point ordinate label box width label box height", where "center point horizontal coordinate "Coordinates", "center point ordinate", "label box width", "label box width", and "label box height" are all divided by the width and height of the image and then normalized;

b、将标注好的数据按8:1:1的比例划分为训练集、验证集和测试集；b. Divide the labeled data into training set, verification set and test set in a ratio of 8:1:1;

c、利用训练集和验证集进行yolov5s模型的训练；其中训练参数包含训练步数，学习率等参数；c. Use the training set and verification set to train the yolov5s model; the training parameters include the number of training steps, learning rate and other parameters;

d、利用测试集进行yolov5s模型的评估；d. Use the test set to evaluate the yolov5s model;

若评估指标P(Precision)、R(Recall)、mAP@0.5(mean Average Precision)有一项未达到96％时，认为模型评估效果不符合预期；假设该模型为V0；If any of the evaluation indicators P (Precision), R (Recall), mAP@0.5 (mean Average Precision) does not reach 96%, it is considered that the model evaluation effect does not meet expectations; assume that the model is V0;

进一步的，重新采集已经处理过的三通道图像(IRD)数据，利用模型V0对图像做数据预标注，然后使用人工标注对标注数据进行微调；数据预标注时关闭hsv优化，启用mosaic；Further, the processed three-channel image (IRD) data is recollected, the image is pre-annotated using model V0, and then manual annotation is used to fine-tune the annotated data; hsv optimization is turned off and mosaic is enabled during data pre-annotation;

重复上述b、c、d步骤，若评估指标仍不达标，则继续以上步骤，直至评估指标P(Precision)、R(Recall)、mAP@0.5(mean Average Precision)均达到96％时，认为模型评估效果符合预期；Repeat steps b, c, and d above. If the evaluation index still does not meet the standard, continue the above steps until the evaluation indexes P (Precision), R (Recall), and mAP@0.5 (mean Average Precision) all reach 96%, and the model is considered The evaluation effect is in line with expectations;

使用该模型即可在实施例1步骤(3)中，推理识别出三通道图像(IRD)中的行人。Using this model, pedestrians in the three-channel image (IRD) can be inferred and identified in step (3) of Embodiment 1.

如图2-3所示，由于本发明中使用的灰度图像(IR)为灰度图像，不包含颜色信息，因此对应的IRD图像对颜色信息不敏感，检测效果受衣着颜色因素影响较小；本发明中使用的TOF相机，测距原理是通过给目标连续发送光脉冲，然后用传感器接收从物体返回的光，通过探测光脉冲的飞行(往返)时间来得到目标物距离，TOF相机镜头上会有一个带通滤光片来保证只有与照明光源波长相同的光才能进入，这样可以抑制非相干光源提高信噪比，因此受光照条件影响较小；对应的IRD图像受光照因素影响较小。As shown in Figure 2-3, since the grayscale image (IR) used in the present invention is a grayscale image and does not contain color information, the corresponding IRD image is not sensitive to color information, and the detection effect is less affected by clothing color factors. ; The TOF camera used in the present invention, the distance measurement principle is to continuously send light pulses to the target, and then use the sensor to receive the light returned from the object, and obtain the target distance by detecting the flight (round trip) time of the light pulse, TOF camera lens There will be a bandpass filter to ensure that only light with the same wavelength as the illumination source can enter. This can suppress incoherent light sources and improve the signal-to-noise ratio, so it is less affected by lighting conditions; the corresponding IRD image is more affected by lighting factors. Small.

实施例3：Example 3:

如图4所示，本发明实施例提供了一种基于IRD图像的行人目标检测电子设备，包括，处理器、通信接口、存储器和通信总线，其中，处理器、通信接口和存储器通过通信总线完成相互间的通信。As shown in Figure 4, an embodiment of the present invention provides an electronic device for pedestrian target detection based on IRD images, including a processor, a communication interface, a memory and a communication bus, where the processor, communication interface and memory are completed through the communication bus. communication between each other.

实施例4：Example 4:

本发明实施例提供了一种计算机可读存储介质，其上存储有计算机程序，该计算机程序被处理器执行时实现基于IRD图像的行人目标检测方法的步骤。Embodiments of the present invention provide a computer-readable storage medium on which a computer program is stored. When the computer program is executed by a processor, the steps of the pedestrian target detection method based on IRD images are implemented.

Claims

1. A pedestrian target detection method based on IRD images, characterized by including the following steps:

(1) Use a TOF camera to collect data on grayscale images (IR) and depth images (D) of pedestrians;

(2) Fusion of the grayscale image (IR) and the depth image (D) into a three-channel image (IRD);

(3) Use the trained model to infer and identify pedestrians in the three-channel image (IRD).

2. A pedestrian target detection method based on IRD images according to claim 1, characterized in that the fusion method of the three-channel images (IRD) includes:

① Normalize the effective value of the grayscale image (IR) to 0~255;

Wherein, x _IR is the original effective value of the grayscale image (IR), and thresh _IR is the normalized threshold of the grayscale image (IR);

② Normalize the effective value of the depth image (D) to 0~255;

Specifically, first use the camera height to subtract the original effective value of the depth image (D),

Then normalize to 0~255;

x _d =camera_height-x _D

Among them, camera_height is the height of the camera, x _D is the original effective value of the depth image (D), and threshold _D is the normalized threshold of the depth image (D);

③ Merge the grayscale image (IR) and the depth image (D) into the three-channel image (IRD).

3. A pedestrian target detection method based on IRD images according to claim 1, characterized in that the training method of the trained model is as follows:

a. Take the three-channel image (IRD) data for data annotation;

b. Divide the labeled data into training set, verification set and test set in proportion;

c. Use the training set and verification set to train the model;

d. Use the test set to evaluate the model:

If the model evaluation effect meets expectations, the model training is completed;

If the model evaluation effect does not meet expectations, repeat steps a-d.

4. A pedestrian target detection method based on IRD images according to claim 3, characterized in that at least 2 groups of the three-channel image (IRD) data are taken for data annotation, wherein different groups of the three-channel images are (IRD) The corresponding TOF cameras have different mounting heights.

5. A pedestrian target detection method based on IRD images according to claim 3, characterized in that the data annotation includes manual annotation, and manual annotation is used when collecting the three-channel image (IRD) data for the first time. Labeling methods label data.

6. A pedestrian target detection method based on IRD images according to claim 5, characterized in that the data annotation includes model pre-annotation; when the model evaluation effect does not meet expectations, the three-channel image is re-acquired ( IRD) data, use a model that does not meet expectations to pre-annotate the three-channel image (IRD) data, and then use manual annotation to fine-tune the annotated data.

7. A pedestrian target detection method based on IRD images according to claim 3, characterized in that the ratio of the training set, the verification set and the test set is 8:1:1.

8. A pedestrian target detection method based on IRD images according to claim 1, characterized in that the model is a yolov5s model.

9. An electronic device for pedestrian target detection based on IRD images, characterized by comprising: a processor, a communication interface, a memory and a communication bus, wherein the processor, the communication interface and the memory complete communication with each other through the communication bus; the memory , used to store the computer program; the processor, used to implement the method steps described in any one of claims 1-8 when executing the computer program.

10. A computer-readable storage medium with a computer program stored thereon, characterized in that when the computer program is executed by a processor, the pedestrian target detection based on IRD images according to any one of claims 1 to 8 is implemented Method steps.