CN111723602A

CN111723602A - Driver's behavior recognition method, device, equipment and storage medium

Info

Publication number: CN111723602A
Application number: CN201910207840.8A
Authority: CN
Inventors: 乔梁
Original assignee: Hangzhou Hikvision Digital Technology Co Ltd
Current assignee: Hangzhou Hikvision Digital Technology Co Ltd
Priority date: 2019-03-19
Filing date: 2019-03-19
Publication date: 2020-09-29
Anticipated expiration: 2039-03-19
Also published as: CN111723602B

Abstract

The present application discloses a method, device, device and storage medium for identifying a driver's behavior, belonging to the technical field of intelligent transportation. The method includes: acquiring a target image, acquiring a first area image from the target image, where the first area image includes the driver's preset behavior, and performing a measurement on the surrounding area of the first area image according to a first proportional threshold in the target image. Reducing processing to obtain a second area image, and performing expansion processing on the surrounding area of the first area image according to the second proportional threshold to obtain a third area image, based on the first area image, the second area image and the third area image, identify out of the driver's behavior. After the first area image is cut in and expanded in the present application, the obtained second area image and the third area image can include more information about the surrounding objects related to the preset behavior. Therefore, based on the three areas The image can identify the driver's behavior, which can improve the accuracy of the recognition.

Description

Driver's behavior recognition method, device, equipment and storage medium

技术领域technical field

本申请涉及智能交通技术领域，特别涉及一种驾驶员的行为识别方法、装置、设备及存储介质。The present application relates to the technical field of intelligent transportation, and in particular, to a method, device, device, and storage medium for identifying a driver's behavior.

背景技术Background technique

随着智能交通技术的快速发展，人们越来越多关注如何通过智能手段解决由于驾驶员分心导致的行车安全问题。在一些场景中，可以对驾驶员的人脸部分进行拍摄，基于拍摄得到的图像对驾驶员的行为进行分析，当确定存在违规行为时，进行报警提示，从而提醒驾驶员安全驾驶。With the rapid development of intelligent transportation technology, people pay more and more attention to how to solve the driving safety problem caused by driver distraction through intelligent means. In some scenarios, the driver's face can be photographed, and the driver's behavior can be analyzed based on the photographed images. When it is determined that there is a violation, an alarm will be issued to remind the driver to drive safely.

目前，可以采用经过深度学习后的网络模型对驾驶员的行为进行检测。也就是说，可以预先基于一些训练样本对待训练的网络模型进行深度训练，该训练样本可以包括图像样本以及图像样本中的行为所在区域的位置信息和行为类别，使得训练后的网络模型可以基于拍摄的图像识别出一些诸如打电话、抽烟之类的手势。At present, the network model after deep learning can be used to detect the driver's behavior. That is to say, the network model to be trained can be deeply trained based on some training samples in advance, and the training samples can include the image samples and the location information and behavior categories of the regions where the behaviors in the image samples are located, so that the trained network model can be based on the image samples. The image recognizes some gestures such as making a phone call, smoking a cigarette, etc.

然而，由于驾驶员在驾驶过程中可能存在与违规行为类似的手势，譬如，摸耳朵，摸下巴、捂嘴等，在该种情况下，上述提供的驾驶员的行为识别方法可能导致误判。However, since the driver may have gestures similar to the illegal behavior during driving, for example, touching the ear, touching the chin, covering the mouth, etc., in this case, the driver's behavior recognition method provided above may lead to misjudgment.

发明内容SUMMARY OF THE INVENTION

本申请实施例提供了一种驾驶员的行为识别方法、装置、设备及存储介质，可以解决相关技术中驾驶员的行为识别方法可能导致误判的问题。所述技术方案如下：Embodiments of the present application provide a method, device, device, and storage medium for a driver's behavior identification, which can solve the problem that the driver's behavior identification method in the related art may lead to misjudgment. The technical solution is as follows:

第一方面，提供了一种驾驶员的行为识别方法，所述方法包括：In a first aspect, a method for identifying a driver's behavior is provided, the method comprising:

获取目标图像，所述目标图像包括驾驶员的人脸；acquiring a target image, the target image including the driver's face;

从所述目标图像中获取第一区域图像，所述第一区域图像包括所述驾驶员的预设行为，所述预设行为是指与违规行为之间的相似度大于预设阈值的行为；acquiring a first area image from the target image, where the first area image includes a preset behavior of the driver, and the preset behavior refers to a behavior whose similarity to the illegal behavior is greater than a preset threshold;

在所述目标图像中，按照第一比例阈值对所述第一区域图像的四周区域进行缩减处理，得到第二区域图像，以及按照第二比例阈值对所述第一区域图像的四周区域进行扩充处理，得到第三区域图像；In the target image, reducing the surrounding area of the first area image according to the first proportional threshold to obtain the second area image, and expanding the surrounding area of the first area image according to the second proportional threshold processing to obtain a third region image;

基于所述第一区域图像、所述第二区域图像和所述第三区域图像，识别出所述驾驶员的行为。Based on the first area image, the second area image and the third area image, the behavior of the driver is identified.

可选地，所述基于所述第一区域图像、所述第二区域图像和所述第三区域图像，识别出所述驾驶员的行为之前，还包括：Optionally, before recognizing the behavior of the driver based on the first area image, the second area image and the third area image, the method further includes:

将所述第一区域图像、所述第二区域图像和所述第三区域图像调整为相同尺寸的区域图像；Adjusting the first area image, the second area image and the third area image into area images of the same size;

相应地，所述基于所述第一区域图像、所述第二区域图像和所述第三区域图像，检测出所述驾驶员的行为，包括：Correspondingly, the detecting the driver's behavior based on the first area image, the second area image and the third area image includes:

调用目标网络模型，所述目标网络模型用于基于任一行为对应的一组区域图像确定所述行为的行为类别；Calling a target network model, the target network model is used to determine the behavior category of the behavior based on a group of regional images corresponding to any behavior;

基于尺寸调整后的第一区域图像、第二区域图像和第三区域图像，通过所述目标网络模型识别出所述驾驶员的行为。Based on the resized first area image, second area image and third area image, the behavior of the driver is identified through the target network model.

可选地，所述目标网络模型包括输入层、中间层、拼接层、全连接层和输出层；Optionally, the target network model includes an input layer, an intermediate layer, a splicing layer, a fully connected layer and an output layer;

所述基于尺寸调整后的第一区域图像、第二区域图像和第三区域图像，通过所述目标网络模型识别出所述驾驶员的行为，包括：The behavior of the driver is identified through the target network model based on the size-adjusted first area image, second area image and third area image, including:

通过所述输入层基于尺寸调整后的区域图像的分辨率和通道数量，将所述尺寸调整后的第一区域图像、第二区域图像和第三区域图像中的图像数据进行通道叠加处理，得到第一特征图；Based on the resolution and the number of channels of the resized area image, the input layer performs channel superposition processing on the image data in the resized first area image, second area image and third area image, to obtain the first feature map;

通过所述中间层对所述第一特征图进行卷积采样处理，得到多个第二特征图，所述多个第二特征图的尺寸相同且通道数不同；Performing convolution sampling processing on the first feature map by the intermediate layer to obtain a plurality of second feature maps, the plurality of second feature maps having the same size and different channel numbers;

通过所述拼接层将所述多个第二特征图进行通道叠加处理，并通过网络深层的卷积层对通道叠加后的特征图进行特征融合，得到第三特征图；Perform channel stacking processing on the plurality of second feature maps through the splicing layer, and perform feature fusion on the feature maps after channel stacking through the convolution layer in the deep layer of the network to obtain a third feature map;

通过所述全连接层基于所述第三特征图，确定所述驾驶员的行为；determining the driver's behavior based on the third feature map through the fully connected layer;

通过所述输出层输出所述行为。The behavior is output through the output layer.

可选地，所述中间层包括N组卷积层和N组采样层，每组卷积层与每组采样层一一对应；Optionally, the intermediate layer includes N groups of convolution layers and N groups of sampling layers, and each group of convolution layers corresponds to each group of sampling layers one-to-one;

所述通过所述中间层对所述第一特征图进行卷积采样处理，得到多个第二特征图，包括：Performing convolution sampling processing on the first feature map through the intermediate layer to obtain a plurality of second feature maps, including:

令i＝1，将所述第一特征图确定为目标特征图；通过第i组卷积层对所述目标特征图进行卷积处理，通过第i组采样层分别按照两个不同倍数对得到的特征图进行采样处理，得到2ⁱ倍特征图和第i个参考尺寸的第二特征图，将所述2ⁱ倍特征图获取为所述目标特征图，所述参考尺寸大于或等于所述2ⁱ倍特征图的尺寸；Let i=1, determine the first feature map as the target feature map; perform convolution processing on the target feature map through the i-th group of convolution layers, and obtain the i-th group of sampling layers according to two different multiple pairs. The feature map is sampled to obtain a ²ⁱ -fold feature map and a second feature map of the i-th reference size, and the ²ⁱ -fold feature map is obtained as the target feature map, and the reference size is greater than or equal to the 2 ⁱ times the size of the feature map;

当i小于所述N时，令i＝i+1，返回所述通过第i组卷积层对所述目标特征图进行卷积处理，通过第i组采样层分别按照两个不同倍数对得到的特征图进行采样处理，得到2ⁱ倍特征图和第i个参考尺寸的第二特征图，将所述2ⁱ倍特征图获取为所述目标特征图的操作；When i is less than the N, let i=i+1, return the convolution processing of the target feature map through the i-th group of convolution layers, and obtain through the i-th group of sampling layers according to two different multiple pairs. The feature map is sampled to obtain 2 ⁱ times the feature map and the second feature map of the i th reference size, and the 2 ⁱ times feature map is obtained as the operation of the target feature map;

当i等于所述N时，通过第N组卷积层对所述目标特征图进行卷积处理，通过第N组采样层按照对得到的特征图进行一次采样处理，得到第N个参考尺寸的第二特征图，结束操作。When i is equal to the N, the target feature map is subjected to convolution processing through the Nth group of convolution layers, and the Nth group of sampling layers is used to perform a sampling process on the obtained feature map to obtain the Nth reference size. The second feature map, end the operation.

可选地，所述调用目标网络模型之前，还包括：Optionally, before invoking the target network model, the method further includes:

获取多个训练样本，所述多个训练样本包括多组图像和每组图像中的行为类别，每组图像包括行为的区域图像、对所述区域图像进行缩减处理后确定的缩减区域图像，以及对所述区域图像进行扩充处理后确定的扩充区域图像；Acquiring multiple training samples, the multiple training samples include multiple sets of images and behavior categories in each set of images, and each set of images includes an area image of the behavior and a reduced area determined after reducing the area image an image, and an expanded area image determined after the area image is expanded;

基于所述多个训练样本对待训练的网络模型进行训练后得到所述目标网络模型。The target network model is obtained after training the network model to be trained based on the plurality of training samples.

可选地，所述从所述目标图像中获取第一区域图像，包括：Optionally, the obtaining the first region image from the target image includes:

调用目标检测模型，将所述目标图像输入至所述目标检测模型中，输出人脸检测框和预设行为检测框，所述目标检测模型用于基于任一图像对所述图像中的人脸和预设行为进行识别；Invoke a target detection model, input the target image into the target detection model, and output a face detection frame and a preset behavior detection frame, and the target detection model is used to detect the face in the image based on any image. and pre-determined behavior;

当所述人脸检测框的数量为一个时，将所述人脸检测框确定为目标人脸检测框；当所述人脸检测框的数量为多个时，从多个人脸检测框中获取最大面积的人脸检测框，将获取的人脸检测框确定为目标人脸检测框；When the number of the face detection frames is one, the face detection frame is determined as the target face detection frame; when the number of the face detection frames is multiple, it is obtained from multiple face detection frames The face detection frame with the largest area, and the acquired face detection frame is determined as the target face detection frame;

基于所述目标人脸检测框，确定所述第一区域图像。The first region image is determined based on the target face detection frame.

可选地，所述基于所述目标人脸检测框，确定所述第一区域图像，包括：Optionally, the determining the first region image based on the target face detection frame includes:

将所述预设行为检测框中未与所述目标人脸检测框重叠，或与所述目标人脸检测框之间的距离大于预设距离阈值的预设行为检测框过滤掉；Filter out the preset behavior detection frame that does not overlap with the target face detection frame in the preset behavior detection frame, or whose distance from the target face detection frame is greater than a preset distance threshold;

从所述目标图像中切割出过滤后剩余的预设行为检测框对应的区域，得到所述第一区域图像。The area corresponding to the preset behavior detection frame remaining after filtering is cut out from the target image to obtain the first area image.

可选地，所述获取目标图像，包括：Optionally, the acquiring the target image includes:

检测用于对所述驾驶员的人脸进行拍摄的摄像头的工作模式；detecting the working mode of the camera for photographing the driver's face;

当所述摄像头的工作模式为红外拍摄模式时，获取拍摄得到的灰度图像；When the working mode of the camera is an infrared shooting mode, acquiring a grayscale image obtained by shooting;

调用图像伪彩转换模型，将所述灰度图像输入至所述图像伪彩转换模型中，输出所述灰度图像对应的三通道彩色图像，所述图像伪彩转换模型用于将任一灰度图像转换为所述灰度图像对应的三通道彩色图像；Invoke the image pseudo-color conversion model, input the grayscale image into the image pseudo-color conversion model, and output a three-channel color image corresponding to the gray-scale image, and the image pseudo-color conversion model is used to convert any grayscale image. converting the grayscale image into a three-channel color image corresponding to the grayscale image;

将输出的三通道彩色图像获取为所述目标图像。The output three-channel color image is acquired as the target image.

可选地，所述检测用于对所述驾驶员的人脸进行拍摄的摄像头的工作模式之前，还包括：Optionally, before the detection of the working mode of the camera for photographing the driver's face, the method further includes:

获取当前的车速和光照强度；Get the current speed and light intensity;

当所述车速大于预设车速阈值且所述光照强度低于光照强度阈值时，将所述摄像头的工作模式切换为所述红外拍摄模式。When the vehicle speed is greater than the preset vehicle speed threshold and the light intensity is lower than the light intensity threshold, the working mode of the camera is switched to the infrared shooting mode.

可选地，所述基于所述第一区域图像、所述第二区域图像和所述第三区域图像，检测出所述驾驶员的行为之后，还包括：Optionally, after the driver's behavior is detected based on the first area image, the second area image and the third area image, the method further includes:

当所述驾驶员的行为属于违规行为时，统计所述驾驶员在预设时长内的违规行为次数；When the driver's behavior is a violation, count the number of violations of the driver within a preset period of time;

当在所述预设时长内所述驾驶员的违规行为次数达到预设次数阈值时，进行违规驾驶报警提示。When the number of illegal behaviors of the driver reaches a preset number of times threshold within the preset time period, an illegal driving alarm prompt is issued.

第二方面，提供了一种驾驶员的行为识别装置，所述装置包括：In a second aspect, a driver's behavior recognition device is provided, the device comprising:

第一获取模块，用于获取目标图像，所述目标图像包括驾驶员的人脸；a first acquisition module, for acquiring a target image, the target image including the driver's face;

第二获取模块，用于从所述目标图像中获取第一区域图像，所述第一区域图像包括所述驾驶员的预设行为，所述预设行为是指与违规行为之间的相似度大于预设阈值的行为；The second acquiring module is configured to acquire a first area image from the target image, where the first area image includes a preset behavior of the driver, and the preset behavior refers to the similarity with the illegal behavior Behavior greater than a preset threshold;

图像处理模块，用于在所述目标图像中，按照第一比例阈值对所述第一区域图像的四周区域进行缩减处理，得到第二区域图像，以及按照第二比例阈值对所述第一区域图像的四周区域进行扩充处理，得到第三区域图像；An image processing module, configured to perform reduction processing on the surrounding area of the first area image in the target image according to a first proportional threshold to obtain a second area image, and perform a reduction process on the first area according to the second proportional threshold The surrounding area of the image is expanded to obtain a third area image;

识别模块，用于基于所述第一区域图像、所述第二区域图像和所述第三区域图像，识别出所述驾驶员的行为。An identification module, configured to identify the driver's behavior based on the first area image, the second area image and the third area image.

可选地，所述装置还包括：Optionally, the device further includes:

尺寸调整模块，用于将所述第一区域图像、所述第二区域图像和所述第三区域图像调整为相同尺寸的区域图像；a size adjustment module, configured to adjust the first area image, the second area image and the third area image to area images of the same size;

所述识别模块，用于调用目标网络模型，所述目标网络模型用于基于任一行为对应的一组区域图像确定所述行为的行为类别；The identification module is used to call a target network model, and the target network model is used to determine the behavior category of the behavior based on a group of regional images corresponding to any behavior;

可选地，所述识别模块用于：Optionally, the identification module is used for:

所述目标网络模型包括输入层、中间层、拼接层、全连接层和输出层；通过所述输入层基于尺寸调整后的区域图像的分辨率和通道数量，将所述尺寸调整后的第一区域图像、第二区域图像和第三区域图像中的图像数据进行通道叠加处理，得到第一特征图；The target network model includes an input layer, an intermediate layer, a splicing layer, a fully connected layer and an output layer; through the input layer, based on the resolution of the resized regional image and the number of channels, the resized first The image data in the area image, the second area image and the third area image are subjected to channel superposition processing to obtain a first feature map;

所述中间层包括N组卷积层和N组采样层，每组卷积层与每组采样层一一对应；令i＝1，将所述第一特征图确定为目标特征图；通过第i组卷积层对所述目标特征图进行卷积处理，通过第i组采样层分别按照两个不同倍数对得到的特征图进行采样处理，得到2ⁱ倍特征图和第i个参考尺寸的第二特征图，将所述2ⁱ倍特征图获取为所述目标特征图，所述参考尺寸大于或等于所述2ⁱ倍特征图的尺寸；The middle layer includes N groups of convolution layers and N groups of sampling layers, each group of convolution layers corresponds to each group of sampling layers one-to-one; let i=1, determine the first feature map as the target feature map; The i group of convolution layers performs convolution processing on the target feature map, and the i-th group of sampling layers respectively sample the obtained feature maps according to two different multiples to obtain 2 ⁱ times the feature map and the i-th reference size. the second feature map, obtaining the ²ⁱ -fold feature map as the target feature map, and the reference size is greater than or equal to the size of the ²ⁱ -fold feature map;

可选地，所述装置还包括训练模块，所述训练模块用于：Optionally, the device further includes a training module, the training module is used for:

可选地，所述第二获取模块用于：Optionally, the second obtaining module is used for:

可选地，所述第一获取模块用于：Optionally, the first acquisition module is used for:

可选地，所述装置还包括：Optionally, the device further includes:

第三获取模块，用于获取当前的车速和光照强度；The third acquisition module is used to acquire the current vehicle speed and light intensity;

切换模块，用于当所述车速大于预设车速阈值且所述光照强度低于光照强度阈值时，将所述摄像头的工作模式切换为所述红外拍摄模式。A switching module, configured to switch the working mode of the camera to the infrared shooting mode when the vehicle speed is greater than a preset vehicle speed threshold and the illumination intensity is lower than the illumination intensity threshold.

可选地，所述装置还包括：Optionally, the device further includes:

统计模块，用于当所述驾驶员的行为属于违规行为时，统计所述驾驶员在预设时长内的违规行为次数；A statistics module, configured to count the number of violations of the driver within a preset time period when the driver's behavior is a violation;

报警模块，用于当在所述预设时长内所述驾驶员的违规行为次数达到预设次数阈值时，进行违规驾驶报警提示。An alarm module, configured to issue an alarm for illegal driving when the number of illegal behaviors of the driver reaches a preset number of thresholds within the preset time period.

第三方面，一种智能设备，包括：A third aspect, a smart device, comprising:

处理器；processor;

用于存储处理器可执行指令的存储器；memory for storing processor-executable instructions;

其中，所述处理器被配置为实现上述第一方面所述的驾驶员的行为识别方法。Wherein, the processor is configured to implement the method for recognizing the behavior of the driver described in the first aspect above.

第四方面，提供了一种计算机可读存储介质，所述计算机可读存储介质上存储有指令，所述指令被处理器执行时实现上述第一方面所述的驾驶员的行为识别方法。In a fourth aspect, a computer-readable storage medium is provided, and instructions are stored on the computer-readable storage medium, and when the instructions are executed by a processor, the method for recognizing the driver's behavior described in the first aspect above is implemented.

第五方面，提供了一种包含指令的计算机程序产品，当其在计算机上运行时，使得计算机执行上述第一方面所述的驾驶员的行为识别方法。In a fifth aspect, there is provided a computer program product containing instructions, which, when executed on a computer, cause the computer to execute the method for recognizing the driver's behavior as described in the first aspect above.

本申请实施例提供的技术方案带来的有益效果是：The beneficial effects brought by the technical solutions provided in the embodiments of the present application are:

获取包括驾驶员的人脸的目标图像，从该目标图像中获取包括驾驶员的预设行为的第一区域图像，该预设行为相似于违规行为，也就是说，从该目标图像中获取与违规行为相近的预设行为的第一区域图像。之后，在该目标图像中对第一区域图像进行不同尺度的内截和外扩，得到第二区域图像和第三区域图像。由于第一区域图像可能只包括预设行为，但不包括或包括极小部分与预设行为相关的周边物体，基于该第一区域图像无法准确确定该预设行为是否真正为违规行为，因此，本申请对该第一区域图像进行内截和外扩后，使得得到的第二区域图像和第三区域图像能够包括与预设行为相关的周边物体的更多信息，因此，基于该三个区域图像对驾驶员的行为进行检测，可以提高检测的准确性。A target image including the driver's face is obtained, and a first area image including the driver's preset behavior is obtained from the target image, and the preset behavior is similar to the violation behavior, that is, the target image is obtained from the The first area image of the preset behavior with similar violation behavior. Afterwards, in the target image, the first area image is cut in and out at different scales to obtain the second area image and the third area image. Since the first area image may only include the preset behavior, but does not include or include a very small part of the surrounding objects related to the preset behavior, it is impossible to accurately determine whether the preset behavior is truly a violation based on the first area image. Therefore, After the first area image is cut in and expanded in the present application, the obtained second area image and the third area image can include more information about the surrounding objects related to the preset behavior. Therefore, based on the three areas The image detects the driver's behavior, which can improve the accuracy of the detection.

附图说明Description of drawings

为了更清楚地说明本申请实施例中的技术方案，下面将对实施例描述中所需要使用的附图作简单地介绍，显而易见地，下面描述中的附图仅仅是本申请的一些实施例，对于本领域普通技术人员来讲，在不付出创造性劳动的前提下，还可以根据这些附图获得其他的附图。In order to illustrate the technical solutions in the embodiments of the present application more clearly, the following briefly introduces the drawings that are used in the description of the embodiments. Obviously, the drawings in the following description are only some embodiments of the present application. For those of ordinary skill in the art, other drawings can also be obtained from these drawings without creative effort.

图1是根据一示例性实施例示出的一种驾驶员的人脸拍摄图像的示意图；1 is a schematic diagram of a captured image of a driver's face according to an exemplary embodiment;

图2是根据一示例性实施例示出的一种驾驶员的行为识别方法的流程图；2 is a flowchart of a method for recognizing a driver's behavior according to an exemplary embodiment;

图3是根据一示例性实施例示出的一种驾驶员的人脸检测框的示意图；3 is a schematic diagram of a driver's face detection frame according to an exemplary embodiment;

图4是根据一示例性实施例示出的一种区域图像的示意图；FIG. 4 is a schematic diagram of a region image according to an exemplary embodiment;

图5是根据一示例性实施例示出的一种中间层的卷积采样处理流程示意图；FIG. 5 is a schematic diagram of a convolution sampling processing flow of an intermediate layer according to an exemplary embodiment;

图6是根据一示例性实施例示出的一种驾驶员的行为识别装置的结构示意图；6 is a schematic structural diagram of a driver's behavior recognition device according to an exemplary embodiment;

图7是根据另一示例性实施例示出的一种驾驶员的行为识别装置的结构示意图；7 is a schematic structural diagram of a device for recognizing a driver's behavior according to another exemplary embodiment;

图8是根据另一示例性实施例示出的一种驾驶员的行为识别装置的结构示意图；8 is a schematic structural diagram of a device for recognizing driver behavior according to another exemplary embodiment;

图9是根据另一示例性实施例示出的一种驾驶员的行为识别装置的结构示意图；9 is a schematic structural diagram of a driver's behavior recognition device according to another exemplary embodiment;

图10是根据另一示例性实施例示出的一种驾驶员的行为识别装置的结构示意图；10 is a schematic structural diagram of a device for recognizing driver behavior according to another exemplary embodiment;

图11是根据一示例性实施例示出的一种终端1000的结构示意图。FIG. 11 is a schematic structural diagram of a terminal 1000 according to an exemplary embodiment.

具体实施方式Detailed ways

为使本申请的目的、技术方案和优点更加清楚，下面将结合附图对本申请实施方式作进一步地详细描述。In order to make the objectives, technical solutions and advantages of the present application clearer, the embodiments of the present application will be further described in detail below with reference to the accompanying drawings.

在对本申请实施例提供的驾驶员的行为识别方法进行详细介绍之前，先对本申请实施例涉及的应用场景和实施环境进行简单介绍。Before the detailed introduction of the driver's behavior identification method provided by the embodiments of the present application, the application scenarios and implementation environments involved in the embodiments of the present application are briefly introduced.

首先，对本申请实施例涉及的应用场景进行简单介绍。First, the application scenarios involved in the embodiments of the present application are briefly introduced.

目前，图像识别在智能交通领域得到广泛应用，在一些实施例中，可以用于检测驾驶员的行为。在驾驶员驾驶过程中，可以对驾驶员的人脸部分进行拍摄，然后通过已训练的网络模型对拍摄图像进行检测分析，以检测驾驶员是否存在违规行为。然而，驾驶员的某些行为可能与违规行为相似，譬如，请参考图1，驾驶员的摸脸手势与打电话手势相似，在该种情况下，通过已训练的网络模型进行检测分析时，容易将该摸脸行为误判为违规行为，导致错误报警，如此不仅会影响到乘客，还可能会影响驾驶员分心。为此，本申请实施例提供了一种驾驶员的行为识别方法，该方法可以准确确定出驾驶员的行为是否为违规行为，其具体实现请参见如下图2所示的实施例。Currently, image recognition is widely used in the field of intelligent transportation, and in some embodiments, can be used to detect driver behavior. During the driving process of the driver, the driver's face part can be photographed, and then the photographed image can be detected and analyzed by the trained network model to detect whether the driver has violated the rules. However, some of the driver's behavior may be similar to the violation. For example, please refer to Figure 1. The driver's face touching gesture is similar to the phone call gesture. In this case, when the trained network model is used for detection and analysis, It is easy to misjudge the act of touching the face as a violation, resulting in false alarms, which will not only affect the passengers, but may also affect the driver's distraction. To this end, an embodiment of the present application provides a method for identifying a driver's behavior, which can accurately determine whether the driver's behavior is a violation. For the specific implementation, please refer to the embodiment shown in FIG. 2 below.

接下来，对本申请实施例涉及的实施环境进行简单介绍。Next, the implementation environment involved in the embodiments of the present application is briefly introduced.

本申请实施例提供的驾驶员的行为识别方法可以由智能设备来执行，该智能设备可以配置或连接有摄像头，该摄像头可以安装于车辆前方中控台、仪表盘或A柱子等区域，以通过该摄像头对驾驶员的人脸部分进行拍摄，从而实时获得驾驶员周围清晰的图像。在一些实施例中，该智能设备可以为手机、平板电脑、计算机设备等终端，或者，该智能设备还可以为智能摄像设备等，本申请实施例对此不做限定。The method for recognizing the driver's behavior provided in the embodiment of the present application may be performed by a smart device, and the smart device may be configured with or connected with a camera, and the camera may be installed in an area such as a center console, a dashboard, or an A-pillar in front of the vehicle to pass the The camera captures part of the driver's face to obtain clear images around the driver in real time. In some embodiments, the smart device may be a terminal such as a mobile phone, a tablet computer, a computer device, or the like, or the smart device may also be a smart camera device, etc., which is not limited in this embodiment of the present application.

在介绍完本申请实施例涉及的应用场景和实施环境后，接下来将结合附图对本申请实施例提供的驾驶员的行为识别方法进行详细介绍。After introducing the application scenarios and implementation environments involved in the embodiments of the present application, the method for recognizing the driver's behavior provided by the embodiments of the present application will be described in detail next with reference to the accompanying drawings.

请参见图2，该图2是根据一示例性实施例示出的一种驾驶员的行为识别方法的流程图，本申请实施例以该方法由上述智能设备执行为例进行说明，该驾驶员的行为识别方法可以包括如下几个实现步骤：Please refer to FIG. 2. FIG. 2 is a flowchart of a method for recognizing a driver's behavior according to an exemplary embodiment. This embodiment of the present application is described by taking the method executed by the above-mentioned smart device as an example. The behavior recognition method can include the following implementation steps:

步骤201：获取目标图像，该目标图像包括驾驶员的人脸。Step 201: Acquire a target image, where the target image includes the driver's face.

在一种可能的实现方式中，该智能设备可以每隔时长阈值，获取摄像头拍摄的视频图像，得到该目标图像。或者可以理解为，在摄像头拍摄视频的过程中，该智能设备可以每隔预设数量个视频图像帧，获取一帧视频图像，得到该目标图像。In a possible implementation manner, the smart device may acquire a video image captured by a camera at intervals of a time threshold to obtain the target image. Alternatively, it can be understood that, during the process of video shooting by the camera, the smart device may acquire a frame of video image every preset number of video image frames to obtain the target image.

其中，上述时长阈值可以由用户根据实际需求自定义设置，也可以由该智能设备默认设置，本申请实施例对此不做限定。The above-mentioned duration threshold may be set by the user according to actual needs, or may be set by default by the smart device, which is not limited in this embodiment of the present application.

另外，该预设数量可以由用户根据实际需求自定义设置，也可以由该智能设备默认设置，本申请实施例对此不做限定。譬如，该预设数量可以为5，此时该目标图像可以为第一个视频图像帧、第六个视频图像帧、第十一个视频图像帧等。In addition, the preset number may be set by the user according to actual needs, or may be set by default by the smart device, which is not limited in this embodiment of the present application. For example, the preset number may be 5, and the target image may be the first video image frame, the sixth video image frame, the eleventh video image frame, and the like.

进一步地，该目标图像可以为三通道彩色图像，该三通道是指R、G和B三个通道，分别代表像素点的红色值、绿色值和蓝色值。在实施中，智能设备检测用于对该驾驶员的人脸进行拍摄的摄像头的工作模式，当该摄像头的工作模式为红外拍摄模式时，获取拍摄得到的灰度图像，调用图像伪彩转换模型，将该灰度图像输入至该图像伪彩转换模型中，输出该灰度图像对应的三通道彩色图像，该图像伪彩转换模型用于将任一灰度图像转换为该灰度图像对应的三通道彩色图像，将输出的三通道彩色图像获取为该目标图像。Further, the target image may be a three-channel color image, and the three channels refer to three channels of R, G and B, respectively representing the red value, the green value and the blue value of the pixel point. In the implementation, the smart device detects the working mode of the camera used for photographing the driver's face, and when the working mode of the camera is the infrared photographing mode, obtains the grayscale image obtained by shooting, and invokes the image pseudo-color conversion model , input the grayscale image into the image pseudo-color conversion model, and output the three-channel color image corresponding to the grayscale image. The image pseudocolor conversion model is used to convert any grayscale image into the corresponding grayscale image. A three-channel color image, and the output three-channel color image is acquired as the target image.

也就是说，摄像头可以根据日夜光线强度自主切换工作模式，该工作模式可以包括红外拍摄模式和非红外拍摄模式。在非红外拍摄模式下，摄像头拍摄的图像为三通道彩色图像，此时，可以直接将摄像头采集的图像获取为目标图像。然而，在红外拍摄模式下，摄像头拍摄的图像为灰度图像，由于灰度图像损失了颜色信息，对于智能设备来说，可能增加了后续处理难度。为此，当检测到摄像头处于红外拍摄模式时，可以将该灰度图像转换为三通道彩色图像，即还原出灰度图像的伪彩色信息，并将转换后得到的三通道彩色图像获取为目标图像。That is to say, the camera can automatically switch the working mode according to the light intensity of day and night, and the working mode can include an infrared shooting mode and a non-infrared shooting mode. In the non-infrared shooting mode, the image captured by the camera is a three-channel color image. At this time, the image captured by the camera can be directly obtained as the target image. However, in the infrared shooting mode, the image captured by the camera is a grayscale image. Since the grayscale image loses color information, it may increase the difficulty of subsequent processing for smart devices. To this end, when it is detected that the camera is in the infrared shooting mode, the grayscale image can be converted into a three-channel color image, that is, the false color information of the grayscale image can be restored, and the three-channel color image obtained after conversion can be obtained as the target. image.

在一些实施例中，可以通过图像伪彩转换模型将灰度图像转换为对应的三通道彩色图像。也即是，可以将该灰度图像输入至伪彩转换模型中，该伪彩转换模型输出与该灰度图像同等尺寸的三通道彩色图像。In some embodiments, a grayscale image can be converted to a corresponding three-channel color image by an image pseudocolor conversion model. That is, the grayscale image can be input into a pseudo-color conversion model that outputs a three-channel color image of the same size as the grayscale image.

进一步地，在调用图像伪彩转换模型之前，还可以获取多个灰度图像样本和每个灰度图像样本对应的彩色图像样本，基于该多个灰度图像样本和每个灰度图像样本对应的彩色图像样本，对待训练的网络进行训练后得到该图像伪彩转换模型。Further, before calling the image pseudo-color conversion model, a plurality of grayscale image samples and color image samples corresponding to each grayscale image sample can also be obtained, and based on the corresponding grayscale image samples and each grayscale image sample The color image sample of , the image pseudo-color conversion model is obtained after training the network to be trained.

譬如，可以获取50万余张从自然场景中截取的彩色图像样本，将他们通过如下公式(1)转变成对应的灰度图像样本，之后，将该50万余张的彩色图像样本和灰度图像样本输入至待训练的网络中进行训练。其中，所述公式(1)为：For example, it is possible to obtain more than 500,000 color image samples taken from natural scenes, and convert them into corresponding grayscale image samples through the following formula (1). The image samples are input into the network to be trained for training. Wherein, the formula (1) is:

H＝R*0.299+G*0.587+B*0.114 (1)H=R*0.299+G*0.587+B*0.114 (1)

其中，H表示像素点的灰度值，R表示像素点的红色值，G表示像素点的绿色值，B表示像素点的蓝色值。Among them, H represents the gray value of the pixel point, R represents the red value of the pixel point, G represents the green value of the pixel point, and B represents the blue value of the pixel point.

在一种可能的实现方式中，该待训练的网络可以使用但不限于条件-生成对抗网络，此时在训练过程中，该网络包括生成器和判别器，该生成器可以通过将该灰度图像样本进行参数叠加生成同样尺寸的彩色图像，该判别器通过判别生成的彩色图像与该灰度图像样本对应的彩色图像样本之间的差异值来产生训练的损失，从而迭代训练该网络。In a possible implementation manner, the network to be trained can use, but is not limited to, a conditional generative adversarial network. At this time, during the training process, the network includes a generator and a discriminator. The image samples are superimposed with parameters to generate a color image of the same size. The discriminator generates the training loss by judging the difference between the generated color image and the color image sample corresponding to the grayscale image sample, thereby iteratively training the network.

进一步地，检测用于对该驾驶员的人脸进行拍摄的摄像头的工作模式之前，智能设备获取当前的车速和光照强度，当该车速大于预设车速阈值且该光照强度低于光照强度阈值时，将该摄像头的工作模式切换为该红外拍摄模式。Further, before detecting the working mode of the camera for photographing the driver's face, the smart device obtains the current vehicle speed and light intensity, when the vehicle speed is greater than the preset speed threshold and the light intensity is lower than the light intensity threshold. , switch the working mode of the camera to the infrared shooting mode.

也就是说，可以只有在车速大于一定预设车速阈值，并且光照强度较低的情况下，摄像头才自动切换为红外拍摄模式，否则，该摄像头以非红外拍摄模式进行视频拍摄，即以常规模式进行视频拍摄。That is to say, the camera can automatically switch to the infrared shooting mode only when the vehicle speed is greater than a certain preset vehicle speed threshold and the light intensity is low; Do video shooting.

其中，该预设车速阈值可以由用户根据实际需求进行设置，也可以由该智能设备默认设置，本申请实施例对此不作限定。The preset vehicle speed threshold may be set by the user according to actual needs, or may be set by default by the smart device, which is not limited in this embodiment of the present application.

另外，该光照强度阈值可以由用户根据实际需求进行设置，也可以由该智能设备默认设置，本申请实施例对此不作限定。In addition, the light intensity threshold may be set by the user according to actual needs, or may be set by default by the smart device, which is not limited in this embodiment of the present application.

进一步地，上述仅是以将光照强度作为条件之一控制摄像头自动切换为红外拍摄模式为例进行说明。在另一实施例中，为了能够提醒驾驶员不要疲劳驾驶，可能有检测驾驶员是否闭眼的需求，在该种情况下，如果驾驶员配带墨镜，则同样需要摄像头切换为红外拍摄模式，此时可以由用户进行手动切换以触发红外拍摄切换指令，使得智能设备基于该红外拍摄切换指令，将摄像头切换为红外拍摄模式。Further, the above description is only described by taking the light intensity as one of the conditions to control the camera to automatically switch to the infrared shooting mode as an example. In another embodiment, in order to remind the driver not to drive fatigued, there may be a need to detect whether the driver closes his eyes. In this case, if the driver wears sunglasses, the camera also needs to be switched to the infrared shooting mode. At this time, the user can manually switch to trigger the infrared shooting switching instruction, so that the smart device switches the camera to the infrared shooting mode based on the infrared shooting switching instruction.

需要说明的是，上述仅是以该目标图像为三通道彩色图像为例进行说明，在另一实施例中，该目标图像也可以为灰度图像，本申请实施例对此不做限定。It should be noted that the above description only takes the target image as a three-channel color image as an example for description. In another embodiment, the target image may also be a grayscale image, which is not limited in this embodiment of the present application.

还需要说明的是，上述仅是以当车速大于预设车速阈值且光照强度较低的情况下，摄像头自动切换为红外拍摄模式，否则以非红外拍摄模式进行拍摄为例进行说明。在另一实施例中，当车速低于该预设阈值时，可以不开启摄像头，也就是说，如果车速低于预设阈值，可以不对驾驶员进行行为检测。因此，获取目标图像之前，可以先判断车速是否大于预设车速阈值，当车速大于该预设车速阈值时，执行获取目标图像的操作，否则，不启动摄像头，也即是，不执行获取目标图像的操作。It should also be noted that the above description is only based on the case where the vehicle speed is greater than the preset vehicle speed threshold and the light intensity is low, the camera automatically switches to the infrared shooting mode, otherwise, the non-infrared shooting mode is used as an example for description. In another embodiment, when the vehicle speed is lower than the preset threshold, the camera may not be turned on, that is, if the vehicle speed is lower than the preset threshold, the behavior detection of the driver may not be performed. Therefore, before acquiring the target image, you can first determine whether the vehicle speed is greater than the preset vehicle speed threshold, and when the vehicle speed is greater than the preset vehicle speed threshold, the operation of acquiring the target image is performed, otherwise, the camera is not activated, that is, the acquisition of the target image is not performed. operation.

步骤202：从该目标图像中获取第一区域图像，该第一区域图像包括该驾驶员的预设行为，该预设行为是指与违规行为之间的相似度大于预设阈值的行为。Step 202: Acquire a first area image from the target image, where the first area image includes a preset behavior of the driver, where the preset behavior refers to behaviors whose similarity to the illegal behavior is greater than a preset threshold.

其中，该预设行为可以由用户根据实际需求自定义设置，也可以由该智能设备默认设置，本申请实施例对此不作限定。The preset behavior may be set by the user according to actual needs, or may be set by default by the smart device, which is not limited in this embodiment of the present application.

譬如，该预设行为可以包括打电话、抽烟、摸下巴、捂嘴、摸侧脸、摸耳朵等。其中，这里所述的打电话是指手拿着手机贴近耳朵的姿势。For example, the preset behavior may include making a phone call, smoking a cigarette, touching the chin, covering the mouth, touching the side of the face, touching the ear, and the like. Wherein, the making a phone call here refers to the posture of holding the mobile phone close to the ear.

另外，上述预设阈值可以由用户根据实际需求自定义设置，也可以由该计算机设备默认设置，本申请实施例对此不做限定。In addition, the above-mentioned preset threshold may be set by the user according to actual needs, or may be set by default by the computer device, which is not limited in this embodiment of the present application.

也就是说，智能设备从该目标图像中确定驾驶员的与违规行为相似的预设行为所在区域，之后从该目标图像中获取该所确定的区域，以便于后续可以针对该预设行为做进一步细分析，从而确定该预设行为是否真正为违规行为。That is to say, the smart device determines the area where the driver's preset behavior similar to the illegal behavior is located from the target image, and then acquires the determined area from the target image, so that the preset behavior can be further processed in the future Detailed analysis to determine whether the preset behavior is truly a violation.

在一些实施例中，从该目标图像中获取第一区域图像的具体实现可以包括：调用目标检测模型，将该目标图像输入至该目标检测模型中，输出人脸检测框和预设行为检测框，该目标检测模型用于基于任一图像对该图像中的人脸和预设行为进行识别。当该人脸检测框的数量为一个时，将该人脸检测框确定为目标人脸检测框；当该人脸检测框的数量为多个时，从多个人脸检测框中获取最大面积的人脸检测框，将获取的人脸检测框确定为目标人脸检测框，基于该目标人脸检测框，确定该第一区域图像。In some embodiments, the specific implementation of acquiring the first region image from the target image may include: calling a target detection model, inputting the target image into the target detection model, and outputting a face detection frame and a preset behavior detection frame , the target detection model is used to identify faces and preset behaviors in any image based on the image. When the number of the face detection frame is one, the face detection frame is determined as the target face detection frame; when the number of the face detection frame is multiple, the largest area of the face detection frame is obtained from the multiple face detection frames. The face detection frame, the acquired face detection frame is determined as the target face detection frame, and the first region image is determined based on the target face detection frame.

也就是说，该目标检测模型不仅可以检测出人脸，还可以检测出预设行为对应的区域，譬如打电话、摸脸、抽烟等行为对应的区域。通常情况下，检测出的预设行为检测框的数量为多个，而其中的某个预设行为检测框对应的预设行为可能不是驾驶员的，譬如，可能是站在驾驶员旁边的某乘客的，智能设备可以过滤掉与驾驶员无关的预设行为检测框。在实施中，可以确定驾驶员的人脸检测框，之后，基于驾驶员的人脸检测框进行过滤操作。That is to say, the target detection model can not only detect faces, but also detect areas corresponding to preset behaviors, such as areas corresponding to behaviors such as calling, touching your face, and smoking. Usually, the number of detected preset behavior detection frames is multiple, and the preset behavior corresponding to one of the preset behavior detection frames may not belong to the driver. For passengers, smart devices can filter out pre-set behavior detection boxes that are irrelevant to the driver. In implementation, the driver's face detection frame may be determined, and then a filtering operation is performed based on the driver's face detection frame.

由于摄像头是对着驾驶员的人脸部分进行拍摄的，因此，当输出的人脸检测框的数量为一个时，可以确定该人脸检测框对应的是驾驶员的人脸，此时将该人脸检测框确定为目标检测框。在另一些实施例中，乘客可能站在驾驶员附近，譬如，站在驾驶员侧后方且面朝向摄像头，此时，该目标检测模型可能会检测到多个人脸检测框。由于摄像头是对着驾驶员的人脸进行拍摄的，因此可以确定面积最大的人脸检测框对应的是驾驶员的人脸，譬如，如图3所示，所以将面积最大的人脸检测框确定为目标人脸检测框。Since the camera shoots at the driver's face, when the number of output face detection frames is one, it can be determined that the face detection frame corresponds to the driver's face. The face detection frame is determined as the target detection frame. In other embodiments, the passenger may be standing near the driver, for example, standing behind the driver's side and facing the camera, at which point the object detection model may detect multiple face detection frames. Since the camera shoots the driver's face, it can be determined that the face detection frame with the largest area corresponds to the driver's face. For example, as shown in Figure 3, the face detection frame with the largest area is Determined as the target face detection frame.

进一步地，基于该目标人脸检测框，确定该第一区域图像的具体实现可以包括：将该预设行为检测框中未与该目标人脸检测框重叠，或与该目标人脸检测框之间的距离大于预设距离阈值的预设行为检测框过滤掉，从该目标图像中切割出过滤后剩余的预设行为检测框对应的区域，得到该第一区域图像。Further, based on the target face detection frame, the specific implementation of determining the first area image may include: the preset behavior detection frame does not overlap with the target face detection frame, or overlaps with the target face detection frame. The preset behavior detection frames whose distances are greater than the preset distance threshold are filtered out, and the area corresponding to the remaining preset behavior detection frames after filtering is cut out from the target image to obtain the first area image.

智能设备可以基于目标人脸检测框，对不属于驾驶员的预设行为的预设行为检测框进行过滤。不难理解，如果预设行为检测框未与该目标人脸检测框重叠，说明该预设行为检测框距离驾驶员的人脸较远，因此可以确定该预设行为检测框对应的预设行为不是驾驶员的行为。另外，也可以根据预设行为检测框与目标人脸检测框之间的距离来确定，当预设行为检测框与目标人脸检测框之间的距离大于预设距离阈值时，也说明该预设行为检测框距离驾驶员的人脸较远，从而可以确定该预设行为检测框对应的预设行为不是驾驶员的行为。The smart device can filter the preset behavior detection frame that does not belong to the driver's preset behavior based on the target face detection frame. It is not difficult to understand that if the preset behavior detection frame does not overlap with the target face detection frame, it means that the preset behavior detection frame is far away from the driver's face, so the preset behavior corresponding to the preset behavior detection frame can be determined. Not the driver's behavior. In addition, it can also be determined according to the distance between the preset behavior detection frame and the target face detection frame. When the distance between the preset behavior detection frame and the target face detection frame is greater than the preset distance threshold It is assumed that the behavior detection frame is far away from the driver's face, so that it can be determined that the preset behavior corresponding to the preset behavior detection frame is not the behavior of the driver.

其中，该预设距离阈值可以由用户根据实际需求自定义设置，也可以由该智能设备默认设置，本申请实施例对此不做限定。The preset distance threshold may be set by the user according to actual needs, or may be set by default by the smart device, which is not limited in this embodiment of the present application.

智能设备对预设行为检测框进行过滤后，可以确定剩余的预设行为检测框为驾驶员的预设行为对应的检测框，智能设备从目标图像中切割出其对应的区域，得到第一区域图像。After the smart device filters the preset behavior detection frames, it can determine that the remaining preset behavior detection frames are the detection frames corresponding to the driver's preset behavior, and the smart device cuts out the corresponding area from the target image to obtain the first area. image.

需要说明的是，如果当前输出的预设行为检测框没有与目标人脸检测框重叠的，或者，与该目标人脸检测框之间的距离均大于预设距离阈值，可以确定在该目标图像中该驾驶员当前不存在违规行为。It should be noted that, if the currently output preset behavior detection frame does not overlap with the target face detection frame, or if the distance from the target face detection frame is greater than the preset distance threshold, it can be determined that the target image The driver currently has no violations.

进一步地，当通过目标检测模型检测出驾驶员配带墨镜时，可以自动控制摄像头切换为红外拍摄模式。也就是说，如果在白天通过目标检测模型检测到驾驶员配带了墨镜，为了能够检测出驾驶员是否存在闭眼行为，可以将摄像头切换为红外拍摄模式。Further, when it is detected through the target detection model that the driver is wearing sunglasses, the camera can be automatically controlled to switch to the infrared shooting mode. That is to say, if the target detection model detects that the driver is wearing sunglasses during the day, in order to detect whether the driver has closed eyes, the camera can be switched to infrared shooting mode.

进一步地，调用该目标检测模型之前，可以获取多个图像样本和每个图像样本中的人脸框和预设行为框，将该多个图像样本和每个图像样本中的人脸框和预设行为框输入至待训练的检测模型进行训练，得到该目标检测模型。Further, before calling the target detection model, multiple image samples and the face frame and preset behavior frame in each image sample can be obtained, and the multiple image samples and the face frame and preset behavior frame in each image sample can be obtained. Assume that the behavior box is input to the detection model to be trained for training, and the target detection model is obtained.

其中，每个图像样本中的人脸框和预设行为框可以是预先标定的，也就是说，基于多个图像样本和该多个图像样本中已标定的人脸框和预设行为框，对待训练的检测模型进行深度学习和训练，从而使得得到的目标检测模型可以自动检测出任一图像中的人脸和预设行为。The face frame and the preset behavior frame in each image sample may be pre-calibrated, that is, based on multiple image samples and the calibrated face frame and preset behavior frame in the multiple image samples, Deep learning and training are performed on the detection model to be trained, so that the obtained target detection model can automatically detect faces and preset behaviors in any image.

在一种可能的实现方式中，该待训练的检测模型可以为YOLO(You Only LookOnce，你只看一次)网络，SSD(Single Shot Detector，一次性探测器)等，本申请实施例对此不做限定。In a possible implementation manner, the detection model to be trained may be a YOLO (You Only LookOnce, you only look once) network, an SSD (Single Shot Detector, a one-time detector), etc. This embodiment of the present application does not cover this. Do limit.

步骤203：在该目标图像中，按照第一比例阈值对该第一区域图像的四周区域进行缩减处理，得到第二区域图像，以及按照第二比例阈值对该第一区域图像的四周区域进行扩充处理，得到第三区域图像。Step 203: in the target image, reducing the surrounding area of the first area image according to the first proportional threshold to obtain a second area image, and expanding the surrounding area of the first area image according to the second proportional threshold processing to obtain a third region image.

由于第一区域图像可能仅包括预设行为，但未包括或者仅包括小部分与预设行为相关的物品，譬如，该物品为手机、烟等，也就是说，手机或烟可能不完全包含在第一区域图像中，或是只占其中的很小一部分，同时手机或烟一类的物体可能由于相机架设、本身尺寸问题而呈现的大小不同。为了进一步确定该预设行为是否真正为违规行为，智能设备按照不同尺度对该第一区域图像对应的区域进行内裁和外扩，即在该目标图像中，按照第一比例阈值对该第一区域图像的四周区域进行缩减处理，以及按照第二比例阈值对该第一区域图像的四周区域进行扩充处理，得到第二区域图像和第三区域图像。Since the first area image may only include the preset behavior, but not or only include a small part of the items related to the preset behavior, for example, the item is a mobile phone, a cigarette, etc., that is to say, the mobile phone or the cigarette may not be completely included in the In the image of the first area, or only a small part of it, objects such as mobile phones or smoke may appear in different sizes due to the installation of the camera and its own size. In order to further determine whether the preset behavior is truly a violation, the smart device performs inward cropping and outward expansion of the area corresponding to the first area image according to different scales, that is, in the target image, according to the first proportional threshold The surrounding area of the area image is reduced, and the surrounding area of the first area image is expanded according to the second proportional threshold to obtain the second area image and the third area image.

譬如，请参考图4，假设该第一比例阈值为0.8，第二比例阈值为1.2，则从该目标图像中对第一区域图像进行内截和外扩后，可以得到图4中41和42所示的第二区域图像和第三区域图像，其中，图4中的43为第一区域图像。For example, please refer to FIG. 4 , assuming that the first scale threshold is 0.8 and the second scale threshold is 1.2, after the first region image is inwardly clipped and outwardly expanded from the target image, 41 and 42 in FIG. 4 can be obtained. The second area image and the third area image are shown, wherein 43 in FIG. 4 is the first area image.

值得一提的是，从目标图像中获取第一区域图像、第二区域图像和第三区域图像，剔除了整张目标图像中的冗余信息，如此，与基于整张目标图像进行行为检测相比，基于该第一区域图像、第二区域图像和第三区域图像进行后续的行为判别处理可以提高检测的准确率。It is worth mentioning that the first area image, the second area image and the third area image are obtained from the target image, and the redundant information in the entire target image is eliminated. In this way, it is different from the behavior detection based on the entire target image. In comparison, the subsequent behavior discrimination processing based on the first area image, the second area image and the third area image can improve the detection accuracy.

其中，该第一比例阈值可以根据实际需求自定义设置，也可以由该智能设备默认设置，本申请实施例对此不做限定。Wherein, the first proportional threshold may be set by self-definition according to actual requirements, or may be set by default by the smart device, which is not limited in this embodiment of the present application.

其中，该第二比例阈值可以根据实际需求自定义设置，也可以由该智能设备默认设置，本申请实施例对此不做限定。Wherein, the second proportional threshold value may be set according to actual needs, and may also be set by default by the smart device, which is not limited in this embodiment of the present application.

步骤204：将该第一区域图像、该第二区域图像和该第三区域图像调整为相同尺寸的区域图像。Step 204: Adjust the first area image, the second area image and the third area image into area images of the same size.

在本申请实施例中，为了便于后续基于该第一区域图像、该第二区域图像和该第三区域图像，识别出该驾驶员的行为，智能设备对该三张区域图像进行尺寸调整，即将不同尺度下的区域图像缩放至同一个尺寸。In this embodiment of the present application, in order to facilitate the subsequent identification of the driver's behavior based on the first area image, the second area image, and the third area image, the smart device adjusts the size of the three area images, namely Region images at different scales are scaled to the same size.

步骤205：调用目标网络模型，该目标网络模型用于基于任一行为对应的一组区域图像确定该行为的行为类别。Step 205: Invoke the target network model, where the target network model is used to determine the behavior category of any behavior based on a group of regional images corresponding to the behavior.

其中，该任一行为对应的一组区域图像包括该行为的区域图像、缩减区域图像和扩充区域图像，该缩减区域图像是从该行为所在的拍摄图像中对该行为所在区域四周进行缩减后确定的区域图像，该扩充区域图像是从该行为所在的拍摄图像中对该行为所在区域四周进行扩充后确定的区域图像。进一步地，该行为的区域图像、缩减区域图像和扩充区域图像的尺寸相同。Wherein, a group of area images corresponding to any behavior includes an area image, a reduced area image and an expanded area image of the behavior, and the reduced area image is determined from the captured image where the behavior is located around the area where the behavior is located. The area image of , the expanded area image is an area image determined after expanding around the area where the behavior is located from the captured image where the behavior is located. Further, the area image, the reduced area image and the expanded area image of this behavior are of the same size.

进一步地，调用目标网络模型之前，可以获取多个训练样本，该多个训练样本包括多组图像和每组图像中的行为类别，每组图像包括行为的区域图像、对该区域图像进行缩减处理后确定的缩减区域图像，以及对该区域图像进行扩充处理后确定的扩充区域图像，基于该多个训练样本对待训练的网络模型进行训练后得到该目标网络模型。Further, before calling the target network model, multiple training samples can be obtained, the multiple training samples include multiple groups of images and behavior categories in each group of images, and each group of images includes an area image of behavior, an image of the area. The reduced region image determined after the reduction process and the expanded region image determined after the expansion process are performed on the region image, and the target network model is obtained after training the network model to be trained based on the plurality of training samples.

也就是说，该目标网络模型可以预先通过训练得到。在训练过程中，可以将多组图像和每组图像中的行为类别输入至待训练的网络模型中进行训练。其中，该每组图像包括行为的区域图像、缩减区域图像和扩充区域图像，也就是说，每组图像中的区域图像可以是从行为所在的拍摄图像中对该行为所在区域进行截取得到，缩减区域图像可以是从行为所在的拍摄图像中对该行为所在区域的四周区域进行缩减处理后得到，扩充区域图像可以是从行为所在的拍摄图像中对该行为所在区域的四周区域进行扩充处理后得到。进一步地，每组图像中的三张区域图像的尺寸大小相同，也就是说，在经过缩减处理以及扩充处理后，可以将得到的区域图像与行为对应的区域图像进行尺寸调整，使得该行为对应的三张区域图像的尺寸一致。That is to say, the target network model can be obtained by training in advance. During the training process, multiple sets of images and the behavior categories in each set of images can be input into the network model to be trained for training. Wherein, each group of images includes an area image, a reduced area image and an expanded area image of the behavior, that is to say, the area image in each group of images may be obtained by intercepting the area where the behavior is located from the captured image where the behavior is located , the reduced area image can be obtained by reducing the surrounding area of the area where the behavior is located from the photographed image where the behavior is located, and the expanded area image can be obtained from the photographed image where the behavior is located. The surrounding area of the area where the behavior is located is expanded obtained later. Further, the sizes of the three area images in each group of images are the same, that is to say, after the reduction processing and expansion processing, the obtained area images and the area images corresponding to the behavior can be resized, so that the behavior The corresponding three area images are of the same size.

另外，由于不同通道中蕴含的信息不同，为了使得待训练的网络模型能够学习到每个通道对于识别任务的权重，可以在该待训练的网络模型中添加压缩-激活模块，以为待训练的网络模型的各个通道之间进行加权，从而使得待训练的网络模型的学习更关注于感兴趣区域，提高了目标网络模型检测的准确率。In addition, since the information contained in different channels is different, in order to enable the network model to be trained to learn the weight of each channel for the recognition task, a compression-activation module can be added to the network model to be trained, as the network model to be trained. Each channel of the model is weighted, so that the learning of the network model to be trained pays more attention to the region of interest, and the accuracy of the target network model detection is improved.

步骤206：基于尺寸调整后的第一区域图像、第二区域图像和第三区域图像，通过该目标网络模型识别出该驾驶员的行为。Step 206: Identify the driver's behavior through the target network model based on the resized first area image, second area image, and third area image.

进一步地，该目标网络模型包括输入层、中间层、拼接层、全连接层和输出层，上述基于尺寸调整后的第一区域图像、第二区域图像和第三区域图像，通过该目标网络模型识别出该驾驶员的行为的具体实现可以包括：通过输入层基于尺寸调整后的区域图像的分辨率和通道数量，将尺寸调整后的第一区域图像、第二区域图像和第三区域图像中的图像数据进行通道叠加处理，得到第一特征图。通过中间层对第一特征图进行卷积采样处理，得到多个第二特征图，多个第二特征图的尺寸相同且通道数不同，通过拼接层将多个第二特征图进行通道叠加处理，并通过网络深层的卷积层对通道叠加后的特征图进行特征融合，得到第三特征图。通过全连接层基于第三特征图，确定驾驶员的行为，通过输出层输出行为。Further, the target network model includes an input layer, an intermediate layer, a splicing layer, a fully connected layer and an output layer. The specific implementation of recognizing the behavior of the driver may include: using the input layer based on the resolution and the number of channels of the resized area image, to resize the first area image, the second area image and the third area image. The image data is subjected to channel superposition processing to obtain the first feature map. The first feature map is subjected to convolution sampling processing through the intermediate layer to obtain multiple second feature maps. The multiple second feature maps have the same size and different number of channels. The multiple second feature maps are subjected to channel stacking processing through the splicing layer. , and perform feature fusion on the feature map after channel stacking through the convolution layer in the deep layer of the network to obtain the third feature map. The driver's behavior is determined based on the third feature map through the fully connected layer, and the behavior is output through the output layer.

譬如，假设每张区域图像的分辨率为256*256，且每张区域图像的通道数量为3通道，在该种情况下，将尺寸调整后的第一区域图像、第二区域图像和第三区域图像这三张区域图像输入至目标网络模型中，通过输出层将该三张区域图像进行通道叠加，可以得到256*256*9的三维矩阵，其中，9为通道叠加后的通道数量，该三维矩阵用于指示该第一特征图。For example, suppose that the resolution of each area image is 256*256, and the number of channels of each area image is 3 channels, in this case, the resized first area image, second area image and third area image are resized. The three regional images are input into the target network model, and the three regional images are superimposed through the output layer, and a 256*256*9 three-dimensional matrix can be obtained, where 9 is the number of channels after channel stacking, and the A three-dimensional matrix is used to indicate the first feature map.

之后，通过该目标网络模型中的中间层对该第一特征图进行卷积采样处理，在一种可能的方式中，该中间层包括N组卷积层和N组采样层，每组卷积层与每组采样层一一对应，其中，该N为大于1的整数。进一步地，每组卷积层可以包括至少一个卷积层，每组采样层可以包括至少一个采样层。在该种情况下，通过中间层对第一特征图进行卷积采样处理，得到多个第二特征图的具体实现可以包括：令i＝1，将该第一特征图确定为目标特征图；通过第i组卷积层对该目标特征图进行卷积处理，通过第i组采样层分别按照两个不同倍数对得到的特征图进行采样处理，得到2ⁱ倍特征图和第i个参考尺寸的第二特征图，将该2ⁱ倍特征图获取为该目标特征图，该参考尺寸大于或等于该2ⁱ倍特征图的尺寸；当i小于该N时，令i＝i+1，返回该通过第i组卷积层对该目标特征图进行卷积处理，通过第i组采样层分别按照两个不同倍数对得到的特征图进行采样处理，得到2ⁱ倍特征图和第i个参考尺寸的第二特征图，将该2ⁱ倍特征图获取为该目标特征图的操作；当i等于该N时，通过第N组卷积层对该目标特征图进行卷积处理，通过第N组采样层按照对得到的特征图进行一次采样处理，得到第N个参考尺寸的第二特征图，结束操作。After that, convolution sampling is performed on the first feature map through an intermediate layer in the target network model. In a possible manner, the intermediate layer includes N groups of convolution layers and N groups of sampling layers, and each group of convolution layers The layers correspond to each group of sampling layers one-to-one, where N is an integer greater than 1. Further, each group of convolutional layers may include at least one convolutional layer, and each group of sampling layers may include at least one sampling layer. In this case, a specific implementation of performing convolution sampling processing on the first feature map by the intermediate layer to obtain multiple second feature maps may include: setting i=1, and determining the first feature map as the target feature map; Perform convolution processing on the target feature map through the ith group of convolution layers, and sample the obtained feature maps according to two different multiples through the ith group of sampling layers to obtain 2 ⁱ times the feature map and the ith reference size. The ^second ^feature map of the The target feature map is subjected to convolution processing through the ith group of convolutional layers, and the obtained feature maps are sampled according to two different multiples through the ith group of sampling layers to obtain 2 ⁱ times the feature map and the ith reference. The second feature map of the size is the operation of obtaining the 2 ⁱ times feature map as the target feature map; when i is equal to the N, the target feature map is convolved through the Nth group of convolution layers, and the Nth The group sampling layer performs a sampling process on the obtained feature map to obtain a second feature map of the Nth reference size, and ends the operation.

其中，上述参考尺寸可以根据实际需求进行设置，也可以由该智能设备型默认设置，本申请实施例对此不做限定。The above reference size may be set according to actual requirements, or may be set by default by the smart device type, which is not limited in this embodiment of the present application.

通过卷积层的卷积处理可以提取到有利于识别出行为类别的图像特征，通过采样层的采样处理可以得到多尺度的特征图像。请参考图5，在实施时，输出层输出的第一特征图输入至第一组卷积层进行卷积处理，得到多通道的特征图，之后，通过第一组采样层中的2*2采样层和8*8采样层分别进行不同倍数的下采样处理，得到2倍特征图和参考尺寸的第二特征图。然后继续将2倍特征图输入至第二组卷积层进行卷积处理，得到多通道的特征图，之后通过第二组采样层中的2*2采样层和4*4采样层分别进行不同倍数的下采样处理，得到4倍特征图和参考尺寸的第二特征图。然后继续按照上述方式进行卷积采样处理，直到第N组卷积层进行卷积处理后，通过第N组采样层进行2*2采样层处理，得到第N个参考尺寸的第二特征图，结束卷积采样操作。Through the convolution processing of the convolution layer, image features that are conducive to identifying behavior categories can be extracted, and multi-scale feature images can be obtained through the sampling processing of the sampling layer. Please refer to Figure 5. During implementation, the first feature map output by the output layer is input to the first set of convolutional layers for convolution processing to obtain multi-channel feature maps. The sampling layer and the 8*8 sampling layer are subjected to down-sampling processing with different multiples, respectively, to obtain a feature map twice as large and a second feature map of the reference size. Then continue to input the 2x feature map to the second set of convolutional layers for convolution processing to obtain multi-channel feature maps, and then pass through the 2*2 sampling layers and 4*4 sampling layers in the second set of sampling layers. Multiple down-sampling process to obtain 4 times the feature map and the second feature map of the reference size. Then continue to perform convolution sampling processing in the above-mentioned manner, until the Nth group of convolution layers performs convolution processing, and then perform 2*2 sampling layer processing through the Nth group of sampling layers to obtain the second feature map of the Nth reference size, End the convolution sampling operation.

请继续参考图5，得到多个第二特征图后，通过拼接层进行通道叠加，然后通过卷积层进行特征融合，得到第三特征图。也即是，在最后预测结果之前，把网络浅层的一倍、二倍、四倍特征图采用不同倍数的下采样，得到同样尺寸的特征图，然后在进行通道上的叠加，拼接后经过一个卷积层进行特征融合，再将特征融合后的特征图输给全连接层。Please continue to refer to Figure 5. After obtaining multiple second feature maps, channel stacking is performed through the splicing layer, and then feature fusion is performed through the convolution layer to obtain the third feature map. That is, before the final prediction result, the one-fold, two-fold, and four-fold feature maps of the shallow layer of the network are downsampled by different multiples to obtain feature maps of the same size, and then superimposed on the channel. A convolutional layer performs feature fusion, and then the feature map after feature fusion is input to the fully connected layer.

值得一提的是，将多尺度的特征图进行融合，可以使得目标网络模型学到不同尺度物体的特征分布，从而提高了行为检测的准确性。It is worth mentioning that the fusion of multi-scale feature maps can enable the target network model to learn the feature distribution of objects at different scales, thereby improving the accuracy of behavior detection.

需要说明的是，每个卷积层的卷积核的尺寸大小可以根据实际需求预先进行设置，另外，每个卷积层的卷积核中的参数是在训练过程中确定的。另外，在实施过程中，不同行为的目标网络模型的图像尺寸、特征图尺寸、通道数、下采样的次数、网络深度等参数均可能不同。It should be noted that the size of the convolution kernel of each convolution layer can be set in advance according to actual requirements. In addition, the parameters in the convolution kernel of each convolution layer are determined during the training process. In addition, in the implementation process, the image size, feature map size, number of channels, number of downsampling times, network depth and other parameters of target network models with different behaviors may be different.

得到第三特征图后，通过该全连接层基于该第三特征图，确定该驾驶员的行为，之后，通过该输出层输出该行为，譬如，该行为为打电话或非打电话。进一步地，可以通过行为标识来标识该行为，譬如，输出的行为标识为“1”则表示打电话，输出的行为标识为“0”则表示为非打电话。在一些实施例中，可以对输出结果给出一个置信度的判断，置信度区间为[0,1]，当置信度大于0.5时，确定该行为类别为违规行为，否则，确定该行为不属于违规行为。After the third feature map is obtained, the behavior of the driver is determined based on the third feature map through the fully connected layer, and then the behavior is output through the output layer, for example, whether the behavior is making a phone call or not making a phone call. Further, the behavior can be identified by the behavior identifier. For example, if the output behavior identifier is "1", it means making a phone call, and if the output behavior identifier is "0", it means not making a phone call. In some embodiments, a confidence level judgment can be given to the output result, and the confidence level interval is [0, 1]. When the confidence level is greater than 0.5, it is determined that the behavior category is a violation, otherwise, it is determined that the behavior does not belong to Irregularities.

需要说明的是，上述仅是以该目标网络模型包括输入层、中间层、拼接层、全连接层和输出层为例进行说明，在另一实施例中，该目标网络模型还包括其它层，以通过其它层对特征图执行其它操作，譬如，该目标网络模型还可能包括BN(Batch Normalization，批量标准化)层等，本申请实施例对此不做限定。It should be noted that the above is only described by taking the target network model including an input layer, an intermediate layer, a splicing layer, a fully connected layer and an output layer as an example. In another embodiment, the target network model also includes other layers, To perform other operations on the feature map through other layers, for example, the target network model may further include a BN (Batch Normalization, batch normalization) layer, etc., which is not limited in this embodiment of the present application.

进一步地，当该驾驶员的行为属于违规行为时，统计该驾驶员在预设时长内的违规行为次数，当在该预设时长内该驾驶员的违规行为次数达到预设次数阈值时，进行违规驾驶报警提示。Further, when the driver's behavior is illegal, count the number of illegal behaviors of the driver within a preset time period, and when the number of violations of the driver within the preset time period reaches the preset number of times threshold, carry out Warning of illegal driving.

当确定驾驶员的行为属于违规行为时，统计该驾驶员在某个时间段内的连续违规行为次数，也即是，统计在预设时长内所有视频图像帧中有多个视频图像帧中检测出的驾驶员的行为属于违规行为，如果该违规行为次数达到预设次数阈值，则可以确定该驾驶员在进行危险驾驶行为。比如统计3秒钟内所有视频图像帧中有75％的视频图像帧的检测结果表示该驾驶员的行为属于违规行为，此时，可以进行违规驾驶报警提示。When it is determined that the driver's behavior is illegal, count the number of consecutive illegal behaviors of the driver within a certain period of time, that is, count the detection of multiple video image frames in all video image frames within a preset period of time. The behavior of the outbound driver is a violation, and if the number of violations reaches a preset number of thresholds, it can be determined that the driver is engaged in dangerous driving behavior. For example, the detection results of 75% of all video image frames within 3 seconds of statistics indicate that the driver's behavior is illegal. At this time, an illegal driving alarm can be prompted.

在一些实施例中，可以通过蜂鸣器进行违规驾驶报警提示，或者，也通过语音播报的形式进行违规驾驶报警提示，以便于提醒驾驶员安全驾驶，也可以使得乘客对驾驶员的行为进行监督。In some embodiments, a buzzer can be used to give an alarm for illegal driving, or a voice broadcast can also be used to give an alarm for illegal driving, so as to remind the driver to drive safely, and also allow passengers to supervise the driver's behavior. .

如此，当检测到驾驶员的违规行为次数达到某个预设次数阈值时进行违规驾驶报警提示，可以避免由于对个别视频图像帧的检测出现偏差导致的误报警，提高了报警的准确性。In this way, when it is detected that the number of illegal behaviors of the driver reaches a certain preset number of thresholds, an illegal driving alarm is prompted, which can avoid false alarms caused by deviations in the detection of individual video image frames, and improve the accuracy of the alarms.

其中，该预设时长可以由用户根据实际需求自定义设置，也可以由该智能设备默认设置，本申请实施例对此不作限定。The preset duration may be set by the user according to actual needs, or may be set by default by the smart device, which is not limited in this embodiment of the present application.

其中，该预设次数阈值可以由用户根据实际需求自定义设置，也可以由该智能设备默认设置，本申请实施例对此不做限定。The preset number of times threshold may be set by the user according to actual needs, or may be set by default by the smart device, which is not limited in this embodiment of the present application.

在本申请实施例中，获取包括驾驶员的人脸的目标图像，从该目标图像中获取包括驾驶员的预设行为的第一区域图像，该预设行为相似于违规行为，也就是说，从该目标图像中获取与违规行为相近的预设行为的第一区域图像。之后，在该目标图像中对第一区域图像进行不同尺度的内截和外扩，得到第二区域图像和第三区域图像。由于第一区域图像可能只包括预设行为的一小部分，但不包括或包括极小部分与预设行为相关的周边物体，基于该第一区域图像无法准确确定该预设行为是否真正为违规行为，因此，本申请对该第一区域图像进行内截和外扩后，使得得到的第二区域图像和第三区域图像能够包括与预设行为相关的周边物体的更多信息，因此，基于该三个区域图像对驾驶员的行为进行检测，可以提高检测的准确性。In the embodiment of the present application, a target image including the driver's face is obtained, and a first area image including the driver's preset behavior is obtained from the target image, and the preset behavior is similar to the illegal behavior, that is, A first area image of a preset behavior similar to the violation behavior is acquired from the target image. Afterwards, in the target image, the first area image is cut in and out of different scales to obtain the second area image and the third area image. Since the first area image may only include a small part of the preset behavior, but does not include or include a very small part of the surrounding objects related to the preset behavior, it is impossible to accurately determine whether the preset behavior is actually a violation based on the first area image. Therefore, after the first area image is cut in and expanded in this application, the obtained second area image and third area image can include more information about the surrounding objects related to the preset behavior. Therefore, based on The three area images are used to detect the driver's behavior, which can improve the detection accuracy.

另外，将三个区域图像进行通道融合，可以使得目标网络模型关注到物体周边一定范围内的信息，使得目标网络模型可以精准确定驾驶员的行为类别，提升了目标网络模型的鲁棒性。In addition, the channel fusion of the three regional images can make the target network model pay attention to the information within a certain range around the object, so that the target network model can accurately determine the driver's behavior category and improve the robustness of the target network model.

图6是根据一示例性实施例示出的一种驾驶员的行为识别装置的结构示意图，该驾驶员的行为识别装置可以由软件、硬件或者两者的结合实现。该驾驶员的行为识别装置可以包括：Fig. 6 is a schematic structural diagram of a device for recognizing driver's behavior according to an exemplary embodiment. The device for recognizing driver's behavior can be implemented by software, hardware, or a combination of the two. The driver's behavior recognition device may include:

第一获取模块501，用于获取目标图像，所述目标图像包括驾驶员的人脸；The first acquisition module 501 is used to acquire a target image, and the target image includes the driver's face;

第二获取模块502，用于从所述目标图像中获取第一区域图像，所述第一区域图像包括所述驾驶员的预设行为，所述预设行为是指与违规行为之间的相似度大于预设阈值的行为；The second acquiring module 502 is configured to acquire a first area image from the target image, where the first area image includes a preset behavior of the driver, and the preset behavior refers to the similarity with the illegal behavior Behavior with a degree greater than a preset threshold;

图像处理模块503，用于在所述目标图像中，按照第一比例阈值对所述第一区域图像的四周区域进行缩减处理，得到第二区域图像，以及按照第二比例阈值对所述第一区域图像的四周区域进行扩充处理，得到第三区域图像；The image processing module 503 is configured to, in the target image, perform reduction processing on the surrounding area of the first area image according to a first proportional threshold to obtain a second area image, and perform a reduction process on the first area image according to the second proportional threshold The surrounding area of the area image is expanded to obtain the third area image;

识别模块504，用于基于所述第一区域图像、所述第二区域图像和所述第三区域图像，识别出所述驾驶员的行为。The identification module 504 is configured to identify the behavior of the driver based on the first area image, the second area image and the third area image.

可选地，请参考图7，所述装置还包括：Optionally, referring to FIG. 7 , the apparatus further includes:

尺寸调整模块505，用于将所述第一区域图像、所述第二区域图像和所述第三区域图像调整为相同尺寸的区域图像；a size adjustment module 505, configured to adjust the first area image, the second area image and the third area image into area images of the same size;

所述识别模块504，用于调用目标网络模型，所述目标网络模型用于基于任一行为对应的一组区域图像确定所述行为的行为类别；The identifying module 504 is used to invoke a target network model, where the target network model is used to determine the behavior category of the behavior based on a group of regional images corresponding to any behavior;

可选地，所述识别模块504用于：Optionally, the identification module 504 is used to:

可选地，请参考图8，所述装置还包括训练模块506，所述训练模块506用于：Optionally, please refer to FIG. 8 , the apparatus further includes a training module 506, and the training module 506 is used for:

可选地，所述第二获取模块502用于：Optionally, the second obtaining module 502 is used for:

可选地，所述第一获取模块501用于：Optionally, the first obtaining module 501 is used for:

可选地，请参考图9，所述装置还包括：Optionally, referring to FIG. 9 , the apparatus further includes:

第三获取模块507，用于获取当前的车速和光照强度；The third obtaining module 507 is used to obtain the current vehicle speed and light intensity;

切换模块508，用于当所述车速大于预设车速阈值且所述光照强度低于光照强度阈值时，将所述摄像头的工作模式切换为所述红外拍摄模式。The switching module 508 is configured to switch the working mode of the camera to the infrared shooting mode when the vehicle speed is greater than a preset vehicle speed threshold and the illumination intensity is lower than the illumination intensity threshold.

可选地，请参考图10，所述装置还包括：Optionally, please refer to FIG. 10, the device further includes:

统计模块509，用于当所述驾驶员的行为属于违规行为时，统计所述驾驶员在预设时长内的违规行为次数；A statistics module 509, configured to count the number of violations of the driver within a preset time period when the driver's behavior is a violation;

报警模块510，用于当在所述预设时长内所述驾驶员的违规行为次数达到预设次数阈值时，进行违规驾驶报警提示。The alarm module 510 is configured to issue a warning of illegal driving when the number of violations of the driver reaches a preset number of thresholds within the preset time period.

需要说明的是：上述实施例提供的驾驶员的行为识别装置在实现驾驶员的行为识别方法时，仅以上述各功能模块的划分进行举例说明，实际应用中，可以根据需要而将上述功能分配由不同的功能模块完成，即将设备的内部结构划分成不同的功能模块，以完成以上描述的全部或者部分功能。另外，上述实施例提供的驾驶员的行为识别装置与驾驶员的行为识别方法实施例属于同一构思，其具体实现过程详见方法实施例，这里不再赘述。It should be noted that: when the driver's behavior recognition device provided in the above embodiment implements the driver's behavior recognition method, only the division of the above-mentioned functional modules is used as an example for illustration. In practical applications, the above-mentioned functions can be allocated as required. It is completed by different functional modules, that is, the internal structure of the device is divided into different functional modules to complete all or part of the functions described above. In addition, the driver's behavior recognition device and the driver's behavior recognition method embodiment provided by the above embodiments belong to the same concept, and the specific implementation process thereof is detailed in the method embodiment, which will not be repeated here.

图11示出了本申请一个示例性实施例提供的终端1000的结构框图。该终端1000可以是：智能手机、平板电脑、MP3播放器(Moving Picture Experts Group Audio LayerIII，动态影像专家压缩标准音频层面3)、MP4(Moving Picture Experts Group AudioLayer IV，动态影像专家压缩标准音频层面4)播放器、笔记本电脑或台式电脑。终端1000还可能被称为用户设备、便携式终端、膝上型终端、台式终端等其他名称。FIG. 11 shows a structural block diagram of a terminal 1000 provided by an exemplary embodiment of the present application. The terminal 1000 can be: a smart phone, a tablet computer, an MP3 player (Moving Picture Experts Group Audio Layer III, the standard audio layer 3 of Moving Picture Experts compression), MP4 (Moving Picture Experts Group AudioLayer IV, the standard audio layer 4 of Moving Picture Experts compression) ) player, laptop or desktop computer. Terminal 1000 may also be called user equipment, portable terminal, laptop terminal, desktop terminal, and the like by other names.

通常，终端1000包括有：处理器1001和存储器1002。Generally, the terminal 1000 includes: a processor 1001 and a memory 1002 .

处理器1001可以包括一个或多个处理核心，比如4核心处理器、8核心处理器等。处理器1001可以采用DSP(Digital Signal Processing，数字信号处理)、FPGA(Field－Programmable Gate Array，现场可编程门阵列)、PLA(Programmable Logic Array，可编程逻辑阵列)中的至少一种硬件形式来实现。处理器1001也可以包括主处理器和协处理器，主处理器是用于对在唤醒状态下的数据进行处理的处理器，也称CPU(Central ProcessingUnit，中央处理器)；协处理器是用于对在待机状态下的数据进行处理的低功耗处理器。在一些实施例中，处理器1001可以在集成有GPU(Graphics Processing Unit，图像处理器)，GPU用于负责显示屏所需要显示的内容的渲染和绘制。一些实施例中，处理器1001还可以包括AI(Artificial Intelligence，人工智能)处理器，该AI处理器用于处理有关机器学习的计算操作。The processor 1001 may include one or more processing cores, such as a 4-core processor, an 8-core processor, and the like. The processor 1001 may use at least one hardware form among DSP (Digital Signal Processing, digital signal processing), FPGA (Field-Programmable Gate Array, field programmable gate array), PLA (Programmable Logic Array, programmable logic array) accomplish. The processor 1001 may also include a main processor and a coprocessor. The main processor is a processor used to process data in a wake-up state, also called a CPU (Central Processing Unit, central processing unit); A low-power processor for processing data in a standby state. In some embodiments, the processor 1001 may be integrated with a GPU (Graphics Processing Unit, image processor), and the GPU is used for rendering and drawing the content that needs to be displayed on the display screen. In some embodiments, the processor 1001 may further include an AI (Artificial Intelligence, artificial intelligence) processor, where the AI processor is used to process computing operations related to machine learning.

存储器1002可以包括一个或多个计算机可读存储介质，该计算机可读存储介质可以是非暂态的。存储器1002还可包括高速随机存取存储器，以及非易失性存储器，比如一个或多个磁盘存储设备、闪存存储设备。在一些实施例中，存储器1002中的非暂态的计算机可读存储介质用于存储至少一个指令，该至少一个指令用于被处理器1001所执行以实现本申请中方法实施例提供的驾驶员的行为识别方法。Memory 1002 may include one or more computer-readable storage media, which may be non-transitory. Memory 1002 may also include high-speed random access memory, as well as non-volatile memory, such as one or more disk storage devices, flash storage devices. In some embodiments, the non-transitory computer-readable storage medium in the memory 1002 is used to store at least one instruction for being executed by the processor 1001 to implement the driver provided by the method embodiments of the present application behavior identification method.

在一些实施例中，终端1000还可选包括有：外围设备接口1003和至少一个外围设备。处理器1001、存储器1002和外围设备接口1003之间可以通过总线或信号线相连。各个外围设备可以通过总线、信号线或电路板与外围设备接口1003相连。具体地，外围设备包括：射频电路1004、触摸显示屏1005、摄像头1006、音频电路1007、定位组件1008和电源1009中的至少一种。In some embodiments, the terminal 1000 may optionally further include: a peripheral device interface 1003 and at least one peripheral device. The processor 1001, the memory 1002 and the peripheral device interface 1003 may be connected through a bus or a signal line. Each peripheral device can be connected to the peripheral device interface 1003 through a bus, a signal line or a circuit board. Specifically, the peripheral device includes: at least one of a radio frequency circuit 1004 , a touch display screen 1005 , a camera 1006 , an audio circuit 1007 , a positioning component 1008 and a power supply 1009 .

外围设备接口1003可被用于将I/O(Input/Output，输入/输出)相关的至少一个外围设备连接到处理器1001和存储器1002。在一些实施例中，处理器1001、存储器1002和外围设备接口1003被集成在同一芯片或电路板上；在一些其他实施例中，处理器1001、存储器1002和外围设备接口1003中的任意一个或两个可以在单独的芯片或电路板上实现，本实施例对此不加以限定。The peripheral device interface 1003 may be used to connect at least one peripheral device related to I/O (Input/Output) to the processor 1001 and the memory 1002 . In some embodiments, processor 1001, memory 1002, and peripherals interface 1003 are integrated on the same chip or circuit board; in some other embodiments, any one of processor 1001, memory 1002, and peripherals interface 1003 or The two can be implemented on a separate chip or circuit board, which is not limited in this embodiment.

射频电路1004用于接收和发射RF(Radio Frequency，射频)信号，也称电磁信号。射频电路1004通过电磁信号与通信网络以及其他通信设备进行通信。射频电路1004将电信号转换为电磁信号进行发送，或者，将接收到的电磁信号转换为电信号。可选地，射频电路1004包括：天线系统、RF收发器、一个或多个放大器、调谐器、振荡器、数字信号处理器、编解码芯片组、用户身份模块卡等等。射频电路1004可以通过至少一种无线通信协议来与其它终端进行通信。该无线通信协议包括但不限于：万维网、城域网、内联网、各代移动通信网络(2G、3G、4G及5G)、无线局域网和/或WiFi(Wireless Fidelity，无线保真)网络。在一些实施例中，射频电路1004还可以包括NFC(Near Field Communication，近距离无线通信)有关的电路，本申请对此不加以限定。The radio frequency circuit 1004 is used for receiving and transmitting RF (Radio Frequency, radio frequency) signals, also called electromagnetic signals. The radio frequency circuit 1004 communicates with the communication network and other communication devices through electromagnetic signals. The radio frequency circuit 1004 converts electrical signals into electromagnetic signals for transmission, or converts received electromagnetic signals into electrical signals. Optionally, the radio frequency circuit 1004 includes an antenna system, an RF transceiver, one or more amplifiers, a tuner, an oscillator, a digital signal processor, a codec chipset, a subscriber identity module card, and the like. The radio frequency circuit 1004 can communicate with other terminals through at least one wireless communication protocol. The wireless communication protocol includes but is not limited to: World Wide Web, Metropolitan Area Network, Intranet, various generations of mobile communication networks (2G, 3G, 4G and 5G), wireless local area network and/or WiFi (Wireless Fidelity, Wireless Fidelity) network. In some embodiments, the radio frequency circuit 1004 may further include a circuit related to NFC (Near Field Communication, short-range wireless communication), which is not limited in this application.

显示屏1005用于显示UI(User Interface，用户界面)。该UI可以包括图形、文本、图标、视频及其它们的任意组合。当显示屏1005是触摸显示屏时，显示屏1005还具有采集在显示屏1005的表面或表面上方的触摸信号的能力。该触摸信号可以作为控制信号输入至处理器1001进行处理。此时，显示屏1005还可以用于提供虚拟按钮和/或虚拟键盘，也称软按钮和/或软键盘。在一些实施例中，显示屏1005可以为一个，设置终端1000的前面板；在另一些实施例中，显示屏1005可以为至少两个，分别设置在终端1000的不同表面或呈折叠设计；在再一些实施例中，显示屏1005可以是柔性显示屏，设置在终端1000的弯曲表面上或折叠面上。甚至，显示屏1005还可以设置成非矩形的不规则图形，也即异形屏。显示屏1005可以采用LCD(Liquid Crystal Display，液晶显示屏)、OLED(Organic Light-Emitting Diode,有机发光二极管)等材质制备。The display screen 1005 is used for displaying UI (User Interface, user interface). The UI can include graphics, text, icons, video, and any combination thereof. When the display screen 1005 is a touch display screen, the display screen 1005 also has the ability to acquire touch signals on or above the surface of the display screen 1005 . The touch signal can be input to the processor 1001 as a control signal for processing. At this time, the display screen 1005 may also be used to provide virtual buttons and/or virtual keyboards, also referred to as soft buttons and/or soft keyboards. In some embodiments, there may be one display screen 1005, which is provided on the front panel of the terminal 1000; in other embodiments, there may be at least two display screens 1005, which are respectively arranged on different surfaces of the terminal 1000 or in a folded design; In still other embodiments, the display screen 1005 may be a flexible display screen, which is disposed on a curved surface or a folding surface of the terminal 1000 . Even, the display screen 1005 can also be set as a non-rectangular irregular figure, that is, a special-shaped screen. The display screen 1005 can be made of materials such as LCD (Liquid Crystal Display, liquid crystal display), OLED (Organic Light-Emitting Diode, organic light emitting diode).

摄像头组件1006用于采集图像或视频。可选地，摄像头组件1006包括前置摄像头和后置摄像头。通常，前置摄像头设置在终端的前面板，后置摄像头设置在终端的背面。在一些实施例中，后置摄像头为至少两个，分别为主摄像头、景深摄像头、广角摄像头、长焦摄像头中的任意一种，以实现主摄像头和景深摄像头融合实现背景虚化功能、主摄像头和广角摄像头融合实现全景拍摄以及VR(Virtual Reality，虚拟现实)拍摄功能或者其它融合拍摄功能。在一些实施例中，摄像头组件1006还可以包括闪光灯。闪光灯可以是单色温闪光灯，也可以是双色温闪光灯。双色温闪光灯是指暖光闪光灯和冷光闪光灯的组合，可以用于不同色温下的光线补偿。The camera assembly 1006 is used to capture images or video. Optionally, the camera assembly 1006 includes a front camera and a rear camera. Usually, the front camera is arranged on the front panel of the terminal, and the rear camera is arranged on the back of the terminal. In some embodiments, there are at least two rear cameras, which are any one of a main camera, a depth-of-field camera, a wide-angle camera, and a telephoto camera, so as to realize the fusion of the main camera and the depth-of-field camera to realize the background blur function, the main camera It is integrated with the wide-angle camera to achieve panoramic shooting and VR (Virtual Reality, virtual reality) shooting functions or other integrated shooting functions. In some embodiments, the camera assembly 1006 may also include a flash. The flash can be a single color temperature flash or a dual color temperature flash. Dual color temperature flash refers to the combination of warm light flash and cold light flash, which can be used for light compensation under different color temperatures.

音频电路1007可以包括麦克风和扬声器。麦克风用于采集用户及环境的声波，并将声波转换为电信号输入至处理器1001进行处理，或者输入至射频电路1004以实现语音通信。出于立体声采集或降噪的目的，麦克风可以为多个，分别设置在终端1000的不同部位。麦克风还可以是阵列麦克风或全向采集型麦克风。扬声器则用于将来自处理器1001或射频电路1004的电信号转换为声波。扬声器可以是传统的薄膜扬声器，也可以是压电陶瓷扬声器。当扬声器是压电陶瓷扬声器时，不仅可以将电信号转换为人类可听见的声波，也可以将电信号转换为人类听不见的声波以进行测距等用途。在一些实施例中，音频电路1007还可以包括耳机插孔。Audio circuitry 1007 may include a microphone and speakers. The microphone is used to collect the sound waves of the user and the environment, convert the sound waves into electrical signals, and input them to the processor 1001 for processing, or to the radio frequency circuit 1004 to realize voice communication. For the purpose of stereo collection or noise reduction, there may be multiple microphones, which are respectively disposed in different parts of the terminal 1000 . The microphone may also be an array microphone or an omnidirectional collection microphone. The speaker is used to convert the electrical signal from the processor 1001 or the radio frequency circuit 1004 into sound waves. The loudspeaker can be a traditional thin-film loudspeaker or a piezoelectric ceramic loudspeaker. When the speaker is a piezoelectric ceramic speaker, it can not only convert electrical signals into sound waves audible to humans, but also convert electrical signals into sound waves inaudible to humans for distance measurement and other purposes. In some embodiments, the audio circuit 1007 may also include a headphone jack.

定位组件1008用于定位终端1000的当前地理位置，以实现导航或LBS(LocationBased Service，基于位置的服务)。定位组件1008可以是基于美国的GPS(GlobalPositioning System，全球定位系统)、中国的北斗系统或俄罗斯的伽利略系统的定位组件。The positioning component 1008 is used to locate the current geographic location of the terminal 1000 to implement navigation or LBS (Location Based Service, location-based service). The positioning component 1008 may be a positioning component based on the GPS (Global Positioning System, global positioning system) of the United States, the Beidou system of China or the Galileo system of Russia.

电源1009用于为终端1000中的各个组件进行供电。电源1009可以是交流电、直流电、一次性电池或可充电电池。当电源1009包括可充电电池时，该可充电电池可以是有线充电电池或无线充电电池。有线充电电池是通过有线线路充电的电池，无线充电电池是通过无线线圈充电的电池。该可充电电池还可以用于支持快充技术。The power supply 1009 is used to power various components in the terminal 1000 . The power source 1009 may be alternating current, direct current, disposable batteries or rechargeable batteries. When the power source 1009 includes a rechargeable battery, the rechargeable battery may be a wired rechargeable battery or a wireless rechargeable battery. Wired rechargeable batteries are batteries that are charged through wired lines, and wireless rechargeable batteries are batteries that are charged through wireless coils. The rechargeable battery can also be used to support fast charging technology.

在一些实施例中，终端1000还包括有一个或多个传感器1010。该一个或多个传感器1010包括但不限于：加速度传感器1011、陀螺仪传感器1012、压力传感器1013、指纹传感器1014、光学传感器1015以及接近传感器1016。In some embodiments, the terminal 1000 further includes one or more sensors 1010 . The one or more sensors 1010 include, but are not limited to, an acceleration sensor 1011 , a gyro sensor 1012 , a pressure sensor 1013 , a fingerprint sensor 1014 , an optical sensor 1015 and a proximity sensor 1016 .

加速度传感器1011可以检测以终端1000建立的坐标系的三个坐标轴上的加速度大小。比如，加速度传感器1011可以用于检测重力加速度在三个坐标轴上的分量。处理器1001可以根据加速度传感器1011采集的重力加速度信号，控制触摸显示屏1005以横向视图或纵向视图进行用户界面的显示。加速度传感器1011还可以用于游戏或者用户的运动数据的采集。The acceleration sensor 1011 can detect the magnitude of acceleration on the three coordinate axes of the coordinate system established by the terminal 1000 . For example, the acceleration sensor 1011 can be used to detect the components of the gravitational acceleration on the three coordinate axes. The processor 1001 can control the touch display screen 1005 to display the user interface in a landscape view or a portrait view according to the gravitational acceleration signal collected by the acceleration sensor 1011 . The acceleration sensor 1011 can also be used for game or user movement data collection.

陀螺仪传感器1012可以检测终端1000的机体方向及转动角度，陀螺仪传感器1012可以与加速度传感器1011协同采集用户对终端1000的3D动作。处理器1001根据陀螺仪传感器1012采集的数据，可以实现如下功能：动作感应(比如根据用户的倾斜操作来改变UI)、拍摄时的图像稳定、游戏控制以及惯性导航。The gyroscope sensor 1012 can detect the body direction and rotation angle of the terminal 1000 , and the gyroscope sensor 1012 can cooperate with the acceleration sensor 1011 to collect 3D actions of the user on the terminal 1000 . The processor 1001 can implement the following functions according to the data collected by the gyro sensor 1012: motion sensing (such as changing the UI according to the user's tilt operation), image stabilization during shooting, game control, and inertial navigation.

压力传感器1013可以设置在终端1000的侧边框和/或触摸显示屏1005的下层。当压力传感器1013设置在终端1000的侧边框时，可以检测用户对终端1000的握持信号，由处理器1001根据压力传感器1013采集的握持信号进行左右手识别或快捷操作。当压力传感器1013设置在触摸显示屏1005的下层时，由处理器1001根据用户对触摸显示屏1005的压力操作，实现对UI界面上的可操作性控件进行控制。可操作性控件包括按钮控件、滚动条控件、图标控件、菜单控件中的至少一种。The pressure sensor 1013 may be disposed on the side frame of the terminal 1000 and/or the lower layer of the touch display screen 1005 . When the pressure sensor 1013 is disposed on the side frame of the terminal 1000, the user's holding signal of the terminal 1000 can be detected, and the processor 1001 performs left and right hand identification or shortcut operations according to the holding signal collected by the pressure sensor 1013. When the pressure sensor 1013 is disposed on the lower layer of the touch display screen 1005 , the processor 1001 controls the operability controls on the UI interface according to the user's pressure operation on the touch display screen 1005 . The operability controls include at least one of button controls, scroll bar controls, icon controls, and menu controls.

指纹传感器1014用于采集用户的指纹，由处理器1001根据指纹传感器1014采集到的指纹识别用户的身份，或者，由指纹传感器1014根据采集到的指纹识别用户的身份。在识别出用户的身份为可信身份时，由处理器1001授权该用户执行相关的敏感操作，该敏感操作包括解锁屏幕、查看加密信息、下载软件、支付及更改设置等。指纹传感器1014可以被设置终端1000的正面、背面或侧面。当终端1000上设置有物理按键或厂商Logo时，指纹传感器1014可以与物理按键或厂商Logo集成在一起。The fingerprint sensor 1014 is used to collect the user's fingerprint, and the processor 1001 identifies the user's identity according to the fingerprint collected by the fingerprint sensor 1014, or the fingerprint sensor 1014 identifies the user's identity according to the collected fingerprint. When the user's identity is identified as a trusted identity, the processor 1001 authorizes the user to perform relevant sensitive operations, including unlocking the screen, viewing encrypted information, downloading software, making payments, and changing settings. The fingerprint sensor 1014 may be provided on the front, back or side of the terminal 1000 . When the terminal 1000 is provided with physical buttons or a manufacturer's logo, the fingerprint sensor 1014 may be integrated with the physical buttons or the manufacturer's logo.

光学传感器1015用于采集环境光强度。在一个实施例中，处理器1001可以根据光学传感器1015采集的环境光强度，控制触摸显示屏1005的显示亮度。具体地，当环境光强度较高时，调高触摸显示屏1005的显示亮度；当环境光强度较低时，调低触摸显示屏1005的显示亮度。在另一个实施例中，处理器1001还可以根据光学传感器1015采集的环境光强度，动态调整摄像头组件1006的拍摄参数。The optical sensor 1015 is used to collect ambient light intensity. In one embodiment, the processor 1001 may control the display brightness of the touch display screen 1005 according to the ambient light intensity collected by the optical sensor 1015 . Specifically, when the ambient light intensity is high, the display brightness of the touch display screen 1005 is increased; when the ambient light intensity is low, the display brightness of the touch display screen 1005 is decreased. In another embodiment, the processor 1001 may also dynamically adjust the shooting parameters of the camera assembly 1006 according to the ambient light intensity collected by the optical sensor 1015 .

接近传感器1016，也称距离传感器，通常设置在终端1000的前面板。接近传感器1016用于采集用户与终端1000的正面之间的距离。在一个实施例中，当接近传感器1016检测到用户与终端1000的正面之间的距离逐渐变小时，由处理器1001控制触摸显示屏1005从亮屏状态切换为息屏状态；当接近传感器1016检测到用户与终端1000的正面之间的距离逐渐变大时，由处理器1001控制触摸显示屏1005从息屏状态切换为亮屏状态。A proximity sensor 1016 , also called a distance sensor, is usually disposed on the front panel of the terminal 1000 . The proximity sensor 1016 is used to collect the distance between the user and the front of the terminal 1000 . In one embodiment, when the proximity sensor 1016 detects that the distance between the user and the front of the terminal 1000 gradually decreases, the processor 1001 controls the touch display screen 1005 to switch from the bright screen state to the off screen state; when the proximity sensor 1016 detects When the distance between the user and the front of the terminal 1000 gradually increases, the processor 1001 controls the touch display screen 1005 to switch from the off-screen state to the bright-screen state.

本领域技术人员可以理解，图11中示出的结构并不构成对终端1000的限定，可以包括比图示更多或更少的组件，或者组合某些组件，或者采用不同的组件布置。Those skilled in the art can understand that the structure shown in FIG. 11 does not constitute a limitation on the terminal 1000, and may include more or less components than the one shown, or combine some components, or adopt different component arrangements.

本申请实施例还提供了一种非临时性计算机可读存储介质，当所述存储介质中的指令由移动终端的处理器执行时，使得移动终端能够执行上述实施例提供的驾驶员的行为识别方法。The embodiments of the present application further provide a non-transitory computer-readable storage medium, when the instructions in the storage medium are executed by the processor of the mobile terminal, the mobile terminal can perform the behavior recognition of the driver provided by the above embodiments method.

本申请实施例还提供了一种包含指令的计算机程序产品，当其在计算机上运行时，使得计算机执行上述实施例提供的驾驶员的行为识别方法。The embodiments of the present application also provide a computer program product containing instructions, which, when running on a computer, cause the computer to execute the method for recognizing the driver's behavior provided by the above embodiments.

本领域普通技术人员可以理解实现上述实施例的全部或部分步骤可以通过硬件来完成，也可以通过程序来指令相关的硬件完成，所述的程序可以存储于一种计算机可读存储介质中，上述提到的存储介质可以是只读存储器，磁盘或光盘等。Those of ordinary skill in the art can understand that all or part of the steps of implementing the above embodiments can be completed by hardware, or can be completed by instructing relevant hardware through a program, and the program can be stored in a computer-readable storage medium. The storage medium mentioned may be a read-only memory, a magnetic disk or an optical disk, etc.

以上所述仅为本申请的较佳实施例，并不用以限制本申请，凡在本申请的精神和原则之内，所作的任何修改、等同替换、改进等，均应包含在本申请的保护范围之内。The above descriptions are only preferred embodiments of the present application, and are not intended to limit the present application. Any modifications, equivalent replacements, improvements, etc. made within the spirit and principles of the present application shall be included in the protection of the present application. within the range.

Claims

1. A method for identifying a driver's behavior, the method comprising:

acquiring a target image, wherein the target image comprises a face of a driver;

acquiring a first area image from the target image, wherein the first area image comprises a preset behavior of the driver, and the preset behavior refers to a behavior with similarity to an illegal behavior larger than a preset threshold;

in the target image, carrying out reduction processing on the peripheral area of the first area image according to a first proportion threshold value to obtain a second area image, and carrying out expansion processing on the peripheral area of the first area image according to a second proportion threshold value to obtain a third area image;

recognizing the behavior of the driver based on the first area image, the second area image, and the third area image.

2. The method of claim 1, wherein prior to identifying the behavior of the driver based on the first region image, the second region image, and the third region image, further comprising:

adjusting the first area image, the second area image and the third area image to be area images of the same size;

accordingly, the identifying the behavior of the driver based on the first area image, the second area image, and the third area image includes:

calling a target network model, wherein the target network model is used for determining the behavior category of any behavior based on a group of regional images corresponding to the behavior;

and identifying the behavior of the driver through the target network model based on the first area image, the second area image and the third area image after the size adjustment.

3. The method of claim 2, wherein the target network model comprises an input layer, an intermediate layer, a splice layer, a fully-connected layer, and an output layer;

the identifying the behavior of the driver through the target network model based on the resized first area image, second area image, and third area image includes:

performing channel superposition processing on image data in the first area image, the second area image and the third area image after the size adjustment through the input layer based on the resolution and the number of channels of the area image after the size adjustment to obtain a first feature map;

performing convolution sampling processing on the first characteristic diagram through the intermediate layer to obtain a plurality of second characteristic diagrams, wherein the second characteristic diagrams are the same in size and different in channel number;

performing channel superposition processing on the plurality of second feature maps through the splicing layer, and performing feature fusion on the feature maps after channel superposition through the convolution layer in the deep network layer to obtain a third feature map;

determining, by the fully-connected layer, behavior of the driver based on the third feature map;

outputting the behavior through the output layer.

4. The method of claim 3, wherein the intermediate layer comprises N sets of convolutional layers and N sets of sampling layers, each set of convolutional layers corresponding one-to-one to each set of sampling layers;

the obtaining a plurality of second feature maps by performing convolution sampling processing on the first feature map through the intermediate layer includes:

determining the first feature map as a target feature map by taking i as 1; performing convolution processing on the target characteristic diagram through the ith group of convolution layers, and performing sampling processing on the obtained characteristic diagram through the ith group of sampling layers according to two different multiples to obtain 2ⁱMultiple feature map and ith second feature map of reference size, 2ⁱObtaining the target feature map as a multiple feature map, wherein the reference dimension is greater than or equal to 2ⁱThe size of the multiple feature map;

when i is smaller than N, making i equal to i +1, returning to the step of performing convolution processing on the target feature map through the ith group of convolution layers, and performing sampling processing on the obtained feature map according to two different multiples through the ith group of sampling layers to obtain 2ⁱMultiple feature map and ith second feature map of reference size, 2ⁱAcquiring a multiplied feature map as the operation of the target feature map;

and when i is equal to the N, performing convolution processing on the target feature map through the Nth group of convolution layers, performing primary sampling processing on the obtained feature map through the Nth group of sampling layers to obtain a second feature map of the Nth reference size, and ending the operation.

5. The method of claim 2, wherein prior to invoking the target network model, further comprising:

obtaining a plurality of training samples, wherein the plurality of training samples comprise a plurality of groups of images and behavior categories in each group of images, each group of images comprise regional images of behaviors, reduced regional images determined after the regional images are subjected to reduction processing, and expanded regional images determined after the regional images are subjected to expansion processing;

and training the network model to be trained based on the plurality of training samples to obtain the target network model.

6. The method of claim 1, wherein said acquiring a first region image from said target image comprises:

calling a target detection model, inputting the target image into the target detection model, and outputting a face detection frame and a preset behavior detection frame, wherein the target detection model is used for identifying the face and the preset behavior in the image based on any image;

when the number of the face detection frames is one, determining the face detection frames as target face detection frames; when the number of the face detection frames is multiple, acquiring a face detection frame with the largest area from the multiple face detection frames, and determining the acquired face detection frame as a target face detection frame;

and determining the first area image based on the target face detection frame.

7. The method of claim 6, wherein the determining the first region image based on the target face detection box comprises:

filtering out the preset behavior detection frames which are not overlapped with the target face detection frame or have the distance with the target face detection frame larger than a preset distance threshold value;

and cutting out the region corresponding to the filtered residual preset behavior detection frame from the target image to obtain the first region image.

8. The method of claim 1, wherein said acquiring a target image comprises:

detecting a working mode of a camera for shooting the face of the driver;

when the working mode of the camera is an infrared shooting mode, obtaining a gray image obtained by shooting;

calling an image pseudo-color conversion model, inputting the gray level image into the image pseudo-color conversion model, and outputting a three-channel color image corresponding to the gray level image, wherein the image pseudo-color conversion model is used for converting any gray level image into the three-channel color image corresponding to the gray level image;

and acquiring the output three-channel color image as the target image.

9. The method of claim 8, wherein prior to detecting an operating mode of a camera used to capture the driver's face, further comprising:

acquiring the current vehicle speed and the current illumination intensity;

and when the vehicle speed is greater than a preset vehicle speed threshold value and the illumination intensity is lower than an illumination intensity threshold value, switching the working mode of the camera to the infrared shooting mode.

10. The method of claim 1, wherein after identifying the behavior of the driver based on the first region image, the second region image, and the third region image, further comprising:

when the behavior of the driver belongs to the violation behavior, counting the number of the violation behaviors of the driver in a preset time length;

and when the number of illegal behaviors of the driver reaches a preset number threshold within the preset time, giving an alarm for illegal driving.

11. A behavior recognition apparatus for a driver, characterized in that the apparatus comprises:

the device comprises a first acquisition module, a second acquisition module and a third acquisition module, wherein the first acquisition module is used for acquiring a target image, and the target image comprises the face of a driver;

the second acquisition module is used for acquiring a first area image from the target image, wherein the first area image comprises a preset behavior of the driver, and the preset behavior refers to a behavior with similarity between the behavior and an illegal behavior larger than a preset threshold;

the image processing module is used for carrying out reduction processing on the peripheral area of the first area image according to a first proportion threshold value in the target image to obtain a second area image, and carrying out expansion processing on the peripheral area of the first area image according to a second proportion threshold value to obtain a third area image;

an identification module to identify a behavior of the driver based on the first area image, the second area image, and the third area image.

12. A smart device, comprising:

a processor;

a memory for storing processor-executable instructions;

wherein the processor is configured to implement the steps of any of the methods of claims 1-10.

13. A computer-readable storage medium having instructions stored thereon, wherein the instructions, when executed by a processor, implement the steps of any of the methods of claims 1-10.