CN109426793A

CN109426793A - A kind of image behavior recognition methods, equipment and computer readable storage medium

Info

Publication number: CN109426793A
Application number: CN201710780212.XA
Authority: CN
Inventors: 王勃飞
Original assignee: ZTE Corp
Current assignee: ZTE Corp
Priority date: 2017-09-01
Filing date: 2017-09-01
Publication date: 2019-03-05

Abstract

The invention discloses an image behavior recognition method, equipment and computer-readable storage medium. The method includes: dividing the sub-components in the target area in the to-be-recognized image, and determining the area where each sub-component is located; extracting the feature of each sub-component from the area where each sub-component is located, and determining the target belongs to according to the feature of each sub-component category of behavior. The invention performs image recognition through the local area full convolution network LRFCN model, and can effectively improve the recognition accuracy under the condition of only increasing the calculation overhead.

Description

Image behavior recognition method, device and computer-readable storage medium

技术领域technical field

本发明涉及图像处理技术领域，特别是涉及一种图像行为识别、设备及计算机可读存储介质。The present invention relates to the technical field of image processing, and in particular, to an image behavior recognition, a device and a computer-readable storage medium.

背景技术Background technique

近年来，随着监控电子设备在各个领域的不断普及，更加高效的从监控视频中监测有价值信息的需求日益凸显。传统的监测方法是采用人工进行监测，但人工监测视频的方法效率低、准确度难以保证，所以迫切需要一种能够智能判别视频中的行为的方法，并能够对视频中的感兴趣行为进行检测。In recent years, with the continuous popularization of surveillance electronic equipment in various fields, the demand for more efficient monitoring of valuable information from surveillance video has become increasingly prominent. The traditional monitoring method is to use manual monitoring, but the method of manual monitoring of video is inefficient and the accuracy is difficult to guarantee, so there is an urgent need for a method that can intelligently discriminate the behavior in the video, and can detect the behavior of interest in the video. .

发明内容SUMMARY OF THE INVENTION

本发明提供一种图像行为识别方法、设备及计算机可读存储介质，用以在电池容量不变情况下，终端实现功耗控制的问题。The present invention provides an image behavior recognition method, device and computer-readable storage medium, which are used to realize the problem of power consumption control of a terminal under the condition of constant battery capacity.

为实现上述发明目的，本发明采用下述的技术方案：In order to realize the above-mentioned purpose of the invention, the present invention adopts the following technical scheme:

依据本发明的一个方面，提供一种图像行为识别方法，所述方法包括：According to one aspect of the present invention, there is provided an image behavior recognition method, the method comprising:

对待识别图像中的目标所在区域进行子部件的划分，确定每个子部件所在的区域；Divide the sub-components in the area where the target in the image to be recognized is located, and determine the area where each sub-component is located;

从所述每个子部件所在的区域中提取每个子部件的特征，根据每个子部件的特征确定所述目标所属的行为类别。The feature of each subcomponent is extracted from the area where each subcomponent is located, and the behavior category to which the target belongs is determined according to the feature of each subcomponent.

可选地，所述对待识别图像中的目标所在区域进行子部件的划分，确定每个子部件所在的区域，包括：Optionally, the sub-component is divided into the area where the target in the image to be recognized is located, and the area where each sub-component is located is determined, including:

根据预设的子部件平均比例值对所述目标所在区域进行子部件的划分；Divide the sub-components in the area where the target is located according to the preset average proportion of the sub-components;

利用区域分割算法对划分的每个子部件区域进行前景背景分割，得到子部件的前景分割结果。The foreground and background of each sub-component area are divided by the region segmentation algorithm, and the foreground segmentation result of the sub-component is obtained.

可选地，在根据预设的子部件平均比例值对所述目标所在区域进行子部件的划分之前，还包括：Optionally, before the sub-components are divided into the region where the target is located according to a preset average ratio of sub-components, the method further includes:

对样本数据集中的包含目标的图像进行子部件的标注；Label the sub-components of the images containing the target in the sample data set;

根据标注的子部件的区域，确定子部件所占图像的比例值；Determine the proportion of the image occupied by the sub-component according to the marked area of the sub-component;

统计所述样本数据集中相同子部件所在图像的比例值的和值，根据所述和值确定所述所述子部件平均比例值，其中，所述子部件平均比例值为不同子部件的和值的比值。Counting the sum value of the scale values of the images where the same sub-component is located in the sample data set, and determining the average scale value of the sub-component according to the sum value, wherein the average scale value of the sub-component is the sum value of different sub-components ratio.

可选地，所述区域分割算法包括以下至少一种：GrabCut算法、GraphCut 算法以及RandomWalker算法。Optionally, the region segmentation algorithm includes at least one of the following: GrabCut algorithm, GraphCut algorithm and RandomWalker algorithm.

可选地，所述从所述每个子部件所在的区域中提取每个子部件的特征，根据每个子部件的特征确定所述目标所属的行为类别，包括：Optionally, the feature of each subcomponent is extracted from the area where each subcomponent is located, and the behavior category to which the target belongs is determined according to the feature of each subcomponent, including:

对所述子部件所在的区域和所述目标所在区域分别进行特征提取；Perform feature extraction on the area where the sub-component is located and the area where the target is located;

将子部件提取的特征与所述目标所在区域提取的特征进行级联，级联后的特征作为所述目标特征；The features extracted by the subcomponents are cascaded with the features extracted from the region where the target is located, and the cascaded features are used as the target features;

根据所述目标特征从预设分类模型中确定所述目标所属的行为类别。The behavior category to which the target belongs is determined from a preset classification model according to the target feature.

可选地，所述根据所述目标特征从预设分类模型中确定所述目标所属的行为类别，包括：Optionally, determining the behavior category to which the target belongs from a preset classification model according to the target feature, including:

根据所述目标特征从预设分类模型确定所属每种行为类别的概率；Determine the probability of belonging to each behavior category from the preset classification model according to the target feature;

选取所述概率最大的行为类别作为所述目标所属的行为类别。The behavior category with the highest probability is selected as the behavior category to which the target belongs.

可选地，在根据所述目标特征从预设分类模型中确定所述目标所属的行为类别之前，还包括：Optionally, before determining the behavior category to which the target belongs from the preset classification model according to the target feature, the method further includes:

获取预训练分类模型；Get the pre-trained classification model;

建立包含多类行为的样本数据集，并对所述样本数据集目标所在区域、行为类别以及子部件的所在区域进行标注；基于标注的样本数据集对所述预训练分类模型进行训练，得到所述预设分类模型。Establish a sample data set containing multiple types of behaviors, and label the target area, behavior category, and sub-components of the sample data set; train the pre-training classification model based on the labeled sample data set, and obtain the Describe the preset classification model.

可选地，所述得到所述预设的分类模型之后，所述方法还包括：Optionally, after obtaining the preset classification model, the method further includes:

对所述样本数据集中的图像进行裁剪，以对所述样本数据集进行扩充；cropping the images in the sample data set to augment the sample data set;

根据扩充后的样本数据集对能量损失函数进行优化，得到优化后的预设分类模型。The energy loss function is optimized according to the expanded sample data set, and an optimized preset classification model is obtained.

依据本发明的一个方面，提供一种图像行为识别设备，包括：存储器和处理器；其中，所述存储器中存储计算机指令，当所述计算机指令被所述处理器执行时，以实现上述的图像行为识别方法中的全部步骤和部分步骤。According to one aspect of the present invention, there is provided an image behavior recognition device, comprising: a memory and a processor; wherein, the memory stores computer instructions, and when the computer instructions are executed by the processor, to realize the above image All and some of the steps in the behavior recognition method.

依据本发明的一个方面，提供一种计算机可读存储介质，所述计算机可读存储介质存储有一个或者多个程序，当所述一个或者多个程序被所述处理器执行时，以实现上述的图像行为识别方法中的全部步骤和部分步骤。According to one aspect of the present invention, a computer-readable storage medium is provided, and the computer-readable storage medium stores one or more programs, when the one or more programs are executed by the processor, to achieve the above All steps and some steps in the image behavior recognition method.

本发明有益效果如下：The beneficial effects of the present invention are as follows:

本发明实施例所提供的图像行为识别方法、设备及计算机可读存储介质，采用局部区域全卷积网络对池化过程进行了改进，通过将识别的目标所在区域进行子部件的划分，根据子部件得到的特征来确定最终的行为类别。因此，本发明通过对局部特征提取，只增加极少计算开销的情况下，可以有效提高识别的精确度。The image behavior recognition method, device, and computer-readable storage medium provided by the embodiments of the present invention improve the pooling process by using a local area full convolution network. The resulting characteristics of the component are used to determine the final behavior category. Therefore, the present invention can effectively improve the recognition accuracy by extracting local features and only increasing the computational overhead.

上述说明仅是本发明技术方案的概述，为了能够更清楚了解本发明的技术手段，而可依照说明书的内容予以实施，并且为了让本发明的上述和其它目的、特征和优点能够更明显易懂，以下特举本发明的具体实施方式。The above description is only an overview of the technical solutions of the present invention, in order to be able to understand the technical means of the present invention more clearly, it can be implemented according to the content of the description, and in order to make the above and other purposes, features and advantages of the present invention more obvious and easy to understand , the following specific embodiments of the present invention are given.

附图说明Description of drawings

为了更清楚地说明本发明实施例或现有中的方案，下面将对实施例或现有描述中所需要使用的附图作一简单地介绍，显而易见地，下面描述中的附图仅仅是本发明的一些实施例，对于本领域普通人员来讲，在不付出创造性劳动性的前提下，还可以根据这些附图获得其他的附图。In order to explain the embodiments of the present invention or the existing solutions more clearly, the following will briefly introduce the accompanying drawings that need to be used in the embodiments or existing descriptions. Obviously, the accompanying drawings in the following description are only the For some embodiments of the invention, for those of ordinary skill in the art, other drawings can also be obtained from these drawings without any creative effort.

图1为本发明实施例中所提供的图像行为识别方法的流程图；1 is a flowchart of an image behavior recognition method provided in an embodiment of the present invention;

图2为本发明实施例中所提供的图像行为识别方法的网络结构图；2 is a network structure diagram of an image behavior recognition method provided in an embodiment of the present invention;

图3为本发明实施例中特征级联的示意图；3 is a schematic diagram of feature cascade in an embodiment of the present invention;

图4为本发明实施例中所提供的图像行为识别设备的原理框图。FIG. 4 is a principle block diagram of an image behavior recognition device provided in an embodiment of the present invention.

具体实施方式Detailed ways

以下结合附图以及实施例，对本发明进行进一步详细说明。应当理解，此处所描述的具体实施例仅仅用以解释本发明，并不限定本发明。The present invention will be further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are only used to illustrate the present invention, but not to limit the present invention.

在计算机视觉领域，有诸多方法可以用于行为识别，但是，在许多情况下，背景建模、前景目标检测和跟踪的实时性和精度难以达到要求。而深度学习作为机器学习的一个新分支，在实时性和精确性上有了良好的改善。在目标检测领域，有一些典型的深度学习模型方案，主要分为两类，一类是基于回归的方法如YOLO(You Only Look Once)、SSD(SingleShot Multibox Detector)等，这类方法效率相对较高但精度有限，一类是基于候选区的方法，如faster RCNN (region-based convolutional neural networks)、RFCN(Region-based Fully Convolutional Networks)等，这类方法精度更高但效率有所降低。In the field of computer vision, there are many methods for behavior recognition, but in many cases, the real-time performance and accuracy of background modeling, foreground object detection and tracking are difficult to meet the requirements. As a new branch of machine learning, deep learning has made good improvements in real-time and accuracy. In the field of target detection, there are some typical deep learning model solutions, which are mainly divided into two categories. One is regression-based methods such as YOLO (You Only Look Once), SSD (SingleShot Multibox Detector), etc. These methods are relatively efficient. High but limited accuracy. One is the method based on candidate regions, such as faster RCNN (region-based convolutional neural networks), RFCN (Region-based Fully Convolutional Networks), etc. These methods have higher accuracy but lower efficiency.

考虑到行为识别问题和目标检测问题具有一定的相似性，但难度更大，因此，本发明选择在目前识别精度最高的RFCN的基础上进行改进，提出了一种基于局部区域全卷积网络(Local Region-based Fully Convolutional Networks，简称LRFCN)的行为识别方法，用于视频的行为识别。Considering that the behavior recognition problem and the target detection problem have a certain similarity, but the difficulty is greater, the present invention chooses to improve on the basis of the RFCN with the highest recognition accuracy at present, and proposes a local area based fully convolutional network ( The behavior recognition method of Local Region-based Fully Convolutional Networks (LRFCN for short) is used for video behavior recognition.

方法实施例Method embodiment

本发明实施例所提供的图像行为识别方法，如图1和图2所示，具体包括如下步骤：The image behavior recognition method provided by the embodiment of the present invention, as shown in FIG. 1 and FIG. 2 , specifically includes the following steps:

步骤101，对待识别图像中的目标所在区域进行子部件的划分，确定每个子部件所在的区域。Step 101 , sub-components are divided into the region where the target in the image to be recognized is located, and the region where each sub-component is located is determined.

步骤102，从每个子部件所在的区域中提取每个子部件的特征，根据每个子部件的特征确定目标所属的行为类别。Step 102: Extract the feature of each subcomponent from the area where each subcomponent is located, and determine the behavior category to which the target belongs according to the feature of each subcomponent.

本发明实施例将目标所在区域进行子部件的划分，并根据子部件提取的特征来确定目标所属最终的行为类别，基于此，本发明通过采用局部特征进行识别，只增加极少计算开销的情况下，可以有效提高识别的精确度。In the embodiment of the present invention, the sub-components are divided into the region where the target is located, and the final behavior category to which the target belongs is determined according to the features extracted by the sub-components. Based on this, the present invention uses local features for identification, and only increases the calculation overhead. can effectively improve the recognition accuracy.

其中，本发明一可选实施例中，在对待识别图像的目标所在区域(也称为感兴趣区域RoI)进行识别时，可以采用区域推荐网络RPN进行感兴趣区域 RoI的识别。对于“区域推荐网络RPN”已属于本领域人员所熟知的技术，这里不再进行说明。当然也可以采用其他识别技术进行感兴趣区域的识别，这里不做过多的限定。其中，在待识别图像的目标所在区域进行识别之前，对待识别图像进行归一化处理，以使图像经过归一化处理后能够得到统一形式的标准图像。Wherein, in an optional embodiment of the present invention, when identifying the region where the target of the image to be identified (also referred to as the region of interest RoI) is identified, the region recommendation network RPN may be used to identify the region of interest RoI. The "regional recommendation network RPN" is a technology well known to those skilled in the art, and will not be described here. Of course, other identification technologies can also be used to identify the region of interest, which is not limited here. Wherein, before the target area of the to-be-recognized image is identified, the to-be-recognized image is subjected to a normalization process, so that a standard image in a unified form can be obtained after the image is normalized.

其中，本发明一可选实施例中，对待识别图像中的目标所在区域进行子部件的划分时，包括根据预设的子部件平均比例值对目标所在区域进行子部件的划分；利用区域分割算法对划分的每个子部件区域进行前景背景分割，得到子部件的前景分割结果。Wherein, in an optional embodiment of the present invention, when the target area in the image to be recognized is divided into sub-components, the sub-components are divided into the target area according to a preset average ratio value of the sub-components; using a region segmentation algorithm Perform foreground and background segmentation on each sub-component area divided to obtain the foreground segmentation result of the sub-component.

这里通过子部件平均比例值对子部件所在区域进行初步划分。其中，子部件平均比例值是根据分类模型(LRFCN模型)训练时的样本数据集来确定，以保证该值的准确性。Here, the area where the sub-component is located is preliminarily divided by the average scale value of the sub-component. Wherein, the average proportion value of the sub-components is determined according to the sample data set during the training of the classification model (LRFCN model) to ensure the accuracy of the value.

具体地，在根据预设的子部件平均比例值对所述目标所在区域进行子部件的划分之前，需要获取该子部件平均比例值。这里确定子部件平均比例的方式，包括如下：Specifically, before the sub-component is divided into the region where the target is located according to the preset average proportion value of the sub-component, the average proportion value of the sub-component needs to be acquired. Here is how to determine the average proportion of sub-components, including the following:

统计样本数据集中相同子部件所在图像的比例值的和值，根据所述和值确定所述所述子部件平均比例值，其中，所述子部件平均比例值为不同子部件的和值的比值。Counting the sum of the scale values of the images where the same sub-component is located in the sample data set, and determining the average scale value of the sub-component according to the sum value, wherein the average scale value of the sub-component is the ratio of the sum values of different sub-components .

具体地，子部件平均比例值的计算方式如下：Specifically, the calculation method of the average scale value of the subcomponent is as follows:

其中，(part1)_i+(part2)_i+…(partk)_i＝1；partk_i为第i个目标中k子部件所占目标所在区域的比例；k为子部件的个数；n为训练库中样本数据集所包含的目标数。Among them, (part1) _i +(part2) _i +...(partk) _i =1; partk _i is the proportion of the k sub-components in the ith target in the area where the target is located; k is the number of sub-components; n is the training The number of targets contained in the sample dataset in the library.

也是说，本发明中对样本数据集里头所有目标的子部件1、子部件2……子部件K所占ROI的比例分别求和，而后根据该比例和来确定各个子部件之间平均比例值。例如，一具体实施例将人体分为头、身体以及下肢三部分。对样本数据集里头所有人的头部、身体、下肢比例求平均，假设第i个人归一化后的比例为Head_i:BodyUp_i:BodyDown_i，其中，Head_i+BodyUp_i+BodyDown_i＝1, 若样本数据集一共有n个人，则平均比例值为 That is to say, in the present invention, the proportions of subcomponents 1, subcomponents 2, ... subcomponents K in the ROI of all targets in the sample data set are summed up respectively, and then the average proportion between the subcomponents is determined according to the sum of the proportions. . For example, a specific embodiment divides the human body into three parts: the head, the body, and the lower limbs. Average the proportions of heads, bodies, and lower limbs of all people in the sample data set, assuming that the normalized proportion of the ith person is Head _i :BodyUp _i :BodyDown _i , where Head _i +BodyUp _i +BodyDown _i =1 , if there are n people in the sample data set, the average proportion is

这里，为了保证子部件区域划分的精确性，需要对进一步对初步划分的区域进行精确分割。具体地，在进行分割时，采用区域分割算法来排除背景干扰，以在每个子部件区域中区分出背景及前景，得到子部件的前景分割结果。Here, in order to ensure the accuracy of the sub-component area division, it is necessary to further accurately divide the preliminarily divided area. Specifically, during segmentation, a region segmentation algorithm is used to eliminate background interference, so as to distinguish the background and the foreground in each sub-component region, and obtain the foreground segmentation result of the sub-component.

其中，优选的，区域分割算法采用GrabCut算法、GraphCut算法或者 RandomWalker算法中的任一种。当然还可以采用其他算法实现，这里不再进行介绍，不脱离本发明核心思想，都在本发明保护范围内。这里，以GrabCut 算法为例，对分割的具体实现过程进行说明。Wherein, preferably, the region segmentation algorithm adopts any one of GrabCut algorithm, GraphCut algorithm or RandomWalker algorithm. Of course, other algorithms can also be used for implementation, which will not be introduced here, without departing from the core idea of the present invention, which are all within the protection scope of the present invention. Here, the GrabCut algorithm is taken as an example to describe the specific implementation process of segmentation.

首先，定义一个能量函数E描述分割的优化目标，其公式表示如下：First, an energy function E is defined to describe the optimization objective of segmentation, and its formula is as follows:

E(α,k,θ,z)＝U(α,k,θ,z)+V(α,z)E(α,k,θ,z)=U(α,k,θ,z)+V(α,z)

其中，U函数表示能量函数的区域数据项，V函数表示能量函数的光滑项(边界项)；α为图片初始化标签(背景标签为0，前景标签为1)，k为采用GMM (混合高斯模型)的高斯分量的个数，θ为GMM的统计学参数(高斯分量的权重、均值向量、协方差矩阵)，z为子部件的图片数据。Among them, the U function represents the area data item of the energy function, the V function represents the smooth item (boundary item) of the energy function; α is the image initialization label (the background label is 0, the foreground label is 1), and k is the GMM (Gaussian Mixture Model). ) number of Gaussian components, θ is the statistical parameters of GMM (weight of Gaussian components, mean vector, covariance matrix), z is the picture data of the sub-component.

然后，求解该能量函数的min-cut最小割，就能得到前景背景的分割像素集合。Then, by solving the min-cut minimum cut of the energy function, the segmentation pixel set of the foreground and background can be obtained.

其中，本发明一可选实施例中，从每个子部件所在的区域中提取每个子部件的特征，根据每个子部件的特征确定目标所属的行为类别，包括：Wherein, in an optional embodiment of the present invention, the feature of each subcomponent is extracted from the area where each subcomponent is located, and the behavior category to which the target belongs is determined according to the feature of each subcomponent, including:

对子部件所在的区域和目标所在区域分别进行特征提取；Feature extraction is performed on the area where the sub-component is located and the area where the target is located;

将子部件提取的特征与目标所在区域提取的特征进行级联，级联后的特征作为目标特征；Concatenate the features extracted from the subcomponents and the features extracted from the target area, and the cascaded features are used as the target features;

根据目标特征从预设的分类模型中确定目标所属的行为类别。According to the characteristics of the target, the behavior category to which the target belongs is determined from the preset classification model.

具体地，在提取每个子部件的特征时，将子部件所在区域的像素与卷积核进行卷积，卷积后的值即为子部件的特征。但因为LRFCN网络通常有很多层，即卷积操作可以迭代了很多次。所以其实际对应到最初原始图像中的范围已经比分割结果的区域要大了。Specifically, when extracting the features of each subcomponent, the pixels in the region where the subcomponent is located are convolved with the convolution kernel, and the value after convolution is the feature of the subcomponent. But because LRFCN networks usually have many layers, the convolution operation can be iterated many times. Therefore, the range actually corresponding to the original original image is already larger than the area of the segmentation result.

其中，为了使得图像识别的精确性，在提取每个子部件特征时，同时提取目标所在区域的整体特征。例如，图3所示，主要对头部、身体、下肢三个部分区域对应的局部特征，和整个人体区域对应的全局特征进行一个串联组合，构成最终用于对整个区域描述的特征。Among them, in order to ensure the accuracy of image recognition, when extracting the characteristics of each sub-component, the overall characteristics of the region where the target is located are simultaneously extracted. For example, as shown in Figure 3, the local features corresponding to the three partial regions of the head, the body, and the lower limbs are mainly combined in series with the global features corresponding to the entire human body region to form the features that are finally used to describe the entire region.

基于此可知，这里通过将每个子部件特征(局部池化)与整个RoI区域提取的特征(整体池化)进行级联，只需增加介绍计算开销的情况下，可以使得图像识别的特征数有所增加，有效提高图像的识别精度。当然，本发明一可选实施例中，还可以通过每个子部件特征的级联特征作为目标特征进行识别，相对于基于整个RoI区域提取的特征进行识别，也可以有效提高图像的识别精度。Based on this, it can be seen that by cascading the features of each sub-component (local pooling) and the features extracted from the entire RoI region (overall pooling), only the introduction of computational overhead can be added. The number of features for image recognition can be The increase can effectively improve the recognition accuracy of the image. Of course, in an optional embodiment of the present invention, the cascading feature of each sub-component feature can also be used as the target feature for identification. Compared with the feature extracted based on the entire RoI area, the identification accuracy of the image can also be effectively improved.

其中，本发明一可选实施例中，根据目标特征从预设的分类模型中确定目标所属的行为类别，包括：确定目标特征所属每种行为类别的概率；选取概率最大的行为类别作为目标的所属行为类别。Wherein, in an optional embodiment of the present invention, determining the behavior category to which the target belongs from a preset classification model according to the target feature includes: determining the probability of each behavior category to which the target feature belongs; selecting the behavior category with the highest probability as the target’s behavior category Belonging to the behavior category.

进一步的，本发明一实施例中，在根据所述目标特征从预设分类模型中确定目标所属的行为类别之前，需要确定预设分类模型(LRFCN模型)。这里，确定确定预设分类模型的方式，具体包括如下：Further, in an embodiment of the present invention, before determining the behavior category to which the target belongs from the preset classification model according to the target feature, a preset classification model (LRFCN model) needs to be determined. Here, the method for determining the preset classification model is determined, which specifically includes the following:

获得预训练分类模型；Get a pretrained classification model;

建立包含多类行为的样本数据集，并对样本数据集目标所在区域、行为类别以及子部件的所在区域进行标注，基于标注的样本数据集对预训练分类模型进行训练，得到预设分类模型。A sample data set containing multiple types of behaviors is established, and the target area, behavior category, and sub-component area of the sample data set are marked, and the pre-trained classification model is trained based on the marked sample data set to obtain a preset classification model.

这里，在获得预训练分类模型时，是通过大型数据库训练得到，例如 ImageNet这一较大的数据库。具体地，建立多类行为样本数据集时，该数据集中所有图像在背景、拍摄角度、光照、图片尺度方法都要有一定的差异性。接着通过人工的方式对图像中的目标区域，行为类型，以及各个子部件的区域进行标注。而后通过标注的样本数据集在对预训练模型进行训练，来对LRFCN 模型中的参数进行调整。Here, when obtaining the pre-trained classification model, it is obtained by training on a large database, such as ImageNet, a large database. Specifically, when establishing a multi-class behavior sample dataset, all images in the dataset must have certain differences in background, shooting angle, illumination, and image scale methods. Then manually annotate the target area, behavior type, and the area of each sub-component in the image. Then, the parameters in the LRFCN model are adjusted by training the pre-training model through the labeled sample data set.

进一步的，可选的，在基于标注的样本数据集对预训练分类模型进行训练，得到预设的分类模型之后，该方法还包括：Further, optionally, after the pre-trained classification model is trained based on the labeled sample data set to obtain a preset classification model, the method further includes:

通过对样本数据集中的图像进行随机裁剪，以对样本数据集进行扩充；根据扩充后的样本数据集对能量损失函数进行优化，得到优化后的预设的分类模型。The sample data set is expanded by randomly cropping the images in the sample data set; the energy loss function is optimized according to the expanded sample data set, and an optimized preset classification model is obtained.

具体地，在训练LRFCN模型时，能量损失函数为交叉熵损失与边界框回归损失的和，如以下公式所示：Specifically, when training the LRFCN model, the energy loss function is the sum of the cross-entropy loss and the bounding box regression loss, as shown in the following formula:

其中，s为各类的softmax响应，t^*代表预测结果相对ground truth的偏移， t为预测结果相对预置框的偏移。c^*＝0说明RoI的标签为背景，当c^*>0时 [c^*>0]＝1，否则为0。L_reg表示边界框损失，r_c表示该RoI第c类的空间位置的分数平均池化，其具体计算方法如以下公式所示：Among them, s is the softmax response of various types, t ^* represents the offset of the prediction result relative to the ground truth, and t is the offset of the prediction result relative to the preset frame. c ^* =0 indicates that the label of RoI is the background, when c ^* >0 [c ^* >0]=1, otherwise it is 0. L _reg represents the loss of the bounding box, and rc represents the fractional average pooling of the spatial location of the RoI class _c . The specific calculation method is shown in the following formula:

L_reg(t,t^*)＝R(t-t^*) _Lreg (t,t ^* )=R(tt ^* )

t_x＝(x-x_a)/w_a,t_y＝(y-y_a)/h_a,t_w＝log(w/w_a), t_h＝log(h/h_a)t _x =(xx _a )/w _a , t _y =(yy _a )/h _a , t _w =log(w/w _a ), t _h =log(h/h _a )

t_x ^*＝(x^*-x_a)/w_a,t_y ^*＝(y^*-y_a)/h_a,t_w ^*＝log(w^*/w_a), t_h ^*＝log(h^*/h_a)t _x ^* =(x ^* -x _a )/w _a , t _y ^* =(y ^* -y _a )/h _a , t _w ^* =log(w ^* /w _a ), t _h ^* =log(h ^* /h _a )

其中，R是Smooth L1损失函数，x,y,w,h分别为预测边界框的中心点坐标和宽高，下标a的为预置框的中心点坐标与宽高，上标*的为ground truth的中心点坐标和宽高。Among them, R is the Smooth L1 loss function, x, y, w, h are the center point coordinates and width and height of the predicted bounding box, respectively, the subscript a is the center point coordinate and width and height of the preset box, and the superscript * is The center point coordinates and width and height of the ground truth.

基于上述可知，通过能量损失函数可以用来估量模型的预测值与真实值的不一致程度，能量损失函数越小，模型准确性就越好。因此，通过训练能量损失函数可以保证LRFCN模型的准确性，以提高识别的精确度。Based on the above, it can be seen that the energy loss function can be used to estimate the inconsistency between the predicted value of the model and the real value. The smaller the energy loss function, the better the model accuracy. Therefore, the accuracy of the LRFCN model can be guaranteed by training the energy loss function to improve the recognition accuracy.

以家庭监控视频中吃东西、看电视、玩电子设备、摔倒、虐待儿童等五类行为识别为例对本发明中LFRCN模型的训练过程进行说明：The training process of the LFRCN model in the present invention is described by taking five types of behavior recognition, such as eating, watching TV, playing with electronic equipment, falling, and child abuse in the home surveillance video as examples:

步骤201，建立包含多类行为的样本数据集。In step 201, a sample data set containing multiple types of behaviors is established.

这里，首先针对提出的问题，建立了一个包含吃东西、看电视、玩电子设备、摔倒、虐待儿童等五类行为的数据库，每类包含大约2000张，这些图像的样本都取自家庭监控视频。Here, in response to the questions raised, a database is established that includes five categories of behaviors, including eating, watching TV, playing with electronic devices, falling, and child abuse. Each category contains about 2,000 images. The samples of these images are all taken from home surveillance. video.

其次，随机选取其中的三分之二作为训练样本并放入训练库中，剩下的三分之一作为测试样本。所有图像包含的内容都来源于现实的家庭监控视频。Secondly, two-thirds of them are randomly selected as training samples and put into the training library, and the remaining one-third are used as test samples. All images contain content derived from real home surveillance video.

步骤202，人工对图像中的目标区域、行为类别，目标头部、身体、下肢等局部区域进行标注。具体地，包括如下：Step 202: Manually mark the target area, behavior category, and local areas such as the target head, body, and lower limbs in the image. Specifically, it includes the following:

步骤2021，以对图像中的目标进行人工标注ground truth，通过画框同时标注出目标区域和行为类别标签，如类别标签为0,1,2,3,4；Step 2021, manually annotate the ground truth of the target in the image, and simultaneously mark the target area and the behavior category label through the frame, for example, the category label is 0, 1, 2, 3, 4;

步骤2022，标定出样本图像中人体目标中头部、身体、下肢三个部分，并根据标定的结果，计算每个区域内头、身体和下肢所占的平均比例值；Step 2022, calibrating the head, body, and lower limbs of the human target in the sample image, and calculating the average proportion of the head, body, and lower limbs in each region according to the calibration result;

步骤2023，标定出样本图像中人体目标中头部、身体、下肢三个部分覆盖的具体像素位置，通过图像模板对具体像素位置进行记录。Step 2023, demarcate the specific pixel positions covered by the head, body, and lower limbs in the human target in the sample image, and record the specific pixel positions through the image template.

步骤203，得到预训练LFRCN网络模型。Step 203, obtaining a pre-trained LFRCN network model.

因为LFRCN网络模型中的神经网络包含大量参数，而自己建立的样本数据集中的样本数偏少，用样本数据集直接进行训练容易发生过拟合现象，故选择在ImageNet这一较大的数据库上先得到LFRCN网络模型，而后基于训练库对预训练LFRCN网络模型进行训练。Because the neural network in the LFRCN network model contains a large number of parameters, and the number of samples in the sample data set established by itself is relatively small, it is prone to overfitting when directly training with the sample data set, so it is selected on the larger database ImageNet. First obtain the LFRCN network model, and then train the pre-trained LFRCN network model based on the training library.

步骤204、基于训练库对预训练LFRCN网络模型进行训练，微调该网络模型的参数。该过程可以分为以下几个小步骤：Step 204: Train the pre-trained LFRCN network model based on the training library, and fine-tune the parameters of the network model. The process can be broken down into the following small steps:

步骤2041、将训练库中的图像的大小进行归一化，使图像的最大边小于 600；Step 2041, normalize the size of the images in the training library, so that the largest side of the image is less than 600;

步骤2042、将训练库中的每一幅图像都随机裁剪，进行数据库的扩充；Step 2042, randomly crop each image in the training library to expand the database;

由于网络参数较多而样本较少，为了避免过拟合，在训练时随机地从图像中裁剪的图像对训练库进行扩充用以网络训练，以增加样本数。Since there are more network parameters and fewer samples, in order to avoid overfitting, the training library is expanded by randomly cropping images from the images for network training to increase the number of samples.

步骤2043、优化上述的能量损失函数，得到最终的LFRCN网络模型。其中，在训练过程中，设定初始学习率为0.000001并按照0.5的丢失率随机地丢弃50％的参数。对于优化的过程，这里不再进行介绍，例如可通过最小二乘法和梯度下降法等。Step 2043: Optimize the above energy loss function to obtain the final LFRCN network model. Among them, in the training process, the initial learning rate is set to 0.000001 and 50% of the parameters are randomly discarded according to the loss rate of 0.5. The optimization process will not be introduced here, such as the least squares method and the gradient descent method.

基于上述可知，本发明所提出的基于局部区域全卷积神经网络(LRFCN) 的行为识别方法，能够非常准确地检测到视频中的目标行为，对于目前智能安防的技术空缺有一定程度的填补。Based on the above, it can be seen that the behavior recognition method based on the local area fully convolutional neural network (LRFCN) proposed by the present invention can very accurately detect the target behavior in the video, which can fill the technical gap of the current intelligent security to a certain extent.

设备实施例Device Embodiment

根据本发明的实施例，提供了一种图像行为识别设备，用于实现上述的图像行为识别方法。如图4所示。该设备括处理器42以及存储有处理器42可执行指令的存储器41。具体地，本发明实施例提供的图像行为识别设备，当存储器41中的可执行指令被处理器42执行时，以实现方法实施例中所提供的图像行为识别方法。需要说明的是，在设备实施例中，对于具体的实现不再进行赘述，可以参见方法实施例中的详细说明，在该实施例中不再进行赘述。According to an embodiment of the present invention, an image behavior recognition device is provided, which is used for realizing the above-mentioned image behavior recognition method. As shown in Figure 4. The device includes a processor 42 and a memory 41 storing instructions executable by the processor 42 . Specifically, in the image behavior recognition device provided in the embodiment of the present invention, when the executable instructions in the memory 41 are executed by the processor 42, the image behavior recognition method provided in the method embodiment is implemented. It should be noted that, in the device embodiment, the specific implementation will not be repeated, and reference may be made to the detailed description in the method embodiment, which will not be repeated in this embodiment.

其中，处理器42可以是通用处理器，例如中央处理器(central processing unit，CPU)，还可以是数字信号处理器(digital signal processor，DSP)、专用集成电路(application specific integrated circuit，ASIC)，或者是被配置成实施本发明实施例的一个或多个集成电路。The processor 42 may be a general-purpose processor, such as a central processing unit (CPU), a digital signal processor (DSP), or an application specific integrated circuit (ASIC), Or one or more integrated circuits configured to implement embodiments of the invention.

存储器41，用于存储程序代码，并将该程序代码传输给CPU。存储器41 可以包括易失性存储器(volatile memory)，例如随机存取存储器(random access memory，RAM)；存储器41也可以包括非易失性存储器(non-volatile memory)，例如只读存储器(read-onlymemory，ROM)、快闪存储器(flash memory)、硬盘(hard disk drive，HDD)或固态硬盘(solid-state drive，SSD)；存储器41还可以包括上述种类的存储器的组合。The memory 41 is used to store program codes and transmit the program codes to the CPU. The memory 41 may include a volatile memory (volatile memory) such as random access memory (RAM); the memory 41 may also include a non-volatile memory (non-volatile memory) such as a read-only memory (read-only memory). only memory (ROM), flash memory (flash memory), hard disk drive (HDD) or solid-state drive (SSD); the memory 41 may also include a combination of the above-mentioned types of memory.

存储介质实施例Storage medium embodiment

本发明实施例还提供了一种计算机可读存储介质。这里的计算机可读存储介质存储有一个或者多个程序。其中，计算机可读存储介质可以包括易失性存储器，例如随机存取存储器；存储器也可以包括非易失性存储器，例如只读存储器、快闪存储器、硬盘或固态硬盘；存储器还可以包括上述种类的存储器的组合。当计算机可读存储介质中一个或者多个程序可被一个或者多个处理器执行，以实现方法实施例所提供的图像行为识别方法中的全部步骤和部分步骤。对于步骤具体的实现，可以参见方法实施例中的详细说明，在该实施例中不再进行赘述。Embodiments of the present invention also provide a computer-readable storage medium. The computer-readable storage medium herein stores one or more programs. The computer-readable storage medium may include volatile memory, such as random access memory; the memory may also include non-volatile memory, such as read-only memory, flash memory, hard disk or solid-state hard disk; the memory may also include the above-mentioned types combination of memory. When one or more programs in the computer-readable storage medium can be executed by one or more processors, all the steps and part of the steps in the image behavior recognition method provided by the method embodiments are implemented. For the specific implementation of the steps, reference may be made to the detailed description in the method embodiment, which will not be repeated in this embodiment.

本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程，是可以通过计算机程序来指令相关的硬件来完成，程序可存储于计算机可读取存储介质中，该程序在执行时，可包括如上述各方法的实施例的流程。Those of ordinary skill in the art can understand that the realization of all or part of the processes in the methods of the above embodiments can be accomplished by instructing the relevant hardware through a computer program, and the program can be stored in a computer-readable storage medium, and when the program is executed, The processes of the embodiments of the various methods described above may be included.

虽然通过实施例描述了本申请，本领域的技术人员知道，本申请有许多变形和变化而不脱离本发明的精神和范围。这样，倘若本发明的这些修改和变型属于本发明权利要求及其等同技术的范围之内，则本发明也意图包含这些改动和变型在内。Although the application has been described by way of examples, those skilled in the art will recognize that many modifications and variations of the application are possible without departing from the spirit and scope of the invention. Thus, provided that these modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include these modifications and variations.

Claims

1. an image behavior recognition method, is characterized in that, comprises:

Divide the sub-components in the area where the target in the image to be recognized is located, and determine the area where each sub-component is located;

The feature of each subcomponent is extracted from the area where each subcomponent is located, and the behavior category to which the target belongs is determined according to the feature of each subcomponent.

2. The image behavior recognition method according to claim 1, wherein the sub-component is divided into the region where the target in the image to be recognized is located, and the region where each sub-component is located is determined, comprising:

Divide the sub-components in the area where the target is located according to the preset average proportion of the sub-components;

The foreground and background of each sub-component area are divided by the region segmentation algorithm, and the foreground segmentation result of the sub-component is obtained.

3. The image behavior recognition method according to claim 2, characterized in that, before carrying out the division of sub-components to the region where the target is located according to a preset sub-component average ratio value, further comprising:

Label the sub-components of the images containing the target in the sample data set;

Determine the proportion of the image occupied by the sub-component according to the marked area of the sub-component;

Counting the sum value of the scale values of the images where the same sub-component is located in the sample data set, and determining the average scale value of the sub-component according to the sum value, wherein the average scale value of the sub-component is the sum value of different sub-components ratio.

4 . The image behavior recognition method according to claim 2 , wherein the region segmentation algorithm comprises at least one of the following: GrabCut algorithm, GraphCut algorithm and RandomWalker algorithm. 5 .

5 . The image behavior recognition method according to claim 1 , wherein the feature of each subcomponent is extracted from the region where each subcomponent is located, and the behavior to which the target belongs is determined according to the feature of each subcomponent. 6 . categories, including:

Perform feature extraction on the area where the sub-component is located and the area where the target is located;

The features extracted by the subcomponents are cascaded with the features extracted from the region where the target is located, and the cascaded features are used as the target features;

The behavior category to which the target belongs is determined from a preset classification model according to the target feature.

6. The image behavior recognition method according to claim 5, wherein the determining the behavior category to which the target belongs from a preset classification model according to the target feature comprises:

Determine the probability of belonging to each behavior category from the preset classification model according to the target feature;

The behavior category with the highest probability is selected as the behavior category to which the target belongs.

7. The image behavior recognition method according to claim 5, wherein before determining the behavior category to which the target belongs from a preset classification model according to the target feature, further comprising:

Get the pre-trained classification model;

Establish a sample data set containing multiple types of behaviors, and label the target area, behavior category, and sub-components of the sample data set; train the pre-training classification model based on the labeled sample data set, and obtain the Describe the preset classification model.

8. The image behavior recognition method according to claim 7, wherein after obtaining the preset classification model, the method further comprises:

cropping the images in the sample data set to augment the sample data set;

The energy loss function is optimized according to the expanded sample data set, and an optimized preset classification model is obtained.

9 . An image behavior recognition device, comprising: a memory and a processor; wherein, the memory stores computer instructions, and when the computer instructions are executed by the processor, claims 1 to 8 are implemented. 10 . The steps in any one of the image behavior recognition methods.

10. A computer-readable storage medium, wherein the computer-readable storage medium stores one or more programs, when the one or more programs are executed by the processor, to realize claim 1 Steps in the image behavior recognition method according to any one of ~8.