CN111723667A - Method and device for crowd behavior recognition of smart light poles based on human body joint point coordinates - Google Patents
Method and device for crowd behavior recognition of smart light poles based on human body joint point coordinates Download PDFInfo
- Publication number
- CN111723667A CN111723667A CN202010432727.2A CN202010432727A CN111723667A CN 111723667 A CN111723667 A CN 111723667A CN 202010432727 A CN202010432727 A CN 202010432727A CN 111723667 A CN111723667 A CN 111723667A
- Authority
- CN
- China
- Prior art keywords
- behavior
- human
- joint point
- crowd
- human body
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/50—Context or environment of the image
- G06V20/52—Surveillance or monitoring of activities, e.g. for recognising suspicious objects
- G06V20/53—Recognition of crowd images, e.g. recognition of crowd congestion
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- Computing Systems (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Molecular Biology (AREA)
- Computational Linguistics (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Multimedia (AREA)
- Human Computer Interaction (AREA)
- Image Analysis (AREA)
- Image Processing (AREA)
Abstract
本发明涉及一种基于人体关节点坐标的智慧灯杆人群行为识别方法,具体包括以下步骤:步骤S1:构建人体行为识别的图像训练集及人体行为对应的人体关节点坐标数据集;步骤S2:智慧灯杆的摄像头获取人群行为信息并提取人体骨骼行为,根据人体关节点坐标数据集,对人群行为信息中缺失关节点的人体骨骼行为进行插补;步骤S3:人体骨骼行为设置时间注意力层,提取人体骨骼行为的特征后构建深度学习网络并设置相应的超参数;步骤S4:通过深度学习开源框架对深度学习网络进行训练,获得人群行为识别模型,通过人群行为识别模型对新获取的人群行为信息进行识别。与现有技术相比,本发明具有提高人体行为识别率、减少遮挡物对识别结果的影响等优点。
The present invention relates to a method for recognizing crowd behavior of smart light poles based on human body joint point coordinates, which specifically includes the following steps: step S1: constructing an image training set for human behavior recognition and a human body joint point coordinate data set corresponding to human behavior; step S2: The camera of the smart light pole obtains the crowd behavior information and extracts the human skeleton behavior, and interpolates the human skeletal behavior of the missing joint points in the crowd behavior information according to the human joint point coordinate data set; Step S3: human skeleton behavior setting time attention layer , after extracting the characteristics of human skeletal behavior, construct a deep learning network and set corresponding hyperparameters; Step S4: train the deep learning network through the deep learning open source framework to obtain a crowd behavior recognition model, and use the crowd behavior recognition model to identify the newly obtained crowd. Behavioural information for identification. Compared with the prior art, the present invention has the advantages of improving the recognition rate of human behavior, reducing the influence of obstructions on the recognition result, and the like.
Description
技术领域technical field
本发明涉及计算机视觉领域,尤其是涉及一种基于人体关节点坐标的智慧灯杆人群行为识别方法和装置。The invention relates to the field of computer vision, in particular to a method and a device for recognizing the behavior of smart light pole crowd based on human body joint point coordinates.
背景技术Background technique
随着计算机视觉和人工智能的飞速发展和应用,视频分析技术得到蓬勃兴起并广泛应用于人类生活的诸多领域之中。如智慧安防、人机交互、智能家居和智慧医疗等。而随着中国城市社区视频监控系统的不断完善,智慧社区也成为当下社会管理创新的一种新模式。智慧灯杆作为未来“AI+5G”智慧社区云控服务的边缘智能硬件,对智慧社区的建设与发展起到关键助推作用。基于智慧灯杆的监控视频,应用计算机视觉技术来达到自动检测人群行为动作,自动触发异常行为报警,将协同辅助智慧社区安防系统,发挥重要作用。With the rapid development and application of computer vision and artificial intelligence, video analysis technology has flourished and is widely used in many fields of human life. Such as smart security, human-computer interaction, smart home and smart medical care. With the continuous improvement of video surveillance systems in urban communities in China, smart communities have also become a new model of social management innovation. As the edge intelligent hardware for future "AI+5G" smart community cloud control services, smart light poles play a key role in promoting the construction and development of smart communities. Based on the surveillance video of smart light poles, computer vision technology is applied to automatically detect crowd behaviors and automatically trigger alarms for abnormal behaviors, which will play an important role in assisting the smart community security system.
目前人体行为识别技术的研究主要是通过提取视频样本数据中的与人体运动变化有关的特征信息,然后融合人体静态特征和动态特征,运用深度学习的方法构建深度时空神经网络,通过训练深度学习网络来实现视频信息的人体行为识别。然而,由于深度学习网络非常依赖样本数据量,并且训练时间和周期较长,所需硬件成本较高。另外,直接对原始视频进行特征提取,不可避免的会带入冗余信息,造成描述行为的表达不够精细。At present, the research on human behavior recognition technology mainly extracts the feature information related to human motion changes in the video sample data, then integrates the static and dynamic features of the human body, and uses the deep learning method to construct a deep spatiotemporal neural network. To realize human behavior recognition of video information. However, since the deep learning network is very dependent on the amount of sample data, and the training time and period are long, the required hardware cost is high. In addition, directly extracting features from the original video will inevitably bring in redundant information, resulting in an insufficiently detailed expression of the description behavior.
现有技术公开了一种基于视频的行人与人群行为识别方法,学习每个关节的点级特征,然后将每个关节的特征视为卷积层的通道,以学习层次共生特征,在单个行人行为识别联合网络结构中采用多部位的肢体网络特征融合到单个行人运动特征中加强单个行人的行为识别,但是未考虑每个骨架动作的时效性以及运行时间的快慢,导致识别结果存在一定的误差,精确度不高。The prior art discloses a video-based pedestrian and crowd behavior recognition method, which learns point-level features of each joint, and then treats the features of each joint as a channel of a convolutional layer to learn hierarchical co-occurrence features. In the joint network structure of behavior recognition, the multi-part limb network features are fused into the motion features of a single pedestrian to enhance the behavior recognition of a single pedestrian, but the timeliness of each skeleton action and the speed of the running time are not considered, resulting in a certain error in the recognition results. , the accuracy is not high.
发明内容SUMMARY OF THE INVENTION
本发明的目的就是为了克服上述现有技术存在的直接对原始视频进行特征提取、冗余信息较多导致描述行为的表达不够精细的缺陷而提供一种基于人体关节点坐标的智慧灯杆人群行为识别方法和装置。The purpose of the present invention is to provide a smart light pole crowd behavior based on the coordinates of human body joint points in order to overcome the defects of the above-mentioned prior art, such as direct feature extraction of the original video, too much redundant information, resulting in insufficiently precise expression of the description behavior. Identification method and apparatus.
本发明的目的可以通过以下技术方案来实现:The object of the present invention can be realized through the following technical solutions:
一种基于人体关节点坐标的智慧灯杆人群行为识别方法,具体包括以下步骤:A method for identifying crowd behaviors of smart light poles based on the coordinates of human joint points, which specifically includes the following steps:
步骤S1:构建人体行为识别的图像训练集及人体行为对应的人体关节点坐标数据集;Step S1: constructing an image training set for human behavior recognition and a human body joint point coordinate dataset corresponding to human behavior;
步骤S2:智慧灯杆的摄像头获取人群行为信息并提取人体骨骼行为,根据所述人体关节点坐标数据集,对人群行为信息中缺失关节点的所述人体骨骼行为进行插补;Step S2: the camera of the smart light pole obtains crowd behavior information and extracts human skeleton behavior, and performs interpolation on the human skeleton behavior with missing joint points in the crowd behavior information according to the human body joint point coordinate data set;
步骤S3:所述人体骨骼行为设置时间注意力层,提取人体骨骼行为的特征后构建深度学习网络并设置所述深度学习网络的超参数;Step S3: setting a time attention layer for the human skeletal behavior, extracting features of the human skeletal behavior to construct a deep learning network and setting hyperparameters of the deep learning network;
步骤S4:通过深度学习开源框架对所述深度学习网络进行训练,获得人群行为识别模型,通过人群行为识别模型对新获取的人群行为信息进行识别。Step S4: Train the deep learning network through the deep learning open source framework to obtain a crowd behavior recognition model, and identify the newly acquired crowd behavior information through the crowd behavior recognition model.
所述图像训练集为开源的行为识别数据集或所述智慧灯杆的摄像头拍摄的视频数据按帧记录的数据集。The image training set is an open-source behavior recognition data set or a data set recorded by frames of video data captured by the camera of the smart light pole.
所述图像训练集通过姿态估计框架提取出每个人体骨骼行为的18个关节点坐标信息,通过设置随机丢弃关节点概率值进行噪声处理后,组成所述人体关节点坐标数据集。The image training set extracts 18 joint point coordinate information of each human skeleton behavior through the pose estimation framework, and after noise processing is performed by setting the probability value of randomly discarding joint points, the human joint point coordinate data set is formed.
进一步地,所述步骤S2中对人群行为信息中缺失关节点的所述人体骨骼行为进行插补的过程具体包括:Further, the process of interpolating the human skeleton behavior with missing joint points in the crowd behavior information in the step S2 specifically includes:
步骤S201:提取关节点完整的人体骨骼行为的18个关节点坐标信息,计算其余17个关节点相对于颈部关节点的欧式距离及角度和完整人体高度;Step S201: extracting the coordinate information of 18 joint points of the complete human skeleton behavior of the joint points, and calculating the Euclidean distance and angle of the remaining 17 joint points relative to the neck joint point and the height of the complete human body;
步骤S202:计算缺失关节点的人体骨骼行为对应的缺失关节点的人体高度,根据完整人体高度与缺失关节点的人体高度之间的比例因子进行同比例缩放,对缺失关节点的人体骨骼行为的关节点坐标进行插补。Step S202: Calculate the human body height of the missing joint point corresponding to the human skeleton behavior of the missing joint point, and perform the same proportional scaling according to the scale factor between the complete human body height and the human body height of the missing joint point, and calculate the human skeletal behavior of the missing joint point. The joint point coordinates are interpolated.
所述人体关节点坐标数据集包括识别训练集和识别测试集,划分方法包括留出法和k折交叉验证法。The human body joint point coordinate data set includes a recognition training set and a recognition test set, and the division method includes a leave-out method and a k-fold cross-validation method.
所述留出法直接将人体关节点坐标数据集D划分为两个数据均匀分布且互斥的集合,其中一个集合作为识别训练集S,另一个作为识别测试集T,即D=S∪T, The set aside method directly divides the human body joint point coordinate data set D into two sets of uniformly distributed and mutually exclusive data, one of which is used as the recognition training set S, and the other is used as the recognition test set T, that is, D=S∪T ,
所述k折交叉验证法先将人体关节点坐标数据集D划分为k个大小相似、数据分布均匀的互斥子集,即D=D1∪D2∪...∪Dk,然后,每次用k-1个子集的并集作为识别训练集,余下的那个子集作为识别测试集。The k-fold cross-validation method first divides the human body joint point coordinate data set D into k mutually exclusive subsets with similar sizes and uniform data distribution, namely D=D 1 ∪D 2 ∪...∪D k , Then, each time the union of k-1 subsets is used as the recognition training set, and the remaining subset is used as the recognition test set.
所述时间注意力层包括多个时间步,时间注意力层的激活值的计算公式如下所示:The temporal attention layer includes multiple time steps, and the calculation formula of the activation value of the temporal attention layer is as follows:
其中,xt为当前时间步的输入,xt-1为上一个时间步的输入,wx~和b~为当前时间步的全连接层的权值和偏置,为上一个时间步LSTM单元的输出值,βt为当前时间步的时间注意力层激活值,ReLU为线性整流函数。Among them, x t is the input of the current time step, x t-1 is the input of the previous time step, w x ~ and b ~ are the weights and biases of the fully connected layer at the current time step, is the output value of the LSTM unit at the previous time step, β t is the activation value of the temporal attention layer at the current time step, and ReLU is a linear rectification function.
进一步地,所述时间注意力层激活值与深度学习网络的输出进行加权综合,共同参与人群行为识别模型的分类层进行分类,具体计算方式如下:Further, the activation value of the time attention layer and the output of the deep learning network are weighted and integrated, and participate in the classification of the classification layer of the crowd behavior recognition model. The specific calculation method is as follows:
其中,zt为深度学习网络输出值,βt为时间注意力层激活值,T为总视频序列长度,o为深度学习网络输出与时间注意力层激活值的加权综合。Among them, z t is the output value of the deep learning network, β t is the activation value of the temporal attention layer, T is the total video sequence length, and o is the weighted synthesis of the output value of the deep learning network and the activation value of the temporal attention layer.
进一步地,所述人群行为识别模型的分类层的预测结果概率值具体为:Further, the prediction result probability value of the classification layer of the crowd behavior recognition model is specifically:
其中,i=1,...C,C为人体骨骼行为的类别总数,oi为深度学习网络输出与时间注意力层激活值的加权综合,X为特定的人体行为。Among them, i=1,...C, C is the total number of categories of human skeletal behaviors, o i is the weighted synthesis of the deep learning network output and the activation value of the temporal attention layer, and X is a specific human behavior.
所述人体骨骼行为的特征包括关节点集合距离特征、慢速运动全局特征和快速运动全局特征。The features of the human skeleton behavior include joint point set distance features, slow motion global features and fast motion global features.
所述关节点坐标为二维坐标,表示为关节点集合距离特征的计算公式具体如下:The joint point coordinates are two-dimensional coordinates, which are expressed as The calculation formula of the distance feature of the joint point set is as follows:
其中,为在第k帧时两个关节点和的欧式距离,N为人体关节点总数,JCDk为在第k帧时关节点集合距离特征。in, is the two joint points at the kth frame and The Euclidean distance of , N is the total number of joint points of the human body, and JCD k is the distance feature of the joint point set at the kth frame.
慢速运动全局特征和快速运动全局特征的计算公式具体如下所示:The calculation formulas of the slow motion global feature and the fast motion global feature are as follows:
其中,和分别表示在第k帧时的慢速和快速运动的全局运动特征,Sk为第k帧的人体关节点坐标集合,Sk+1和Sk+2分别表示第k帧之后的后1帧和后2帧的人体关节点坐标集合,K为帧总数。in, and Represents the global motion features of slow and fast motion at the kth frame, respectively, Sk is the coordinate set of human body joints in the kth frame, and Sk+1 and Sk+2 represent the next frame after the kth frame, respectively. and the coordinate set of human body joints in the next 2 frames, K is the total number of frames.
在所述深度学习开源框架中,关节点集合距离特征、慢速运动全局特征和快速运动全局特征伸展成一维向量形式,通过卷积层函数和深度学习网络进行连接、融合,将融合后的特征输入到深度学习网络的多层感知机分类器中,在最后一层的网络层直接进行softmax分类。In the deep learning open source framework, the distance feature of the joint point set, the slow motion global feature and the fast motion global feature are extended into a one-dimensional vector form, connected and fused through the convolution layer function and the deep learning network, and the fused features are Input into the multi-layer perceptron classifier of the deep learning network, and directly perform softmax classification at the last layer of the network layer.
所述步骤S4中人群行为识别模型的识别结果为人体骨骼行为的分类标签。The recognition result of the crowd behavior recognition model in the step S4 is the classification label of human skeletal behavior.
所述超参数包括网络层数、隐含层节点数、卷积核参数、训练批次大小(Batch_size),学习率(learning_rate),Adam优化器参数(β1和β2)和训练批次周期(Epochs)。The hyperparameters include the number of network layers, the number of hidden layer nodes, convolution kernel parameters, training batch size (Batch_size), learning rate (learning_rate), Adam optimizer parameters (β 1 and β 2 ) and training batch period (Epochs).
一种使用所述基于人体关节点坐标的智慧灯杆人群行为识别方法的装置,包括存储器和处理器,所述识别方法以计算机程序的形式存储于存储器中,并由处理器执行,执行时实现以下步骤:A device for using the method for recognizing the behavior of smart light pole crowd based on the coordinates of human body joints, comprising a memory and a processor, the recognition method is stored in the memory in the form of a computer program, and is executed by the processor, and is realized when executed. The following steps:
步骤S1:构建人体行为识别的图像训练集及人体行为对应的人体关节点坐标数据集;Step S1: constructing an image training set for human behavior recognition and a human body joint point coordinate dataset corresponding to human behavior;
步骤S2:智慧灯杆的摄像头获取人群行为信息并提取人体骨骼行为,根据所述人体关节点坐标数据集,对人群行为信息中缺失关节点的所述人体骨骼行为进行插补;Step S2: the camera of the smart light pole obtains crowd behavior information and extracts human skeleton behavior, and performs interpolation on the human skeleton behavior with missing joint points in the crowd behavior information according to the human body joint point coordinate data set;
步骤S3:所述人体骨骼行为设置时间注意力层,提取人体骨骼行为的特征后构建深度学习网络并设置所述深度学习网络的超参数;Step S3: setting a time attention layer for the human skeletal behavior, extracting features of the human skeletal behavior to construct a deep learning network and setting hyperparameters of the deep learning network;
步骤S4:通过深度学习开源框架对所述深度学习网络进行训练,获得人群行为识别模型,通过人群行为识别模型对新获取的人群行为信息进行识别。Step S4: Train the deep learning network through the deep learning open source framework to obtain a crowd behavior recognition model, and identify the newly acquired crowd behavior information through the crowd behavior recognition model.
与现有技术相比,本发明具有以下有益效果:Compared with the prior art, the present invention has the following beneficial effects:
1.本发明通过提取人体关节点坐标信息的关节点集合距离特征、慢速运动全局特征和快速运动全局特征进行人体行为识别,充分考虑了人体行为变化的静态特征和动态特征。1. The present invention performs human behavior recognition by extracting the joint point set distance feature, slow motion global feature and fast motion global feature of the human body joint point coordinate information, and fully considers the static and dynamic features of human behavior changes.
2.本发明增设时间注意力层,为每一帧的人体骨骼关节点设置重要性参数,从而更加凸显关键帧的图像,根据软注意力机制的原理,又有别于简单提取关键帧和忽略其他帧的算法,将深度学习网络与时间注意力层结果加权综合送入分类器预测,使深度学习网络能够更快的进行网络推理和更准确地描述人体行为运动的变化。2. The present invention adds a temporal attention layer, and sets importance parameters for the human skeleton joint points of each frame, so that the image of the key frame is more prominent. According to the principle of the soft attention mechanism, it is different from simply extracting the key frame and ignoring it. For other frame algorithms, the results of the deep learning network and the temporal attention layer are weighted and integrated into the classifier for prediction, so that the deep learning network can perform network reasoning faster and describe the changes of human behavior and movement more accurately.
3.本发明设置随机关节点丢弃概率值增强数据样本,对日常人群中关节点部分遮挡问题具有更好的鲁棒性。3. The present invention sets random joint points to discard probability values to enhance data samples, and has better robustness to the problem of partial occlusion of joint points in everyday people.
附图说明Description of drawings
图1为本发明的结构框图;Fig. 1 is the structural block diagram of the present invention;
图2为本发明的流程示意图。FIG. 2 is a schematic flow chart of the present invention.
具体实施方式Detailed ways
下面结合附图和具体实施例对本发明进行详细说明。本实施例以本发明技术方案为前提进行实施,给出了详细的实施方式和具体的操作过程,但本发明的保护范围不限于下述的实施例。The present invention will be described in detail below with reference to the accompanying drawings and specific embodiments. This embodiment is implemented on the premise of the technical solution of the present invention, and provides a detailed implementation manner and a specific operation process, but the protection scope of the present invention is not limited to the following embodiments.
如图2所示,一种基于人体关节点坐标的智慧灯杆人群行为识别方法,具体包括以下步骤:As shown in Figure 2, a method for recognizing crowd behavior of smart light poles based on the coordinates of human joint points includes the following steps:
步骤S1:构建人体行为识别的图像训练集及人体行为对应的人体关节点坐标数据集;Step S1: constructing an image training set for human behavior recognition and a human body joint point coordinate dataset corresponding to human behavior;
步骤S2:智慧灯杆的摄像头获取人群行为信息并提取人体骨骼行为,根据人体关节点坐标数据集,对人群行为信息中缺失关节点的人体骨骼行为进行插补;Step S2: the camera of the smart light pole obtains the crowd behavior information and extracts the human skeleton behavior, and interpolates the human skeleton behavior of the missing joint points in the crowd behavior information according to the human body joint point coordinate data set;
步骤S3:人体骨骼行为设置时间注意力层,提取人体骨骼行为的特征后构建深度学习网络并设置深度学习网络的超参数;Step S3: setting a temporal attention layer for human skeletal behavior, extracting features of human skeletal behavior to construct a deep learning network and setting hyperparameters of the deep learning network;
步骤S4:通过深度学习开源框架对深度学习网络进行训练,获得人群行为识别模型,通过人群行为识别模型对新获取的人群行为信息进行识别。Step S4: train the deep learning network through the deep learning open source framework, obtain a crowd behavior recognition model, and identify the newly acquired crowd behavior information through the crowd behavior recognition model.
图像训练集为开源的行为识别数据集或智慧灯杆的摄像头拍摄的视频数据按帧记录的数据集。The image training set is an open source behavior recognition data set or a data set recorded by frames of video data captured by a camera of a smart light pole.
行为识别数据集包括UCF数据集、CUHK数据集、UMN数据集、JHMDB数据集或SHREC数据集。Behavior recognition datasets include UCF dataset, CUHK dataset, UMN dataset, JHMDB dataset or SHREC dataset.
图像训练集通过姿态估计框架提取出每个人体骨骼行为的18个关节点坐标信息,通过设置随机丢弃关节点概率值进行噪声处理后,组成人体关节点坐标数据集。The image training set extracts the coordinate information of 18 joint points of each human skeleton behavior through the pose estimation framework. After noise processing is performed by setting the probability value of randomly discarding joint points, a human joint point coordinate data set is formed.
步骤S2中对人群行为信息中缺失关节点的人体骨骼行为进行插补的过程具体包括:The process of interpolating the human skeleton behavior of missing joint points in the crowd behavior information in step S2 specifically includes:
步骤S201:提取关节点完整的人体骨骼行为的18个关节点坐标信息,计算其余17个关节点相对于颈部关节点的欧式距离及角度和完整人体高度;Step S201: extracting the coordinate information of 18 joint points of the complete human skeleton behavior of the joint points, and calculating the Euclidean distance and angle of the remaining 17 joint points relative to the neck joint point and the height of the complete human body;
步骤S202:计算缺失关节点的人体骨骼行为对应的缺失关节点的人体高度,根据完整人体高度与缺失关节点的人体高度之间的比例因子进行同比例缩放,对缺失关节点的人体骨骼行为的关节点坐标进行插补。Step S202: Calculate the human body height of the missing joint point corresponding to the human skeleton behavior of the missing joint point, and perform the same proportional scaling according to the scale factor between the complete human body height and the human body height of the missing joint point, and calculate the human skeletal behavior of the missing joint point. The joint point coordinates are interpolated.
人体关节点坐标数据集包括识别训练集和识别测试集,划分方法包括留出法和k折交叉验证法。The human joint point coordinate data set includes recognition training set and recognition test set, and the division methods include leave-out method and k-fold cross-validation method.
留出法直接将人体关节点坐标数据集D划分为两个数据均匀分布且互斥的集合,其中一个集合作为识别训练集S,另一个作为识别测试集T,即D=S∪T,留出法需要保持数据分布的一致性,比如对数据集做分层采样,将每一个行为类别对应的人体关节点坐标数据集70%的样本数据作为识别训练集,剩余30%的样本数据作为识别测试集。The set aside method directly divides the human body joint point coordinate data set D into two sets of uniformly distributed and mutually exclusive data, one of which is used as the recognition training set S, and the other is used as the recognition test set T, that is, D=S∪T, The set aside method needs to maintain the consistency of the data distribution, such as stratified sampling of the data set, 70% of the sample data of the human body joint point coordinate data set corresponding to each behavior category is used as the recognition training set, and the remaining 30% of the sample data is used as the recognition training set. Identify the test set.
k折交叉验证法先将人体关节点坐标数据集D划分为k个大小相似、数据分布均匀的互斥子集,即D=D1∪D2∪...∪Dk,然后,每次用k-1个子集的并集作为识别训练集,余下的那个子集作为识别测试集。k折交叉验证法可以获得k组识别训练集和识别测试集,从而可进行k次训练和测试,最终返回k个测试结果的均值。The k-fold cross-validation method first divides the human body joint point coordinate data set D into k mutually exclusive subsets with similar sizes and uniform data distribution, namely D=D 1 ∪D 2 ∪...∪D k , Then, each time the union of k-1 subsets is used as the recognition training set, and the remaining subset is used as the recognition test set. The k-fold cross-validation method can obtain k groups of identification training sets and identification test sets, so that k times of training and testing can be performed, and finally the mean value of k test results is returned.
时间注意力层包括多个时间步,时间注意力层的激活值的计算公式如下所示:The temporal attention layer includes multiple time steps, and the calculation formula of the activation value of the temporal attention layer is as follows:
其中,xt为当前时间步的输入,xt-1为上一个时间步的输入,wx~和b~为当前时间步的全连接层的权值和偏置,为上一个时间步LSTM单元的输出值,βt为当前时间步的时间注意力层激活值,ReLU为线性整流函数。Among them, x t is the input of the current time step, x t-1 is the input of the previous time step, w x ~ and b ~ are the weights and biases of the fully connected layer at the current time step, is the output value of the LSTM unit at the previous time step, β t is the activation value of the temporal attention layer at the current time step, and ReLU is a linear rectification function.
时间注意力层激活值与深度学习网络的输出进行加权综合,共同参与人群行为识别模型的分类层进行分类,具体计算方式如下:The activation value of the temporal attention layer and the output of the deep learning network are weighted and integrated, and jointly participate in the classification layer of the crowd behavior recognition model for classification. The specific calculation method is as follows:
其中,zt为深度学习网络输出值,βt为时间注意力层激活值,T为总视频序列长度,o为深度学习网络输出与时间注意力层激活值的加权综合。Among them, z t is the output value of the deep learning network, β t is the activation value of the temporal attention layer, T is the total video sequence length, and o is the weighted synthesis of the output value of the deep learning network and the activation value of the temporal attention layer.
人群行为识别模型的分类层的预测结果概率值具体为:The prediction result probability value of the classification layer of the crowd behavior recognition model is specifically:
其中,i=1,...C,C为人体骨骼行为的类别总数,oi为深度学习网络输出与时间注意力层激活值的加权综合,X为特定的人体行为。Among them, i=1,...C, C is the total number of categories of human skeletal behaviors, o i is the weighted synthesis of the deep learning network output and the activation value of the temporal attention layer, and X is a specific human behavior.
人体骨骼行为的特征包括关节点集合距离特征、慢速运动全局特征和快速运动全局特征。The features of human skeletal behavior include joint point set distance features, slow motion global features and fast motion global features.
关节点坐标为二维坐标,表示为关节点集合距离特征的计算公式具体如下:The joint point coordinates are two-dimensional coordinates, which are expressed as The calculation formula of the distance feature of the joint point set is as follows:
其中,为在第k帧时两个关节点和的欧式距离,N为人体关节点总数,JCDk为在第k帧时关节点集合距离特征。in, is the two joint points at the kth frame and The Euclidean distance of , N is the total number of joint points of the human body, and JCD k is the distance feature of the joint point set at the kth frame.
慢速运动全局特征和快速运动全局特征的计算公式具体如下所示:The calculation formulas of the slow motion global feature and the fast motion global feature are as follows:
其中,和分别表示在第k帧时的慢速和快速运动的全局运动特征,Sk为第k帧的人体关节点坐标集合,Sk+1和Sk+2分别表示第k帧之后的后1帧和后2帧的人体关节点坐标集合,K为帧总数。in, and Represents the global motion features of slow and fast motion at the kth frame, respectively, Sk is the coordinate set of human body joints in the kth frame, and Sk+1 and Sk+2 represent the next frame after the kth frame, respectively. and the coordinate set of human body joints in the next 2 frames, K is the total number of frames.
如图1所示,深度学习开源框架为Tensorflow框架,其中API为Keras,在Tensorflow深度学习开源框架中,关节点集合距离特征、慢速运动全局特征和快速运动全局特征伸展成一维向量形式,将关节点集合距离特征(JCD)和慢速、快速运动的双尺度的全局运动特征进行维度匹配,运用线性插值的思想,将慢速、快速运动的双尺度的全局运动特征和维度变换成和 As shown in Figure 1, the deep learning open source framework is the Tensorflow framework, and the API is Keras. In the Tensorflow deep learning open source framework, the joint point set distance feature, slow motion global feature and fast motion global feature are extended into a one-dimensional vector form. The joint point set distance feature (JCD) is dimensionally matched with the dual-scale global motion features of slow and fast motion. Using the idea of linear interpolation, the dual-scale global motion features of slow and fast motion are combined. and transform the dimension into and
通过卷积层函数和深度学习网络进行连接、融合,具体融合过程如下:The convolution layer function and the deep learning network are used for connection and fusion. The specific fusion process is as follows:
其中,和为关节点集合距离特征、慢速运动全局特征和快速运动全局特征的特征值,εk为融合后的特征值,为融合运算操作。in, and is the eigenvalue of the distance feature of the joint point set, the global feature of slow motion and the global feature of fast motion, εk is the eigenvalue after fusion, For the fusion operation.
将融合后的特征输入到深度学习网络的多层感知机分类器中,在最后一层的网络层直接进行softmax分类。The fused features are input into the multi-layer perceptron classifier of the deep learning network, and the softmax classification is directly performed in the last layer of the network layer.
人群行为识别模型采用的训练优化器为Adam,采用的损失函数为带有正则化项的交叉熵损失函数(Cross Entropy),具体计算公式如下:The training optimizer used by the crowd behavior recognition model is Adam, and the loss function used is the Cross Entropy loss function with a regularization term. The specific calculation formula is as follows:
其中,yi表示目标值,yi_表示实际输出值,n为行为类别总数,λ1和λ2为权重,T为总视频帧数,βt为时间注意力层激活值,W为多层感知机分类器权重。where y i represents the target value, y i _ represents the actual output value, n is the total number of action categories, λ 1 and λ 2 are the weights, T is the total number of video frames, β t is the temporal attention layer activation value, and W is the number of Layer perceptron classifier weights.
步骤S4深度学习开源框架对深度学习网络进行训练的过程中包括根据交叉熵损失函数求出行为分类误差,启动Adam优化器训练,反向传播行为分类误差进行权值更新,直至交叉熵损失损失函数收敛或完成所有的训练批次周期,输出此时对应的人群行为识别模型。Step S4 The process of training the deep learning network by the deep learning open source framework includes obtaining the behavior classification error according to the cross entropy loss function, starting the Adam optimizer training, and back-propagating the behavior classification error to update the weights until the cross entropy loss function. Convergence or complete all training batch cycles, and output the corresponding crowd behavior recognition model at this time.
步骤S4中人群行为识别模型的识别结果为人体骨骼行为的分类标签。The recognition result of the crowd behavior recognition model in step S4 is the classification label of human skeleton behavior.
超参数包括网络层数、隐含层节点数、卷积核参数、训练批次大小(Batch_size),学习率(learning_rate),Adam优化器参数(β1和β2)和训练批次周期(Epochs)。Hyperparameters include the number of network layers, the number of hidden layer nodes, convolution kernel parameters, training batch size (Batch_size), learning rate (learning_rate), Adam optimizer parameters (β 1 and β 2 ) and training batch period (Epochs ).
对人群行为识别模型的分析包括绘制模型训练准确率曲线和测试准确率曲线来判断网络性能和绘制模型测试ROC曲线或混淆矩阵来评价模型的泛化能力,网络性能包括计算速度、识别准确率以及是否过拟合或欠拟合。The analysis of the crowd behavior recognition model includes drawing the model training accuracy curve and test accuracy curve to judge the network performance and drawing the model test ROC curve or confusion matrix to evaluate the generalization ability of the model. The network performance includes calculation speed, recognition accuracy and Whether to overfit or underfit.
一种使用基于人体关节点坐标的智慧灯杆人群行为识别方法的装置,包括存储器和处理器,识别方法以计算机程序的形式存储于存储器中,并由处理器执行,执行时实现以下步骤:A device for using a method for recognizing behaviors of smart light pole crowds based on the coordinates of human joint points, comprising a memory and a processor, the recognition method is stored in the memory in the form of a computer program, and is executed by the processor, and the following steps are implemented during execution:
步骤S1:构建人体行为识别的图像训练集及人体行为对应的人体关节点坐标数据集;Step S1: constructing an image training set for human behavior recognition and a human body joint point coordinate dataset corresponding to human behavior;
步骤S2:智慧灯杆的摄像头获取人群行为信息并提取人体骨骼行为,根据人体关节点坐标数据集,对人群行为信息中缺失关节点的人体骨骼行为进行插补;Step S2: the camera of the smart light pole obtains the crowd behavior information and extracts the human skeleton behavior, and interpolates the human skeleton behavior of the missing joint points in the crowd behavior information according to the human body joint point coordinate data set;
步骤S3:人体骨骼行为设置时间注意力层,提取人体骨骼行为的特征后构建深度学习网络并设置深度学习网络的超参数;Step S3: setting a temporal attention layer for human skeletal behavior, extracting features of human skeletal behavior to construct a deep learning network and setting hyperparameters of the deep learning network;
步骤S4:通过深度学习开源框架对深度学习网络进行训练,获得人群行为识别模型,通过人群行为识别模型对新获取的人群行为信息进行识别。Step S4: train the deep learning network through the deep learning open source framework, obtain a crowd behavior recognition model, and identify the newly acquired crowd behavior information through the crowd behavior recognition model.
此外,需要说明的是,本说明书中所描述的具体实施例,所取名称可以不同,本说明书中所描述的以上内容仅仅是对本发明结构所做的举例说明。凡依据本发明构思的构造、特征及原理所做的等效变化或者简单变化,均包括于本发明的保护范围内。本发明所属技术领域的技术人员可以对所描述的具体实例做各种各样的修改或补充或采用类似的方法,只要不偏离本发明的结构或者超越本权利要求书所定义的范围,均应属于本发明的保护范围。In addition, it should be noted that the names of the specific embodiments described in this specification may be different, and the above content described in this specification is only an example to illustrate the structure of the present invention. All equivalent changes or simple changes made according to the structures, features and principles of the present invention are included in the protection scope of the present invention. Those skilled in the art to which the present invention pertains can make various modifications or additions to the specific examples described or adopt similar methods, as long as they do not deviate from the structure of the present invention or go beyond the scope defined by the claims, all It belongs to the protection scope of the present invention.
Claims (10)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202010432727.2A CN111723667A (en) | 2020-05-20 | 2020-05-20 | Method and device for crowd behavior recognition of smart light poles based on human body joint point coordinates |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202010432727.2A CN111723667A (en) | 2020-05-20 | 2020-05-20 | Method and device for crowd behavior recognition of smart light poles based on human body joint point coordinates |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| CN111723667A true CN111723667A (en) | 2020-09-29 |
Family
ID=72564764
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN202010432727.2A Pending CN111723667A (en) | 2020-05-20 | 2020-05-20 | Method and device for crowd behavior recognition of smart light poles based on human body joint point coordinates |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN111723667A (en) |
Cited By (8)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN112329541A (en) * | 2020-10-10 | 2021-02-05 | 南京理工大学 | Crowd behavior identification method based on storyboard relation model |
| CN112836824A (en) * | 2021-03-04 | 2021-05-25 | 上海交通大学 | Method, system and medium for unsupervised learning of monocular 3D human pose |
| CN113158861A (en) * | 2021-04-12 | 2021-07-23 | 杭州电子科技大学 | A Motion Analysis Method Based on Prototype Contrastive Learning |
| CN113283373A (en) * | 2021-06-09 | 2021-08-20 | 重庆大学 | Method for enhancing detection of limb motion parameters by depth camera |
| CN114792401A (en) * | 2021-01-26 | 2022-07-26 | 中国移动通信有限公司研究院 | Training method, device and equipment of behavior recognition model and storage medium |
| CN114842401A (en) * | 2022-05-25 | 2022-08-02 | 广州智能科技发展有限公司 | Method and system for capturing and classifying human body actions |
| CN115713721A (en) * | 2022-11-29 | 2023-02-24 | 同济大学 | Behavior posture recognition method based on dual-channel video collaborative perception |
| CN118019188A (en) * | 2024-01-30 | 2024-05-10 | 深圳联恒智控科技有限公司 | Human behavior recognition method and system based on intelligent spotlight |
Citations (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN110135249A (en) * | 2019-04-04 | 2019-08-16 | 华南理工大学 | Human bodys' response method based on time attention mechanism and LSTM |
| CN110826453A (en) * | 2019-10-30 | 2020-02-21 | 西安工程大学 | Behavior identification method by extracting coordinates of human body joint points |
-
2020
- 2020-05-20 CN CN202010432727.2A patent/CN111723667A/en active Pending
Patent Citations (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN110135249A (en) * | 2019-04-04 | 2019-08-16 | 华南理工大学 | Human bodys' response method based on time attention mechanism and LSTM |
| CN110826453A (en) * | 2019-10-30 | 2020-02-21 | 西安工程大学 | Behavior identification method by extracting coordinates of human body joint points |
Non-Patent Citations (2)
| Title |
|---|
| FAN YANG ETC.: ""Make Skeleton-based Action Recognition Model Smaller,Faster and Better"", 《PROCEEDINGS OF THE ACM MULTIMEDIA ASIA》 * |
| SIJIE SONG ETC.: ""An End-to-End Spatio-Temporal Attention Model for Human Action Recognition for Skeleton Data"", 《ARXIV:1611.06067V1[CS.CV]》 * |
Cited By (9)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN112329541A (en) * | 2020-10-10 | 2021-02-05 | 南京理工大学 | Crowd behavior identification method based on storyboard relation model |
| CN114792401A (en) * | 2021-01-26 | 2022-07-26 | 中国移动通信有限公司研究院 | Training method, device and equipment of behavior recognition model and storage medium |
| CN112836824A (en) * | 2021-03-04 | 2021-05-25 | 上海交通大学 | Method, system and medium for unsupervised learning of monocular 3D human pose |
| CN113158861A (en) * | 2021-04-12 | 2021-07-23 | 杭州电子科技大学 | A Motion Analysis Method Based on Prototype Contrastive Learning |
| CN113158861B (en) * | 2021-04-12 | 2024-02-13 | 杭州电子科技大学 | Motion analysis method based on prototype comparison learning |
| CN113283373A (en) * | 2021-06-09 | 2021-08-20 | 重庆大学 | Method for enhancing detection of limb motion parameters by depth camera |
| CN114842401A (en) * | 2022-05-25 | 2022-08-02 | 广州智能科技发展有限公司 | Method and system for capturing and classifying human body actions |
| CN115713721A (en) * | 2022-11-29 | 2023-02-24 | 同济大学 | Behavior posture recognition method based on dual-channel video collaborative perception |
| CN118019188A (en) * | 2024-01-30 | 2024-05-10 | 深圳联恒智控科技有限公司 | Human behavior recognition method and system based on intelligent spotlight |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN111723667A (en) | Method and device for crowd behavior recognition of smart light poles based on human body joint point coordinates | |
| Defard et al. | Padim: a patch distribution modeling framework for anomaly detection and localization | |
| CN109858390B (en) | Human skeleton behavior recognition method based on end-to-end spatiotemporal graph learning neural network | |
| CN111259786B (en) | Pedestrian re-identification method based on synchronous enhancement of appearance and motion information of video | |
| CN112307995B (en) | Semi-supervised pedestrian re-identification method based on feature decoupling learning | |
| CN113221641B (en) | Video pedestrian re-identification method based on generation of antagonism network and attention mechanism | |
| Pan et al. | A deep spatial and temporal aggregation framework for video-based facial expression recognition | |
| CN111783540B (en) | A method and system for human action recognition in video | |
| Asadi et al. | A convolution recurrent autoencoder for spatio-temporal missing data imputation | |
| CN114049381A (en) | A Siamese Cross-Target Tracking Method Fusing Multi-layer Semantic Information | |
| CN110097000A (en) | Video behavior recognition methods based on local feature Aggregation Descriptor and sequential relationship network | |
| CN109753897B (en) | Behavior recognition method based on memory cell reinforcement-time sequence dynamic learning | |
| CN106909938B (en) | Perspective-independent behavior recognition method based on deep learning network | |
| CN109743642B (en) | Video abstract generation method based on hierarchical recurrent neural network | |
| Ma et al. | Multi-view time-series hypergraph neural network for action recognition | |
| CN111597929A (en) | Group Behavior Recognition Method Based on Channel Information Fusion and Group Relationship Spatial Structured Modeling | |
| CN111931549A (en) | Human skeleton action prediction method based on multitask non-autoregressive decoding | |
| Cai et al. | Video based emotion recognition using CNN and BRNN | |
| CN107609509A (en) | A kind of action identification method based on motion salient region detection | |
| CN112668438A (en) | Infrared video time sequence behavior positioning method, device, equipment and storage medium | |
| CN116665276A (en) | A face alignment method and system based on local-global fusion attention | |
| CN113762082B (en) | Unsupervised skeleton action recognition method based on cyclic graph convolution automatic encoder | |
| Aikyn et al. | Efficient facial expression recognition framework based on edge computing. | |
| Li et al. | A weakly-supervised crowd density estimation method based on two-stage linear feature calibration | |
| CN114092746A (en) | Multi-attribute identification method and device, storage medium and electronic equipment |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| RJ01 | Rejection of invention patent application after publication | ||
| RJ01 | Rejection of invention patent application after publication |
Application publication date: 20200929 |