CN112101176B - User identity recognition method and system combining user gait information - Google Patents
User identity recognition method and system combining user gait information Download PDFInfo
- Publication number
- CN112101176B CN112101176B CN202010943184.0A CN202010943184A CN112101176B CN 112101176 B CN112101176 B CN 112101176B CN 202010943184 A CN202010943184 A CN 202010943184A CN 112101176 B CN112101176 B CN 112101176B
- Authority
- CN
- China
- Prior art keywords
- user
- space
- time
- gait
- skeleton
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/20—Movements or behaviour, e.g. gesture recognition
- G06V40/23—Recognition of whole body movements, e.g. for sport training
- G06V40/25—Recognition of walking or running movements, e.g. gait recognition
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2413—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
- G06F18/24133—Distances to prototypes
- G06F18/24137—Distances to cluster centroïds
- G06F18/2414—Smoothing the distance, e.g. radial basis function networks [RBFN]
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
- G06F18/253—Fusion techniques of extracted features
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Computation (AREA)
- Life Sciences & Earth Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Software Systems (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Computing Systems (AREA)
- Biomedical Technology (AREA)
- Mathematical Physics (AREA)
- Psychiatry (AREA)
- Social Psychology (AREA)
- Human Computer Interaction (AREA)
- Multimedia (AREA)
- Image Analysis (AREA)
Abstract
The invention provides a user identity recognition method and a system combining user gait information, wherein the method comprises the following steps: and carrying out gesture detection on the pedestrian object of each frame in the video sequence of the original data set by using a two-dimensional gesture estimation system, and extracting gesture information. And then preprocessing the extracted joint coordinate sequence to generate a human skeleton data set. And finally, constructing a space-time diagram convolution network model, dividing a skeleton diagram into six subgraphs, sharing joints between the subgraphs, using a diagram convolution network learning identification model, training by using a constructed data set, adopting a multi-loss strategy combining classification loss and contrast loss, optimizing network parameters by using random gradient descent, and predicting the accuracy of the trained model by using a verification set. The invention fully utilizes the effective information of the articulation point, reserves the motion state in the time dimension as much as possible, has higher robustness to the change and carrying state of clothing, and has good generalization capability in the task of crossing visual angles.
Description
Technical Field
The invention belongs to the field of gait recognition in computer vision, and particularly relates to a user identity recognition method and system combined with user gait information.
Background
In the task of human recognition, there are a variety of biological features, such as iris, fingerprint, face, etc., gait also belongs to a behavioral biological feature. Compared with other biological characteristics, gait has the advantages of difficult stealing and imitation due to the unique non-contact characteristic, is particularly suitable for long-distance human identification, and attracts more and more attention in the field of video monitoring. Gait recognition has heretofore remained a very challenging problem because it relies on video sequences taken in a controlled or uncontrolled environment, as the appearance characteristics of pedestrians can change over time, changes in the capture perspective can also greatly change the appearance of a person during walking, and is affected by factors such as clothing and footwear changes, walking surface, walking speed and emotional conditions.
Existing gait recognition methods can be divided into two main categories. The first is a model-based method, which uses human body features to manually fit pictures of each frame by studying and analyzing gait videos or gait contours, and models according to the structure of the human body and the local motion patterns of different body parts. The method is used for extracting dynamic or static information related to gait in the early traditional gait recognition research work, and has the advantages of high requirements on an extracted original data set, huge model parameters, high calculation cost and poor effect due to complex method. The second category is appearance-based methods that extract gait representations directly from the video without explicit consideration of the body's infrastructure. In such methods, gait energy images (Gait Engery Image, GEI) are the most common input, as it achieves a good compromise between recognition rate and computational simplicity. The gait energy pattern is a gait pattern formed by all contours in a gait cycle according to a certain rule, which mixes dynamic information and static information in a sequence of contours, and the energy of each pixel in the pattern is obtained by calculating the average pixel of the contours in a gait cycle, but the contours of a person are easily changed in shape by the influence of covariates like clothes and carrying objects, and the like, which directly leads to the reduction of the recognition rate.
In view of these problems, several deformation methods of the GEI have been mainly focused on improvement of dynamic parts to mitigate the influence of appearance changes caused by clothing and carrying conditions. On the other hand, due to the recent explosive development of deep learning in the field of computer vision, particularly, a deep convolutional neural network has achieved very excellent performance for processing various tasks, and in recent years, research work for gait recognition based on deep learning has been endless. The algorithm complexity of the non-model method is much smaller than that of the model-based algorithm and is high in calculation efficiency, however, the method still has a great challenge to the covariates affecting gait recognition performance, including observation angle, clothing change, walking speed, carrying state, resolution and the like. In the past, bones have been successfully used in the fields of object recognition, as well as human behavior recognition, video tracking, pedestrian re-recognition, etc., and have achieved excellent performance. The existing method based on the gesture key points always models skeleton data into vector sequences or pseudo images, and then sends the vector sequences or pseudo images into a convolutional neural network (Convolutional Neural Networks, CNN) or a cyclic neural network (Recurrent Neural Network, RNN) for processing, and the method links the key points into a characteristic vector only at each time step and lacks of full utilization of effective information of human body joints.
Disclosure of Invention
Aiming at the defects of the prior art, the invention aims to provide a user identity recognition method and a system combined with user gait information, and aims to solve the problems that the profile of a person in the existing gait task is easily influenced by covariates such as clothes and carrying objects to change the shape to directly cause the reduction of recognition rate and the effective information of a human body joint point is not fully utilized in the traditional algorithm based on the gesture key points.
In order to achieve the above object, in a first aspect, the present invention provides a user identification method combined with gait information of a user, including the following steps:
determining a gait dataset of the user; carrying out gesture estimation on each video frame in the gait data set, and determining gesture key point coordinates of each video frame; the gait data set is a plurality of video frames containing gait information of a user;
determining a user bone dataset based on the user gesture keypoint coordinates; the user skeleton data set comprises coordinates of each node of the user;
connecting coordinates of each node of the user based on the user bone data set according to the bone structure of the user to construct a user bone space-time topological graph;
inputting the skeleton space-time topological graph of the user into a space-time graph convolution network model, and combining gait information of different users stored in the space-time graph convolution network model in advance to identify the identity information of the user.
In an alternative embodiment, the gesture keypoints correspond to multiple nodes of the user; the user skeleton data set is determined based on the user gesture key point coordinates, specifically:
normalizing the gesture key point coordinates of each video frame based on the central positions of the two nodes of the neck and the hip of the user to obtain the coordinates of each node of the normalized user, and taking the coordinates of each node of the normalized user as a user skeleton data set.
In an optional embodiment, the coordinates of each node of the user are connected according to the skeleton structure of the user based on the user skeleton data set, so as to construct a user skeleton space-time topological graph, specifically:
connecting joint points in a video frame on a spatial domain of the single video frame according to a user skeleton structure, dividing the connected joint points of the user into six parts including a head, a trunk, a left arm, a right arm, a left leg and a right leg by using a spatial diagram in the single video frame, and forming six subgraphs with shared vertexes and shared edges; the same node of adjacent video frames are connected to form a time sequence edge of a time-space diagram; and repeating the two steps for all video frames to obtain all time sequence edges and joint points of all video frames to jointly form a user skeleton space-time diagram.
In an alternative embodiment, the space-time diagram convolutional network model comprises a multi-layer backbone network, each layer backbone network comprises a space domain convolutional network SGCN and a time domain convolutional network TCN, and the SGCN and the TCN transmit features in adjacent serial connection;
in SGCN, after the input features pass through a convolution layer of a convolution kernel, extracting high-dimensional differentiation space features under the interaction of the joint points in a first-order neighborhood in the input features by combining an attention mechanism, and inputting the high-dimensional differentiation space features into TCN;
in TCN, the high-dimensional differentiation space features are subjected to normalization time domain feature distribution by a batch normalization layer (BN layer), and are activated by utilizing a linear rectification function, so that the high-dimensional differentiation space features are used as input of a convolution layer of a convolution kernel, and finally effective expression of user joint features in a plurality of continuous time domains is realized, and the high-dimensional features in space dimension and time sequence dimension are extracted;
the nonlinear mapping from the input feature space to the high-dimensional feature space is completed through stacking and cascading multi-layer backbone networks, and high-dimensional differentiation features are obtained;
outputting the high-dimensional differentiation characteristics by using a pooling layer and a full-connection layer; the high-dimensional differentiation feature is used to identify identity information of the user.
In an alternative embodiment, the attention mechanism is used to enhance the saliency and distinguishability of the extracted spatiotemporal features;
the attention mechanism distributes different weights to each joint point of the user, focuses on the joint point with relatively large effect, ignores the joint point with relatively small effect and selects the effective joint point of the gait feature; the attention mechanism is developed by adding a learnable mask before input to the space-time diagram convolutional network model.
In an alternative embodiment, the space-time diagram convolution network model combines classification loss and contrast loss, reduces the eigenvalue distance in the user gait, and increases the difference between the user gaits;
the classification loss function uses Softmax loss as a supervision signal to provide class center information for the space-time diagram convolution network model, and meanwhile, uses contrast loss to constrain the relations between classes.
In a second aspect, the present invention provides a user identification system in combination with user gait information, comprising:
a gait data determining unit for determining a gait data set of the user; carrying out gesture estimation on each video frame in the gait data set, and determining gesture key point coordinates of each video frame; the gait data set is a plurality of video frames containing gait information of a user;
a bone data determining unit for determining a user bone data set based on the user gesture key point coordinates; the user skeleton data set comprises coordinates of each node of the user;
the skeleton topology construction unit is used for connecting the coordinates of each node of the user based on the user skeleton data set according to the skeleton structure of the user so as to construct a user skeleton space-time topological graph;
the user identity identification unit is used for inputting the user skeleton space-time topological graph into a space-time graph convolution network model and identifying the identity information of the user by combining gait information of different users stored in the space-time graph convolution network model in advance.
In an optional embodiment, the gesture key points determined by the gait data determining unit correspond to a plurality of nodes of the user;
the skeleton data determining unit normalizes the gesture key point coordinates of each video frame based on the central positions of the two joint points of the neck and the hip of the user to obtain the coordinates of each joint point of the normalized user, and the coordinates of each joint point of the normalized user are used as a user skeleton data set.
In an optional embodiment, the skeleton topology building unit connects the joint points in the video frame according to the skeleton structure of the user on the spatial domain of the single video frame, and divides the connected joint points of the user into six parts including a head, a trunk, a left arm, a right arm, a left leg and a right leg by using the spatial graph in the single video frame to form six subgraphs with shared vertexes and shared edges; the same node of adjacent video frames are connected to form a time sequence edge of a time-space diagram; and repeating the two steps for all video frames to obtain all time sequence edges and joint points of all video frames to jointly form a user skeleton space-time diagram.
In an alternative embodiment, the space-time diagram convolutional network model comprises a multi-layer backbone network, each layer backbone network comprises a space domain convolutional network SGCN and a time domain convolutional network TCN, and the SGCN and the TCN transmit features in adjacent serial connection; in SGCN, after the input features pass through a convolution layer of a convolution kernel, extracting high-dimensional differentiation space features under the interaction of the joint points in a first-order neighborhood in the input features by combining an attention mechanism, and inputting the high-dimensional differentiation space features into TCN; in TCN, the high-dimensional differentiation space features normalize time domain feature distribution through a batch normalization layer, and are activated by utilizing a linear rectification function, so that the high-dimensional differentiation space features are used as input of a convolution layer of a convolution kernel, effective expression of user joint features in a plurality of continuous time domains is finally realized, and high-dimensional features in space dimension and time sequence dimension are extracted; the nonlinear mapping from the input feature space to the high-dimensional feature space is completed through stacking and cascading multi-layer backbone networks, and high-dimensional differentiation features are obtained; outputting the high-dimensional differentiation characteristics by using a pooling layer and a full-connection layer; the high-dimensional differentiation feature is used to identify identity information of the user.
In general, the above technical solutions conceived by the present invention have the following beneficial effects compared with the prior art:
the invention provides a user identity recognition method and a system combining user gait information, which can more effectively extract the space information and time sequence information of a joint point by combining graph convolution and gesture key points, and can obtain the beneficial effect of acquiring gait characteristics with more distinguishing capability. The present invention optimizes the network in combination with an attention mechanism and a multiple loss policy. The attention adding mechanism strengthens the saliency of the extracted space-time characteristics, combines classification loss and contrast loss, reduces intra-gait distance and increases inter-gait difference.
The invention provides a user identity recognition method and a system combining user gait information, and researches an effective method capable of simulating dynamic bones to solve gait recognition tasks, and captures information from graph nodes and links thereof, so that the system has generalization capability and fault tolerance capability of the system is improved. A space-time diagram is constructed from the joint point sequence to dynamically model the skeleton sequence, so that the space-time diagram convolution network can automatically learn the space characteristics and time sequence information of human skeletons in walking, focus on modeling gait dynamics and eliminating the influence of pedestrian appearance on recognition, and partition the diagram (namely, dividing the diagram into a plurality of sub-diagrams sharing the joint points) to learn advanced attributes among different parts and relations among the different parts, so that fusion of local and global information is realized. The method adopts the organic combination of classification loss and contrast verification loss, effectively utilizes different relations between the identity information of the targets and the targets, and increases the degree of distinction between the features.
Drawings
FIG. 1 is a flow chart of a user identification method combining user gait information provided by the invention;
FIG. 2 is a flowchart illustrating a partition map rolling operation according to an embodiment of the present invention;
FIG. 3 is a block diagram of a space-time diagram convolutional network model provided by an embodiment of the present invention;
fig. 4 is a schematic diagram of a user identification system combined with user gait information according to the present invention.
Detailed Description
The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.
The invention provides a user identity recognition method and a system combining user gait information, wherein the method comprises the following steps: firstly, an open-source two-dimensional attitude estimation system is utilized to carry out attitude detection on pedestrian objects of each frame in a video sequence of an original dataset, and attitude information is extracted. And then, performing a series of preprocessing operations on the extracted joint coordinate sequences to generate a human skeleton data set for gait recognition, and preparing for model training of subsequent gait recognition tasks. Finally, a space-time diagram convolution network model is constructed, in order to capture advanced semantic information, a skeleton diagram is divided into six subgraphs, joints are shared between the subgraphs, a part-based diagram convolution network learning recognition model is used, so that the performance is effectively improved, a constructed data set is utilized for training, a multi-loss strategy combining classification loss and comparison loss is adopted as a loss function, network parameters are optimized by using random gradient descent, and the accuracy of the trained model is predicted by using a verification set. The method can fully utilize the effective information of the node, retain the motion state in the time dimension to a greater extent, has higher robustness to the change and carrying state of clothing, and has good generalization capability in the task of crossing visual angles.
The method of the present invention is based on early studies of gait perception, which showed that the movement of the joints over time is sufficient to enable a human to identify a familiar person. In recent years, a gesture estimation algorithm based on deep learning has higher robustness to self-occlusion, clothes change and carried objects, and compared with gait information extracted from gait images, since the joints of a body do not depend on the appearance shape and cannot change due to the change of clothes and the change of carrying conditions, the application believes that the gesture recognition by using gesture key points is beneficial to alleviating the influence of covariate change on recognition performance.
The invention captures space and time sequence information in gait based on a human body posture joint point method, combines the concept of a topological graph with posture key points, researches an effective method capable of simulating dynamic bones to solve gait recognition tasks, focuses on modeling gait dynamics and eliminating the influence of pedestrian appearance on recognition, solves the problems that the profile of a person in the existing gait tasks is easily influenced by covariant like clothes, carrying and the like to change shape, directly leads to the reduction of recognition rate and lacks of full utilization of effective information of human body joint points in a traditional algorithm based on the posture key points.
The technical scheme adopted by the invention is a gait recognition method combining a graph rolling network and human body posture key points, which comprises the following implementation steps:
step 1, acquiring a video sequence in an original gait data set, and estimating the gesture of a pedestrian object of each frame in the sequence;
step 2, performing a series of preprocessing operations on the obtained human body posture joint point data to obtain a human skeleton data set for the recognition task;
step 3, constructing a corresponding human skeleton space-time topological graph on the obtained human skeleton data set;
step 4, constructing a partition graph convolution network, designing a loss function, training by using a training set, and optimizing network parameters by using a random gradient descent algorithm;
and 5, identifying unknown samples in the verification set by using the trained model to obtain estimated identity information, and predicting the accuracy.
Preferably, step 1 comprises the following specific steps:
step 11: the pedestrian gait process data is acquired using a video acquisition device as a simultaneous dataset or using a common gait dataset comprising video in the original dataset, such as a CASIA-B dataset, a USF dataset, etc.
Step 12: and carrying out gesture estimation on each frame in the video sequence of the data set by using a mature gesture estimation system to obtain a gesture key point coordinate set of each frame.
Preferably, step 2 comprises the following specific steps:
step 21: normalizing the joint point coordinates in each frame obtained in the step 1 based on the center positions of the two joints, namely the neck and the hip;
step 22: randomly dividing the data set into a training set and a verification set;
step 23: the samples are normalized and serialized to be compatible with the model input format. The purpose of normalization is to make the length of all samples uniform by repeating frames sequentially to completely fill the established fixed number of frames. Serialization involves preloading standardized samples in a subset to convert them into a physical Python file. For each subset, the present application generates two physical files, a sample and a tag.
Preferably, the step 3 specifically comprises: connecting joint points in a frame on a spatial domain of a single frame according to a natural skeleton structure of a human body, dividing the spatial map in the single frame into six parts including a head, a trunk, left and right arms and left and right legs based on a method of dividing the contour map into fixed limbs and bodies, and forming six subgraphs with shared vertexes and shared edges; connecting the same key points of adjacent frames to form a time sequence edge of a time-space diagram; and repeating the two steps for all the input frames to obtain all sides and joint points of all the input frames to jointly form a human skeleton space-time diagram.
Preferably, step 4 comprises the following specific steps:
step 41: constructing a partitioned space-time graph convolution network model, firstly performing space convolution on each sub-graph in a space domain, then combining convolution sub-graphs by using a weighting and fusion strategy, and finally performing time convolution on the graph after the aggregation operation of the sub-graphs is realized. Wherein a learning mask is added to form an attention mechanism before the graph convolution of the spatial domain, a learning weight matrix is given to each adjacency matrix to learn the importance of the spatial edge, and different importance is given to all the nodes in the adjacency.
Step 42: the training set is used for training, the random gradient descent method is used for optimizing network parameters, a multi-loss strategy is adopted to combine the classifying loss Softmax loss and the contrast loss Contrasitive loss, the characteristic value distance in the gait is reduced, and the difference between the gaits is increased.
Preferably, the step 5 specifically comprises: the adopted evaluation index is average accuracy commonly used in gait recognition, a gait recognition task is regarded as a classification problem, a section of gait sequence sample of normal walking of pedestrians is given, firstly, according to an object classification training model, the sample obtains characteristic representation irrelevant to covariates in a low-dimensional space through a characteristic learning network based on a deep neural network. Then dividing the test set into two parts of a validation set and a registration set probe set, and evaluating the similarity between the probe and the probe by using the characteristics obtained by training to obtain a model. The accuracy of a covariate at a certain angle is the number of all video sequences under the condition that are predicted to be accurate/the number of all video sequences under the angle of the covariate, and the average accuracy under the covariate condition is the result of averaging the accuracy calculations under all angles.
Fig. 1 is a flowchart of a user identification method combined with user gait information, as shown in fig. 1, including the following steps:
s110, determining a gait data set of a user; carrying out gesture estimation on each video frame in the gait data set, and determining gesture key point coordinates of each video frame; the gait data set is a plurality of video frames containing gait information of a user;
s120, determining a user skeleton data set based on the user gesture key point coordinates; the user skeleton data set comprises coordinates of each node of the user;
s130, connecting coordinates of all nodes of the user based on the user skeleton data set according to the skeleton structure of the user to construct a user skeleton space-time topological graph;
s140, inputting the skeleton space-time topological graph of the user into a space-time graph convolution network model, and identifying identity information of the user by combining gait information of different users stored in the space-time graph convolution network model in advance.
In a specific embodiment, the method mainly comprises three major steps, wherein the first step is to extract gesture information from a video sequence of an original dataset, perform gesture detection on a pedestrian object of each frame in the video by using an open-source two-dimensional gesture estimation system, extract a joint coordinate sequence of the pedestrian object and perform preprocessing on the joint coordinate sequence. And secondly, constructing a time-space diagram from the key point sequence to dynamically model a time-space domain in the skeleton sequence. And finally, expanding the graph convolution to a space-time graph model to extract the spatial characteristics and the time sequence characteristics of the human skeleton. In order to capture high-level semantic information, the skeleton graph is divided into six subgraphs and joints are shared between the subgraphs, and a part-based graph rolling network learning recognition model is used, so that the performance is effectively improved. To extract spatiotemporal features with large inter-gait differences and small intra-gait differences, a multiple-loss strategy is employed to optimize the network.
(A) Construction of human skeleton gait data set
The existing gait data sets all contain data that is a profile of the appearance of the walking object, and in order to make the original data set compatible with the model input, a series of preprocessing operations are required to be performed, which will generate a new data set that contains bone estimates of all the pedestrian objects in the gait data set.
Firstly, acquiring walking videos of individual objects from an original data set, and then carrying out gesture estimation on the walking objects in each video to obtain coordinates of all nodes in each frame for constructing a skeleton topological graph. Each video frame contains a section named "pose coordinates post" in which the coordinates of the X-axis and Y-axis of the human joint are estimated, and a section of "confidence score" contains the confidence of each joint. The file of each object contains all frames of the video sequence and object tags. The input to the model is a feature vector on the node, which consists of a coordinate vector and an estimated confidence.
The second step performs normalization operation. Because the camera is fixed in the shooting process of the data set, the distance between a person and the fixed point of the camera can always change in the walking process. In order to eliminate the effects of such distance variations, all walking objects can be viewed and analyzed in a relatively uniform size pose, with a standardized operation for each joint. It is contemplated that the center of the neck and hip joint is two relatively stable positions during walking of a person. Therefore, based on the two joint positions, normalization equations are defined as follows:
wherein P is i Representing the coordinates of the joint points of the human body, P neck Represents the joint point coordinates of the neck, H represents the distance between the center points of the two joints of the neck and the buttocks of the person, and P' i Representing the results after normalization.
The third step is to divide the data set. A new data set containing basic estimates of all skeletal keypoints will be generated by the steps described above. Finally, the samples are normalized and serialized to make them compatible with the model input format. The purpose of normalization is to make the length of all samples uniform by repeating frames sequentially to completely fill the established fixed number of frames. Serialization involves preloading standardized samples in a subset to convert them into a physical Python file, which contains the representation in its memory, i.e., the format used by the model. For each subset, two physical files, a sample and a tag are generated.
(B) Construction of pedestrian skeleton topological graph
Any gait video sequence of a pedestrian can be expressed as a T group of skeleton sequences with N joints, wherein T represents the number of frames of the video, a space-time diagram is constructed on the skeleton sequences, the diagram is formed by the skeletons of the T frames, and the same joints of each frame are communicated. Thus, an undirected graph g= (V, E) can be obtained, where V represents the set of all the nodes of interest within the skeleton sequence of the input frame and E is a set of edges. In this figure, identical nodes of successive frames are connected to form the temporal edges of the figure, and the nodes within each frame are connected to form the spatial edges of the figure according to the natural human skeletal structure. E may be expressed as:
E s (t)={v ti v tj |(i,j∈N)} (1.2)
E t ={v ti v (t+1)i } (1.3)
wherein v is ti And v tj Is a different joint point in the same frame, N is a group of joint sets connected according to a natural body skeleton structure, v ti And v (t+1)i Is the same node of a different frame.
Since the human body is a hinge structure, which can be regarded as rigid parts connected to each other, the present application also divides the human body skeleton diagram into several parts, each sub-diagram representing a local area of the human body. The graph G is here divided into several skeletal partitions representing corresponding parts of the body. Then, G may be represented as a combination of subgraphs with specific motion trajectory properties:
G=U i∈{1,...,k} S i (1.4)
where k represents the number of partitions, where the value of k is set to 6, i.e. the human skeleton is divided into 6 parts. S is S i =(V i ,E i ) Is a sub-graph of G with shared vertices or shared edges with other sub-graphs. For a specific division of the space-time diagram, the constructed space-time diagram is divided into six parts of the head, torso, left and right arms and left and right legs, here based on the previous method of dividing the contour diagram into fixed limbs and bodies.
(C) Space-time diagram convolution network construction and training stage
(C1) Partial-based space-time diagram convolutional network definition
First defining a neighborhood of vertices in a skeleton map of a pedestrian, where the vertices v are represented using a sampling function N ti And its domain set, as shown in equation 1.5:
N s (v ti )={v qj |d(v tj ,v ti )≤D,v ti ,v tj ∈V s ,|q-t|≤[τ/2]} (1.5)
wherein v is ti And v tj Respectively represent two nodes in the same frame, v ti And v qt The same nodes respectively representing different frames, V s Is the set of nodes in subgraph s, d (v tj ,v ti ) Representing the secondary vertex v ti To v tj Where D takes a value of 1, representing a set of neighbors that are distance 1 in the spatial dimension.
Then, a weight function is defined to assign weights to calculate inner products of the input feature vectors, and labels are assigned to vertices in the neighborhood. In this work, nodes in the 1-neighborhood are divided into two subsets in the spatial domain, namely the node itself and the nodes in the neighborhood, to model the relative position change between the nodes. On the time domain, a partitioning rule similar to the time convolution TCN is directly used. Finally, the Cartesian products of the subset of spatial partitions and the subset of temporal partitions will together constitute a result after utilizing the partitioning rules on the space-time diagram. Mapping can then be performed by equation 1.6:
L st =d(v tj ,v ti )+(q-t+[τ/2]) (1.6)
the nodes in the neighborhood are mapped to corresponding subset labels. In the spatial dimension, the invention sets K labels (L s : v→ {0, 1) to allocate all nodes within each vertex 1 neighborhood, τ (L) in the time dimension t : v→ {0,..tau. -1 }) to assign different weights to vertices of different frames within the neighborhood. Thus, according to v ti A time-space domain is generated.
The graph rolling network can be performed on a defined partitioned space-time graph using the sampling functions and weighting functions defined above. First, a spatial convolution is performed on each sub-graph in the spatial domain, as shown in equation 1.7:
the convolutions subgraphs are then combined using a weighted sum fusion strategy as shown in equation 1.8:
where n is the number of partitions, the aggregate function will have shared vertices or be fused between two sub-graphs by two parts of the feature information connecting the edges. Then, a temporal convolution is performed on the graph after the aggregate operation of the subgraph is implemented, as shown in equation 1.9:
that is, a convolution operation is independently performed on each partition of each frame in the spatial domain, and is aggregated in each single frame, and then a temporal convolution is performed on the aggregated map, i.e., the convolution over the temporal domain, as shown in fig. 2, where (a) in fig. 2 represents a spatial map divided over the single frame, (b) in fig. 2 represents a spatial map after the aggregation is performed over the single frame, and (c) in fig. 2 represents a human skeleton space-time map.
(C2) Input to a graph rolling network
The input of the whole network is the result of preprocessing the output of the two-dimensional pose estimation system, and for a batch input gait video, the input can be expressed as a five-dimensional tensor matrix (N, C, T, V, M), wherein N represents the number of videos contained in a batch, C is the channel number, and is set to 3, and represents three characteristics of x, y and confidence contained in a joint, T represents the number of video frames, and the frame is sequentially repeated to completely fill the established fixed number of frames to make the lengths of all samples uniform, V represents the number of extracted joints, and the 18 joints of a human body are marked by using the two-dimensional pose estimation system Openphase.
(C3) Structure of space-time diagram convolution network
Fig. 3 is a proposed zoned spatiotemporal convolution network architecture with human gait spatiotemporal patterns as input, outputting high-dimensional differentiation features for characterizing identity information. The network layer takes a space-time characteristic network as a backbone and is composed of the following modules: firstly, normalizing input space-time diagram features by utilizing a BN layer; then, inputting the normalized features into a partitioned space-time diagram convolution network composed of SGCN and TCN, and extracting high-dimensional features on a space dimension (joint) and a time sequence dimension (key frame) by combining joint point interaction weights represented by an attention mechanism; then, through stacking the cascaded six-layer backbone network, the nonlinear mapping from the input space to the characteristic space is completed; and finally, outputting high-dimensional differentiation characteristics for classification by using the pooling layer and the full-connection layer.
The partitioned spatio-temporal feature network of the present invention includes an SGCN module for the spatial domain and a TCN module for the time domain, as shown in fig. 3. In the space domain convolution network (SGCN), the input feature X is convolved into a convolution layer of 1 multiplied by 1, and then, the high-dimensional differentiation space feature F under the interaction of the joint points in the first-order neighborhood in the feature is extracted by combining a attention mechanism, and the F is input into the time domain convolution network (TCN). In TCN, F is normalized by BN layer to obtain time domain feature distribution, and activated by linear rectification function (ReLU), and then used as input of convolution layer with convolution kernel scale of 9×1, to finally realize effective expression of human joint feature in 9 continuous time domains. In order to ensure that the characteristic distribution of the space-time characteristic network before and after input and output is consistent, after the TCN finishes the characteristic expression in the time domain, the characteristic input BN layer needs to be further normalized in batches, and meanwhile, the Dropout layer is combined to inactivate neurons with the probability of 0.5, so that the problem of overfitting caused by excessive network parameters and insufficient training data is avoided.
(C4) Multi-mechanism combination
In order to obtain gait expression with more discrimination capability, the attention mechanism is used for enhancing the saliency and the distinguishing property of the extracted space-time characteristics, and simultaneously, joint classification loss and contrast loss are proposed, the distance between characteristic values in the gait is reduced, and the difference between the gaits is increased.
(C4-1) attention mechanism Module
In the constructed human skeleton topological graph, the number of nodes in the neighborhood of each vertex can be different, so that the characteristic value of the vertex with more nodes in the neighborhood can be more remarkable. In walking, as not all joints can effectively improve the performance of gait recognition, even some joints can be rendered worse, a priori knowledge has shown that the motion diagnosis track of the legs and arms is important in gait, and the motion diagnosis track is more focused on the subtle body differences of the legs and arms of a subject than other recognition tasks, the thought of Attention introducing mechanism Attention is to allocate different weights to each joint point, focus on the joint points with larger roles, and neglect some joint points with smaller roles to select effective joints with gait characteristics. The attention mechanism is formed here by adding a learnable mask before the graph is rolled. In the invention, each model unit has own weight parameter for training, and each adjacent matrix is endowed with a learnable weight matrix to learn the importance of a space edge and endow all the nodes in the adjacent matrix with different importance. Can be represented by the following formula 1.10:
x′ i =∑ j∈neighbor(i) a learn (i,j)Wx j (1.10)
(C4-2) multiple loss strategy
The invention combines the advantages of classification loss and contrast loss, proposes a multi-loss supervision scheme, improves the network classification performance and improves generalization capability, as shown in a formula 1.11:
L=L s +αL c +λ||W|| 2 (1.11)
wherein: l (L) s Representing Softmax loss; l (L) c Representing contrast loss; alpha represents a balance weight; lambda W 2 Representing a regularization term;
the loss function uses Softmax loss as a supervision signal to provide category center information for the network, and meanwhile, uses contrast loss to constrain the relationship between the categories, so that the characteristics of 'compactness in the category and separation between the categories' are shown. This brings two benefits: on one hand, the Softmax loss solves the problem of difficult convergence caused by unbalanced sample pair of the contrast loss, and simplifies the sampling and training process; on the other hand, the comparison loss optimizes the intra-class and inter-class relations, and the problem that the generalization capability of the network model is limited is solved. And finally, the gait recognition performance is improved.
The invention discloses a gait recognition method combining human body posture key points and a graph rolling network. The method comprises the following steps: firstly, an open-source two-dimensional attitude estimation system is utilized to carry out attitude detection on pedestrian objects of each frame in a video sequence of an original dataset, and attitude information is extracted. And then, performing a series of preprocessing operations on the extracted joint coordinate sequences to generate a human skeleton data set for gait recognition, and preparing for model training of subsequent gait recognition tasks. Finally, a space-time diagram convolution network model is constructed, in order to capture advanced semantic information, a skeleton diagram is divided into six subgraphs, joints are shared between the subgraphs, a part-based diagram convolution network learning recognition model is used, so that the performance is effectively improved, a constructed data set is utilized for training, a multi-loss strategy combining classification loss and comparison loss is adopted as a loss function, network parameters are optimized by using random gradient descent, and the accuracy of the trained model is predicted by using a verification set. The method can fully utilize the effective information of the node, retain the motion state in the time dimension to a greater extent, has higher robustness to the change and carrying state of clothing, and has good generalization capability in the task of crossing visual angles.
Fig. 4 is a schematic diagram of a user identification system combined with user gait information according to the present invention, as shown in fig. 4, including:
a gait data determination unit 410 for determining a gait data set of the user; carrying out gesture estimation on each video frame in the gait data set, and determining gesture key point coordinates of each video frame; the gait data set is a plurality of video frames containing gait information of a user;
a bone data determination unit 420 for determining a user bone data set based on the user gesture key point coordinates; the user skeleton data set comprises coordinates of each node of the user;
a skeleton topology construction unit 430, configured to connect coordinates of each node of the user based on the user skeleton data set according to the skeleton structure of the user, so as to construct a user skeleton space-time topological graph;
the user identity identifying unit 440 is configured to input the user skeleton space-time topological graph into a space-time graph convolutional network model, and identify identity information of a user by combining gait information of different users stored in the space-time graph convolutional network model in advance.
It should be understood that the functions of the units in fig. 4 may be referred to in the foregoing detailed description of the method embodiment, and are not described herein.
It will be readily appreciated by those skilled in the art that the foregoing description is merely a preferred embodiment of the invention and is not intended to limit the invention, but any modifications, equivalents, improvements or alternatives falling within the spirit and principles of the invention are intended to be included within the scope of the invention.
Claims (8)
1. The user identity recognition method combined with the user gait information is characterized by comprising the following steps of:
determining a gait dataset of the user; carrying out gesture estimation on each video frame in the gait data set, and determining gesture key point coordinates of each video frame; the gait data set is a plurality of video frames containing gait information of a user;
constructing a user skeleton data set based on the user gesture key point coordinates; the user skeleton data set comprises coordinates of each node of the user;
connecting coordinates of all nodes of a user as space edges according to a skeleton structure of the user in each frame based on the user skeleton data set, and connecting the same nodes of two adjacent frames as time edges to construct a user skeleton space-time topological graph;
inputting the user skeleton space-time topological graph into a space-time graph convolution network model, and combining gait information of different users stored in the space-time graph convolution network model in advance to identify identity information of the users;
the space-time diagram convolutional network model comprises a plurality of layers of backbone networks, each layer of backbone network comprises a space domain convolutional network SGCN and a time domain convolutional network TCN, and the SGCN and the TCN transmit characteristics in an adjacent serial connection mode; in SGCN, after the input features pass through a convolution layer of a convolution kernel, extracting high-dimensional differentiation space features under the interaction of the joint points in a first-order neighborhood in the input features by combining an attention mechanism, and inputting the high-dimensional differentiation space features into TCN; in TCN, the high-dimensional differentiation space features normalize time domain feature distribution through a batch normalization layer, and are activated by utilizing a linear rectification function, so that the high-dimensional differentiation space features are used as input of a convolution layer of a convolution kernel, effective expression of user joint features in a plurality of continuous time domains is finally realized, and high-dimensional features in space dimension and time sequence dimension are extracted; the nonlinear mapping from the input feature space to the high-dimensional feature space is completed through stacking and cascading multi-layer backbone networks, and the high-dimensional differentiation feature is obtained; outputting the high-dimensional differentiation characteristics by using a pooling layer and a full-connection layer; the high-dimensional differentiation feature is used to identify identity information of the user.
2. The user identification method according to claim 1, wherein the gesture key points correspond to a plurality of nodes of the user; the user skeleton data set is determined based on the user gesture key point coordinates, specifically:
normalizing the gesture key point coordinates of each video frame based on the central positions of the two nodes of the neck and the hip of the user to obtain the coordinates of each node of the normalized user, and taking the coordinates of each node of the normalized user as a user skeleton data set.
3. The method for identifying user identity according to claim 1, wherein the coordinates of each node of the user are connected according to the skeleton structure of the user based on the user skeleton data set to construct a user skeleton space-time topological graph, specifically:
connecting joint points in a video frame on a spatial domain of the single video frame according to a user skeleton structure, dividing the connected joint points of the user into six parts including a head, a trunk, a left arm, a right arm, a left leg and a right leg by using a spatial diagram in the single video frame, and forming six subgraphs with shared vertexes and shared edges; the same node of adjacent video frames are connected to form a time sequence edge of a time-space diagram; and repeating the two steps for all video frames to obtain all time sequence edges and joint points of all video frames to jointly form a user skeleton space-time diagram.
4. The user identification method of claim 1, wherein the attention mechanism is used to enhance the saliency and distinguishability of the extracted spatiotemporal features;
the attention mechanism distributes different weights to each joint point of the user, focuses on the joint point with relatively large effect, ignores the joint point with relatively small effect and selects the effective joint point of the gait feature; the attention mechanism is developed by adding a learnable mask before input to the space-time diagram convolutional network model.
5. The user identification method according to claim 1, wherein the space-time diagram convolution network model combines classification loss and contrast loss, reduces the distance of characteristic values in user gait, and increases the difference between user gaits;
the classification loss function uses Softmax loss as a supervision signal to provide class center information for the space-time diagram convolution network model, and meanwhile, uses contrast loss to constrain the relations between classes.
6. A user identification system incorporating user gait information, comprising:
a gait data determining unit for determining a gait data set of the user; carrying out gesture estimation on each video frame in the gait data set, and determining gesture key point coordinates of each video frame; the gait data set is a plurality of video frames containing gait information of a user;
a bone data determining unit for determining a user bone data set based on the user gesture key point coordinates; the user skeleton data set comprises coordinates of each node of the user;
the skeleton topology construction unit is used for connecting the coordinates of each node of the user based on the user skeleton data set according to the skeleton structure of the user so as to construct a user skeleton space-time topological graph;
the user identity identification unit is used for inputting the user skeleton space-time topological graph into a pre-trained space-time graph convolution network model and identifying the identity information of the user by combining gait information of different users pre-stored in the space-time graph convolution network model; the space-time diagram convolutional network model comprises a plurality of layers of backbone networks, each layer of backbone network comprises a space domain convolutional network SGCN and a time domain convolutional network TCN, and the SGCN and the TCN transmit characteristics in an adjacent serial connection mode; in SGCN, after the input features pass through a convolution layer of a convolution kernel, extracting high-dimensional differentiation space features under the interaction of the joint points in a first-order neighborhood in the input features by combining an attention mechanism, and inputting the high-dimensional differentiation space features into TCN; in TCN, the high-dimensional differentiation space features normalize time domain feature distribution through a batch normalization layer, and are activated by utilizing a linear rectification function, so that the high-dimensional differentiation space features are used as input of a convolution layer of a convolution kernel, effective expression of user joint features in a plurality of continuous time domains is finally realized, and high-dimensional features in space dimension and time sequence dimension are extracted; the nonlinear mapping from the input feature space to the high-dimensional feature space is completed through stacking and cascading multi-layer backbone networks, and the high-dimensional differentiation feature is obtained; outputting the high-dimensional differentiation characteristics by using a pooling layer and a full-connection layer; the high-dimensional differentiation feature is used to identify identity information of the user.
7. The user identification system according to claim 6, wherein the posture key points determined by the gait data determining unit correspond to a plurality of nodes of the user;
the skeleton data determining unit normalizes the gesture key point coordinates of each video frame based on the central positions of the two joint points of the neck and the hip of the user to obtain the coordinates of each joint point of the normalized user, and the coordinates of each joint point of the normalized user are used as a user skeleton data set.
8. The system of claim 6, wherein the skeleton topology construction unit connects joints in the video frame according to the skeleton structure of the user in the spatial domain of the single video frame, and divides the connected joints in the spatial map in the single video frame into six parts including a head, a trunk, a left arm, a right arm, a left leg and a right leg, to form six subgraphs with shared vertices and shared edges; the same node of adjacent video frames are connected to form a time sequence edge of a time-space diagram; and repeating the two steps for all video frames to obtain all time sequence edges and joint points of all video frames to jointly form a user skeleton space-time diagram.
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202010943184.0A CN112101176B (en) | 2020-09-09 | 2020-09-09 | User identity recognition method and system combining user gait information |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202010943184.0A CN112101176B (en) | 2020-09-09 | 2020-09-09 | User identity recognition method and system combining user gait information |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| CN112101176A CN112101176A (en) | 2020-12-18 |
| CN112101176B true CN112101176B (en) | 2024-04-05 |
Family
ID=73751238
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN202010943184.0A Active CN112101176B (en) | 2020-09-09 | 2020-09-09 | User identity recognition method and system combining user gait information |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN112101176B (en) |
Families Citing this family (26)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN112800836A (en) * | 2020-12-25 | 2021-05-14 | 富盛科技股份有限公司 | Pedestrian re-identification method, system, server and storage medium |
| CN113205060A (en) * | 2020-12-28 | 2021-08-03 | 武汉纺织大学 | Human body action detection method adopting circulatory neural network to judge according to bone morphology |
| CN112967427B (en) * | 2021-02-08 | 2022-12-27 | 深圳市机器时代科技有限公司 | Method and system for unlocking by using wearable device |
| CN112906599B (en) * | 2021-03-04 | 2024-07-09 | 杭州海康威视数字技术股份有限公司 | Gait-based personnel identity recognition method and device and electronic equipment |
| CN112926522B (en) * | 2021-03-30 | 2023-11-24 | 广东省科学院智能制造研究所 | Behavior recognition method based on skeleton gesture and space-time diagram convolution network |
| CN113128424B (en) * | 2021-04-23 | 2024-05-03 | 浙江理工大学 | Method for identifying action of graph convolution neural network based on attention mechanism |
| CN113361334B (en) * | 2021-05-18 | 2022-07-22 | 山东师范大学 | Method and system for person re-identification based on key point optimization and multi-hop attention graph convolution |
| CN113159007B (en) * | 2021-06-24 | 2021-10-29 | 之江实验室 | Gait emotion recognition method based on adaptive graph convolution |
| CN113537121B (en) * | 2021-07-28 | 2024-06-21 | 浙江大华技术股份有限公司 | Identity recognition method and device, storage medium and electronic equipment |
| CN113963201B (en) * | 2021-10-18 | 2022-06-14 | 郑州大学 | Bone action recognition method, device, electronic device and storage medium |
| CN113887516B (en) * | 2021-10-29 | 2024-05-24 | 北京邮电大学 | Feature extraction system and method for human motion recognition |
| CN114052726A (en) * | 2021-11-25 | 2022-02-18 | 湖南中科助英智能科技研究院有限公司 | A thermal infrared human gait recognition method and device in a dark environment |
| CN114373227B (en) * | 2022-01-05 | 2025-10-24 | 北京爱笔科技有限公司 | Skeleton key point encoding method, device, electronic device and storage medium |
| US12236688B2 (en) | 2022-01-27 | 2025-02-25 | Toyota Research Institute, Inc. | Systems and methods for tracking occluded objects |
| CN114267088B (en) * | 2022-03-02 | 2022-06-07 | 北京中科睿医信息科技有限公司 | Gait information processing method and device and electronic equipment |
| CN114613011A (en) * | 2022-03-17 | 2022-06-10 | 东华大学 | Human 3D Skeletal Behavior Recognition Method Based on Graph Attention Convolutional Neural Network |
| CN114783050A (en) * | 2022-03-21 | 2022-07-22 | 上海交通大学 | Gait identity recognition method and system based on dynamic vision sensor |
| CN114638064B (en) * | 2022-03-23 | 2024-12-13 | 昆明理工大学 | A vision-based method for a quadruped bionic robot to simulate animal gaits |
| CN114782992B (en) * | 2022-04-29 | 2025-05-06 | 常州大学 | A super joint and multimodal network and its application in behavior recognition method |
| CN115131876B (en) * | 2022-07-13 | 2024-10-29 | 中国科学技术大学 | Emotion recognition method and system based on human body movement gait and posture |
| CN115050101B (en) * | 2022-07-18 | 2024-03-22 | 四川大学 | A gait recognition method based on fusion of skeleton and contour features |
| CN115424339A (en) * | 2022-08-10 | 2022-12-02 | 一汽奔腾轿车有限公司 | Method, system, storage medium and electronic device for automatically opening rear tailgate |
| CN115830712B (en) * | 2022-12-06 | 2023-12-01 | 凯通科技股份有限公司 | Gait recognition method, device, equipment and storage medium |
| CN115953830B (en) * | 2022-12-06 | 2025-11-04 | 浪潮云信息技术股份公司 | A behavior recognition method, apparatus, device and medium |
| CN118587738B (en) * | 2024-06-03 | 2025-05-02 | 西藏大学 | Method and system for identifying Thangka image character based on human body posture estimation |
| CN120260135A (en) * | 2025-05-30 | 2025-07-04 | 泉州装备制造研究所 | A method, device, equipment and storage medium for counting repeated motion gestures |
Citations (8)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN110119703A (en) * | 2019-05-07 | 2019-08-13 | 福州大学 | The human motion recognition method of attention mechanism and space-time diagram convolutional neural networks is merged under a kind of security protection scene |
| CN110688898A (en) * | 2019-08-26 | 2020-01-14 | 东华大学 | Cross-view-angle gait recognition method based on space-time double-current convolutional neural network |
| CN110781765A (en) * | 2019-09-30 | 2020-02-11 | 腾讯科技(深圳)有限公司 | A human body gesture recognition method, device, equipment and storage medium |
| KR20200014461A (en) * | 2018-07-31 | 2020-02-11 | 동국대학교 산학협력단 | Apparatus and method for identifying based on gait using convolution neural network |
| WO2020042419A1 (en) * | 2018-08-29 | 2020-03-05 | 汉王科技股份有限公司 | Gait-based identity recognition method and apparatus, and electronic device |
| CN111160085A (en) * | 2019-11-19 | 2020-05-15 | 天津中科智能识别产业技术研究院有限公司 | Human body image key point posture estimation method |
| CN111310668A (en) * | 2020-02-18 | 2020-06-19 | 大连海事大学 | A gait recognition method based on skeleton information |
| CN111382679A (en) * | 2020-02-25 | 2020-07-07 | 上海交通大学 | Method, system and equipment for evaluating severity of gait dyskinesia of Parkinson's disease |
Family Cites Families (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20110054870A1 (en) * | 2009-09-02 | 2011-03-03 | Honda Motor Co., Ltd. | Vision Based Human Activity Recognition and Monitoring System for Guided Virtual Rehabilitation |
| WO2016065534A1 (en) * | 2014-10-28 | 2016-05-06 | 中国科学院自动化研究所 | Deep learning-based gait recognition method |
| US20190188533A1 (en) * | 2017-12-19 | 2019-06-20 | Massachusetts Institute Of Technology | Pose estimation |
-
2020
- 2020-09-09 CN CN202010943184.0A patent/CN112101176B/en active Active
Patent Citations (8)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| KR20200014461A (en) * | 2018-07-31 | 2020-02-11 | 동국대학교 산학협력단 | Apparatus and method for identifying based on gait using convolution neural network |
| WO2020042419A1 (en) * | 2018-08-29 | 2020-03-05 | 汉王科技股份有限公司 | Gait-based identity recognition method and apparatus, and electronic device |
| CN110119703A (en) * | 2019-05-07 | 2019-08-13 | 福州大学 | The human motion recognition method of attention mechanism and space-time diagram convolutional neural networks is merged under a kind of security protection scene |
| CN110688898A (en) * | 2019-08-26 | 2020-01-14 | 东华大学 | Cross-view-angle gait recognition method based on space-time double-current convolutional neural network |
| CN110781765A (en) * | 2019-09-30 | 2020-02-11 | 腾讯科技(深圳)有限公司 | A human body gesture recognition method, device, equipment and storage medium |
| CN111160085A (en) * | 2019-11-19 | 2020-05-15 | 天津中科智能识别产业技术研究院有限公司 | Human body image key point posture estimation method |
| CN111310668A (en) * | 2020-02-18 | 2020-06-19 | 大连海事大学 | A gait recognition method based on skeleton information |
| CN111382679A (en) * | 2020-02-25 | 2020-07-07 | 上海交通大学 | Method, system and equipment for evaluating severity of gait dyskinesia of Parkinson's disease |
Non-Patent Citations (5)
| Title |
|---|
| "GaitPart: Temporal Part-Based Model for Gait Recognition";C. Fan等;《2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)》;20200805;第14213-14221页 * |
| "Graph Edge Convolutional Neural Networks for Skeleton-Based Action Recognition";X. Zhang等;《IEEE Transactions on Neural Networks and Learning Systems》;20190917;第31卷(第8期);第3047-3060页 * |
| " 基于多规则学习的康复姿势及动作识别算法研究";胡博;《中国优秀硕士学位论文全文数据库 医药卫生科技辑》;20200315(第3期);E060-479 * |
| "一种集成卷积神经网络和深信网的步态识别与模拟方法";何正义等;《山东大学学报:工学版》;20180813;第48卷(第3期);第88-95页 * |
| "基于频域注意力时空卷积网络的步态识别方法";赵国顺等;《信息技术与网络安全》;20200618;第39卷(第6期);第1-6页 * |
Also Published As
| Publication number | Publication date |
|---|---|
| CN112101176A (en) | 2020-12-18 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN112101176B (en) | User identity recognition method and system combining user gait information | |
| CN111274916B (en) | Face recognition method and face recognition device | |
| CN106682598B (en) | Multi-pose face feature point detection method based on cascade regression | |
| CN111160294B (en) | Gait recognition method based on graph convolutional network | |
| CN113221663A (en) | Real-time sign language intelligent identification method, device and system | |
| CN111310668A (en) | A gait recognition method based on skeleton information | |
| Mici et al. | A self-organizing neural network architecture for learning human-object interactions | |
| Hasan et al. | Multi-level feature fusion for robust pose-based gait recognition using RNN | |
| Aitkenhead et al. | A neural network face recognition system | |
| CN109447175A (en) | In conjunction with the pedestrian of deep learning and metric learning recognition methods again | |
| CN114519899B (en) | An identity recognition method and system based on adaptive fusion of multiple biometric features | |
| Kovač et al. | Frame–based classification for cross-speed gait recognition | |
| Chan et al. | A 3-D-point-cloud system for human-pose estimation | |
| Wen et al. | Multi-view gait recognition based on generative adversarial network | |
| CN114782992B (en) | A super joint and multimodal network and its application in behavior recognition method | |
| Yang et al. | Self-supervised video pose representation learning for occlusion-robust action recognition | |
| Li et al. | A multi-modal dataset for gait recognition under occlusion | |
| Du | The computer vision simulation of athlete’s wrong actions recognition model based on artificial intelligence | |
| CN111626152A (en) | Space-time sight direction estimation prototype design based on Few-shot | |
| Liu et al. | A deep learning based framework to detect and recognize humans using contactless palmprints in the wild | |
| Arnaud et al. | Tree-gated deep mixture-of-experts for pose-robust face alignment | |
| Batool et al. | Fundamental recognition of ADL assessments using machine learning engineering | |
| CN112818942A (en) | Pedestrian action recognition method and system in vehicle driving process | |
| Wei | Collar recognition and matching of clothing style drawings based on complex networks | |
| Raskin et al. | 3D Human Body-Part Tracking and Action Classification Using A Hierarchical Body Model. |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| GR01 | Patent grant | ||
| GR01 | Patent grant |