CN116665245A - Multi-view gesture recognition method, device, computer equipment and storage medium - Google Patents
Multi-view gesture recognition method, device, computer equipment and storage medium Download PDFInfo
- Publication number
- CN116665245A CN116665245A CN202310535438.9A CN202310535438A CN116665245A CN 116665245 A CN116665245 A CN 116665245A CN 202310535438 A CN202310535438 A CN 202310535438A CN 116665245 A CN116665245 A CN 116665245A
- Authority
- CN
- China
- Prior art keywords
- hand
- key points
- coordinates
- gesture recognition
- key point
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/107—Static hand or arm
- G06V40/11—Hand-related biometrics; Hand pose recognition
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/10—Image acquisition
- G06V10/12—Details of acquisition arrangements; Constructional details thereof
- G06V10/14—Optical characteristics of the device performing the acquisition or on the illumination arrangements
- G06V10/143—Sensing or illuminating at different wavelengths
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/10—Image acquisition
- G06V10/12—Details of acquisition arrangements; Constructional details thereof
- G06V10/14—Optical characteristics of the device performing the acquisition or on the illumination arrangements
- G06V10/145—Illumination specially adapted for pattern recognition, e.g. using gratings
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/24—Aligning, centring, orientation detection or correction of the image
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/46—Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
- G06V10/462—Salient features, e.g. scale invariant feature transforms [SIFT]
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/80—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
- G06V10/806—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/20—Movements or behaviour, e.g. gesture recognition
- G06V40/28—Recognition of hand or arm movements, e.g. recognition of deaf sign language
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V2201/00—Indexing scheme relating to image or video recognition or understanding
- G06V2201/11—Technique with transformation invariance effect
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Computer Vision & Pattern Recognition (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Human Computer Interaction (AREA)
- Computing Systems (AREA)
- Databases & Information Systems (AREA)
- Medical Informatics (AREA)
- Software Systems (AREA)
- Psychiatry (AREA)
- Social Psychology (AREA)
- Image Analysis (AREA)
Abstract
Description
技术领域technical field
本发明涉及手势识别方法技术领域,尤其涉及一种多视角手势识别方法、装置、计算机设备和存储介质。The present invention relates to the technical field of gesture recognition methods, in particular to a multi-view gesture recognition method, device, computer equipment and storage medium.
背景技术Background technique
在人机交互领域中,手势交互作为应用最为广泛且高效的交互方式被广泛应用在机器人控制、工业生产、手语识别、多媒体影音控制等众多领域。手势交互的准确性和高效性依赖于可靠的静、动态手势识别。In the field of human-computer interaction, gesture interaction, as the most widely used and efficient interaction method, is widely used in many fields such as robot control, industrial production, sign language recognition, and multimedia audio-visual control. The accuracy and efficiency of gesture interaction depend on reliable static and dynamic gesture recognition.
目前主流的手势识别依赖于单一视角下的2D图像识别结果,利用RGB摄像头获取视频流,通过检测手部,再对手部进行手势识别。也有一些方式对视频帧进行建模等方式来获取手部3D信息,提高手势识别的准确率。然而,依赖于单一视角下的信息,由于视场角限制、环境空间大小限制等因素难以准确识别,同时对于一些复杂的手势和部分遮挡导致手部不完整的情况也无法解决,最终导致手势识别的准确率降低,以及无法识别更为复杂的手势。The current mainstream gesture recognition relies on 2D image recognition results from a single perspective, using RGB cameras to obtain video streams, and then performing gesture recognition on hands by detecting hands. There are also some ways to model the video frame to obtain 3D information of the hand and improve the accuracy of gesture recognition. However, relying on information from a single viewing angle, it is difficult to accurately identify due to factors such as the limitation of the field of view and the size of the environmental space. At the same time, it cannot be solved for some complex gestures and partial occlusions that lead to incomplete hands, which eventually leads to gesture recognition. reduced accuracy and the inability to recognize more complex gestures.
为此,本申请人经过有益的探索和研究,找到了解决上述问题的方法,下面将要介绍的技术方案便是在这种背景下产生的。For this reason, the applicant has found a solution to the above-mentioned problems through beneficial exploration and research, and the technical solutions to be introduced below are generated under this background.
发明内容Contents of the invention
本发明所要解决的技术问题之一在于:针对现有技术的不足而提供一种提高识别准确率、可识别更为复杂的手势的的多视角手势识别方法。One of the technical problems to be solved by the present invention is to provide a multi-view gesture recognition method that improves the recognition accuracy and can recognize more complex gestures in view of the deficiencies in the prior art.
本发明所要解决的技术问题之二在于:提供一种实现上述多视角手势识别方法的多视角手势识别装置。The second technical problem to be solved by the present invention is to provide a multi-view gesture recognition device for implementing the above multi-view gesture recognition method.
本发明所要解决的技术问题之三在于:提供一种实现上述多视角手势识别方法的计算机设备。The third technical problem to be solved by the present invention is to provide a computer device for realizing the above multi-view gesture recognition method.
本发明所要解决的技术问题之四在于:提供一种实现上述多视角手势识别方法的计算机可读存储介质。The fourth technical problem to be solved by the present invention is to provide a computer-readable storage medium for realizing the above multi-view gesture recognition method.
作为本发明第一方面的一种多视角手势识别方法,包括:A multi-view gesture recognition method as the first aspect of the present invention includes:
在手势交互空间内安装至少两台用于采集手部3D信息的TOF传感器,并对每一台TOF传感器进行标定处理,以获取每一台TOF传感器的内参矩阵和镜头畸变参数以及多台TOF传感器之间的相对位置变换矩阵;Install at least two TOF sensors for collecting hand 3D information in the gesture interaction space, and perform calibration processing on each TOF sensor to obtain the internal reference matrix and lens distortion parameters of each TOF sensor and multiple TOF sensors The relative position transformation matrix between;
工作时,通过每一台TOF传感器实时采集手势交互空间内每一视角下的红外IR图像和深度图像;When working, each TOF sensor collects infrared IR images and depth images at each angle of view in the gesture interaction space in real time;
对采集到的红外IR图像进行检测处理,以获取每一视角下的手部关键点坐标;Detect and process the collected infrared IR images to obtain the coordinates of key points of the hand at each viewing angle;
根据标定得到的内参矩阵和镜头畸变参数并结合采集到的深度图像对每一视角下的手部关键点坐标进行矫正计算处理,得到所有视角下的手部关键点3D坐标集合;According to the internal reference matrix and lens distortion parameters obtained from the calibration and combined with the collected depth images, the coordinates of the key points of the hand under each viewing angle are corrected and calculated, and the 3D coordinates of the key points of the hand under all viewing angles are obtained;
根据标定得到的多台TOF传感器之间的相对位置变换矩阵对所有视角下的手部关键点3D坐标集合进行校准映射处理,得到以手腕关键点为坐标原点的三维坐标系下的手部关键点3D坐标集合;以及According to the relative position transformation matrix between multiple TOF sensors obtained through calibration, the 3D coordinate set of key points of the hand under all viewing angles is calibrated and mapped, and the key points of the hand in the three-dimensional coordinate system with the key point of the wrist as the coordinate origin are obtained. A collection of 3D coordinates; and
将以手腕关键点为坐标原点的三维坐标系下的手部关键点3D坐标集合进行计算处理,得到手部特征图,并对得到的手部特征图进行手势识别处理,得到手势识别结果。Calculate and process the 3D coordinate set of key points of the hand under the three-dimensional coordinate system with the key point of the wrist as the coordinate origin to obtain the hand feature map, and perform gesture recognition processing on the obtained hand feature map to obtain the gesture recognition result.
在本发明的一个优选实施例中,所述对采集到的红外IR图像进行检测处理,以获取每一视角下的手部关键点坐标,包括:In a preferred embodiment of the present invention, the detection and processing of the collected infrared IR images to obtain the key point coordinates of the hand under each viewing angle includes:
通过基于IR图的手部检测模型实时对采集到的红外IR图像进行检测处理;The collected infrared IR image is detected and processed in real time through the hand detection model based on the IR image;
当检测到红外IR图像出现人体手部时,对采集到的红外IR图像中的手部区域进行裁切处理,得到人体手部的局部IR图;以及When it is detected that a human hand appears in the infrared IR image, the hand area in the collected infrared IR image is cut and processed to obtain a partial IR image of the human hand; and
将裁切后的人体手部的局部IR图送入基于IR图的手部关键点检测模型进行检测,得到每一视角下的手部关键点坐标;Send the cropped local IR image of the human hand to the hand key point detection model based on the IR image for detection, and obtain the coordinates of the key points of the hand under each viewing angle;
在本发明的一个优选实施例中,所述根据标定得到的内参矩阵和镜头畸变参数并结合采集到的深度图像对每一视角下的手部关键点坐标进行矫正计算处理,得到所有视角下的手部关键点3D坐标集合,包括:In a preferred embodiment of the present invention, the coordinates of the key points of the hand at each viewing angle are corrected and calculated according to the internal reference matrix and lens distortion parameters obtained through calibration and combined with the collected depth images, to obtain the coordinates of key points of the hand at each viewing angle. A collection of 3D coordinates of hand key points, including:
根据标定得到的内参矩阵和镜头畸变矩阵对每一视角下的手部关键点坐标和采集到的深度图像进行矫正处理;According to the internal reference matrix and lens distortion matrix obtained by calibration, the coordinates of the key points of the hand and the collected depth images in each viewing angle are corrected;
将矫正后的手部关键点坐标重新映射到矫正后的深度图像上,得到每一视角下每一个手部关键点的二维坐标和深度数据;以及Remap the corrected hand keypoint coordinates to the corrected depth image to obtain the two-dimensional coordinates and depth data of each hand keypoint at each viewing angle; and
根据标定得到的内参矩阵对每一视角下每一个手部关键点的二维坐标和深度数据进行计算处理,得到每一视角下的手部关键点3D坐标;以及Calculate and process the two-dimensional coordinates and depth data of each hand key point at each viewing angle according to the internal reference matrix obtained through calibration, and obtain the 3D coordinates of the hand key point at each viewing angle; and
将每一视角下的手部关键点3D坐标进行集合处理,得到所有视角下的手部关键点3D坐标集合。The 3D coordinates of the key points of the hand in each viewing angle are collected and processed to obtain the 3D coordinates of the key points of the hand in all viewing angles.
在本发明的一个优选实施例中,所述根据标定得到的多台TOF传感器之间的相对位置变换矩阵对所有视角下的手部关键点3D坐标集合进行校准映射处理,得到以手腕关键点为坐标原点的三维坐标系下的手部关键点3D坐标集合,包括:In a preferred embodiment of the present invention, the relative position transformation matrix between the plurality of TOF sensors obtained according to the calibration is used to calibrate and map the 3D coordinate sets of key points of the hand under all viewing angles, and obtain the key points of the wrist as The set of 3D coordinates of key points of the hand under the 3D coordinate system of the coordinate origin, including:
根据标定得到的多台TOF传感器之间的相对位置转换矩阵对所有视角下的手部关键点3D坐标集合进行融合校准处理;以及According to the relative position transformation matrix between multiple TOF sensors obtained through calibration, the 3D coordinate set of key points of the hand under all viewing angles is fused and calibrated; and
将融合校准处理后的所有视角下的手部关键点3D坐标集合重新映射至以手腕关键点为坐标原点的三维坐标系内,得到以手腕关键点为坐标原点的三维坐标系下的手部关键点3D坐标集合。Remap the 3D coordinate set of key points of the hand under all viewing angles after the fusion calibration process to the three-dimensional coordinate system with the key point of the wrist as the coordinate origin, and obtain the key points of the hand under the three-dimensional coordinate system with the key point of the wrist as the coordinate origin A collection of point 3D coordinates.
在本发明的一个优选实施例中,所述将以手腕关键点为坐标原点的三维坐标系下的手部关键点3D坐标集合进行计算处理,得到手部特征图,包括:In a preferred embodiment of the present invention, the calculation process is performed on the 3D coordinate set of key points of the hand under the three-dimensional coordinate system with the key point of the wrist as the origin of the coordinates, and the hand feature map is obtained, including:
将以手腕关键点为坐标原点的三维坐标系下的手部关键点3D坐标集合进行归一化处理,得到归一化后的手部关键点3D坐标集合;Normalize the hand key point 3D coordinate set under the three-dimensional coordinate system with the wrist key point as the coordinate origin, and obtain the normalized hand key point 3D coordinate set;
计算归一化后的每一个手部关键点之间的欧式距离,生成多个手部关键点之间归一化距离图;Calculate the normalized Euclidean distance between each hand key point, and generate a normalized distance map between multiple hand key points;
根据每一个关键点选择与相邻的距离手腕关键最远的点连接生成多个手部关键点距离手腕较远的节点连接图;According to each key point, select and connect with the adjacent point farthest from the wrist key to generate a node connection graph with multiple hand key points farther away from the wrist;
根据每一个关键点选择其相邻的距离手腕关键点最近的点连接生成多个手部关键点距离手腕较近的节点连接图;According to each key point, select its adjacent points closest to the wrist key point to connect to generate a node connection graph with multiple hand key points closer to the wrist;
将多个手部关键点之间归一化距离图、多个手部关键点距离手腕较远的节点连接图、多个手部关键点距离手腕较近的节点连接图进行合成处理,得到手部特征图。The normalized distance map between multiple hand key points, the node connection graph with multiple hand key points far away from the wrist, and the node connection graph with multiple hand key points close to the wrist are synthesized to obtain the hand Partial feature map.
在本发明的一个优选实施例中,通过基于手部关键点3D坐标信息融合特征图的手势识别模型对得到的手部特征图进行手势识别,并生成手势识别结果。In a preferred embodiment of the present invention, gesture recognition is performed on the hand feature map obtained by using a gesture recognition model based on the 3D coordinate information fusion feature map of the hand key points, and a gesture recognition result is generated.
作为本发明第二方面的一种实现上述多视角手势识别方法的多视角手势识别装置,包括:As a second aspect of the present invention, a multi-view gesture recognition device implementing the above multi-view gesture recognition method includes:
至少两台安装在手势交互空间内的用于采集手部3D信息的TOF传感器;At least two TOF sensors installed in the gesture interaction space for collecting 3D information of the hand;
标定处理模块,所述标定处理模块用于对每一台TOF传感器进行标定处理,以获取每一台TOF传感器的内参矩阵和镜头畸变参数以及多台TOF传感器之间的相对位置变换矩阵;A calibration processing module, the calibration processing module is used to perform calibration processing on each TOF sensor, so as to obtain the internal reference matrix and lens distortion parameters of each TOF sensor and the relative position transformation matrix between multiple TOF sensors;
图像采集模块,所述图像采集模块用于工作时通过每一台TOF传感器实时采集手势交互空间内每一视角下的红外IR图像和深度图像;An image acquisition module, the image acquisition module is used to collect infrared IR images and depth images under each angle of view in the gesture interaction space in real time through each TOF sensor during work;
手部关键点坐标检测获取模块,所述手部关键点坐标检测获取模块用于对采集到的红外IR图像进行检测处理,以获取每一视角下的手部关键点坐标;A hand key point coordinate detection and acquisition module, the hand key point coordinate detection and acquisition module is used to detect and process the collected infrared IR image, to obtain the hand key point coordinates under each viewing angle;
矫正计算处理模块,所述矫正计算处理模块用于根据标定得到的内参矩阵和镜头畸变参数并结合采集到的深度图像对每一视角下的手部关键点坐标进行矫正计算处理,得到所有视角下的手部关键点3D坐标集合;Correction calculation processing module, the correction calculation processing module is used to correct and calculate the coordinates of key points of the hand at each angle of view according to the internal reference matrix and lens distortion parameters obtained through calibration and combined with the collected depth images, to obtain 3D coordinate collection of hand key points;
校准映射处理模块,所述校准映射处理模块用于根据标定得到的多台TOF传感器之间的相对位置变换矩阵对所有视角下的手部关键点3D坐标集合进行校准映射处理,得到以手腕关键点为坐标原点的三维坐标系下的手部关键点3D坐标集合;以及A calibration mapping processing module, the calibration mapping processing module is used to perform calibration mapping processing on the 3D coordinate sets of key points of the hand under all viewing angles according to the relative position transformation matrix between multiple TOF sensors obtained through calibration, and obtain the key points of the wrist A set of 3D coordinates of key points of the hand under the 3D coordinate system of the coordinate origin; and
手势识别计算处理模块,所述手势识别计算处理模块用于将以手腕关键点为坐标原点的三维坐标系下的手部关键点3D坐标集合进行计算处理,得到手部特征图,并对得到的手部特征图进行手势识别处理,得到手势识别结果。Gesture recognition calculation and processing module, the gesture recognition calculation and processing module is used to calculate and process the 3D coordinate set of key points of the hand under the three-dimensional coordinate system with the key point of the wrist as the origin of the coordinates, to obtain the hand feature map, and to obtain the obtained Gesture recognition processing is performed on the hand feature map to obtain gesture recognition results.
作为本发明第三方面的一种用于实现多视角手势识别方法的计算机设备,包括存储器和处理器,所述存储器存储有计算机程序,所述处理器执行所述计算机程序时实现以下步骤:As a third aspect of the present invention, a computer device for implementing a multi-view gesture recognition method includes a memory and a processor, the memory stores a computer program, and the processor implements the following steps when executing the computer program:
在手势交互空间内安装至少两台用于采集手部3D信息的TOF传感器,并对每一台TOF传感器进行标定处理,以获取每一台TOF传感器的内参矩阵和镜头畸变参数以及多台TOF传感器之间的相对位置变换矩阵;Install at least two TOF sensors for collecting hand 3D information in the gesture interaction space, and perform calibration processing on each TOF sensor to obtain the internal reference matrix and lens distortion parameters of each TOF sensor and multiple TOF sensors The relative position transformation matrix between;
工作时,通过每一台TOF传感器实时采集手势交互空间内每一视角下的红外IR图像和深度图像;When working, each TOF sensor collects infrared IR images and depth images at each angle of view in the gesture interaction space in real time;
对采集到的红外IR图像进行检测处理,以获取每一视角下的手部关键点坐标;Detect and process the collected infrared IR images to obtain the coordinates of key points of the hand at each viewing angle;
根据标定得到的内参矩阵和镜头畸变参数并结合采集到的深度图像对每一视角下的手部关键点坐标进行矫正计算处理,得到所有视角下的手部关键点3D坐标集合;According to the internal reference matrix and lens distortion parameters obtained from the calibration and combined with the collected depth images, the coordinates of the key points of the hand under each viewing angle are corrected and calculated, and the 3D coordinates of the key points of the hand under all viewing angles are obtained;
根据标定得到的多台TOF传感器之间的相对位置变换矩阵对所有视角下的手部关键点3D坐标集合进行校准映射处理,得到以手腕关键点为坐标原点的三维坐标系下的手部关键点3D坐标集合;以及According to the relative position transformation matrix between multiple TOF sensors obtained through calibration, the 3D coordinate set of key points of the hand under all viewing angles is calibrated and mapped, and the key points of the hand in the three-dimensional coordinate system with the key point of the wrist as the coordinate origin are obtained. A collection of 3D coordinates; and
将以手腕关键点为坐标原点的三维坐标系下的手部关键点3D坐标集合进行计算处理,得到手部特征图,并对得到的手部特征图进行手势识别处理,得到手势识别结果。Calculate and process the 3D coordinate set of key points of the hand under the three-dimensional coordinate system with the key point of the wrist as the coordinate origin to obtain the hand feature map, and perform gesture recognition processing on the obtained hand feature map to obtain the gesture recognition result.
作为本发明第四方面的一种用于实现上述多视角手势识别方法的计算机可读存储介质,其上存储有计算机程序,所述计算机程序被处理器执行时实现以下步骤:As a fourth aspect of the present invention, a computer-readable storage medium for implementing the above multi-view gesture recognition method has a computer program stored thereon, and the computer program implements the following steps when executed by a processor:
在手势交互空间内安装至少两台用于采集手部3D信息的TOF传感器,并对每一台TOF传感器进行标定处理,以获取每一台TOF传感器的内参矩阵和镜头畸变参数以及多台TOF传感器之间的相对位置变换矩阵;Install at least two TOF sensors for collecting hand 3D information in the gesture interaction space, and perform calibration processing on each TOF sensor to obtain the internal reference matrix and lens distortion parameters of each TOF sensor and multiple TOF sensors The relative position transformation matrix between;
工作时,通过每一台TOF传感器实时采集手势交互空间内每一视角下的红外IR图像和深度图像;When working, each TOF sensor collects infrared IR images and depth images at each angle of view in the gesture interaction space in real time;
对采集到的红外IR图像进行检测处理,以获取每一视角下的手部关键点坐标;Detect and process the collected infrared IR images to obtain the coordinates of key points of the hand at each viewing angle;
根据标定得到的内参矩阵和镜头畸变参数并结合采集到的深度图像对每一视角下的手部关键点坐标进行矫正计算处理,得到所有视角下的手部关键点3D坐标集合;According to the internal reference matrix and lens distortion parameters obtained from the calibration and combined with the collected depth images, the coordinates of the key points of the hand under each viewing angle are corrected and calculated, and the 3D coordinates of the key points of the hand under all viewing angles are obtained;
根据标定得到的多台TOF传感器之间的相对位置变换矩阵对所有视角下的手部关键点3D坐标集合进行校准映射处理,得到以手腕关键点为坐标原点的三维坐标系下的手部关键点3D坐标集合;以及According to the relative position transformation matrix between multiple TOF sensors obtained through calibration, the 3D coordinate set of key points of the hand under all viewing angles is calibrated and mapped, and the key points of the hand in the three-dimensional coordinate system with the key point of the wrist as the coordinate origin are obtained. A collection of 3D coordinates; and
将以手腕关键点为坐标原点的三维坐标系下的手部关键点3D坐标集合进行计算处理,得到手部特征图,并对得到的手部特征图进行手势识别处理,得到手势识别结果。Calculate and process the 3D coordinate set of key points of the hand under the three-dimensional coordinate system with the key point of the wrist as the coordinate origin to obtain the hand feature map, and perform gesture recognition processing on the obtained hand feature map to obtain the gesture recognition result.
由于采用了如上技术方案,本发明的有益效果在于:Owing to adopting above technical scheme, the beneficial effect of the present invention is:
1.本发明以手腕关键点为3D坐标原点对手部关键点坐标的重投影,可最大程度地还原出手部的真实状态,并通过提取手部的特征图,使得提取到的手势特征具有旋转、平移不变性,从而能够识别各姿态下的手势;1. The present invention takes the key point of the wrist as the origin of 3D coordinates to reproject the coordinates of the key points of the hand, which can restore the real state of the hand to the greatest extent, and by extracting the feature map of the hand, the extracted gesture features have rotation, Translation invariance, so that gestures under various poses can be recognized;
2.相比较传统的采用单、双目RGB传感器,本发明通过一帧数据即可拿到深度图和IR图,计算量小、实时性高;同时,本发明采用主动光源,受环境光的干扰较小可在复杂的环境光条件下使用,并且对硬件要求较低,稳定可靠;2. Compared with the traditional single- and binocular RGB sensors, the present invention can obtain the depth map and IR map through one frame of data, with a small amount of calculation and high real-time performance; Less interference can be used under complex ambient light conditions, and has low hardware requirements, stable and reliable;
3.相比现在普遍采取的单一视角方案,本发明采用多视角方案,可获取更为全面的人手的状态信息,并充分利用TOF的技术特点还原人手部各手指在空间中的几何关系,能够更准确的识别手势以及面对遮挡等场景下,具有更好的鲁棒性。3. Compared with the single-view solution commonly adopted at present, the present invention adopts a multi-view solution, which can obtain more comprehensive state information of the human hand, and fully utilizes the technical characteristics of TOF to restore the geometric relationship of each finger of the human hand in space, which can More accurate recognition of gestures and better robustness in the face of occlusion and other scenarios.
附图说明Description of drawings
为了更清楚地说明本发明实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to more clearly illustrate the technical solutions in the embodiments of the present invention or the prior art, the following will briefly introduce the drawings that need to be used in the description of the embodiments or the prior art. Obviously, the accompanying drawings in the following description are only These are some embodiments of the present invention. Those skilled in the art can also obtain other drawings based on these drawings without creative work.
图1是本发明的多视角手势识别方法的流程图。FIG. 1 is a flow chart of the multi-view gesture recognition method of the present invention.
图2是本发明的多视角手势识别装置的结构示意图。FIG. 2 is a schematic structural diagram of the multi-view gesture recognition device of the present invention.
图3是本发明的计算机设备的内部结构图。Fig. 3 is an internal structure diagram of the computer device of the present invention.
具体实施方式Detailed ways
为了使本发明实现的技术手段、创作特征、达成目的与功效易于明白了解,下面结合具体图示,进一步阐述本发明。In order to make the technical means, creative features, goals and effects achieved by the present invention easy to understand, the present invention will be further described below in conjunction with specific illustrations.
参见图1,图中给出的是本发明的多视角手势识别方法,包括以下步骤:Referring to Fig. 1, provided among the figure is the multi-view gesture recognition method of the present invention, comprises the following steps:
步骤S10,在手势交互空间内安装两台用于采集手部3D信息的TOF传感器,并对每一台TOF传感器进行标定处理,以获取每一台TOF传感器的内参矩阵I_M和镜头畸变参数D_M以及多台TOF传感器之间的相对位置变换矩阵T_M。在本实施例中,TOF传感器的数量并不局限于本实施例中的数量,其应根据手势交互空间及手势识别精度而设置,安装时在保证部分视野重合的情况下,应尽可能地做到全面覆盖手势交互空间。Step S10, install two TOF sensors for collecting 3D information of the hand in the gesture interaction space, and perform calibration processing on each TOF sensor to obtain the internal reference matrix I_M and lens distortion parameter D_M of each TOF sensor and The relative position transformation matrix T_M between multiple TOF sensors. In this embodiment, the number of TOF sensors is not limited to the number in this embodiment. It should be set according to the gesture interaction space and gesture recognition accuracy. To fully cover the gesture interaction space.
步骤S20,工作时,通过每一台TOF传感器实时采集手势交互空间内每一视角下的红外IR图像IR_M和深度图像DEP_M。Step S20, when working, each TOF sensor collects the infrared IR image IR_M and the depth image DEP_M at each viewing angle in the gesture interaction space in real time.
步骤S30,对采集到的红外IR图像IR_M进行检测处理,以获取每一视角下的手部关键点坐标。Step S30 , performing detection processing on the collected infrared IR image IR_M to obtain the key point coordinates of the hand under each viewing angle.
步骤S40,根据标定得到的内参矩阵I_M和镜头畸变参数D_M并结合采集到的深度图像DEP_M对每一视角下的手部关键点坐标进行矫正计算处理,得到所有视角下的手部关键点3D坐标集合。Step S40, according to the internal reference matrix I_M obtained through calibration and the lens distortion parameter D_M and combined with the acquired depth image DEP_M, the coordinates of the key points of the hand under each viewing angle are corrected and calculated to obtain the 3D coordinates of the key points of the hand under all viewing angles gather.
步骤S50,根据标定得到的多台TOF传感器之间的相对位置变换矩阵T_M对所有视角下的手部关键点3D坐标集合进行校准映射处理,得到以手腕关键点为坐标原点的三维坐标系下的手部关键点3D坐标集合。Step S50, according to the relative position transformation matrix T_M between multiple TOF sensors obtained through calibration, the 3D coordinate set of key points of the hand under all viewing angles is calibrated and mapped, and the 3D coordinate system with the key point of the wrist as the coordinate origin is obtained. A collection of 3D coordinates of hand key points.
步骤S60,将以手腕关键点为坐标原点的三维坐标系下的手部关键点3D坐标集合进行计算处理,得到手部特征图,并对得到的手部特征图进行手势识别处理,得到手势识别结果。Step S60, calculate and process the 3D coordinate set of key points of the hand under the three-dimensional coordinate system with the key point of the wrist as the coordinate origin, obtain the hand feature map, and perform gesture recognition processing on the obtained hand feature map to obtain gesture recognition result.
在步骤S30中,对采集到的红外IR图像IR_M进行检测处理,以获取每一视角下的手部关键点坐标,包括以下步骤:In step S30, the collected infrared IR image IR_M is detected and processed to obtain the key point coordinates of the hand under each viewing angle, including the following steps:
步骤S31,通过基于IR图的手部检测模型HD_MODEL实时对采集到的红外IR图像IR_M进行检测处理;其中,基于IR图的手部检测模型HD_MODEL为本领域常规的检测模型,使用通用检测器模型---yolov5s框架,其先对大量不同视角下IR图中的手部位置经行人工标注,获取到标注信息,再送到模型中进行训练,得到HD_MODEL。In step S31, the collected infrared IR image IR_M is detected and processed in real time through the hand detection model HD_MODEL based on the IR image; wherein, the hand detection model HD_MODEL based on the IR image is a conventional detection model in the field, and a general detector model is used ---yolov5s framework, which first manually labels the hand positions in a large number of IR images under different viewing angles, obtains the label information, and then sends it to the model for training to obtain HD_MODEL.
步骤S32,当检测到红外IR图像IR_M出现人体手部时,对采集到的红外IR图像IR_M中的手部区域进行裁切处理,得到人体手部的局部IR图HD_ROI;Step S32, when it is detected that a human hand appears in the infrared IR image IR_M, the hand area in the collected infrared IR image IR_M is cut to obtain a partial IR image HD_ROI of the human hand;
步骤S33,将裁切后的人体手部的局部IR图HD_ROI送入基于IR图的手部关键点检测模型KPD_MODEL进行检测,得到每一视角下的手部关键点坐标。其中,基于IR图的手部关键点检测模型KPD_MODEL是为本领域常规的检测模型,使用通用检测器模型---yolov5s框架,其将通过HD_MODEL检测出的人头局部ROI区域裁切后经行处理,并标注手部21个关键点,获取到关键点的标注信息,然后送到模型中进行训练,得到手部关键点检测模型KPD_MODEL。In step S33, the cropped partial IR image HD_ROI of the human hand is sent to the hand key point detection model KPD_MODEL based on the IR image for detection, and the coordinates of the key points of the hand at each viewing angle are obtained. Among them, the hand key point detection model KPD_MODEL based on the IR image is a conventional detection model in this field, using the general detector model --- yolov5s framework, which cuts the local ROI area of the human head detected by HD_MODEL and then processes it , and mark 21 key points of the hand, obtain the key point labeling information, and then send it to the model for training, and get the hand key point detection model KPD_MODEL.
在步骤S40中,根据标定得到的内参矩阵I_M和镜头畸变参数D_M并结合采集到的深度图像DEP_M对每一视角下的手部关键点坐标进行矫正计算处理,得到所有视角下的手部关键点3D坐标集合,包括以下步骤:In step S40, according to the internal reference matrix I_M obtained through calibration and the lens distortion parameter D_M combined with the acquired depth image DEP_M, the coordinates of the key points of the hand under each viewing angle are corrected and calculated, and the key points of the hand under all viewing angles are obtained. A collection of 3D coordinates, including the following steps:
步骤S41,根据标定得到的内参矩阵I_M和镜头畸变参数D_M对每一视角下的手部关键点坐标和采集到的深度图像DEP_M进行矫正处理;Step S41, according to the internal reference matrix I_M obtained by calibration and the lens distortion parameter D_M, the coordinates of the key points of the hand under each viewing angle and the collected depth image DEP_M are corrected;
步骤S42,将矫正后的手部关键点坐标重新映射到矫正后的深度图像DEP_M上,得到每一视角下每一个手部关键点的二维坐标和深度数据;Step S42, remap the corrected hand key point coordinates to the corrected depth image DEP_M, and obtain the two-dimensional coordinates and depth data of each hand key point at each viewing angle;
步骤S43,根据标定得到的内参矩阵I_M对每一视角下每一个手部关键点的二维坐标和深度数据进行计算处理,得到每一视角下的手部关键点3D坐标;Step S43, calculate and process the two-dimensional coordinates and depth data of each key point of the hand under each viewing angle according to the internal reference matrix I_M obtained through calibration, and obtain the 3D coordinates of the key points of the hand under each viewing angle;
步骤S44,将每一视角下的手部关键点3D坐标进行集合处理,得到所有视角下的手部关键点3D坐标集合。In step S44, the 3D coordinates of the key points of the hand under each viewing angle are aggregated to obtain a set of 3D coordinates of the key points of the hand under all viewing angles.
在步骤S50中,根据标定得到的多台TOF传感器之间的相对位置变换矩阵T_M对所有视角下的手部关键点3D坐标集合进行校准映射处理,得到以手腕关键点为坐标原点的三维坐标系下的手部关键点3D坐标集合,包括以下步骤:In step S50, according to the relative position transformation matrix T_M between multiple TOF sensors obtained through calibration, the 3D coordinate set of key points of the hand under all viewing angles is calibrated and mapped to obtain a three-dimensional coordinate system with the key point of the wrist as the coordinate origin The set of 3D coordinates of the key points of the hand includes the following steps:
步骤S51,根据标定得到的多台TOF传感器之间的相对位置变换矩阵T_M对所有视角下的手部关键点3D坐标集合进行融合校准处理;Step S51, according to the relative position transformation matrix T_M between multiple TOF sensors obtained through calibration, perform fusion calibration processing on the 3D coordinate sets of key points of the hand under all viewing angles;
步骤S52,将融合校准处理后的所有视角下的手部关键点3D坐标集合重新映射至以手腕关键点为坐标原点的三维坐标系内,得到以手腕关键点为坐标原点的三维坐标系下的手部关键点3D坐标集合。Step S52, remap the 3D coordinate sets of key points of the hand under all viewing angles after fusion calibration to the three-dimensional coordinate system with the key point of the wrist as the coordinate origin, and obtain the 3D coordinate system with the key point of the wrist as the coordinate origin. A collection of 3D coordinates of hand key points.
在步骤S60中,将以手腕关键点为坐标原点的三维坐标系下的手部关键点3D坐标集合进行计算处理,得到手部特征图,包括以下步骤:In step S60, the 3D coordinate set of key points of the hand under the three-dimensional coordinate system with the key point of the wrist as the coordinate origin is calculated and processed to obtain the hand feature map, including the following steps:
步骤S61,将以手腕关键点为坐标原点的三维坐标系下的手部关键点3D坐标集合进行归一化处理,得到归一化后的手部关键点3D坐标集合;Step S61, normalize the 3D coordinate set of hand key points under the three-dimensional coordinate system with the wrist key point as the coordinate origin, and obtain the normalized 3D coordinate set of hand key points;
步骤S62,计算归一化后的每一个手部关键点之间的欧式距离,生成多个手部关键点之间归一化距离图DISTANCE_MAP;具体地,通过步骤S61中得到的以手腕关键点为坐标原点的坐标系下21个关键点坐标,计算各个关键点之间的坐标,以最远距离为最大值进行归一化得到大小为21*21的以距离为特征值的特征图DISTANCE_MAP;Step S62, calculate the normalized Euclidean distance between each hand key point, and generate a normalized distance map DISTANCE_MAP between multiple hand key points; specifically, the wrist key point obtained in step S61 The coordinates of 21 key points under the coordinate system of the coordinate origin, calculate the coordinates between each key point, and normalize with the farthest distance as the maximum value to obtain a feature map DISTANCE_MAP with a size of 21*21 and taking the distance as the feature value;
步骤S63,根据每一个关键点选择与相邻的距离手腕关键最远的点连接生成多个手部关键点距离手腕较远的节点连接图F_DISTANCE_MAP;具体地,通过步骤S61得到的以手腕关键点为坐标原点的坐标系下21个关键点坐标,生成21*21的距离矩阵,并以各个关键点为中心,选取其相邻关键节点中距离手腕距离最远的点,并将距离矩阵的值赋值为1,其他赋值为0,生成节点连接图F_DISTANCE_MAP;Step S63, according to each key point, select and connect with the adjacent point farthest from the wrist key to generate a node connection graph F_DISTANCE_MAP with multiple hand key points farther away from the wrist; specifically, the wrist key points obtained through step S61 For the coordinates of 21 key points in the coordinate system of the coordinate origin, generate a distance matrix of 21*21, and take each key point as the center, select the point farthest from the wrist among its adjacent key nodes, and set the value of the distance matrix The assignment is 1, and the other assignments are 0, generating a node connection map F_DISTANCE_MAP;
步骤S64,根据每一个关键点选择其相邻的距离手腕关键点最近的点连接生成多个手部关键点距离手腕较近的节点连接图N_DISTANCE_MAP;具体地,通过步骤S61得到的以手腕关键点为坐标原点的坐标系下21个关键点坐标,生成21*21的距离矩阵,并以各个关键点为中心,选取其相邻关键节点中距离手腕距离最近的点,并将距离矩阵的值赋值为1,其他赋值为0,生成节点连接图N_DISTANCE_MAP;Step S64, according to each key point, select its adjacent points closest to the wrist key point to connect to generate a node connection map N_DISTANCE_MAP with multiple hand key points closer to the wrist; specifically, the wrist key points obtained through step S61 The coordinates of 21 key points in the coordinate system of the coordinate origin, generate a 21*21 distance matrix, and take each key point as the center, select the point closest to the wrist among its adjacent key nodes, and assign the value of the distance matrix is 1, other assignments are 0, and the node connection map N_DISTANCE_MAP is generated;
步骤S65,将多个手部关键点之间归一化距离图DISTANCE_MAP、多个手部关键点距离手腕较远的节点连接图F_DISTANCE_MAP、多个手部关键点距离手腕较近的节点连接图N_DISTANCE_MAP进行合成处理,得到手部特征图。Step S65, the normalized distance map DISTANCE_MAP between multiple hand key points, the node connection map F_DISTANCE_MAP with multiple hand key points far from the wrist, and the node connection map N_DISTANCE_MAP with multiple hand key points close to the wrist Synthesis processing is performed to obtain hand feature maps.
本发明以手腕关键点为3D坐标原点对手部关键点坐标的重投影,可最大程度地还原出手部的真实状态,并通过提取手部的特征图,使得提取到的手势特征具有旋转、平移不变性,从而能够识别各姿态下的手势。The present invention takes the key point of the wrist as the origin of 3D coordinates to re-project the coordinates of the key points of the hand, which can restore the real state of the hand to the greatest extent, and by extracting the feature map of the hand, the extracted gesture features have the characteristics of rotation and translation. Denaturation, so as to be able to recognize gestures under each posture.
在步骤S60中,通过基于手部关键点3D坐标信息融合特征图的手势识别模型KPGR_MODEL对得到的手部特征图进行手势识别,并生成手势识别结果。其中,基于手部关键点3D坐标信息融合特征图的手势识别模型KPGR_MODEL为本领域常规的识别模型,其基于ResNet18框架模型来实现的,将融合后的多手势手部特征图打好标签送入网络进行训练,得到手势识别结果。In step S60, gesture recognition is performed on the obtained hand feature map through the gesture recognition model KPGR_MODEL based on the 3D coordinate information of the hand key point fusion feature map, and a gesture recognition result is generated. Among them, the gesture recognition model KPGR_MODEL based on the 3D coordinate information fusion feature map of hand key points is a conventional recognition model in this field. It is realized based on the ResNet18 framework model, and the fused multi-gesture hand feature map is labeled and sent to The network is trained to obtain gesture recognition results.
参见图2,图中给出的是本发明的多视角手势识别装置,包括两台TOF传感器110、标定处理模块120、图像采集模块130、手部关键点坐标检测获取模块140、矫正计算处理模块150、校准映射处理模块160以及手势识别计算处理模块170。Referring to Fig. 2, the multi-view gesture recognition device of the present invention is shown in the figure, including two TOF sensors 110, a calibration processing module 120, an image acquisition module 130, a hand key point coordinate detection and acquisition module 140, and a correction calculation processing module 150 , a calibration mapping processing module 160 and a gesture recognition calculation processing module 170 .
两台TOF传感器110安装在手势交互空间内,其用于采集手部3D信息。在本实施例中,TOF传感器110的数量并不局限于本实施例中的数量,其应根据手势交互空间及手势识别精度而设置,安装时在保证部分视野重合的情况下,应尽可能地做到全面覆盖手势交互空间。Two TOF sensors 110 are installed in the gesture interaction space, which are used to collect 3D information of the hand. In this embodiment, the number of TOF sensors 110 is not limited to the number in this embodiment, it should be set according to the gesture interaction space and gesture recognition accuracy, and it should be installed as much as possible under the condition that part of the field of view overlaps during installation. To fully cover the gesture interaction space.
标定处理模块120用于对每一台TOF传感器110进行标定处理,以获取每一台TOF传感器110的内参矩阵和镜头畸变参数以及多台TOF传感器之间的相对位置变换矩阵。The calibration processing module 120 is used to perform calibration processing on each TOF sensor 110 to obtain the internal reference matrix and lens distortion parameters of each TOF sensor 110 and the relative position transformation matrix between multiple TOF sensors.
图像采集模块130用于工作时通过每一台TOF传感器110实时采集手势交互空间内每一视角下的红外IR图像IR_M和深度图像DEP_M。The image acquisition module 130 is used to collect the infrared IR image IR_M and the depth image DEP_M at each viewing angle in the gesture interaction space in real time through each TOF sensor 110 during operation.
手部关键点坐标检测获取模块140用于对采集到的红外IR图像IR_M进行检测处理,以获取每一视角下的手部关键点坐标。The hand key point coordinate detection and acquisition module 140 is used to detect and process the collected infrared IR image IR_M, so as to obtain the hand key point coordinates under each viewing angle.
矫正计算处理模块150用于根据标定得到的内参矩阵I_M和镜头畸变参数D_M并结合采集到的深度图像DEP_M对每一视角下的手部关键点坐标进行矫正计算处理,得到所有视角下的手部关键点3D坐标集合。The correction calculation processing module 150 is used to correct and calculate the coordinates of the key points of the hand under each viewing angle according to the internal reference matrix I_M obtained through calibration and the lens distortion parameter D_M combined with the acquired depth image DEP_M, and obtain the hand coordinates under all viewing angles. A collection of 3D coordinates of key points.
校准映射处理模块160用于根据标定得到的多台TOF传感器之间的相对位置变换矩阵T_M对所有视角下的手部关键点3D坐标集合进行校准映射处理,得到以手腕关键点为坐标原点的三维坐标系下的手部关键点3D坐标集合。The calibration mapping processing module 160 is used to perform calibration mapping processing on the 3D coordinate sets of key points of the hand under all viewing angles according to the relative position transformation matrix T_M between multiple TOF sensors obtained through calibration, and obtain a three-dimensional coordinate system with the key point of the wrist as the coordinate origin. A collection of 3D coordinates of hand key points in the coordinate system.
手势识别计算处理模块170用于将以手腕关键点为坐标原点的三维坐标系下的手部关键点3D坐标集合进行计算处理,得到手部特征图,并对得到的手部特征图进行手势识别处理,得到手势识别结果。Gesture recognition calculation and processing module 170 is used to calculate and process the 3D coordinate set of key points of the hand under the three-dimensional coordinate system with the key point of the wrist as the coordinate origin, obtain the hand feature map, and perform gesture recognition on the obtained hand feature map Processing to get the gesture recognition result.
本发明的多视角手势识别装置中的各个模块可全部或部分通过软件、硬件及其组合来实现。上述各模块可以硬件形式内嵌于或独立于计算机设备中的处理器中,也可以以软件形式存储于计算机设备中的存储器中,以便于处理器调用执行以上各个模块对应的操作。Each module in the multi-view gesture recognition device of the present invention can be fully or partially realized by software, hardware and a combination thereof. The above-mentioned modules can be embedded in or independent of the processor in the computer device in the form of hardware, and can also be stored in the memory of the computer device in the form of software, so that the processor can invoke and execute the corresponding operations of the above-mentioned modules.
本发明还提供了一种用于实现上述多视角手势识别方法的计算机设备,该计算机设备可以是服务器,其内部结构图可以如图3所示。该计算机设备包括通过系统总线连接的处理器、存储器、网络接口和数据库。其中,该计算机设备的处理器用于提供计算和控制能力。该计算机设备的存储器包括非易失性存储介质、内存储器。该非易失性存储介质存储有操作系统、计算机程序和数据库。该内存储器为非易失性存储介质中的操作系统和计算机程序的运行提供环境。该计算机设备的数据库用于存储用户信息、记录信息和文件等数据。该计算机设备的网络接口用于与外部的终端通过网络连接通信。该计算机程序被处理器执行时以实现一种上述的多视角手势识别方法。The present invention also provides a computer device for implementing the above multi-view gesture recognition method. The computer device may be a server, and its internal structure diagram may be shown in FIG. 3 . The computer device includes a processor, memory, network interface and database connected by a system bus. Wherein, the processor of the computer device is used to provide calculation and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, computer programs and databases. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage medium. The database of the computer equipment is used to store data such as user information, record information and files. The network interface of the computer device is used to communicate with an external terminal via a network connection. When the computer program is executed by the processor, the above-mentioned multi-view gesture recognition method can be realized.
本领域技术人员可以理解,图3中示出的结构,仅仅是与本技术方案相关的部分结构的框图,并不构成对本申请方案所应用于其上的计算机设备的限定,具体的计算机设备可以包括比图中所示更多或更少的部件,或者组合某些部件,或者具有不同的部件布置。Those skilled in the art can understand that the structure shown in FIG. 3 is only a block diagram of a part of the structure related to the technical solution, and does not constitute a limitation to the computer equipment on which the solution of the application is applied. The specific computer equipment can be More or fewer components than shown in the figures may be included, or some components may be combined, or have a different arrangement of components.
具体地,本发明的计算机设备包括存储器和处理器,该存储器存储有计算机程序,处理器执行计算机程序时实现以下步骤:Specifically, the computer device of the present invention includes a memory and a processor, the memory stores a computer program, and the processor implements the following steps when executing the computer program:
步骤S10,在手势交互空间内安装两台用于采集手部3D信息的TOF传感器,并对每一台TOF传感器进行标定处理,以获取每一台TOF传感器的内参矩阵I_M和镜头畸变参数D_M以及多台TOF传感器之间的相对位置变换矩阵T_M;Step S10, install two TOF sensors for collecting 3D information of the hand in the gesture interaction space, and perform calibration processing on each TOF sensor to obtain the internal reference matrix I_M and lens distortion parameter D_M of each TOF sensor and Relative position transformation matrix T_M between multiple TOF sensors;
步骤S20,工作时,通过每一台TOF传感器实时采集手势交互空间内每一视角下的红外IR图像IR_M和深度图像DEP_M;Step S20, when working, each TOF sensor is used to collect the infrared IR image IR_M and the depth image DEP_M at each angle of view in the gesture interaction space in real time;
步骤S30,对采集到的红外IR图像IR_M进行检测处理,以获取每一视角下的手部关键点坐标;Step S30, detecting and processing the collected infrared IR image IR_M, so as to obtain the key point coordinates of the hand under each viewing angle;
步骤S40,根据标定得到的内参矩阵I_M和镜头畸变参数D_M并结合采集到的深度图像DEP_M对每一视角下的手部关键点坐标进行矫正计算处理,得到所有视角下的手部关键点3D坐标集合;Step S40, according to the internal reference matrix I_M obtained through calibration and the lens distortion parameter D_M and combined with the acquired depth image DEP_M, the coordinates of the key points of the hand under each viewing angle are corrected and calculated to obtain the 3D coordinates of the key points of the hand under all viewing angles gather;
步骤S50,根据标定得到的多台TOF传感器之间的相对位置变换矩阵T_M对所有视角下的手部关键点3D坐标集合进行校准映射处理,得到以手腕关键点为坐标原点的三维坐标系下的手部关键点3D坐标集合;Step S50, according to the relative position transformation matrix T_M between multiple TOF sensors obtained through calibration, the 3D coordinate set of key points of the hand under all viewing angles is calibrated and mapped, and the 3D coordinate system with the key point of the wrist as the coordinate origin is obtained. 3D coordinate collection of hand key points;
步骤S60,将以手腕关键点为坐标原点的三维坐标系下的手部关键点3D坐标集合进行计算处理,得到手部特征图,并对得到的手部特征图进行手势识别处理,得到手势识别结果。Step S60, calculate and process the 3D coordinate set of key points of the hand under the three-dimensional coordinate system with the key point of the wrist as the coordinate origin, obtain the hand feature map, and perform gesture recognition processing on the obtained hand feature map to obtain gesture recognition result.
本发明还提供了一种用于实现上述多视角手势识别方法的计算机可读存储介质,其上存储有计算机程序,该计算机程序被处理器执行时实现以下步骤:The present invention also provides a computer-readable storage medium for realizing the above multi-view gesture recognition method, on which a computer program is stored, and when the computer program is executed by a processor, the following steps are implemented:
步骤S10,在手势交互空间内安装两台用于采集手部3D信息的TOF传感器,并对每一台TOF传感器进行标定处理,以获取每一台TOF传感器的内参矩阵I_M和镜头畸变参数D_M以及多台TOF传感器之间的相对位置变换矩阵T_M;Step S10, install two TOF sensors for collecting 3D information of the hand in the gesture interaction space, and perform calibration processing on each TOF sensor to obtain the internal reference matrix I_M and lens distortion parameter D_M of each TOF sensor and Relative position transformation matrix T_M between multiple TOF sensors;
步骤S20,工作时,通过每一台TOF传感器实时采集手势交互空间内每一视角下的红外IR图像IR_M和深度图像DEP_M;Step S20, when working, each TOF sensor is used to collect the infrared IR image IR_M and the depth image DEP_M at each angle of view in the gesture interaction space in real time;
步骤S30,对采集到的红外IR图像IR_M进行检测处理,以获取每一视角下的手部关键点坐标;Step S30, detecting and processing the collected infrared IR image IR_M, so as to obtain the key point coordinates of the hand under each viewing angle;
步骤S40,根据标定得到的内参矩阵I_M和镜头畸变参数D_M并结合采集到的深度图像DEP_M对每一视角下的手部关键点坐标进行矫正计算处理,得到所有视角下的手部关键点3D坐标集合;Step S40, according to the internal reference matrix I_M obtained through calibration and the lens distortion parameter D_M and combined with the acquired depth image DEP_M, the coordinates of the key points of the hand under each viewing angle are corrected and calculated to obtain the 3D coordinates of the key points of the hand under all viewing angles gather;
步骤S50,根据标定得到的多台TOF传感器之间的相对位置变换矩阵T_M对所有视角下的手部关键点3D坐标集合进行校准映射处理,得到以手腕关键点为坐标原点的三维坐标系下的手部关键点3D坐标集合;Step S50, according to the relative position transformation matrix T_M between multiple TOF sensors obtained through calibration, the 3D coordinate set of key points of the hand under all viewing angles is calibrated and mapped, and the 3D coordinate system with the key point of the wrist as the coordinate origin is obtained. 3D coordinate collection of hand key points;
步骤S60,将以手腕关键点为坐标原点的三维坐标系下的手部关键点3D坐标集合进行计算处理,得到手部特征图,并对得到的手部特征图进行手势识别处理,得到手势识别结果。Step S60, calculate and process the 3D coordinate set of key points of the hand under the three-dimensional coordinate system with the key point of the wrist as the coordinate origin, obtain the hand feature map, and perform gesture recognition processing on the obtained hand feature map to obtain gesture recognition result.
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机程序来指令相关的硬件来完成,所述的计算机程序可存储于一非易失性计算机可读取存储介质中,该计算机程序在执行时,可包括如上述各方法的实施例的流程。其中,本申请所提供的各实施例中所使用的对存储器、存储、数据库或其它介质的任何引用,均可包括非易失性和/或易失性存储器。非易失性存储器可包括只读存储器(ROM)、可编程ROM(PROM)、电可编程ROM(EPROM)、电可擦除可编程ROM(EEPROM)或闪存。易失性存储器可包括随机存取存储器(RAM)或者外部高速缓冲存储器。作为说明而非局限,RAM以多种形式可得,诸如静态RAM(SRAM)、动态RAM(DRAM)、同步DRAM(SDRAM)、双数据率SDRAM(DDRSDRAM)、增强型SDRAM(ESDRAM)、同步链路(Synchlink)DRAM(SLDRAM)、存储器总线(Rambus)直接RAM(RDRAM)、直接存储器总线动态RAM(DRDRAM)、以及存储器总线动态RAM(RDRAM)等。Those of ordinary skill in the art can understand that all or part of the processes in the methods of the above embodiments can be implemented through computer programs to instruct related hardware, and the computer programs can be stored in a non-volatile computer-readable memory In the medium, when the computer program is executed, it may include the processes of the embodiments of the above-mentioned methods. Wherein, any references to memory, storage, database or other media used in the various embodiments provided in the present application may include non-volatile and/or volatile memory. Nonvolatile memory can include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory. Volatile memory can include random access memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in many forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Chain Synchlink DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.
以上显示和描述了本发明的基本原理和主要特征和本发明的优点。本行业的技术人员应该了解,本发明不受上述实施例的限制,上述实施例和说明书中描述的只是说明本发明的原理,在不脱离本发明精神和范围的前提下,本发明还会有各种变化和改进,这些变化和改进都落入要求保护的本发明范围内。本发明要求保护范围由所附的权利要求书及其等效物界定。The basic principles and main features of the present invention and the advantages of the present invention have been shown and described above. Those skilled in the industry should understand that the present invention is not limited by the above-mentioned embodiments. What are described in the above-mentioned embodiments and the description only illustrate the principle of the present invention. Without departing from the spirit and scope of the present invention, the present invention will also have Variations and improvements are possible, which fall within the scope of the claimed invention. The protection scope of the present invention is defined by the appended claims and their equivalents.
Claims (9)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202310535438.9A CN116665245A (en) | 2023-05-12 | 2023-05-12 | Multi-view gesture recognition method, device, computer equipment and storage medium |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202310535438.9A CN116665245A (en) | 2023-05-12 | 2023-05-12 | Multi-view gesture recognition method, device, computer equipment and storage medium |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| CN116665245A true CN116665245A (en) | 2023-08-29 |
Family
ID=87714470
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN202310535438.9A Pending CN116665245A (en) | 2023-05-12 | 2023-05-12 | Multi-view gesture recognition method, device, computer equipment and storage medium |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN116665245A (en) |
Cited By (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN119942629A (en) * | 2023-10-28 | 2025-05-06 | 荣耀终端股份有限公司 | A gesture classification method and related equipment |
Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20190392205A1 (en) * | 2017-02-28 | 2019-12-26 | SZ DJI Technology Co., Ltd. | Recognition method and apparatus and mobile platform |
| CN111522446A (en) * | 2020-06-09 | 2020-08-11 | 宁波视睿迪光电有限公司 | Gesture recognition method and device based on multipoint TOF |
| US20210124425A1 (en) * | 2019-01-04 | 2021-04-29 | Beijing Dajia Internet Information Technology Co., Ltd. | Method and electronic device of gesture recognition |
| CN113850865A (en) * | 2021-09-26 | 2021-12-28 | 北京欧比邻科技有限公司 | Human body posture positioning method and system based on binocular vision and storage medium |
| CN115798031A (en) * | 2021-09-09 | 2023-03-14 | 广州视源电子科技股份有限公司 | Gesture recognition method and device based on lie group, electronic equipment and storage medium |
-
2023
- 2023-05-12 CN CN202310535438.9A patent/CN116665245A/en active Pending
Patent Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20190392205A1 (en) * | 2017-02-28 | 2019-12-26 | SZ DJI Technology Co., Ltd. | Recognition method and apparatus and mobile platform |
| US20210124425A1 (en) * | 2019-01-04 | 2021-04-29 | Beijing Dajia Internet Information Technology Co., Ltd. | Method and electronic device of gesture recognition |
| CN111522446A (en) * | 2020-06-09 | 2020-08-11 | 宁波视睿迪光电有限公司 | Gesture recognition method and device based on multipoint TOF |
| CN115798031A (en) * | 2021-09-09 | 2023-03-14 | 广州视源电子科技股份有限公司 | Gesture recognition method and device based on lie group, electronic equipment and storage medium |
| CN113850865A (en) * | 2021-09-26 | 2021-12-28 | 北京欧比邻科技有限公司 | Human body posture positioning method and system based on binocular vision and storage medium |
Non-Patent Citations (2)
| Title |
|---|
| 徐越畅: "基于TOF设备的三维手势姿态估计方法", 《中国优秀硕士学位论文全文数据库 (信息科技辑)》, 15 February 2023 (2023-02-15) * |
| 罗坚 等: "异常步态3维人体建模和可变视角识别", 《中国图象图形学报》, no. 08, 12 August 2020 (2020-08-12) * |
Cited By (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN119942629A (en) * | 2023-10-28 | 2025-05-06 | 荣耀终端股份有限公司 | A gesture classification method and related equipment |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN110096925B (en) | Enhancement method, acquisition method and device of facial expression image | |
| CN111462200B (en) | A cross-video pedestrian positioning and tracking method, system and device | |
| CN108399367B (en) | Hand motion recognition method and device, computer equipment and readable storage medium | |
| CN107451965B (en) | Distorted face image correction method, device, computer equipment and storage medium | |
| CN107590453A (en) | Processing method, device and the equipment of augmented reality scene, computer-readable storage medium | |
| WO2022068225A1 (en) | Point cloud annotating method and apparatus, electronic device, storage medium, and program product | |
| CN110728210A (en) | Semi-supervised target labeling method and system for three-dimensional point cloud data | |
| CN109934847A (en) | The method and apparatus of weak texture three-dimension object Attitude estimation | |
| CN112037159A (en) | A method and system for cross-camera road space fusion and vehicle target detection and tracking | |
| CN111461196B (en) | Fast and Robust Image Recognition and Tracking Method and Device Based on Structural Features | |
| CN106030610A (en) | Real-time 3D gesture recognition and tracking system for mobile devices | |
| CN108537214B (en) | An automatic construction method of indoor semantic map | |
| CN107766851A (en) | A kind of face key independent positioning method and positioner | |
| CN112017212A (en) | Training and tracking method and system of face key point tracking model | |
| CN112991449B (en) | An AGV positioning and mapping method, system, device and medium | |
| CN111399634B (en) | Method and device for gesture-guided object recognition | |
| CN112102404B (en) | Object detection tracking method and device and head-mounted display equipment | |
| CN113705393A (en) | 3D face model-based depression angle face recognition method and system | |
| JP2023089947A5 (en) | ||
| CN115700507B (en) | Map updating method and device | |
| CN116665245A (en) | Multi-view gesture recognition method, device, computer equipment and storage medium | |
| Kalra et al. | Towards co-evaluation of cameras hdr and algorithms for industrial-grade 6dof pose estimation | |
| WO2023066142A1 (en) | Target detection method and apparatus for panoramic image, computer device and storage medium | |
| JP2007249592A (en) | 3D object recognition system | |
| CN112767442B (en) | A three-dimensional detection and tracking method and system for pedestrians based on top view |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination |