[go: up one dir, main page]

CN116092167A - Human face living body detection method based on reading - Google Patents

Human face living body detection method based on reading Download PDF

Info

Publication number
CN116092167A
CN116092167A CN202310206478.9A CN202310206478A CN116092167A CN 116092167 A CN116092167 A CN 116092167A CN 202310206478 A CN202310206478 A CN 202310206478A CN 116092167 A CN116092167 A CN 116092167A
Authority
CN
China
Prior art keywords
reading
sound
video
face
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310206478.9A
Other languages
Chinese (zh)
Inventor
谢华
陈书东
彭汉迎
席锋
陈晓念
马宇翔
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Weisi E Commerce Shenzhen Co ltd
Original Assignee
Weisi E Commerce Shenzhen Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Weisi E Commerce Shenzhen Co ltd filed Critical Weisi E Commerce Shenzhen Co ltd
Priority to CN202310206478.9A priority Critical patent/CN116092167A/en
Publication of CN116092167A publication Critical patent/CN116092167A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • G06V40/166Detection; Localisation; Normalisation using acquisition arrangements
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/75Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/40Spoof detection, e.g. liveness detection
    • G06V40/45Detection of the body part being alive

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

本发明提供一种基于读数的人脸活体检测方法,涉及医疗器械领域。该基于读数的人脸活体检测方法,具体包括以下步骤:S1.人脸检测,S2.读数提示,S3.读数数据采集,S4.读数判断,S5.活体检测。本发发明通过识别唇部动作和声音内容,判断动作和声音是否一致,满足要求后,进行动作视频活体检测和声音活体检测,本发明利用了视频图像信息和声音信息,比单纯的只利用图像信息,有了更多的信息,提升了活体检测的上限,极大的提升了活体检测的能力。

Figure 202310206478

The invention provides a face detection method based on readings, and relates to the field of medical equipment. The face detection method based on readings specifically includes the following steps: S1. face detection, S2. reading prompt, S3. reading data collection, S4. reading judgment, S5. living body detection. The present invention judges whether the action and the sound are consistent by identifying the lip movement and the sound content. After meeting the requirements, the motion video liveness detection and the sound liveness detection are carried out. Information, with more information, the upper limit of liveness detection is raised, and the ability of liveness detection is greatly improved.

Figure 202310206478

Description

一种基于读数的人脸活体检测方法A face detection method based on readings

技术领域technical field

本发明涉及人脸活体检测技术领域,具体为一种基于读数的人脸活体检测方法。The invention relates to the technical field of face liveness detection, in particular to a reading-based face liveness detection method.

背景技术Background technique

随着人脸识别技术在各种身份认证场景中的广泛使用,黑产、假冒他人等情况的日益猖獗,对人脸活体检测的要求也越来越高。With the widespread use of face recognition technology in various identity authentication scenarios, illegal production and impersonation of others are becoming more and more rampant, and the requirements for face liveness detection are also getting higher and higher.

起初,人脸图像被用在人脸活体检测当中;随后,基于动作的动态人脸活体检测方法也被广泛使用。同时,随着声音识别、声纹识别等技术的发展,人的声音也被用作身份验证,也衍生出了声音活体检测的需求。At first, face images were used in face liveness detection; subsequently, motion-based dynamic face liveness detection methods were also widely used. At the same time, with the development of technologies such as voice recognition and voiceprint recognition, human voices are also used for identity verification, which also leads to the need for voice liveness detection.

然而现有的检测方法准确率较低,活体检测的能力不高,影响检测结果。However, the accuracy of existing detection methods is low, and the ability of live detection is not high, which affects the detection results.

发明内容Contents of the invention

(一)解决的技术问题(1) Solved technical problems

针对现有技术的不足,本发明提供了一种基于读数的人脸活体检测方法,解决了现有的检测方法准确率较低的问题。Aiming at the deficiencies of the prior art, the present invention provides a face detection method based on readings, which solves the problem of low accuracy of the existing detection method.

(二)技术方案(2) Technical solutions

为实现以上目的,本发明通过以下技术方案予以实现:一种基于读数的人脸活体检测方法,具体包括以下步骤:In order to achieve the above object, the present invention is achieved through the following technical solutions: a reading-based face detection method, specifically comprising the following steps:

S1.人脸检测S1. Face detection

进入人脸身份验证流程后,采集摄像头捕获的画面,使用神经网络人脸检测模型检测画面中是否有人脸;After entering the face authentication process, collect the picture captured by the camera, and use the neural network face detection model to detect whether there is a human face in the picture;

S2.读数提示S2. Reading prompt

检测到人脸后,进入读数阶段;After detecting the face, enter the reading stage;

S3.读数数据采集S3. Reading data acquisition

提示结束后,进入读数阶段;After the prompt is over, enter the reading stage;

S4.读数判断S4. Reading judgment

因为无法保证用户按照要求,发声读出指定的数字。Because there is no guarantee that the user will read the specified number audibly according to the requirements.

设备端需要判断用户是否完成了“发声读数”的动作,如果没有检测到用户发声读出数字,则提示用户操作失败,引导重新采集;The device side needs to judge whether the user has completed the action of "voice reading". If it is not detected that the user has read out the number by voice, it will prompt the user that the operation failed and guide the collection again;

S5.活体检测S5. Liveness detection

设备采集读数视频数据,传递到后端进行活体检测。The device collects reading video data and transmits it to the backend for liveness detection.

优选的,所述步骤S1中还包括如果没有人脸,或者人脸的图像质量较低,则不进行下一阶段,有人脸时,进入下一阶段。Preferably, the step S1 further includes not proceeding to the next stage if there is no human face, or the image quality of the human face is low, and entering the next stage if there is a human face.

优选的,所述步骤S2中还包括通过一定的交互方式,教用户如何完成该阶段的数据采集。Preferably, the step S2 also includes teaching the user how to complete the data collection at this stage through a certain interactive method.

优选的,所述步骤S3中还包括用户需要在一段时间内,读出提示的数字,同时,设备将摄像头采集的视频-声音数据保存下来。无论用户是否按照要求进行读数,数据都会被缓存下来。Preferably, the step S3 also includes that the user needs to read out the prompted number within a period of time, and at the same time, the device saves the video-sound data collected by the camera. Regardless of whether the user takes the reading as requested, the data will be cached.

优选的,所述步骤S4中进一步的为使用语音活动检测算法来检测是否有人的发声,VAD算法通过检测声音能量的变化,判断声音是否有变化。通常情况下,发声读数,声音信号的能量会有变化,设备可以通过检测声音信号能量的变化来判断有无发声。Preferably, in the step S4, a voice activity detection algorithm is further used to detect whether someone is making a sound, and the VAD algorithm detects a change in sound energy to determine whether there is a change in the sound. Under normal circumstances, the energy of the sound signal will change in the reading of the sound, and the device can judge whether there is sound by detecting the change of the sound signal energy.

优选的,所述步骤S5中进一步的为使用了“基于多模态Tansformer的视频-声音内容识别”和“基于视频的动作活体检测”来做活体检测,只有两个模块同时通过时才验证通过。Preferably, the step S5 further uses "multimodal Transformer-based video-sound content recognition" and "video-based action liveness detection" for liveness detection, and the verification is passed only when the two modules pass at the same time .

优选的,所述步骤S5中具体包括以下步骤:Preferably, the step S5 specifically includes the following steps:

a.基于多模态Tansformer的视频-声音内容识别a. Video-sound content recognition based on multi-modal Transformer

伪造的视频-声音数据中,比较容易出现声音和视频不匹配的情况,将声音和视频不匹配的数据进行拒绝,Tansformer的结构,能够记录数据的位置信息,利用模型的该能力,将不匹配的数据进行识别,如果声音和视频不匹配,则不通过,只有声音和视频匹配了才会通过;Forged video-sound data, it is more likely that the sound and video do not match, and the data that does not match the sound and video will be rejected. The structure of the Transformer can record the location information of the data. Using this ability of the model, the mismatched data will be rejected. The data is identified, if the sound and video do not match, it will not pass, only the sound and video match will pass;

b.基于视频的动作活体检测b. Video-based liveness detection

人发声读数,人的嘴巴有动作,光流可以表示运动物体前后两帧图像像素点变化的大小和方向,当人的嘴巴做出动作时,位置发生移动,而可以通过光流来表示图像间的运动特征。Human voice readings, the human mouth moves, the optical flow can represent the size and direction of the pixel change of the two frames of images before and after the moving object, when the human mouth moves, the position moves, and the optical flow can be used to represent the distance between the images. movement characteristics.

(三)有益效果(3) Beneficial effects

本发明提供了一种基于读数的人脸活体检测方法。具备以下有益效果:The invention provides a face detection method based on readings. Has the following beneficial effects:

本发明提供了一种基于读数的人脸活体检测方法,本发明通过与人交互,提示人进行阅读数字,人的嘴唇会做出相应动作并发出声音,通过识别唇部动作和声音内容,判断动作和声音是否一致,满足要求后,进行动作活体检测和声音活体检测,该方法通过同时使用图像和语音信息来进行人脸活体检测,提升了活体检测的准确率,此方法在实际工业应用中,效果提升明显,得到广泛的推广使用。The present invention provides a face detection method based on readings. The present invention prompts the person to read the number by interacting with the person, and the person's lips will make a corresponding movement and make a sound. By identifying the lip movement and the sound content, the judgment Whether the action and sound are consistent, and after meeting the requirements, perform action liveness detection and sound liveness detection. This method uses image and voice information at the same time for face liveness detection, which improves the accuracy of liveness detection. This method is used in practical industrial applications. , the effect is significantly improved, and has been widely promoted and used.

附图说明Description of drawings

图1为本发明的人脸活体检测流程图;Fig. 1 is the flow chart of human face biopsy detection of the present invention;

图2为本发明基于多模态Tansformer的视频-声音内容识别模型结构图;Fig. 2 is the structural diagram of the video-sound content recognition model based on the multimodal Transformer of the present invention;

图3为本发明基于视频的动作活体检测模型结构图。FIG. 3 is a structural diagram of a video-based motion detection model of the present invention.

具体实施方式Detailed ways

下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。The following will clearly and completely describe the technical solutions in the embodiments of the present invention with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only some, not all, embodiments of the present invention. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the protection scope of the present invention.

实施例:Example:

如图1-3所示,本发明实施例提供一种基于读数的人脸活体检测方法,具体包括以下步骤:As shown in Figures 1-3, the embodiment of the present invention provides a face detection method based on readings, which specifically includes the following steps:

S1.人脸检测S1. Face detection

进入人脸身份验证流程后,采集摄像头捕获的画面,使用神经网络人脸检测模型检测画面中是否有人脸。如果没有人脸,或者人脸的图像质量较低,则不进行下一阶段;有人脸时,进入下一阶段;After entering the face authentication process, collect the picture captured by the camera, and use the neural network face detection model to detect whether there is a human face in the picture. If there is no face, or the image quality of the face is low, do not proceed to the next stage; when there is a face, enter the next stage;

S2.读数提示S2. Reading prompt

检测到人脸后,进入读数阶段。通过一定的交互方式,教用户如何完成该阶段的数据采集。例如:设备屏幕上,展示读数的动画提示,同时播放声音提示,向用户展示如何正确操作。After the face is detected, enter the reading stage. Through certain interactive methods, teach users how to complete the data collection at this stage. For example: on the screen of the device, an animation prompt of the reading is displayed, and a sound prompt is played at the same time to show the user how to operate correctly.

然后,用户根据提示,发声读出指定的数字。本方案使用阿拉伯数字,对于不同的语言,同一个数字的读音不同。但是,对于同一个数字,读音的种类是有限的;Then, the user audibly reads the specified number according to the prompt. This scheme uses Arabic numerals, and for different languages, the pronunciation of the same numeral is different. However, for the same number, the types of pronunciation are limited;

S3.读数数据采集S3. Reading data acquisition

提示结束后,进入读数阶段。用户需要在一段时间内,读出提示的数字。同时,设备将摄像头采集的视频-声音数据保存下来。无论用户是否按照要求进行读数,数据都会被缓存下来。After the prompt ends, enter the reading stage. The user needs to read out the number prompted within a period of time. At the same time, the device saves the video-sound data collected by the camera. Regardless of whether the user takes the reading as requested, the data will be cached.

设备保存声音和视频数据时,要求声音和视频在时间上必须对齐,保持同步;When the device saves sound and video data, it is required that the sound and video must be aligned in time and kept in sync;

S4.读数判断S4. Reading judgment

因为无法保证用户按照要求,发声读出指定的数字。Because there is no guarantee that the user will read the specified number audibly according to the requirements.

设备端需要判断用户是否完成了“发声读数”的动作,如果没有检测到用户发声读出数字,则提示用户操作失败,引导重新采集。The device side needs to judge whether the user has completed the action of "voice reading". If it is not detected that the user has read out the number, it will prompt the user that the operation failed and guide the collection again.

本方案中,使用语音活动检测(Voice Activity Detection,VAD)算法来检测是否有人的发声。VAD算法通过检测声音能量的变化,判断声音是否有变化。通常情况下,发声读数,声音信号的能量会有变化,设备可以通过检测声音信号能量的变化来判断有无发声。In this solution, a voice activity detection (Voice Activity Detection, VAD) algorithm is used to detect whether someone is speaking. The VAD algorithm judges whether the sound has changed by detecting the change of sound energy. Under normal circumstances, the energy of the sound signal will change in the reading of the sound, and the device can judge whether there is sound by detecting the change of the sound signal energy.

本方案中,使用基于深度学习的动作检测模型来检测人的嘴巴是否有读数的动作。通常,人正常读数时,嘴巴会产生张开、闭合的动作,或者嘴型的变化。通过检测视频中是否有嘴巴的动作,设备就可以判断用户是否在正常读数。In this solution, an action detection model based on deep learning is used to detect whether the human mouth has a reading action. Usually, when a person reads normally, the mouth will open and close, or the shape of the mouth will change. By detecting whether there is a mouth movement in the video, the device can determine whether the user is taking a normal reading.

为了保证正常用户的使用不受打扰,两个条件有一个满足就说明用户做出了正确的动作。设备将采集到的读数视频-声音数据进行上传。In order to ensure that the use of normal users will not be disturbed, if one of the two conditions is met, it means that the user has taken a correct action. The device uploads the collected reading video-sound data.

为了保证设备对于读数行为判断的鲁棒性,尽量降低门槛和阈值。设备端进行判断,过滤掉一部分不符合要求的数据,及时提供反馈,无需等待后端的反馈结果,主要是为了保证前期新手用户的使用体验;In order to ensure the robustness of the device for judging the reading behavior, the threshold and threshold should be lowered as much as possible. The device side makes judgments, filters out some data that does not meet the requirements, and provides feedback in a timely manner without waiting for the feedback results from the backend, mainly to ensure the experience of novice users in the early stage;

S5.活体检测S5. Liveness detection

设备采集读数视频数据,传递到后端进行活体检测。本方案使用了“基于多模态Tansformer的视频-声音内容识别”和“基于视频的动作活体检测”来做活体检测,只有两个模块同时通过时才验证通过。The device collects reading video data and transmits it to the backend for liveness detection. This solution uses "Video-Sound Content Recognition Based on Multimodal Transformer" and "Video-Based Action Liveness Detection" for liveness detection. Only when the two modules pass at the same time can the verification be passed.

步骤S5中具体包括以下步骤:Step S5 specifically includes the following steps:

a.基于多模态Tansformer的视频-声音内容识别a. Video-sound content recognition based on multi-modal Transformer

伪造的视频-声音数据中,比较容易出现声音和视频不匹配的情况,将声音和视频不匹配的数据进行拒绝,Tansformer的结构,能够记录数据的位置信息,利用模型的该能力,将不匹配的数据进行识别,如果声音和视频不匹配,则不通过,只有声音和视频匹配了才会通过。Forged video-sound data, it is more likely that the sound and video do not match, and the data that does not match the sound and video will be rejected. The structure of the Transformer can record the location information of the data. Using this ability of the model, the mismatched data will be rejected. The data is identified, if the sound and video do not match, it will not pass, only if the sound and video match will pass.

同时,模型可以识别声音和视频中动作,判断人的读数内容与要求的读数是否一致。如果一致,则验证通过;否则不通过。At the same time, the model can recognize actions in sound and video, and judge whether the reading content of the person is consistent with the required reading. If they are consistent, the verification is passed; otherwise, it is not passed.

本方案中,基于多模态Tansformer的视频-声音内容识别的模型结果如图2所示。模型输入为声音数据和视频数据,模型输出为活体的类别,使用数据训练Tansformer得到识别模型;In this solution, the model results of video-sound content recognition based on multi-modal Transformer are shown in Figure 2. The input of the model is sound data and video data, the output of the model is the category of the living body, and the recognition model is obtained by training the Transformer with the data;

b.基于视频的动作活体检测b. Video-based liveness detection

人发声读数,人的嘴巴有动作,光流可以表示运动物体前后两帧图像像素点变化的大小和方向,当人的嘴巴做出动作时,位置发生移动,而可以通过光流来表示图像间的运动特征。Human voice readings, the human mouth moves, the optical flow can represent the size and direction of the pixel change of the two frames of images before and after the moving object, when the human mouth moves, the position moves, and the optical flow can be used to represent the distance between the images. movement characteristics.

对于视频,有很多帧图像[I1,I2,...,In],计算每帧图像之间的光流得到光流序列[F1,F2,...,Fn-1],合并两个得到一个包含运动的人脸图像和连续每两张图像之间的光流的序列[I1,F1,I2,F2,...,In-1,Fn-1,In]。For video, there are many frames of images [I1,I2,...,In], calculate the optical flow between each frame of images to get the optical flow sequence [F1,F2,...,Fn-1], merge the two to get A sequence [I1,F1,I2,F2,...,In-1,Fn-1,In] of moving face images and optical flow between every two consecutive images.

设定阈值,如果活体得分大于阈值,模型预测为活体,则验证通过;否则不通过。Set the threshold, if the living body score is greater than the threshold and the model predicts that it is living, the verification is passed; otherwise it is not passed.

本方案中,基于视频的动作活体检测模型结果如图3所示。模型输入视频帧和光流序列,模型输出为活体的类别,使用数据训练神经网络得到活体检测模型。In this solution, the results of the video-based liveness detection model are shown in Figure 3. The model inputs the video frame and the optical flow sequence, and the model output is the category of the living body, and uses the data to train the neural network to obtain the living body detection model.

本发明进一步的为基于读数的人脸活体检测流程图如图1所示,进入刷脸流程后,设备捕获摄像头数据,对摄像头采集到的图像进行人脸检测,如果检测到人脸,进入下一阶段,否则持续循环进行人脸检测;检测到人脸后,提示用户读出数字,并缓存用户读数的语音和视频;对保存的语音和视频进行动作检测和语音活动检测,当存在嘴部动作或者有声音时进入下一阶段,否则开始新一轮的采集;将上采集到的数上传到后台,使用活体检测模型,判断待验证人脸是否为真人,如果是真人返回活体检测通过,否则返回活体检测不通过。The present invention further is a flow chart of face detection based on readings as shown in Figure 1. After entering the face scanning process, the device captures camera data and performs face detection on images collected by the camera. If a face is detected, enter the next step. One stage, otherwise face detection is performed in a continuous cycle; after a face is detected, the user is prompted to read the number, and the voice and video of the user's reading are cached; the saved voice and video are detected for motion and voice activity, when there is a mouth Enter the next stage when there is an action or sound, otherwise start a new round of collection; upload the data collected above to the background, use the liveness detection model to judge whether the face to be verified is a real person, if it is a real person, return to liveness detection and pass, Otherwise, return liveness detection failed.

尽管已经示出和描述了本发明的实施例,对于本领域的普通技术人员而言,可以理解在不脱离本发明的原理和精神的情况下可以对这些实施例进行多种变化、修改、替换和变型,本发明的范围由所附权利要求及其等同物限定。Although the embodiments of the present invention have been shown and described, those skilled in the art can understand that various changes, modifications and substitutions can be made to these embodiments without departing from the principle and spirit of the present invention. and modifications, the scope of the invention is defined by the appended claims and their equivalents.

Claims (7)

1.一种基于读数的人脸活体检测方法,其特征在于,具体包括以下步骤:1. A human face living body detection method based on readings, is characterized in that, specifically comprises the following steps: S1.人脸检测S1. Face detection 进入人脸身份验证流程后,采集摄像头捕获的画面,使用神经网络人脸检测模型检测画面中是否有人脸;After entering the face authentication process, collect the picture captured by the camera, and use the neural network face detection model to detect whether there is a human face in the picture; S2.读数提示S2. Reading prompt 检测到人脸后,进入读数阶段;After detecting the face, enter the reading stage; S3.读数数据采集S3. Reading data acquisition 提示结束后,进入读数阶段;After the prompt is over, enter the reading stage; S4.读数判断S4. Reading judgment 因为无法保证用户按照要求,发声读出指定的数字。Because there is no guarantee that the user will read the specified number audibly according to the requirements. 设备端需要判断用户是否完成了“发声读数”的动作,如果没有检测到用户发声读出数字,则提示用户操作失败,引导重新采集;The device side needs to judge whether the user has completed the action of "voice reading". If it is not detected that the user has read out the number by voice, it will prompt the user that the operation failed and guide the collection again; S5.活体检测S5. Liveness detection 设备采集读数视频数据,传递到后端进行活体检测。The device collects reading video data and transmits it to the backend for liveness detection. 2.根据权利要求1所述的一种基于读数的人脸活体检测方法,其特征在于:所述步骤S1中还包括如果没有人脸,或者人脸的图像质量较低,则不进行下一阶段,有人脸时,进入下一阶段。2. A kind of human face living body detection method based on reading according to claim 1, it is characterized in that: if there is no human face in the described step S1, or the image quality of human face is low, then do not carry out the next step. stage, when there is a face, enter the next stage. 3.根据权利要求1所述的一种基于读数的人脸活体检测方法,其特征在于:所述步骤S2中还包括通过一定的交互方式,教用户如何完成该阶段的数据采集。3. A face detection method based on readings according to claim 1, characterized in that: the step S2 also includes teaching the user how to complete the data collection at this stage through a certain interactive mode. 4.根据权利要求1所述的一种基于读数的人脸活体检测方法,其特征在于:所述步骤S3中还包括用户需要在一段时间内,读出提示的数字,同时,设备将摄像头采集的视频-声音数据保存下来。无论用户是否按照要求进行读数,数据都会被缓存下来。4. A reading-based human face living body detection method according to claim 1, characterized in that: said step S3 also includes that the user needs to read out the number of prompts within a period of time, and at the same time, the device collects the number with the camera The video-sound data is saved. Regardless of whether the user takes the reading as requested, the data will be cached. 5.根据权利要求1所述的一种基于读数的人脸活体检测方法,其特征在于:所述步骤S4中进一步的为使用语音活动检测算法来检测是否有人的发声,VAD算法通过检测声音能量的变化,判断声音是否有变化。通常情况下,发声读数,声音信号的能量会有变化,设备可以通过检测声音信号能量的变化来判断有无发声。5. a kind of human face living body detection method based on reading according to claim 1, it is characterized in that: in described step S4, further be to use voice activity detection algorithm to detect whether people's utterance, VAD algorithm detects sound energy to determine whether the sound has changed. Under normal circumstances, the energy of the sound signal will change in the reading of the sound, and the device can judge whether there is sound by detecting the change of the sound signal energy. 6.根据权利要求1所述的一种基于读数的人脸活体检测方法,其特征在于:所述步骤S5中进一步的为使用了“基于多模态Tansformer的视频-声音内容识别”和“基于视频的动作活体检测”来做活体检测,只有两个模块同时通过时才验证通过。6. A kind of human face living body detection method based on reading according to claim 1, it is characterized in that: in described step S5, further be to use " video-sound content recognition based on multimodal Tansformer " and " based on Video motion liveness detection" is used for liveness detection, and the verification is passed only when the two modules pass at the same time. 7.根据权利要求1所述的一种基于读数的人脸活体检测方法,其特征在于:所述步骤S5中具体包括以下步骤:7. a kind of face detection method based on reading according to claim 1, is characterized in that: specifically comprise the following steps in the described step S5: a.基于多模态Tansformer的视频-声音内容识别a. Video-sound content recognition based on multi-modal Transformer 伪造的视频-声音数据中,比较容易出现声音和视频不匹配的情况,将声音和视频不匹配的数据进行拒绝,Tansformer的结构,能够记录数据的位置信息,利用模型的该能力,将不匹配的数据进行识别,如果声音和视频不匹配,则不通过,只有声音和视频匹配了才会通过;Forged video-sound data, it is more likely that the sound and video do not match, and the data that does not match the sound and video will be rejected. The structure of the Transformer can record the location information of the data. Using this ability of the model, the mismatched data will be rejected. The data is identified, if the sound and video do not match, it will not pass, only the sound and video match will pass; b.基于视频的动作活体检测b. Video-based liveness detection 人发声读数,人的嘴巴有动作,光流可以表示运动物体前后两帧图像像素点变化的大小和方向,当人的嘴巴做出动作时,位置发生移动,而可以通过光流来表示图像间的运动特征。Human voice readings, the human mouth moves, the optical flow can represent the size and direction of the pixel change of the two frames of images before and after the moving object, when the human mouth moves, the position moves, and the optical flow can be used to represent the distance between the images. movement characteristics.
CN202310206478.9A 2023-02-23 2023-02-23 Human face living body detection method based on reading Pending CN116092167A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310206478.9A CN116092167A (en) 2023-02-23 2023-02-23 Human face living body detection method based on reading

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310206478.9A CN116092167A (en) 2023-02-23 2023-02-23 Human face living body detection method based on reading

Publications (1)

Publication Number Publication Date
CN116092167A true CN116092167A (en) 2023-05-09

Family

ID=86214181

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310206478.9A Pending CN116092167A (en) 2023-02-23 2023-02-23 Human face living body detection method based on reading

Country Status (1)

Country Link
CN (1) CN116092167A (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017113370A1 (en) * 2015-12-31 2017-07-06 华为技术有限公司 Voiceprint detection method and apparatus
CN108154111A (en) * 2017-12-22 2018-06-12 泰康保险集团股份有限公司 Liveness detection method, system, electronic device and computer readable medium
CN113505652A (en) * 2021-06-15 2021-10-15 腾讯科技(深圳)有限公司 Living body detection method, living body detection device, electronic apparatus, and storage medium
CN114596609A (en) * 2022-01-19 2022-06-07 中国科学院自动化研究所 A kind of audiovisual forgery detection method and device
CN115546874A (en) * 2022-11-03 2022-12-30 平安银行股份有限公司 Face living body detection method and system

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017113370A1 (en) * 2015-12-31 2017-07-06 华为技术有限公司 Voiceprint detection method and apparatus
CN108154111A (en) * 2017-12-22 2018-06-12 泰康保险集团股份有限公司 Liveness detection method, system, electronic device and computer readable medium
CN113505652A (en) * 2021-06-15 2021-10-15 腾讯科技(深圳)有限公司 Living body detection method, living body detection device, electronic apparatus, and storage medium
CN114596609A (en) * 2022-01-19 2022-06-07 中国科学院自动化研究所 A kind of audiovisual forgery detection method and device
CN115546874A (en) * 2022-11-03 2022-12-30 平安银行股份有限公司 Face living body detection method and system

Similar Documents

Publication Publication Date Title
CN104361276B (en) A kind of multi-modal biological characteristic identity identifying method and system
CN113537005B (en) Online examination student behavior analysis method based on attitude estimation
CN110443231A (en) A kind of fingers of single hand point reading character recognition method and system based on artificial intelligence
CN115050077A (en) Emotion recognition method, device, equipment and storage medium
CN109558788B (en) Silence voice input identification method, computing device and computer readable medium
CN105718874A (en) Method and device of in-vivo detection and authentication
CN113177531B (en) Speech recognition method, system, equipment and medium based on video analysis
CN114596609B (en) Method and device for detecting audio-visual forgery
CN108227903A (en) A kind of virtual reality language interactive system and method
CN112801000B (en) A fall detection method and system for the elderly at home based on multi-feature fusion
CN109299690B (en) A method that can improve the accuracy of video real-time face recognition
TWI767775B (en) Image processing based emotion recognition system and method
CN110363129B (en) Early screening system for autism based on smile paradigm and audio-visual behavior analysis
CN114998968B (en) A method for analyzing classroom interactive behavior based on audio and video
CN105516280A (en) Multi-mode learning process state information compression recording method
CN118897887B (en) An efficient digital human interaction system integrating multimodal information
CN114970701A (en) Multi-mode fusion-based classroom interaction analysis method and system
CN118013010A (en) Robot interaction method based on large language model
CN114037954B (en) A human behavior analysis system based on densely populated classrooms
CN116092167A (en) Human face living body detection method based on reading
CN112580520A (en) Defafake detection method based on imitation learning
CN111950449A (en) Emotion recognition method based on walking posture
CN117892260A (en) Multi-mode short video emotion visualization analysis method and system
CN111159676A (en) Multi-dimensional identity authentication system and method based on face recognition
Yu Computer-aided english pronunciation accuracy detection based on lip action recognition algorithm

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20230509