WO2021073364A1 - Procédé, appareil et dispositif de détection de vivacité de visage, et support de stockage - Google Patents
Procédé, appareil et dispositif de détection de vivacité de visage, et support de stockage Download PDFInfo
- Publication number
- WO2021073364A1 WO2021073364A1 PCT/CN2020/116507 CN2020116507W WO2021073364A1 WO 2021073364 A1 WO2021073364 A1 WO 2021073364A1 CN 2020116507 W CN2020116507 W CN 2020116507W WO 2021073364 A1 WO2021073364 A1 WO 2021073364A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- face
- image
- difference image
- target detection
- sample
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/168—Feature extraction; Face representation
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/40—Spoof detection, e.g. liveness detection
- G06V40/45—Detection of the body part being alive
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q20/00—Payment architectures, schemes or protocols
- G06Q20/38—Payment protocols; Details thereof
- G06Q20/40—Authorisation, e.g. identification of payer or payee, verification of customer or shop credentials; Review and approval of payers, e.g. check credit lines or negative lists
- G06Q20/401—Transaction verification
- G06Q20/4014—Identity check for transactions
- G06Q20/40145—Biometric identity checks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/44—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
- G06V10/443—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
- G06V10/449—Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
- G06V10/451—Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters with interaction between the filter responses, e.g. cortical complex cells
- G06V10/454—Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/56—Extraction of image or video features relating to colour
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/60—Type of objects
- G06V20/64—Three-dimensional objects
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/168—Feature extraction; Face representation
- G06V40/171—Local features and components; Facial parts ; Occluding parts, e.g. glasses; Geometrical relationships
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/172—Classification, e.g. identification
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/94—Hardware or software architectures specially adapted for image or video understanding
- G06V10/95—Hardware or software architectures specially adapted for image or video understanding structured as a network, e.g. client-server architectures
Definitions
- This application relates to the field of identity authentication technology, in particular to the detection of human faces in vivo.
- Face live detection is a key step in the face recognition process, and it is directly related to the security of user authentication. With the continuous development and application of live face detection technology, more and more face live attacks are also emerging. Only defending against plane-type attacks (screen, paper attacks, etc.) is no longer sufficient for face live detection. The demand for high security.
- the current face live detection technology commonly used in the industry is a live detection algorithm based on the depth information of the user's face.
- the algorithm judges whether the current user is a real person or a copy of some paper, photos, certificates, etc. by estimating the depth image of the input picture. attack.
- a major flaw of this algorithm is that it has been proven theoretically unable to resist three-dimensional (3D) attacks (such as real people wearing masks, 3D models, etc.), which has high security requirements for current payment, access control and other application scenarios It is often unacceptable.
- the embodiment of the present application provides a method for live detection of a face, which decouples texture information and depth information from face images under different lighting conditions based on the principle of reflection for use in live detection of a face, which can effectively defend against 3D attacks. And plane types of attacks.
- the embodiments of the present application also provide a face living body detection device, equipment, computer-readable storage medium, and computer program product.
- the first aspect of the present application provides a method for detecting live human faces, the method is executed by a processing device with image processing capabilities, and the method includes:
- a feature map is extracted from the difference image, and the object reflectivity corresponding to the target detection object and the object normal vector corresponding to the target detection object are decoupled from the feature map; wherein, the object reflectivity is Is used to represent texture information, and the object normal vector is used to represent depth information;
- the target detection object is a living body.
- a second aspect of the present application provides a method for training a face detection model.
- the method is executed by a processing device with image processing capabilities, and the method includes:
- Each group of training data in the training data set includes a sample difference image, a label of the sample difference image, a depth map and a texture map corresponding to the sample difference image, and the sample difference image It is obtained by performing image difference on the face image of the sample detection object collected under different lighting conditions, and the label tag of the sample difference image is used to identify whether the sample detection object to which the sample difference image belongs is a living body,
- the depth map is used to identify the depth information of each pixel position in the sample difference image
- the texture map is used to identify the material type of each pixel position in the sample difference image, and the material type is based on the pixel position Texture information determination;
- a pre-built first neural network model according to the training data set to obtain a first neural network model in a convergent state, the first neural network model including a convolutional layer, two deconvolutional layers, and a global pooling layer And the fully connected classification layer;
- the first neural network model in the convergent state is cropped to obtain a face live detection model, and the face live detection model includes the convolutional layer, the global pooling layer, and the fully connected classification layer.
- a third aspect of the present application provides a face living detection device, the device is deployed on a processing device with image processing capabilities, and the device includes:
- a face image acquisition module configured to acquire a first face image of the target detection object under a first lighting condition and a second face image of the target detection object under a second lighting condition
- a difference image determination module configured to determine a difference image according to the first face image and the second face image
- the feature extraction module is configured to extract a feature map from the difference image, and decouple the object reflectivity corresponding to the target detection object and the object normal vector corresponding to the target detection object from the feature map; wherein, The object reflectivity is used to characterize texture information, and the object normal vector is used to characterize depth information;
- the living body detection module is used to determine whether the target detection object is a living body according to the object reflectivity and the normal vector of the object.
- a fourth aspect of the present application provides a device for training a face detection model.
- the device is deployed on a processing device with image processing capabilities, and the device includes:
- the sample set acquisition module is used to acquire a training data set.
- Each set of training data in the training data set includes a sample difference image, a label of the sample difference image, a depth map and a texture map corresponding to the sample difference image
- the sample difference image is obtained by performing image difference on the face image of the sample detection object collected under different lighting conditions, and the label of the sample difference image is used to identify the sample difference image belongs to Whether the sample detection object is a living body, the depth map is used to identify the depth information of each pixel position in the sample difference image, and the texture map is used to identify the material type of each pixel position in the sample difference image. The material type is determined based on the texture information of the pixel position;
- the training module is configured to train a pre-built first neural network model according to the training data set to obtain a first neural network model in a convergent state, the first neural network model including a convolutional layer and two deconvolutional layers And the global pooling layer and the fully connected classification layer;
- the cropping module is configured to crop the first neural network model in the convergent state to obtain a face live detection model, the face live detection model including the convolutional layer, the global pooling layer, and the full Connect the classification layer.
- a fifth aspect of the present application provides a processing device, the device including:
- the memory is used to store a computer program
- the processor is configured to execute the steps of the face living detection method described in the first aspect or the face living detection model training method described in the second aspect according to the computer program.
- a sixth aspect of the present application provides a computer-readable storage medium, the computer-readable storage medium is used to store a computer program, and the computer program is used to execute the face living detection method described in the first aspect or the second aspect The training method of the face detection model.
- the seventh aspect of the present application provides a computer program product including instructions, which when run on a computer, causes the computer to execute the face living detection method described in the first aspect or the face living body detection method described in the second aspect. Detect model training method.
- Face detection can be realized based on the hardware of mainstream terminal devices, without additional hardware overhead, and without requiring users to complete additional specified actions, which improves detection efficiency and user experience.
- the lighting sequence of color and intensity under different lighting conditions can be regarded as a kind of active encoding, and the attacker cannot provide targeted input under the corresponding light when the encoding method is unknown, further reducing the possibility of being attacked. It improves the reliability of detection.
- FIG. 1 is a system architecture diagram of a method for detecting a human face in an embodiment of the application
- FIG. 2 is a flowchart of a method for detecting a human face in an embodiment of the application
- FIG. 3 is a schematic diagram of a face image formed under different lighting conditions in an embodiment of the application.
- FIG. 4 is a schematic diagram of an image of a central area of a human face in an embodiment of the application
- FIG. 5 is a schematic structural diagram of a face living detection model in an embodiment of the application.
- Fig. 6 is a flowchart of a method for training a face detection model in an embodiment of the application
- Fig. 7 is a schematic structural diagram of a first neural network model in an embodiment of the application.
- FIG. 8 is a schematic diagram of an application scenario of a face living body detection method in an embodiment of the application.
- FIG. 9 is a structural diagram of a face living body detection device in an embodiment of the application.
- FIG. 10 is a schematic structural diagram of a training device for a face detection model in an embodiment of the application.
- FIG. 11 is a schematic diagram of a structure of a server in an embodiment of the application.
- AI Artificial Intelligence
- digital computers or machines controlled by digital computers to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge, and use knowledge to obtain the best results.
- artificial intelligence is a comprehensive technology of computer science, which attempts to understand the essence of intelligence and produce a new kind of intelligent machine that can react in a similar way to human intelligence.
- Artificial intelligence is to study the design principles and implementation methods of various intelligent machines, so that the machines have the functions of perception, reasoning and decision-making.
- Artificial intelligence technology is a comprehensive discipline, covering a wide range of fields, including both hardware-level technology and software-level technology.
- Basic artificial intelligence technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, and mechatronics.
- Artificial intelligence software technology mainly includes computer vision technology, speech processing technology, natural language processing technology, and machine learning/deep learning.
- Computer Vision (CV) technology is a science that studies how to make machines "see”. To put it further, it refers to the use of cameras and computers instead of human eyes to identify, track, and measure machine vision for targets, and further Do graphics processing to make computer processing an image more suitable for human observation or transmission to the instrument for inspection.
- Computer vision technology usually includes image processing, image recognition, image semantic understanding, image retrieval, optical character recognition (OCR), video processing, video semantic understanding, video content/behavior recognition, three-dimensional object reconstruction, 3D technology, virtual Technologies such as reality, augmented reality, synchronized positioning and map construction also include common facial recognition, fingerprint recognition and other biometric recognition technologies.
- This application mainly uses image recognition technology in computer vision technology for face recognition, which is mainly used to identify whether the face is a real person or an attack type.
- this application provides a face live detection method based on the principle of reflection.
- the light projected on the user's face under different lighting conditions is reflected
- Different face images are formed, feature maps are extracted from the difference images formed by different face images, texture information and depth information are decoupled from the feature maps, and used for face live detection, which can effectively identify plane types of attacks and 3D attacks improve the accuracy and security of face live detection.
- the face living detection method provided in this application can be applied to any processing device with image processing capabilities, and the processing device can include a central processing unit (CPU) and/or a graphics processing unit (Graphics Processing Unit). , GPU) terminal or server.
- the terminal may be a fixed terminal device such as an access control system, a payment system, or a desktop computer, or a portable terminal device such as a notebook computer, a tablet computer, or a smart phone, or an augmented reality (AR) terminal device and a virtual terminal device.
- AR Augmented Reality
- the server may specifically be a computing device that provides a face living detection service, and it may be an independent server or a computing cluster formed by multiple servers.
- the processing device as a server for exemplary description.
- the above-mentioned face living detection method can be stored in a processing device in the form of a computer program, and the processing device implements the face living detection method of the present application by running the computer program.
- the computer program may be independent, or may be a functional module, plug-in, or small program integrated on other computer programs, which is not limited in this embodiment.
- the face living detection method provided in this application includes but is not limited to being applied to the application environment as shown in FIG. 1.
- the terminal 101 deploys a human face live detection system, through which the face live detection system can collect the face images of the target detection object under different lighting conditions.
- the face live detection system can collect the face images of the target detection object under different lighting conditions.
- it may be the target detection object under the first lighting condition.
- the server 102 obtains the first face image and the second face image from the terminal 101, determines the difference image, and then extracts the feature map from the difference image, and decouples the object corresponding to the target detection object from the feature map
- the reflectance and the normal vector of the object, and determine whether the target detection object is a living body according to the reflectance and the normal vector of the object, and the server 102 may also return the face living detection result to the terminal 101 to prompt the user.
- the method includes:
- S201 Acquire a first face image of the target detection object under a first lighting condition and a second face image of the target detection object under a second lighting condition.
- the server can verify user identity based on facial images. Considering that criminals may simulate real people to attack by digging holes in paper, digging masks, silicone masks, 3D human head models, etc., the server can obtain face images under different lighting conditions based on the principle of reflection for live face detection to resist Plane type attack or 3D attack.
- the principle of reflection means that according to the Lambert lighting model, for some surfaces with complex three-dimensional structures and fine surface textures, when the lighting conditions change, the resulting diffuse reflected light will also change greatly, and the image formed by such reflected light will also change. There will be big differences. Based on this, under different lighting conditions, different face images can be obtained by shooting the same face.
- the lighting conditions may refer to light irradiation conditions, which may include at least one of lighting parameters such as light source color, light intensity, and light angle.
- different lighting conditions can be light sources of different colors. Referring to FIG. 3, red light, green light, and blue light can be irradiated on a human face to form different facial images, and the facial images constitute a reflective imaging picture sequence. It should be noted that different light conditions can also be light and non-light conditions.
- the terminal may collect the first face image of the target detection object under the first lighting condition and the second face image of the second lighting condition, and the server may obtain the first face image and the second face image from the terminal.
- the face image is used for face live detection.
- first lighting condition and second lighting condition may be naturally formed or artificially created.
- the user may artificially adjust at least one of the color, brightness, or tilt angle displayed by the light-emitting element.
- the light source color, light intensity and/or light angle are adjusted to form the first light condition and the second light condition.
- the light-emitting element may be located on the display screen or camera accessory, and the light-emitting element may be, for example, a light-emitting diode.
- the server may respond to the living body detection request and generate a lighting instruction.
- the lighting instruction may include, for example, the first lighting condition corresponding to the first lighting condition.
- the server can control the light emitting element to emit light based on the illumination instruction, and collect the reflected light from the face of the target detection object to sequentially form the first face image and the second face image .
- the illumination parameters can include the color of the light source, and the colors of different light sources can be represented by color identifications
- the server can randomly select two different color identifications in response to the living body detection request, and generate Two lighting instructions with different color marks, in this way, the light-emitting element can be controlled to emit light of different colors according to the above-mentioned light instructions carrying different color marks, so as to form different lighting conditions.
- the above process can also be implemented independently through the terminal.
- the terminal when a user triggers a living body detection operation through a terminal to make the terminal generate a living body detection request, the terminal generates a lighting instruction in response to the living body detection request, and then emits light according to the lighting instruction to form a corresponding lighting condition.
- the server or terminal when the server or terminal generates the light instruction, it can also randomly select two different light intensities, or randomly select two different light angles, or randomly select a combination of different light source colors, light intensity, and/or light angles to generate Carry light instructions with different light intensities, or light instructions with different light angles, or light instructions with other combinations of different light parameters. In this way, the information entropy of the encoding can be increased, and the security of the face detection method can be further improved.
- the reflected light from the face can be directly collected to form the first face image and the second face image.
- the face relative to the image collection area for example, the framing frame
- the light-emitting element is controlled to emit light of this angle in turn based on the first lighting parameter and the second lighting parameter, so that the reflected light from the face can be collected to form the first face image and the second face image.
- S202 Determine a difference image according to the first face image and the second face image.
- the server may perform image processing on the entire first face image and the second face image to obtain a difference image.
- the server may obtain the pixel sequences of the first face image and the second face image respectively, and then perform the difference operation according to the corresponding channels according to the above pixel sequence to obtain the difference pixel sequence, and the difference value can be obtained based on the difference pixel sequence image.
- the server may also determine the difference image only for the central area of the face. For example, the server can crop the face center area of the first face image to obtain the first face center area image, crop the face center area of the second face image to obtain the second face center area image, and then compare the first face image.
- the central area image and the second face central area image are processed by image difference to obtain the difference image of the central area of the face, so that subsequent predictions can focus on the depth information and texture information of the central area of the face and improve the prediction accuracy.
- the server can first identify the face area through the face recognition model, and then crop the face area.
- the server After the server obtains the reflective face image, it performs data preprocessing based on the face recognition result, and cuts out the image of the central area of the face.
- the area a) represents a complete picture
- the area b) is high.
- the face recognition result of the high-precision face recognition model is the face area
- the c) area is indented by N pixels on the basis of the b) area.
- the image obtained by cropping according to the c) area is the face center area. image.
- N can be set according to actual needs, for example, it can be set to 15.
- the face live detection model can be made to pay more attention to the depth information and texture information of the central area of the face, which can improve the human face.
- the accuracy of the face live detection model is used as the input of the face live detection model for training and testing.
- the server can also determine the difference image for the partial organ regions of the face. For example, the server can crop the face partial organ area of the first face image to obtain the first face partial organ area image, crop the face partial organ area of the second face image to obtain the second face partial organ area image, and then Perform image difference processing on the first face local organ area image and the second face local organ area image to obtain the difference image of the face local organ area, so that subsequent predictions only focus on the most valuable local area, which can guarantee the prediction Accuracy can improve forecasting efficiency.
- the server can cut the nose area from the first face image and the second face image to obtain the corresponding first face partial organ area image And the second face local organ region image, and use this to determine the difference image, and use the difference image for face living detection to improve the detection accuracy.
- the server may determine the difference image from both global and local dimensions. For example, the central area of the face is cut from the first face image and the second face image to obtain the corresponding first face.
- the central area image and the second face central area image are respectively cropped from the first face image and the second face image to obtain the corresponding first face partial organ area image and the second face partial organ area Region image, and then perform image difference processing on the first face center region image and the second face center region image to obtain the difference image of the face center region, which is used as the first difference image for the local organ region of the first face Image difference processing is performed on the image and the second face partial organ area image to obtain a difference image of the face partial organ area, which is used as the second difference image.
- Performing double detection based on the difference image of the above two dimensions can further improve the reliability of the detection result.
- S203 Extract a feature map from the difference image, decouple from the feature map the object reflectivity corresponding to the target detection object and the object normal vector corresponding to the target detection object, and according to the object reflectivity and the object method The vector determines whether the target detection object is a living body.
- the difference image of the two images of the same object under different lighting conditions contains the two major information of object reflectivity and object normal vector.
- the corresponding reflection of objects of different materials (or different textures) The rate is different, and the normal vector directions corresponding to different positions are different. Therefore, the object reflectivity can represent the texture information of the object, and the object normal vector can represent the depth information of the object.
- the difference image formed based on the face image under different lighting conditions contains the texture information and depth information of the face, and the server can extract the feature map from the above difference image, and then couple the texture information from the feature diagram.
- the object reflectivity and the normal vector of the object representing the depth information can be used for live detection, which can prevent the texture information and depth information from affecting the accuracy of face live detection.
- the feature map is extracted from the difference image, and the object reflectance and the object normal vector corresponding to the target detection object are decoupled from the feature map. This can be achieved through the pre-trained face living detection model, and the object reflection corresponding to the target detection object The rate and object normal vector can also be used to determine whether the target detection object is a living body through the above-mentioned face living detection model.
- the above-mentioned face living detection model takes the difference image of the target detection object as an input, and takes the prediction result of whether the target detection object is a living body as an output.
- the face live detection model can include a convolutional layer, a global pooling layer, and a fully connected classification layer.
- the server can input the difference image into a pre-trained face live detection model, extract image features from the convolutional layer in the face live detection model to obtain a feature map, and decouple the target detection object from the feature map The corresponding object reflectivity and the object normal vector corresponding to the target detection object. Then, through the global pooling layer and fully connected classification layer in the face living detection model, determine whether the target detection object is alive according to the object reflectivity and the object normal vector .
- the face live detection model can be obtained through neural network training.
- the neural network includes a convolutional layer, a global pooling layer and a fully connected classification layer.
- the sample difference image in the training data is input into the face live detection model.
- the face live detection model can decouple the texture information and depth information through the convolutional layer, and determine the face live detection score corresponding to the sample difference image based on the texture information and depth information through the global pooling layer and the fully connected classification layer.
- the face live detection score determines the prediction result, and updates the parameters based on the prediction result and the label label corresponding to the sample difference image in the sample data.
- the training is stopped and the model that meets the condition is used For face detection.
- the training process is described in detail below.
- the present application also describes the implementation of the live face detection of the target detection object in combination with the face live detection model structure.
- different colors of light are emitted to the face through the user terminal display screen, and the light reflection forms the first face image (ie the image A obtained under green light) and the second face image (ie the image B obtained under the purple light) ), the difference image is obtained by performing difference processing on the A picture and the B picture as the network input, the size of which is 256 ⁇ 256 ⁇ 3, and the difference image is input to the face live detection model
- the face live detection model includes the input layer , Deep convolutional layer, global pooling layer and fully connected classification layer.
- the feature map can be obtained by feature extraction through the deep convolutional layer, and its size is 8 ⁇ 8 ⁇ 512. Then the 512-dimensional features can be obtained through the global pooling layer, and then the live detection score of the face can be obtained by using the softmax function to classify the fully connected classification layer. If the live detection score of the face is higher than the preset judgment threshold t, then the face is live.
- the detection model judges the input as a real person, that is, the target detection object is a living body, otherwise the input is judged as an attack, that is, the target detection object is not a living body, but forged by means such as photos or 3D models.
- the server after the server obtains the face live detection score, it can further normalize it, and then compare the normalized face live detection score with the corresponding threshold to achieve face live detection .
- the preset judgment threshold can be set according to actual needs. By adjusting the preset judgment threshold, the server can maintain the pass rate and the attack rejection rate at a relatively high level.
- the face live detection model can also include two branches of depth map regression and texture map regression.
- the server when determining the target detection object as an attack, the server can also based on the depth map regression result and the texture map regression The result determines the type of attack.
- the server can also infer the original lighting conditions from the acquired first face image and second face image. If the lighting sequence code does not match, it can be considered that the front-end equipment port has been hijacked by other technologies, and the input during this period of time is considered an attack.
- the embodiment of the present application provides a face live detection method.
- a face live detection model is pre-trained, and the face live detection model can solve the problem.
- Coupling depth information and texture information, and then realizing face live detection based on depth information and texture information, can accurately identify 2D and 3D attacks.
- the depth and material information are decoupled from the reflection imaging pictures under different lighting conditions, which is more robust than a single living body detection method using depth information, and this method can greatly reduce The false pass rate of 3D attacks does not affect the recognition of other types of attacks.
- this method does not require any form of user interaction, and only needs to maintain the posture for a short period of time to complete the facial recognition verification.
- this method does not require customized hardware, supports the use of mainstream mobile devices in the current market, and is easy to promote.
- the face living detection method provided in this application is implemented by a face living detection model.
- the method for training the face living detection model provided in the present application will be described in detail below in conjunction with specific embodiments from the perspective of the server.
- the method includes:
- the training data set includes multiple sets of training data.
- Each set of training data includes sample difference images, label labels of the sample difference images, depth maps and texture maps corresponding to the sample difference images, and the sample difference images are in different pairs.
- the face image of the sample detection object collected under light conditions is obtained by image difference processing.
- the label label of the sample difference image is used to identify whether the sample detection to which the sample difference image belongs is a living body, and the texture map is used to identify the sample
- the material type of each pixel position in the difference image The material type is determined based on the texture information of the pixel position.
- the depth map is used to identify the depth information of each pixel position in the sample difference image. The depth information can be based on the spatial position distance of the pixel. The distance of the imaging plane is determined.
- the sample detection objects include real people (living bodies) and attack samples (non-living bodies).
- the training data whose sample detection object is a real person is called a positive sample
- the training data whose sample detection object is an attack sample is called a negative sample.
- the positive sample and the negative sample can be configured according to the first preset ratio to form a training data set.
- the first preset ratio can be set according to actual needs, for example, it can be set to 8:2.
- the server may also configure different types of negative samples according to the second preset ratio to form a training data set.
- negative samples include flat paper attacks, flat screen attacks, and 3D model attacks.
- the second preset ratio may be 1:1:2. In this way, the server can configure negative samples according to this ratio to form a training data set.
- the server can first obtain the sample difference image, and then give different labels based on the real person, paper, screen, 3D model, and environment, such as labels 1 to 5, so as to obtain the label label corresponding to the sample difference image Then, based on the texture information of each pixel position of the sample difference image, the material type of each pixel position can be determined. By assigning the material type label to the sample difference image pixel by pixel, the texture map of the sample difference image can be generated. In addition, based on 3D modeling The tool generates a depth map for the sample difference image. The depth map of the plane attack is an all-zero grayscale image. Finally, the server can generate training data based on the sample difference image, label, texture map, and depth map. Among them, the label, the texture map, and the depth map are used as the supervision information of the training data.
- S602 Train a pre-built first neural network model according to the training data set, and obtain the first neural network model in a convergent state.
- the first neural network model may include a convolutional layer and two deconvolutional layers, as well as a global pooling layer and a fully connected classification layer.
- the convolution layer is used to extract the feature map from the difference image, and decouple from the feature map the object reflectivity representing the material information of the sample detection object and the object normal vector representing the depth information of the sample detection object.
- the two deconvolution layers are respectively It is used to restore the image based on the object reflectivity and the object normal vector to achieve depth regression and material regression, that is, one deconvolution layer obtains the material map based on the object reflectivity regression, and the other deconvolution layer obtains the depth map based on the object normal vector regression.
- the global pooling layer is used to perform pooling processing
- the fully connected classification layer is used to classify the features after pooling processing, and predict whether the sample detection object is a living body according to the classification result.
- the convolutional layer used to extract depth information and material information can be the existing deep convolutional layer in the industry, for example, it can be the convolutional layer in the network structure of VGGNet, ResNet, DenseNet, etc., of course, it can also be based on the scheme. Need to design or modify the network structure by yourself.
- the deconvolution layer used to restore the picture can adopt an upsampling model with a cross-layer connection structure, such as the deconvolution layer in a network structure such as UNet, Deconvolution with skip-connection.
- the global pooling layer, the fully connected classification layer, etc. can adopt the general structure of the industry, which is not repeated here.
- the server can train the network parameters of the first neural network model according to the training data set based on an end-to-end training method until the first neural network model in a convergent state is obtained. , So you can get better performance.
- the server may also train the first neural network model in a cross-training manner.
- the pre-built first neural network model fix the global pooling layer and fully connected classification layer in the pre-built first neural network model in the first stage, and train the convolutional layer and two inverse convolutions based on the training data set
- the product layer, in the second stage fix the convolution layer and two deconvolution layers in the first neural network model, train the global pooling layer and the fully connected classification layer based on the training data set, and then according to the convolution trained in the first stage Layer, deconvolution layer, global pooling layer and fully connected classification layer trained in the second stage, to obtain the first neural network model in a convergent state after cross-training in the first and second stages, which can reduce the difficulty of training , Improve training efficiency.
- the server can input the sample difference image included in each set of training data in the training data set into the pre-built first neural network model, and perform feature extraction on the sample difference image through the convolution layer in the first neural network model to obtain a set of feature maps , And then couple this group of feature diagrams into the first grouped feature map and the second grouped feature map, where the first grouped feature map represents the object normal vector of the sample detection object, and the second grouped feature map represents the object reflection of the sample detection object rate.
- the object normal vector in the first grouped feature map can be used to regress to obtain a depth map representing depth information
- the object reflectivity in the second grouped feature map can be used to regress to obtain a material map representing material information.
- the second deconvolution layer, and the set of feature maps are input to the global pooling layer in the first neural network model, and then input to the fully connected classification layer after being processed by the global pooling layer.
- the predicted feature map output by the first deconvolution layer is essentially the depth map returned by the first deconvolution layer based on the first grouped feature map.
- the predicted depth map is The depth map corresponding to the pre-labeled sample image can be compared to determine the depth map loss.
- the predicted feature map output by the second deconvolution is essentially the second convolutional layer based on the second grouped feature map.
- it is recorded as a predicted texture map, and the predicted texture map is compared with the texture map corresponding to the pre-labeled sample image to determine the loss of the texture map.
- the server can also determine the classification loss according to the predicted label output by the fully connected classification layer and the label label corresponding to the sample difference image, and determine the model according to the depth map loss, texture map loss and classification loss determined in each iterative update cycle Loss, update the parameters of the first neural network model through the model loss, and iteratively update until the first neural network model is in a convergent state.
- this application also provides an example to illustrate the training process of the face living detection model.
- the first neural network model includes a deep convolution layer, two deconvolution layers, a global pooling layer, and a fully connected classification layer.
- the sample difference images formed by face images under different lighting conditions are input into the above-mentioned first neural network model.
- a neural network model through the deep convolution layer for feature extraction, can get the feature map, by decoupling the feature map, you can get the feature map including depth information and material information, and then the model is divided into three branches, of which One branch is based on the feature map containing depth information through the deconvolution layer to restore the image and then performs depth regression, one branch is based on the feature map containing the material information through the deconvolution layer to restore the image and then performs the material regression, and the other branch is based on the inclusion
- the feature maps of the depth information and the material information are classified by the global pooling layer and the fully connected classification layer, and the detection object of the sample is predicted to be a real person or an attack type.
- each branch of the first neural network model has a corresponding loss function, that is, loss1 to loss3 in the figure.
- the server performs forward calculation for each set of training data (minibatch) to obtain the loss value, and then according to the loss
- the value uses Stochastic Gradient Descent (SGD) or other optimization algorithms to update the model parameters.
- the first neural network model can be optimized through continuous iterative updates. When the updated first neural network model is in a convergent state, the server can Stop training. It should be noted that during the training process, the server can select models based on the validation set, and prevent simulation overfitting through other technical means.
- the server can remove the two branches used for depth regression and material regression in the first neural network model to obtain
- the face live detection model includes a convolutional layer, a global pooling layer, and a fully connected classification layer.
- the server may also retain the two branches of the above-mentioned depth regression and material regression, so as to determine the attack type based on the depth information and the material information.
- this embodiment of the application provides a method for training a face live detection model.
- a training data set is obtained.
- Each group of training data in the training data set is labeled with a depth map and a texture map as sample differences.
- the supervised information of the value image where the material map is used for supervised learning of the model's ability to extract material information, and the depth map is used for supervised learning of the model's ability to extract depth information.
- Such a trained model can accurately extract depth information and material
- the information function improves the prediction accuracy of the face live detection model.
- the server when the server obtains the training data set, it may further process the face images under different lighting conditions to obtain the sample difference image, and then obtain the training data.
- the server can perform face recognition on face images under different lighting conditions, then crop the image of the central area of the face based on the result of the face recognition, and then perform image difference on the image of the central area of the face
- the difference image of the central area of the human face is processed and used as a sample difference image. In this way, subsequent predictions can pay more attention to the depth and material information of the central area of the human face and improve the prediction accuracy.
- the server can perform face recognition on face images under different lighting conditions, and then crop the face partial organ area based on the face recognition result to obtain the face partial organ area image, and then perform face recognition on the face part
- the organ region image is processed by image difference to obtain the difference image of the local organ region of the face as a sample difference image. In this way, the subsequent prediction can only focus on the valuable local organ region of the face, which can not only ensure the prediction accuracy, but also Improve forecasting efficiency.
- the server can also perform double detection based on the difference image of the central area of the face and the difference image of the partial organ area of the face.
- the server can train two models, one model is used for prediction based on the central area of the face, and the other model is used for prediction based on the local area of the facial organs, such as the nose area, so as to improve the detection accuracy.
- the server obtains the first training data set and the second training data set, where the sample difference image included in each set of training data in the first training data set is based on the two images corresponding to the sample detection object under different lighting conditions.
- the central area of the face is obtained by image difference processing.
- the sample difference image included in each set of training data in the second training data set is based on the image difference between the two images corresponding to the sample detection object under different lighting conditions.
- Value processing is obtained, and then the pre-built first neural network model is trained in parallel according to the first training data set and the second training data set to obtain two first neural network models in a convergent state.
- the server cuts the above two first neural network models in a convergent state, and uses the cut two first neural network models as face living detection models.
- the servers in the embodiment shown in FIG. 2 and the embodiment shown in FIG. 6 can be the same or different, that is, the server in the training process and the server in the prediction process can be the same, or Different, can be set according to actual needs.
- the face living detection method provided by the present application will be introduced in conjunction with the application scenario of identity verification in mobile payment.
- the scene includes a terminal 810, a training server 820, and a payment server 830.
- a payment application is installed on the terminal 810, and the user can initiate a payment through the payment application on the terminal 810.
- the terminal 810 can respond to the payment operation by sending a payment request to the payment server 830.
- the payment server 830 first authenticates the user based on the face living detection model trained by the training server 820, and then performs the payment request after the verification is passed.
- the account is deducted, and a deduction notification message is returned to the terminal 810 to remind the user whether the payment is successful.
- the process of the payment server 830 performing identity verification based on the face live detection model includes:
- the terminal 810 when the terminal 810 generates a payment request, it also triggers the generation of a live face detection request, and then the terminal 810 sends a face live detection request to the payment server 830, and the payment server 830 randomly selects two different types in response to the face live detection request.
- the color identifiers such as red and green, generate lighting instructions carrying two different color identifiers, and send the lighting instructions to the terminal 810.
- the terminal 810 controls the display screen to emit light of corresponding colors according to the two different color identifiers, and collects the reflected light from the user's face to sequentially form the first face image and the second face image, and then the first face image and the second face image are formed in sequence.
- the face image and the second face image are sent to the payment server 830.
- the payment server 830 performs face recognition on the first face image and the second face image, and then collects the central area of the face according to the face recognition result to obtain the image of the central area of the first face and the image of the central area of the second face.
- Image difference processing is performed on the image of the central area of the first face and the image of the central area of the second face to obtain the difference image of the central area of the face.
- the payment server 830 inputs the above-mentioned difference image into the face live detection model obtained from the training server 820, extracts a feature map from the difference image through the face live detection model, and decouples the object reflectivity representing texture information from the feature map And the object normal vector representing the depth information, and determine the face live detection score corresponding to the difference image according to the object reflectivity and the object normal vector, and compare the face live detection score with a preset judgment threshold.
- the payment server 830 can continue to perform the account deduction operation, and after the deduction is successful, it sends the deduction to the terminal 810
- the success notification message prompts the user that the payment is successful; otherwise, it is determined that the live face detection result is an attack, the account deduction operation is abandoned, and a deduction failure notification message is sent to the terminal 810 to prompt the user that the payment has failed.
- the application also provides a corresponding device.
- the above-mentioned device provided by the embodiment of the present application will be introduced from the perspective of functional modularity.
- the device 900 includes:
- the face image acquisition module 910 is configured to acquire a first face image of the target detection object under a first lighting condition and a second face image of the target detection object under a second lighting condition;
- a difference image determination module 920 configured to determine a difference image according to the first face image and the second face image
- the feature extraction module 930 is configured to extract a feature map from the difference image, and decouple the object reflectivity corresponding to the target detection object and the object normal vector corresponding to the target detection object from the feature map; wherein , The object reflectivity is used to represent texture information, and the object normal vector is used to represent depth information;
- the living body detection module 940 is configured to determine whether the target detection object is a living body according to the reflectance of the object and the normal vector of the object.
- the difference image determination module 920 is specifically configured to:
- Image difference processing is performed on the image of the central region of the first human face and the image of the central region of the second human face to obtain a differential image of the central region of the human face.
- the difference image determination module 920 is specifically configured to:
- Image difference processing is performed on the first face partial area image and the second face partial area image to obtain a difference image of the face partial area.
- the difference image determination module 920 is specifically configured to:
- Image difference processing is performed on the first face partial area image and the second face partial area image to obtain a difference image of the face partial area.
- the feature extraction module 930 is specifically configured to:
- a feature map is extracted from the difference image through a pre-trained face living detection model, and the object reflectivity corresponding to the target detection object and the object normal vector corresponding to the target detection object are decoupled from the feature map ;
- the living body detection module 940 is specifically configured to:
- the target detection object is a living object.
- the feature extraction module 930 is specifically configured to:
- the living body detection module 940 is specifically configured to:
- the global pooling layer and the fully connected classification layer in the face living detection model are used to determine whether the target detection object is a living object according to the object reflectivity and the object normal vector.
- the face image acquisition module 910 is specifically configured to:
- the lighting instruction In response to the living body detection request, generating a lighting instruction, the lighting instruction including a first lighting parameter corresponding to the first lighting condition and a second lighting parameter corresponding to the second lighting condition;
- the light emitting element is controlled to emit light and collect the reflected light from the face of the target detection object to sequentially form the first face image and the second face image.
- the face image acquisition module 910 is specifically configured to:
- two different color identifications are randomly selected, and a lighting instruction carrying the two different color identifications is generated.
- the face image acquisition module 910 controls the light-emitting element to emit light and collects the reflected light from the face of the target detection object to sequentially form the first face image and the second face image, it is specifically configured to:
- the light-emitting element is controlled to sequentially emit light of the angle based on the first illumination parameter and the second illumination parameter.
- the device 1000 includes:
- the sample set acquisition module 1010 is configured to acquire a training data set.
- Each set of training data in the training data set includes a sample difference image, a label of the sample difference image, a depth map and a material corresponding to the sample difference image
- the sample difference image is obtained by performing image difference on the face image of the sample detection object collected under different lighting conditions, and the label label of the sample difference image is used to identify the sample difference image belongs to
- the sample detection object of is a living body
- the depth map is used to identify the depth information of each pixel position in the sample difference image
- the texture map is used to identify the material type of each pixel position in the sample difference image, The material type is determined based on the texture information of the pixel position;
- the training module 1020 is configured to train a pre-built first neural network model according to the training data set to obtain a first neural network model in a convergent state, the first neural network model including a convolutional layer and two deconvolutions Layer and global pooling layer and fully connected classification layer;
- the cropping module 1030 is configured to crop the first neural network model in the convergent state to obtain a face living detection model, where the face living detection model includes the convolutional layer, the global pooling layer, and the Fully connected classification layer.
- the training module 1020 is specifically configured to:
- the first grouped feature map represents the object normal vector of the sample detection object
- the second grouped feature map represents the object reflectivity of the sample detection object
- the texture map corresponding to the value image determines the texture map loss, and the classification loss is determined according to the predicted label output by the fully connected classification layer and the label label corresponding to the sample difference image;
- sample set acquisition module 1010 is specifically configured to:
- the training module 1020 is specifically used for:
- the pre-built first neural network model is trained in parallel according to the first training data set and the second training data set to obtain two first neural network models in a convergent state.
- the training module 1020 is specifically configured to:
- the network parameters of the first neural network model are trained according to the training data set based on an end-to-end training method until the first neural network model in a convergent state is obtained.
- the training module 1020 is specifically configured to:
- the global pooling layer and the fully connected classification layer in the pre-built first neural network model are fixed in the first stage, and the convolutional layer and the fully connected classification layer are trained based on the training data set.
- a first neural network model in a convergent state after cross-training in the first stage and the second stage is obtained.
- the network structure of the convolution layer adopts VGGNet, ResNet or DenseNet; and the network structure of the deconvolution layer adopts UNet or Deconvolution with skip-connection.
- the present application also provides a device for realizing face living detection and a device for realizing face living detection model training.
- the following describes the device provided in the embodiment of the present application from the perspective of hardware materialization.
- FIG. 11 is a schematic structural diagram of a device provided by an embodiment of the present application.
- the device may be a server.
- the server 1100 may have relatively large differences due to different configurations or performance, and may include one or more central processing units (central processing units). units, CPU) 1122 (for example, one or more processors) and memory 1132, and one or more storage media 1130 (for example, one or more storage devices) for storing application programs 1142 or data 1144.
- the memory 1132 and the storage medium 1130 may be short-term storage or persistent storage.
- the program stored in the storage medium 1130 may include one or more modules (not shown in the figure), and each module may include a series of command operations on the server.
- the central processing unit 1122 may be configured to communicate with the storage medium 1130, and execute a series of instruction operations in the storage medium 1130 on the server 1100.
- the server 1100 may also include one or more power supplies 1126, one or more wired or wireless network interfaces 1150, one or more input and output interfaces 1158, and/or one or more operating systems 1141, such as Windows ServerTM, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM, etc.
- operating systems 1141 such as Windows ServerTM, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM, etc.
- the steps performed by the server in the above embodiment may be based on the server structure shown in FIG. 11.
- the CPU 1122 is used to perform the following steps:
- a feature map is extracted from the difference image, and the object reflectivity corresponding to the target detection object and the object normal vector corresponding to the target detection object are decoupled from the feature map; wherein, the object reflectivity is Is used to represent texture information, and the object normal vector is used to represent depth information;
- the target detection object is a living body.
- the CPU 1122 is also used to execute the steps of any implementation manner of the face living detection method provided in the embodiments of the present application.
- the CPU 1122 is used to perform the following steps:
- Each group of training data in the training data set includes a sample difference image, a label of the sample difference image, a depth map and a texture map corresponding to the sample difference image, and the sample difference image It is obtained by performing image difference on the face image of the sample detection object collected under different lighting conditions, and the label tag of the sample difference image is used to identify whether the sample detection object to which the sample difference image belongs is a living body,
- the depth map is used to identify the depth information of each pixel position in the sample difference image
- the texture map is used to identify the material type of each pixel position in the sample difference image, and the material type is based on the pixel position Texture information determination;
- a pre-built first neural network model according to the training data set to obtain a first neural network model in a convergent state, the first neural network model including a convolutional layer, two deconvolutional layers, and a global pooling layer And the fully connected classification layer;
- the first neural network model in the convergent state is cropped to obtain a face live detection model, and the face live detection model includes the convolutional layer, the global pooling layer, and the fully connected classification layer.
- the CPU 1122 is also used to execute the steps of any implementation manner of the method for training the face living detection model provided in the embodiments of the present application.
- the embodiments of the present application also provide a computer-readable storage medium for storing a computer program, and the computer program is used to execute one of the face living detection method or the face living detection model training method described in each of the foregoing embodiments. Any implementation.
- the embodiment of the present application also provides a computer program product including instructions, which when run on a computer, causes the computer to execute the above-mentioned face living detection method or face living detection model training method.
- the disclosed system, device, and method may be implemented in other ways.
- the device embodiments described above are merely illustrative, for example, the division of the units is only a logical function division, and there may be other divisions in actual implementation, for example, multiple units or components may be combined or It can be integrated into another system, or some features can be ignored or not implemented.
- the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical, mechanical or other forms.
- the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
- the functional units in the various embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.
- the above-mentioned integrated unit can be implemented in the form of hardware or software functional unit.
- the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer readable storage medium.
- the technical solution of the present application essentially or the part that contributes to the existing technology or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium , Including several instructions to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the methods described in the various embodiments of the present application.
- the aforementioned storage media include: U disk, mobile hard disk, read-only memory (English full name: Read-Only Memory, English abbreviation: ROM), random access memory (English full name: Random Access Memory, English abbreviation: RAM), magnetic Various media that can store program codes, such as discs or optical discs.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Oral & Maxillofacial Surgery (AREA)
- Human Computer Interaction (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Biomedical Technology (AREA)
- Business, Economics & Management (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Molecular Biology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Data Mining & Analysis (AREA)
- Biophysics (AREA)
- Mathematical Physics (AREA)
- General Engineering & Computer Science (AREA)
- Computing Systems (AREA)
- Accounting & Taxation (AREA)
- Computational Linguistics (AREA)
- Software Systems (AREA)
- Computer Security & Cryptography (AREA)
- Biodiversity & Conservation Biology (AREA)
- Finance (AREA)
- Strategic Management (AREA)
- General Business, Economics & Management (AREA)
- Image Analysis (AREA)
- Measurement Of The Respiration, Hearing Ability, Form, And Blood Characteristics Of Living Organisms (AREA)
- Collating Specific Patterns (AREA)
Abstract
Procédé de détection de vivacité de visage consistant à : obtenir une première image faciale d'un objet de détection cible dans une première condition d'éclairage et une seconde image faciale de l'objet de détection cible dans une seconde condition d'éclairage ; déterminer une image de différence en fonction de la première image faciale et de la seconde image faciale ; extraire une carte de caractéristiques à partir de l'image de différence ; découpler une réflectivité d'objet et un vecteur normal d'objet correspondant à l'objet de détection cible à partir de la carte de caractéristiques ; puis, déterminer si l'objet de détection cible est un corps vivant en fonction de la réflectivité d'objet et du vecteur normal d'objet. Selon le procédé, une texture faciale et des informations de profondeur sont découplées, et la détection de vivacité est effectuée au moyen d'informations découplées, de telle sorte que la capacité de défense contre une attaque en 3D est améliorée, et une attaque plane et l'attaque en 3D peuvent être efficacement neutralisées ; de plus, le procédé ne nécessite aucune forme d'interaction utilisateur, est pratique à utiliser, est approprié pour des dispositifs électroniques courant, est bon marché et est facile à populariser. L'invention concerne également un appareil et un dispositif correspondants, ainsi qu'un support.
Priority Applications (3)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| JP2022515568A JP7262884B2 (ja) | 2019-10-18 | 2020-09-21 | 生体顔検出方法、装置、設備及びコンピュータプログラム |
| EP20876546.1A EP3995989B1 (fr) | 2019-10-18 | 2020-09-21 | Procédé, appareil et dispositif de détection de vivacité de visage, et support de stockage |
| US17/513,731 US11972638B2 (en) | 2019-10-18 | 2021-10-28 | Face living body detection method and apparatus, device, and storage medium |
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201910996055.5A CN110765923B (zh) | 2019-10-18 | 2019-10-18 | 一种人脸活体检测方法、装置、设备及存储介质 |
| CN201910996055.5 | 2019-10-18 |
Related Child Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| US17/513,731 Continuation US11972638B2 (en) | 2019-10-18 | 2021-10-28 | Face living body detection method and apparatus, device, and storage medium |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2021073364A1 true WO2021073364A1 (fr) | 2021-04-22 |
Family
ID=69332390
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/CN2020/116507 Ceased WO2021073364A1 (fr) | 2019-10-18 | 2020-09-21 | Procédé, appareil et dispositif de détection de vivacité de visage, et support de stockage |
Country Status (5)
| Country | Link |
|---|---|
| US (1) | US11972638B2 (fr) |
| EP (1) | EP3995989B1 (fr) |
| JP (1) | JP7262884B2 (fr) |
| CN (1) | CN110765923B (fr) |
| WO (1) | WO2021073364A1 (fr) |
Cited By (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN113239743A (zh) * | 2021-04-23 | 2021-08-10 | 普联国际有限公司 | 一种人群密度检测方法、装置、设备及存储介质 |
| CN114187650A (zh) * | 2021-10-29 | 2022-03-15 | 深圳绿米联创科技有限公司 | 动作识别方法、装置、电子设备及存储介质 |
| CN114627525A (zh) * | 2022-01-29 | 2022-06-14 | 北京旷视科技有限公司 | 一种活体检测方法和人脸识别方法 |
| EP4024352A3 (fr) * | 2021-05-25 | 2022-09-21 | Beijing Baidu Netcom Science Technology Co., Ltd. | Procédé et appareil de détection de vivacité du visage, et support de stockage |
| WO2024255161A1 (fr) * | 2023-06-15 | 2024-12-19 | 广州朗国电子科技股份有限公司 | Procédé et appareil de lutte contre la contrefaçon de visage, et support de stockage |
Families Citing this family (46)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN110765923B (zh) * | 2019-10-18 | 2024-05-24 | 腾讯科技(深圳)有限公司 | 一种人脸活体检测方法、装置、设备及存储介质 |
| KR20210108082A (ko) | 2020-02-25 | 2021-09-02 | 삼성전자주식회사 | 위상 차를 이용하는 라이브니스 검출 방법 및 장치 |
| CN111523438B (zh) * | 2020-04-20 | 2024-02-23 | 支付宝实验室(新加坡)有限公司 | 一种活体识别方法、终端设备和电子设备 |
| CN113553887A (zh) * | 2020-04-26 | 2021-10-26 | 华为技术有限公司 | 一种基于单目摄像头的活体检测方法、设备和可读存储介质 |
| CN113591517B (zh) * | 2020-04-30 | 2024-11-19 | 华为技术有限公司 | 一种活体检测方法及相关设备 |
| CN111597938B (zh) * | 2020-05-07 | 2022-02-22 | 马上消费金融股份有限公司 | 活体检测、模型训练方法及装置 |
| CN111582381B (zh) * | 2020-05-09 | 2024-03-26 | 北京市商汤科技开发有限公司 | 确定性能参数的方法及装置、电子设备和存储介质 |
| WO2021236175A1 (fr) * | 2020-05-20 | 2021-11-25 | Google Llc | Apprentissage de l'éclairement à partir de divers portraits |
| CN111683273A (zh) * | 2020-06-02 | 2020-09-18 | 中国联合网络通信集团有限公司 | 视频卡顿信息的确定方法及装置 |
| CN113761983B (zh) * | 2020-06-05 | 2023-08-22 | 杭州海康威视数字技术股份有限公司 | 更新人脸活体检测模型的方法、装置及图像采集设备 |
| CN112085701B (zh) * | 2020-08-05 | 2024-06-11 | 深圳市优必选科技股份有限公司 | 一种人脸模糊度检测方法、装置、终端设备及存储介质 |
| CN111914775B (zh) * | 2020-08-06 | 2023-07-28 | 平安科技(深圳)有限公司 | 活体检测方法、装置、电子设备及存储介质 |
| CN112016505B (zh) * | 2020-09-03 | 2024-05-28 | 平安科技(深圳)有限公司 | 基于人脸图像的活体检测方法、设备、存储介质及装置 |
| CN112149578B (zh) * | 2020-09-24 | 2024-05-24 | 四川川大智胜软件股份有限公司 | 基于人脸三维模型的人脸皮肤材质计算方法、装置及设备 |
| CN112232152B (zh) * | 2020-09-30 | 2021-12-03 | 墨奇科技(北京)有限公司 | 非接触式指纹识别方法、装置、终端和存储介质 |
| DE102020126291A1 (de) * | 2020-10-07 | 2022-04-07 | Fujitsu Technology Solutions Gmbh | Verfahren zum Analysieren eines Bauteils, Verfahren zum Trainieren eines Systems, Vorrichtung, Computerprogramm und computerlesbares Speichermedium |
| CN112036386A (zh) * | 2020-11-05 | 2020-12-04 | 中科创达软件股份有限公司 | Tee环境下使用相机相近帧进行活体检测的方法及装置 |
| CN112465717B (zh) * | 2020-11-25 | 2024-05-31 | 北京字跳网络技术有限公司 | 脸部图像处理模型训练方法、装置、电子设备和介质 |
| CN112580454B (zh) * | 2020-12-08 | 2024-03-26 | 上海明略人工智能(集团)有限公司 | 基于图片材质分割标记的人脸防伪方法及系统 |
| CN114627522A (zh) * | 2020-12-11 | 2022-06-14 | 深圳市光鉴科技有限公司 | 深度相机 |
| CN112966562A (zh) * | 2021-02-04 | 2021-06-15 | 深圳市街角电子商务有限公司 | 人脸活体检测方法、系统及存储介质 |
| US11663775B2 (en) * | 2021-04-19 | 2023-05-30 | Adobe, Inc. | Generating physically-based material maps |
| CN113221766B (zh) * | 2021-05-18 | 2024-07-19 | 南京西云信息技术有限公司 | 训练活体人脸识别模型、识别活体人脸的方法及相关装置 |
| CN113409056B (zh) * | 2021-06-30 | 2022-11-08 | 深圳市商汤科技有限公司 | 支付方法、装置、本地识别设备、人脸支付系统及设备 |
| CN115690918A (zh) * | 2021-07-22 | 2023-02-03 | 京东科技控股股份有限公司 | 构建活体识别模型和活体识别的方法、装置、设备及介质 |
| CN113642428B (zh) * | 2021-07-29 | 2022-09-27 | 北京百度网讯科技有限公司 | 人脸活体检测方法、装置、电子设备及存储介质 |
| CN113422982B (zh) * | 2021-08-23 | 2021-12-14 | 腾讯科技(深圳)有限公司 | 数据处理方法、装置、设备及存储介质 |
| US12020512B2 (en) * | 2021-09-17 | 2024-06-25 | Jumio Corporation | Spoof detection using eye boundary analysis |
| CN114120068A (zh) * | 2021-11-04 | 2022-03-01 | 腾讯科技(深圳)有限公司 | 图像处理方法、装置、电子设备、存储介质及计算机产品 |
| CN114333078B (zh) * | 2021-12-01 | 2024-07-23 | 马上消费金融股份有限公司 | 活体检测方法、装置、电子设备及存储介质 |
| CN114693551B (zh) * | 2022-03-25 | 2024-11-01 | 腾讯科技(深圳)有限公司 | 一种图像处理方法、装置、设备以及可读存储介质 |
| CN114764949A (zh) * | 2022-03-28 | 2022-07-19 | 联想(北京)有限公司 | 一种活体检测方法及装置 |
| CN114724255B (zh) * | 2022-04-08 | 2025-09-09 | 云从科技集团股份有限公司 | 活体检测方法、系统、装置和介质 |
| CN114885469A (zh) * | 2022-05-09 | 2022-08-09 | 深圳四博智联科技有限公司 | 一种灯光自适应显示方法、装置及存储介质 |
| WO2023221996A1 (fr) * | 2022-05-16 | 2023-11-23 | 北京旷视科技有限公司 | Procédé de détection de corps vivant, dispositif électronique, support de stockage et produit de programme |
| CN114999004B (zh) * | 2022-05-20 | 2025-06-27 | 阿里云计算有限公司 | 攻击识别方法 |
| CN115240255A (zh) * | 2022-07-22 | 2022-10-25 | 北京百度网讯科技有限公司 | 活体检测方法、装置、电子设备及存储介质 |
| CN115147705B (zh) * | 2022-09-06 | 2023-02-03 | 平安银行股份有限公司 | 人脸翻拍检测方法、装置、电子设备及存储介质 |
| CN115761839A (zh) * | 2022-10-21 | 2023-03-07 | 北京百度网讯科技有限公司 | 人脸活体检测模型的训练方法、人脸活体检测方法及装置 |
| CN115601818B (zh) * | 2022-11-29 | 2023-04-07 | 海豚乐智科技(成都)有限责任公司 | 一种轻量化可见光活体检测方法及装置 |
| CN115937992A (zh) * | 2022-12-01 | 2023-04-07 | 支付宝(杭州)信息技术有限公司 | 活体攻击检测方法和活体攻击检测模型的训练方法 |
| CN116524608A (zh) * | 2023-04-14 | 2023-08-01 | 支付宝(杭州)信息技术有限公司 | 活体检测方法、装置、设备与存储介质 |
| CN116682137A (zh) * | 2023-05-12 | 2023-09-01 | 深圳数联天下智能科技有限公司 | 训练静态人体检测模型的方法、滞留检测方法及存储介质 |
| US20240386623A1 (en) * | 2023-05-16 | 2024-11-21 | Salesforce, Inc. | Systems and methods for controllable image generation |
| CN116994343B (zh) * | 2023-09-27 | 2023-12-15 | 睿云联(厦门)网络通讯技术有限公司 | 基于标签平滑的扩散标签深度学习模型训练方法及介质 |
| CN117315759A (zh) * | 2023-10-27 | 2023-12-29 | 京东科技控股股份有限公司 | 人脸活体检测方法、人脸识别方法、装置和电子设备 |
Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN105320947A (zh) * | 2015-11-04 | 2016-02-10 | 博宏信息技术有限公司 | 一种基于光照成分的人脸活体检测方法 |
| CN105574509A (zh) * | 2015-12-16 | 2016-05-11 | 天津科技大学 | 一种基于光照的人脸识别系统回放攻击检测方法及应用 |
| US20180204111A1 (en) * | 2013-02-28 | 2018-07-19 | Z Advanced Computing, Inc. | System and Method for Extremely Efficient Image and Pattern Recognition and Artificial Intelligence Platform |
| CN110765923A (zh) * | 2019-10-18 | 2020-02-07 | 腾讯科技(深圳)有限公司 | 一种人脸活体检测方法、装置、设备及存储介质 |
Family Cites Families (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US10621454B2 (en) * | 2015-06-29 | 2020-04-14 | Beijing Kuangshi Technology Co., Ltd. | Living body detection method, living body detection system, and computer program product |
| US10360464B1 (en) * | 2016-03-04 | 2019-07-23 | Jpmorgan Chase Bank, N.A. | Systems and methods for biometric authentication with liveness detection |
| CN107169405B (zh) * | 2017-03-17 | 2020-07-03 | 上海云从企业发展有限公司 | 基于双目摄像机活体识别的方法及装置 |
| JP2018200640A (ja) * | 2017-05-29 | 2018-12-20 | キヤノン株式会社 | 画像処理装置および画像処理方法 |
| US11644834B2 (en) * | 2017-11-10 | 2023-05-09 | Nvidia Corporation | Systems and methods for safe and reliable autonomous vehicles |
-
2019
- 2019-10-18 CN CN201910996055.5A patent/CN110765923B/zh active Active
-
2020
- 2020-09-21 WO PCT/CN2020/116507 patent/WO2021073364A1/fr not_active Ceased
- 2020-09-21 EP EP20876546.1A patent/EP3995989B1/fr active Active
- 2020-09-21 JP JP2022515568A patent/JP7262884B2/ja active Active
-
2021
- 2021-10-28 US US17/513,731 patent/US11972638B2/en active Active
Patent Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20180204111A1 (en) * | 2013-02-28 | 2018-07-19 | Z Advanced Computing, Inc. | System and Method for Extremely Efficient Image and Pattern Recognition and Artificial Intelligence Platform |
| CN105320947A (zh) * | 2015-11-04 | 2016-02-10 | 博宏信息技术有限公司 | 一种基于光照成分的人脸活体检测方法 |
| CN105574509A (zh) * | 2015-12-16 | 2016-05-11 | 天津科技大学 | 一种基于光照的人脸识别系统回放攻击检测方法及应用 |
| CN110765923A (zh) * | 2019-10-18 | 2020-02-07 | 腾讯科技(深圳)有限公司 | 一种人脸活体检测方法、装置、设备及存储介质 |
Non-Patent Citations (1)
| Title |
|---|
| See also references of EP3995989A4 * |
Cited By (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN113239743A (zh) * | 2021-04-23 | 2021-08-10 | 普联国际有限公司 | 一种人群密度检测方法、装置、设备及存储介质 |
| EP4024352A3 (fr) * | 2021-05-25 | 2022-09-21 | Beijing Baidu Netcom Science Technology Co., Ltd. | Procédé et appareil de détection de vivacité du visage, et support de stockage |
| CN114187650A (zh) * | 2021-10-29 | 2022-03-15 | 深圳绿米联创科技有限公司 | 动作识别方法、装置、电子设备及存储介质 |
| CN114627525A (zh) * | 2022-01-29 | 2022-06-14 | 北京旷视科技有限公司 | 一种活体检测方法和人脸识别方法 |
| WO2024255161A1 (fr) * | 2023-06-15 | 2024-12-19 | 广州朗国电子科技股份有限公司 | Procédé et appareil de lutte contre la contrefaçon de visage, et support de stockage |
Also Published As
| Publication number | Publication date |
|---|---|
| EP3995989A1 (fr) | 2022-05-11 |
| US11972638B2 (en) | 2024-04-30 |
| JP2022547183A (ja) | 2022-11-10 |
| EP3995989A4 (fr) | 2022-09-14 |
| US20220083795A1 (en) | 2022-03-17 |
| CN110765923B (zh) | 2024-05-24 |
| CN110765923A (zh) | 2020-02-07 |
| EP3995989B1 (fr) | 2024-12-25 |
| JP7262884B2 (ja) | 2023-04-24 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN110765923B (zh) | 一种人脸活体检测方法、装置、设备及存储介质 | |
| EP4024352A2 (fr) | Procédé et appareil de détection de vivacité du visage, et support de stockage | |
| CN111274928B (zh) | 一种活体检测方法、装置、电子设备和存储介质 | |
| CN113205057B (zh) | 人脸活体检测方法、装置、设备及存储介质 | |
| CN103383723B (zh) | 用于生物特征验证的电子欺骗检测的方法和系统 | |
| CN110163899B (zh) | 图像匹配方法和图像匹配装置 | |
| WO2020134238A1 (fr) | Procédé et appareil de détection de corps vivant et support d'informations | |
| WO2021143216A1 (fr) | Procédé de détection de vivacité de visage et appareil associé | |
| CN110059579B (zh) | 用于活体检验的方法和装置,电子设备和存储介质 | |
| JP2017017431A (ja) | 画像処理装置、情報処理方法及びプログラム | |
| WO2022227765A1 (fr) | Procédé de génération d'un modèle de complétion d'image, et dispositif, support et produit programme | |
| CN112818722A (zh) | 模块化动态可配置的活体人脸识别系统 | |
| KR102257897B1 (ko) | 라이브니스 검사 방법과 장치,및 영상 처리 방법과 장치 | |
| CN111862030B (zh) | 一种人脸合成图检测方法、装置、电子设备及存储介质 | |
| CN112052832A (zh) | 人脸检测的方法、装置和计算机存储介质 | |
| CN114694265A (zh) | 活体检测方法、装置及系统 | |
| CN112464873A (zh) | 模型的训练方法、人脸活体识别方法、系统、设备及介质 | |
| CN115147936B (zh) | 一种活体检测方法、电子设备、存储介质及程序产品 | |
| CN116246356A (zh) | 一种活体检测方法和系统 | |
| CN114663929B (zh) | 基于人工智能的脸部识别方法、装置、设备和存储介质 | |
| CN114581978A (zh) | 人脸识别的方法和系统 | |
| CN114627518A (zh) | 数据处理方法、装置、计算机可读存储介质和处理器 | |
| CN117218398A (zh) | 一种数据处理的方法以及相关装置 | |
| CN114202806A (zh) | 活体检测方法、装置、电子设备和存储介质 | |
| HK40021912A (en) | Method and apparatus for detecting human face living body, device, and storage medium |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 20876546 Country of ref document: EP Kind code of ref document: A1 |
|
| ENP | Entry into the national phase |
Ref document number: 2020876546 Country of ref document: EP Effective date: 20220207 |
|
| ENP | Entry into the national phase |
Ref document number: 2022515568 Country of ref document: JP Kind code of ref document: A |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |