EP4655767A1 - Détection faciale automatisée avec anti-mystification - Google Patents
Détection faciale automatisée avec anti-mystificationInfo
- Publication number
- EP4655767A1 EP4655767A1 EP24755770.5A EP24755770A EP4655767A1 EP 4655767 A1 EP4655767 A1 EP 4655767A1 EP 24755770 A EP24755770 A EP 24755770A EP 4655767 A1 EP4655767 A1 EP 4655767A1
- Authority
- EP
- European Patent Office
- Prior art keywords
- face
- image
- user
- facial
- analysing
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- A—HUMAN NECESSITIES
- A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
- A61B—DIAGNOSIS; SURGERY; IDENTIFICATION
- A61B3/00—Apparatus for testing the eyes; Instruments for examining the eyes
- A61B3/10—Objective types, i.e. instruments for examining the eyes independent of the patients' perceptions or reactions
- A61B3/11—Objective types, i.e. instruments for examining the eyes independent of the patients' perceptions or reactions for measuring interpupillary distance or diameter of pupils
- A61B3/111—Objective types, i.e. instruments for examining the eyes independent of the patients' perceptions or reactions for measuring interpupillary distance or diameter of pupils for measuring interpupillary distance
-
- A—HUMAN NECESSITIES
- A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
- A61B—DIAGNOSIS; SURGERY; IDENTIFICATION
- A61B3/00—Apparatus for testing the eyes; Instruments for examining the eyes
- A61B3/10—Objective types, i.e. instruments for examining the eyes independent of the patients' perceptions or reactions
- A61B3/113—Objective types, i.e. instruments for examining the eyes independent of the patients' perceptions or reactions for determining or recording eye movement
-
- A—HUMAN NECESSITIES
- A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
- A61B—DIAGNOSIS; SURGERY; IDENTIFICATION
- A61B5/00—Measuring for diagnostic purposes; Identification of persons
- A61B5/0059—Measuring for diagnostic purposes; Identification of persons using light, e.g. diagnosis by transillumination, diascopy, fluorescence
- A61B5/0077—Devices for viewing the surface of the body, e.g. camera, magnifying lens
-
- A—HUMAN NECESSITIES
- A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
- A61B—DIAGNOSIS; SURGERY; IDENTIFICATION
- A61B5/00—Measuring for diagnostic purposes; Identification of persons
- A61B5/103—Measuring devices for testing the shape, pattern, colour, size or movement of the body or parts thereof, for diagnostic purposes
-
- A—HUMAN NECESSITIES
- A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
- A61B—DIAGNOSIS; SURGERY; IDENTIFICATION
- A61B5/00—Measuring for diagnostic purposes; Identification of persons
- A61B5/103—Measuring devices for testing the shape, pattern, colour, size or movement of the body or parts thereof, for diagnostic purposes
- A61B5/107—Measuring physical dimensions, e.g. size of the entire body or parts thereof
-
- A—HUMAN NECESSITIES
- A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
- A61B—DIAGNOSIS; SURGERY; IDENTIFICATION
- A61B5/00—Measuring for diagnostic purposes; Identification of persons
- A61B5/103—Measuring devices for testing the shape, pattern, colour, size or movement of the body or parts thereof, for diagnostic purposes
- A61B5/107—Measuring physical dimensions, e.g. size of the entire body or parts thereof
- A61B5/1079—Measuring physical dimensions, e.g. size of the entire body or parts thereof using optical or photographic means
-
- A—HUMAN NECESSITIES
- A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
- A61B—DIAGNOSIS; SURGERY; IDENTIFICATION
- A61B5/00—Measuring for diagnostic purposes; Identification of persons
- A61B5/117—Identification of persons
- A61B5/1171—Identification of persons based on the shapes or appearances of their bodies or parts thereof
- A61B5/1176—Recognition of faces
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/30—Authentication, i.e. establishing the identity or authorisation of security principals
- G06F21/31—User authentication
- G06F21/32—User authentication using biometric data, e.g. fingerprints, iris scans or voiceprints
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/161—Detection; Localisation; Normalisation
- G06V40/167—Detection; Localisation; Normalisation using comparisons between temporally consecutive images
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/168—Feature extraction; Face representation
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/172—Classification, e.g. identification
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/174—Facial expression recognition
- G06V40/176—Dynamic expression
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/40—Spoof detection, e.g. liveness detection
Definitions
- This disclosure relates to automated facial authentication or identification systems, in particular to address potential vulnerabilities of these systems to spoofing attacks, whereby someone attempts to be authenticated by presenting an image of a face to the system.
- biometrics in the authentication or identification of individuals is gaining traction in recent years, in particular given the advance in facial recognition and image process techniques.
- An application for which such use can be readily adopted is the identification or registration of passengers, particularly in airports, where there are already self-serve kiosks where passengers can complete other functions such as checking into flights, printing boarding passes, or printing baggage tags.
- facial biometrics verification also increasingly may be used in other scenarios, such as building access control.
- a facial biometric identification system an image of a person’s face is captured, analysed and compared with a database of registered faces to determine whether there is a match. Based on the result of this determination the system ascertains the identity of the person.
- This process is potentially vulnerable to “spoofing” attempts by an imposter to disguise their true identity, by presenting a “spoof’, i.e., the facial image of someone else, to the biometric identification system.
- the system needs to be able to determine whether it has captured an image of a live face, or an image of a “spoof’.
- I imposter can present a display of a mobile device showing a facial picture of a different person to the biometric identification system, while using such software to animate the picture to make it appear like the person is interacting with the biometric identification system. This makes it more difficult for facial biometric identification systems to detect an imposter attempt by requesting a live interaction with the person it is trying to identify.
- the invention provides a method of estimating whether a presented face for a user is a real face by analysing image frames acquired of the presented face, comprising one or more of:
- (a), (b), and (c) are performed sequentially.
- At least one of (a), (b), and (c) is performed at the same time as another one of (a), (b), and (c).
- determining and analysing movements effected by the user to fit a facial image of the presented face to a randomised goal comprises: displaying a randomised goal on a display screen, and directing the user to effect movement to move his or her facial image from a current position in relation to the display screen such that his or her facial image will fit the randomised goal; analysing image frames acquired while the user is directed to track the randomised goal; and estimating whether the presented face is a real face based on the analysis.
- the goal may be randomised in that it has a randomised location, size, or both.
- analysing image frames acquired while the user is directed to track the randomised goal comprises estimating movement effected by the user.
- estimating whether the presented face is a real face comprises comparing the determined movement or movements with a movement which a person is expected to make.
- the estimation is made by a machine learning model.
- the estimation is made by a reinforcement learning based trained using data in relation to natural movements.
- determining movement comprises determining a path between the current position and the randomised goal.
- determining and analysing changes in facial feature metrics in the facial images in response to a change in the user’s expression comprises: directing the user to make an expression to cause a visually detectable facial feature warping; analysing image frames acquired while the user is directed to make the expression; and estimating whether the presented face is a real face based on the analysis.
- analysing the image frames acquired while the user is directed to make the expression comprises obtaining a time series of metric values from the series of analysed frames, by calculating, from each analysed frame, a metric based on position of one or more facial features
- estimating whether the presented face is a real face based on the analysis comprises: determining a momentum in the times series of metric values and comparing the momentum with an expected momentum profile for a real smiling face.
- the user is directed to smile.
- the metric calculated is or comprises a ratio of a distance between eyes of a detected face in the analysed frame to a width of a mouth of the detected face.
- the method comprises comparing the image frames acquired while the user is directed to make the expression with a reference image in which the user has a neutral expression, and selecting an image from the analysed image frames which is most similar to the reference image as an anchor image in which the user is deemed to have a neutral expression.
- analysing an effect of a change in illumination on the presented face on the image data comprises: capturing one or more first image frames of the presented face at a first illumination level, and capturing one or more second image frames of the presented face at a second illumination level which is different than the first illumination level; analysing the first image frame to determine a first contrast level between a detected face region in the first image frame and an adjacent region which is adjacent the detected face region; analysing the second image frame to determine a second contrast level between a detected face region in the second image frame and an adjacent region which is adjacent the detected face region of the second image frame, wherein a relationship between the adjacent region and the detected face region of the second image frame is the same as a relationship between the adjacent region and the detected face region of the first image frame; comparing the first and second contrast levels to estimate whether the presented face is likely to be a real face.
- comparing the first and second contrast levels comprises determining whether a change between the first and the second contrast levels is greater than a threshold.
- the method comprises: directing the user to fit a facial image of his or her presented face to a first, larger target area; and upon detecting a facial image within in the first target area, setting an area generally taken by the facial image as a reference target area.
- the method comprises applying a tolerance range around the first, larger target area, whereby detection of a facial image within the tolerance range will trigger setting of the area of the facial image as the reference target area.
- the method comprises applying a tolerance range around the reference target area, so that the facial image of user’s presented face is considered to stay within the reference target area if it is within the tolerance range around the referenced area.
- the invention provides an apparatus for estimating whether a presented face for a user is a real face by analysing image frames acquired of the presented face, comprising, a processor configured to execute machine instructions which implement the method mentioned above.
- the apparatus is a local device used or accessed by the user.
- the apparatus is a kiosk at an airport.
- the kiosk may be a check in kiosk, a bagdrop kiosk, a security kiosk, or another kiosk.
- the local device is a mobile phone or tablet.
- the acquired image frames are sent over a communication network to a backend system, and are processed by a processor of the backend system configured to execute machine instructions at least partially implementing the method mentioned above.
- the presented face is estimated to be a real face, if processing results by the processor of the apparatus and processing results by the processor of the backend system both estimate that the presented face is a real face.
- the apparatus is configured to enable the user to interface with an automated system biometric matching system to enrol or verity his or her identity.
- the user is an air-travel passenger
- the backend system is a server system hosting a biometric matching service or is a server system connected to another server system hosting biometric matching service.
- the invention provides a method of biometrically determining a subject’s identity, including: estimating whether a presented face of the subject is a real face, in accordance with the method mentioned above; providing a two-dimensional image acquired of the presented face for biometric identification of the subject, if it is estimated that the presented face is a real face; and outputting a result of the biometric identification.
- the method is performed during a check-in process by an air travel passenger.
- a biometric identification system including an image capture arrangement, a depth data capture arrangement, and a processor configured to execute machine readable instructions, which when executed are adapted to perform the method of biometrically determining a subject’s identity mentioned above.
- Figure l is a schematic for a liveness estimation method, according to one embodiment of the present invention.
- Figure l is a flow chart for a facial expression detection process according to one embodiment
- Figure 3 (1) is an image in which a person has a smiling expression
- Figure 3 (2) is an image in which a person has a neutral expression
- Figure 4 shows the lip distance to eye separation ratio (LDZED) through the frames, where the facial expression changes from a neutral to a smiling expression;
- Figure 4 (2) shows LDZED through the frames, where the facial expression changes from a smiling to a neutral expression
- Figure 5 (1) depicts the LDZED and momentum in the LDZED as calculated from image frames acquired over a period of time, when the expression remains neutral;
- Figure 5 (2) depicts the LDZED and momentum in the LDZED as calculated from image frames acquired over a period of time, when the expression is a slow-paced smile made by a real face;
- Figure 5 (3) depicts the LDZED and momentum in the LDZED as calculated from image frames acquired over a period of time, when the expression is a medium-paced smile made by a real face
- Figure 5 (4) depicts the LDZED and momentum in the LDZED as calculated from image frames acquired over a period of time, when the expression is a fast-paced smile made by a real face;
- Figure 6 (1) depicts a first image of a face changing from a neutral expression to a smiling expression, in which the face is neutral, and a second image of the face in which the face is smiling;
- Figure 6 (2) is a schematic depicting of the times that image frames are expected to be acquired when the camera frame rate is 10 frames per second (fps);
- Figure 6 (3) depicts a time series of LDZED data measured from 12 frames taken at 2 fps;
- Figure 6 (4) is a schematic depicting how the time series of Figure 6 (3) are used to populate or interpolate values to populate a data series for analysis at a target sampling rate which is higher than 2 fps;
- Figure 7 schematically depicts a user interface on a mobile device where the user is asked to fit his or her facial image to a randomised target;
- Figure 8 is a depiction of a reinforcement training model
- Figure 9 depicts an example face motion test, in accordance with an embodiment of the invention.
- Figure 10 depicts an example face dimensionality analysis, in accordance with an embodiment of the invention.
- Figure 11 depicts an example of a facial region and four adjacent, non-facial regions
- Figure 12 depicts an example facial target fitting process in accordance with an embodiment of the invention
- Figure 13 (1) schematically depicts a rough fitting step mentioned in Figure 12;
- Figure 13 (2) schematically depicts a concise fitting step mentioned in Figure 12;
- Figure 13 (3) schematically depicts a reference reset step mentioned in Figure 12;
- Figure 14 schematically depicts an example of an automated system for the purpose of authenticating a traveller or registering a traveller, in accordance with an embodiment of the invention.
- a spoofing attempt a facial image or model, rather than a live face, is presented to an automated system which uses facial biometrics for purposes such as the enrolment, registration or the verification of identities, to try to fool the automated system.
- the “spoof’ which is presented to the automated system in a spoofing attempt may be a static two-dimensional (2D) spoof such as a print-out or cut-out of a picture, or a dynamic 2D spoof such as a video of a face presented on a screen.
- the spoof may be a static three-dimensional (3D) spoof such as a static 3D model or a 3D rendering.
- Another type of spoofs are dynamic 3D spoofs, for example a 3D model with face expression dynamics in real or a virtual camera.
- the capture and biometric analysis of a passenger’s face can occur at the various points, such as at flight check-in, baggage drop, security, boarding, etc.
- the identification system includes image analysis algorithms, which rely on colour images taken with cameras. Systems using these algorithms are therefore limited in their ability to detect when a pre-recorded or a synthesized image sequence (i.e., video sequence) is being shown to the camera of the biometric identification system, rather than the real face of a person. The challenges are even greater when a 3D spoof is presented.
- anti-spoofing for such systems may be done by configuring them to, or combining them with a system configured to, estimate whether the image being analysed is likely to have been taken of a spoof or a real face, i.e., estimating the liveness of the presented face.
- Embodiments of the present invention provide a method for estimating the liveness of a face presented to a facial biometric system, i.e., determining whether it is a real face or a spoof.
- the disclosed method is implementable as an anti-spoofing algorithm, step, or module, in the facial biometric system.
- the system may be configured to enrol or register passengers, or verify passenger identities, or both.
- the disclosure also covers facial biometric systems which are configured to implement the method.
- FIG. 1 is a high-level schematic for a liveness estimation method 100, according to one embodiment of the present invention.
- Image data is received or acquired by a system implementing the method at step 102.
- the image data is processed at step 104, and then a liveness estimation is made at step 106.
- the processing of the image data includes a facial expression analysis 108, a motion tracking analysis 110, and a facial lighting response analysis 112.
- the processing at step 104 may be an interactive process whereby the system will output instruction, to direct the passenger attempting to register or authenticate their identity (e.g., at check-in) to take particular actions.
- Image data captured whilst the passenger is performing the actions can then be analysed.
- Arrow 105 represents the interactive process whereby during the processing (step 104), further image data are acquired or received to be analysed.
- processing which occurs at step 104 may be different than that depicted in Figure 1, by including only one or two of the three types of analyses shown in the figure. Facial expression analysis
- the facial expression detection 108 is configured to analyze the facial images as detected in the input image data.
- the input image data is acquired whilst the user is directed make a particular facial expression or a series of expressions, in order to determine if a facial expression likely to be made by a real face can be detected.
- the facial expression is of a type such that at least a partial set of the user’s facial features or muscles are expected to move whilst the user is making the expression.
- the movement or movements cause a “warping” in the facial features as compared with an expressionless or neutral face.
- the analysis for performing the facial expression detection 108 is configured to characterise this warping from the image data, to determine whether a real face is being captured making the expression, or whether the facial image is likely to have been captured from a “spoof’.
- FIG. 2 is a high-level depiction of the facial expression detection 108 according to one embodiment.
- the facial expression detection 108 is conceptually depicted (as represented by the dashed rectangular box) as occurring after the step of detecting facial images in the image data (step 113).
- the facial image detection 113 may be included as part of the facial expression detection process 108.
- the facial expression detection 108 at step 114 detects one or more facial features in each processed image frame.
- one or more metrics may be calculated from the detected facial features, such as the width or height of a particular facial feature or the distance between facial features.
- the positions of the detected features, or the metrics calculated from step 116 (if step 116 is performed) are tracked over the period of time during which the user is asked to perform the facial expression.
- the time series of data are analysed, to determine if the time series of data are indicative of a real “live” face or a spoof being captured in the image data.
- Figures 3 (1) and 3 (2) which for illustrative purposes show examples where the user is directed to make is to smile.
- the facial features being identified are the eyes and the mouth of the person.
- Figure 3 (2) When a person changes from a neutral or non-smiling expression ( Figure 3 (2)) to a smiling expression ( Figure 3 (1)), the separation between the eyes is expected to remain the same, whilst the mouth is expected to widen. Therefore, the ratio of the mouth width to the distance between the eyes is expected to increase as shown in Figure 4 (1). Conversely, when a person changes from a smiling expression to a neutral or non-smiling expression, this ratio is expected to decrease as shown in Figure 4 (2).
- the metrics which are calculated from the facial features may include the distance between the eyes (ED) and the distance across the width of the lips (LD), and the ratio between the two distances (i.e., LDZED).
- LDZED the ratio between the two distances
- values of the ratio LDZED are tracked over time and the time series of the values of the metric are analysed. It should be noted that in other implementations where the user is asked to smile or make other expressions, different metrics can be tracked within the same premise of tracking a movement or a warping in facial metrics over a series of images.
- the aforementioned process may be generalised to include other embodiments.
- the generalisation may be in one or more different ways. For example, other expressions than a smile may be used, as long as the expressions can be expected to cause measurable changes or a “warping” of the facial muscles.
- the computation used for quantifying the changes for example involve a different number of facial points for the purpose of analyzing the different types of expressions.
- the computation may also be non-linear, for instance involving a model based on non-linear computation such as but not limited to a poly- or a spline- model, rather than a liner model.
- the analysis which is performed (step 120) on the time series of values may include determining a momentum in the time series of values, and ascertaining characteristics of the momentum to see if they indicate a “lively” expression has been made, i.e., by a real face.
- Different methods may be used to measure momentum.
- An example is the Moving Average Convergence/Divergence (MACD) analysis, but other tools for providing measures of momentum may be used instead.
- MCD Moving Average Convergence/Divergence
- the momentum in the time series differs, depending on whether there remains no smile (Figure 5 (1)), or whether the smile is a slow-paced mile (Figure 5 (2)), a medium-paced smile (Figure 5 (3)), or a fast-paced smile (Figure 5 (4)).
- Figures 5 (1) to 5 (4) the LDZED metric (in % unit) are shown on the top graphs.
- the MACD analyses are shown in the bottom graphs of Figures 5 (1) to 5 (4).
- the horizontal axis represents sample points.
- the momentum which is being analysed indicates the momentum in the facial metrics as the facial expression is expected to change to or from a “neutral expression”.
- the facial expression analysis algorithm needs to have access to an image which can be considered to provide a neutral or expressionless face. This image may be referred to as an “anchor” image. This may be done by asking the person interacting with the automated system to assume an expressionless face.
- the algorithm may set a threshold or threshold range for the metric or metrics being analysed, and assign an image in which the threshold or threshold range is met as the anchor image.
- the anchor image may be chosen based on a reference image for the person who is interacting with the automated system, in which the face is expected to be neutral.
- the reference image may be a prior existing image such as an identification photograph, an example being a driver’s license photograph or the photograph in a passport.
- each image in the series of input images is compared with the reference image.
- the input image which is considered “closest” to the reference image will be chosen as the “anchor” image.
- the image frames in the series input image after the anchor image can then be used for the analysis to determine whether the “face” in the images is a real face or a spoof.
- the comparison between the reference image and the input images may be made using biometric methods.
- the comparison may be made on the basis of the specific facial feature metric(s) being used in the facial expression analysis, by comparing the metric(s) calculated from the reference image with the same metric(s) calculated from the input images, and identifying the input image from which the calculated metric(s) is or are closest to the metrics calculated from the prior existing image. The identified input image is then chosen as the “anchor” image.
- a photograph such as a passport photograph has further benefits in that facial metrics measured on the basis of horizontal distances in the passport photograph is not expected to be subject to significant influence by distortions in the camera used to acquire the passport photo. This is because, in relation to the face, a person’s eyes and lips are expected to remain in the same vertical axis irrespective of the facial expression. Therefore, any vertical distortions can be expected to affect the eyes and the lips equally. Thus vertical distortions are not expected to have any real influence on ratios such as LDZED which rely on measurements across a horizontal distance. On the other hand, horizontal distortion may impact the lip-eye-ratio metric LDZED.
- IAO International Civil Aviation Organization
- the facial expression detection algorithms may run on hardware having different technical specifications. For example, some older smartphones have lower frame rates than the newest smartphones. Therefore, in some embodiments, the algorithm is configured so that it will be able to perform the facial analysis when run on different hardware having different frame rates. In this way, such embodiments of the liveness estimation system which are intended to work on different types of devices as may be owned by users of the biometric system, will be “device” agnostic by being agnostic to frame rates, provided a minimum level of frame rate is available.
- the self check-in may be done on a mobile application installed on the mobile device, or via a web-based application which the user can access using a browser on the mobile device.
- the web application may be supported by a server providing the 1 to N biometric matching to verify passenger identities.
- the facial expression analysis algorithm is configured to perform the analysis whereby the samples used are at a “target sampling rate”.
- the algorithm may be configured to require a minimum or predefined number of samples (M samples) at the “target sampling rate” to be available.
- the number “M” may be determined as the number of samples which are expected over a predetermined period of time at the target sampling rate.
- the M samples are used as the data series for expression analysis.
- the initial time point for the M samples is set to temporally coincide with one of the input image frames, and the facial feature metric calculated from that input image frame will be used to as the first of the M samples.
- the image frame providing the first sample in the M sample series may be the very first input image frame acquired.
- it may be an image frame which is taken at a predefined period of time after the initial image frame, or it may be the first image captured once the algorithm determines that the facial image fits to a “goal” area in the display view, or it may be the input image frame used as the anchor image.
- the sample value will be the facial feature metric calculated from the input image frame which temporally coincides with the sample, if available. If no input image frame which temporally coincides with a required sample is available, then the value for that sample will be determined from the facial metric values calculated from the input image frames which are the closest in time to the sample. For instance, the sample value may be determined by interpolating between the facial feature metrics calculated from the input image frames which temporally, immediately precede and immediately succeed the time for the sample.
- the target sampling rate may be one which is expected to be met or exceeded by the frame rates of most camera hardware included in the users’ devices (e.g., most of the available smart phone or tablet cameras).
- the facial expression analysis algorithm may be configured so that it requires the input images to be acquired at or above a minimum actual sampling rate which is required in order to generate useful input data for the facial expression analysis algorithm at the target sampling rate.
- Figure 6 depicts an example of how the facial expression analysis algorithm obtains the data series comprising data samples at M discrete temporal points over three seconds, as defined by the analysis sampling rate.
- the sampling rates provided below are examples only, serving to illustrate how the analysis algorithm works, and are not essential features of the invention.
- Figure 6 (1) depicts the real time continuous motion which occurs when a person changes the facial expression from a neutral expression to a smiling expression.
- the images in Figure 6 (1) are artificial-intelligence generated images, provided for illustrative purposes only, and are not actual images acquired of a user.
- the neutral expression is shown in the input image which is chosen to be the “anchor image” (illustrated by the left hand side image of Figure 6 (1)).
- the smiling expression may be assumed to be shown in an input image (illustrated by the right hand side image of Figure 6 (1)) which is taken by the camera a pre-set period of time, e.g., 3 seconds (s), after the anchor image.
- the target sampling rate for the data series to be analysed is 10 samples per second as depicted in Figure 6 (2).
- the actual sampling rate is only 2 samples per second, where there are input images at times To, To + 0.5s, To + Is, etc. Designating as To as 0.0 seconds, then there are facial metrics calculated from the actual images, at times 0.0 second, 0.5 second, 1.0 second, etc, as shown in Figure 6 (3).
- the system thus needs to generate a data series where the sample points are at the target sampling rate, which in this case means samples are needed at times To, To + 0.1s, To + 0.2s, To + 0.3s, To + 0.4s, ..., etc.
- the sampling rates mentioned in this paragraph are illustrative only and should not be taken as limiting how embodiments of the algorithm should be implemented.
- the facial feature metrics calculated from those input frames are used to provide corresponding sample points in the data series, as represented by the dashed arrows between Figures 6 (3) and 6 (4).
- the sample values at each of these time points are estimated from the sample values of the nearest neighbours.
- the estimation may be an interpolation.
- the interpolation may be a linear interpolation.
- the data series can then be used to analyse the characteristics of any observed facial feature motion or warping, to make an estimation of whether the face captured in the input images is likely to be a real face or a spoof.
- This method of building the data series for analysis has the benefit of being generally agnostic to the variation in the frame rates in the cameras, at least for cameras capable of operating at or above a target frame rate. Also, at least given current image sensor frame rates, the processing speed of the CPU or processor running the expression analysis algorithms is likely to be much higher. Using interpolation means that the processing algorithm does not necessarily need to wait for the camera to produce enough frames so that metrics can be calculated to fill the required number of data samples in the data series.
- the liveness estimation 100 may further include a face motion analysis algorithm 110 (see Figure 1), which analyses how the user performs, as can be determined from the input images being captured, when asked to move his or her face when directed by the liveness estimation system.
- a face motion analysis algorithm 110 see Figure 1
- the user is asked to make movements such that his or her facial image matches to a “goal” area on the screen.
- the goal does not remain unchanged in the same position on the screen 700, and will instead move to or appear in at least one other position, or change in size, or both.
- the goal may thus be considered a dynamic goal.
- the positioning, sizing, or both, of the goal 704 may be randomised, to make the algorithm more robust against someone trying to use a dynamic spoof with software which tries to learn the and anticipate the movement pattern.
- the person interacting with the automated system whilst the face motion analysis is performed is the “agent”.
- a user can be considered a trained “agent” interacting with the system, such that his or her facial image 702 is shown on the display area 700 and will move within the display area 700 to match the position, size, or both, of the goal 704. Therefore, the display area 700 may be considered an ‘environment’ in which the agent is doing an action at time t (“At”), namely, to position the facial image 702 to the goal 704.
- the position of the user’s face at time t may be considered the “State” at time t (“St”).
- the “reward” at time t (“Rt”) is therefore defined as the State St matching or substantially matching the position of the goal 704.
- the face motion analysis determines one or more various factors, such as whether the reward condition is met, or a characterisation of the relationship between the change in the State and the change in Reward, over a period of time in which the analysis is undertaken, in order to estimate whether the face is likely a real face or a spoof.
- Figure 9 outlines an example implementation of a “face motion” test 900 provided by the face motion analysis.
- the algorithm detects the face in the incoming images.
- a “goal” is displayed in a location of the screen which is away from the detected face.
- the goal will define an area on the screen.
- the position of the goal and thus the defined area on the screen may be randomly selected to result in a “randomised goal”.
- the system provides direction to the user to make the required movement or movements so that his or her facial image will move to the area defined by the goal.
- the system will track the detected face over the image frames to determine the path it takes to move from its starting position (i.e., State “S”) to the position of the area defined by the goal (step 906).
- the movement or movements, i.e., motion, determined from the images will be analysed (arrow 912).
- the movement analysis may include an analysis of the “path” of the detected face takes through the image frames (step 914).
- the determined path is then analysed and liveness estimation made based upon the analysis. This path is represented by arrows 706 in Figure 7, for the facial image to be fitted to the goal 704.
- the path is expected to be smoother and shorter than the path which would be taken to move the “spoof’.
- some dynamic spoofs use “brute force” attacks where the spoof presented to the automated system will be caused by software to move to random positions until it matches the position of the “goal”. Brute force attacks thus are likely to result in a path which is indirect and which may be erratic, even if they successfully fit the detected facial image to the “goal”.
- the analysis may be a comparison between the path which is determined or estimated to have been taken, with the path which is expected when someone is not attempting a spoof attack.
- the analysis of the movement or movements may additionally or instead comprise a determination of the “naturality” of the movement or movements (step 916).
- the movement may comprise only the movement of the user’s face if the user is interacting with the automated system where the image sensor is in a fixed position.
- the movement may comprise the movement of the face, motion attributed to the movement of the camera as made by the user, or a combination of both.
- the order of the path analysis (step 914) and the naturality analysis (step 916) may be reversed from that shown in Figure 9.
- a compliance failure may also be made if a failure occurs at one or more of the other steps. For instance, a compliance failure may be determined, if the system does not detect that there is a successful tracking of the facial image to the randomised goal (failure at step 906), or if the system determines from the path analysis that the facial image is an image of a spoof (failure at step 914), or both.
- the final capture i.e., the last frame or frames
- the camera motion if the camera is provided by or as a mobile device, or both.
- this relative movement impacts one or more of the position, angle, or depth of the facial features as captured by the camera.
- the relative movement can affect the lighting or shadows as can be observed in the image data. Moving a real face in a three- dimensional (3D) environment, i.e., a “natural movement”, will cause different behaviours in the observed shadow and lighting, as opposed to moving a 2D spoof. Therefore, it is possible to analyse the above-mentioned parameters in the image data, in order to estimate the position or the movement of the user’s face in relation to the camera, in the physical 3D environment, and then determine whether the movement is a “natural movement”.
- the determination of whether a movement is a natural movement may be made by a trained model using machine learning.
- a reinforcement training model (Figure 8) can be used.
- the user or user’s face would be considered the “agent”, and its position in the 3D environment would be considered as “State”.
- the “reward” may be a determination that the movement is a natural movement.
- the algorithm makes an assessment, i.e., estimation, of whether the face is likely to be a spoof meaning there is a compliance failure (910), or likely to be real (918).
- the face motion test 900 may be performed a number of times. That is, the face motion analysis may include multiple iterations of the face motion test 900.
- the algorithm may require that a “successful” outcome, in which the face is estimated to be likely real, to be achieved for all of the tests or for a threshold number or percentage of the tests, for the overall analysis to estimate that the detected face is likely to be the image of a real face.
- the liveness determination 100 may further include a lighting response analysis 112 (see Figure 1) which analyses the input image frames to assess the response visible in the input image frames to changes in lighting.
- a lighting response analysis 112 see Figure 1 which analyses the input image frames to assess the response visible in the input image frames to changes in lighting.
- some key face spoofing scenarios may involve either using a mobile device screen or printed photo to match against a documented face image, e.g., a passport image or an enrolment image. It is expected that the responses of a real face which is in 3D will be different than a 2D spoof or a mask worn over someone’s face. Therefore, the lighting response analysis may also be referred to as the face dimensionality analysis.
- Figure 10 depicts an example process 1000 implemented to provide the face dimensionality analysis 112.
- the input images are checked to ensure that there is a detectable face correctly positioned in the field of view (1002).
- the algorithm to work it is important that there are no significant movements in the detected face, and that the presented face is sufficiently close to the screen or camera, so that the screen brightness, front flash, or both, is close enough to change face illumination significantly.
- the light from the mobile screen provides the illumination. Once a correctly positioned face is detected in an input image, a plurality of images will be captured, at different illumination levels.
- One or more first images may be captured at a first illumination level (1004), and then one or more second number of images may be captured at a second illumination level which is different than the first illumination level (1006).
- the illumination statistics will then be calculated (1008) for the first and second images, to find occurrence of a transition between “dark” and “bright” regions.
- the calculation of the intensity statistics is done in respect of a face region in the image, and one or more non-face regions adjacent to the face region.
- five regions are defined, include face region R5 and four other regions Rl, R2, R3, R4, respectively to the left of, to the right of, above, and below, the face region R5.
- Regions Rl, R2, R3 are chosen so that they capture the background behind a real person, if the person is a real person presenting a real face.
- Region R4 is a body region which is expected to be of similar “depth’ from the screen or camera, but subjected to slightly less illumination than the face due to the expected positioning of the face in relation to the source of illumination. It should be noted that Figure 11 is an example only. Other embodiments may have different regions. For instance another embodiment may not include any “body region”, or may include a different number of “background regions”.
- the statistics are calculated for the series of input frames captured at steps 1004 and 1006, to determine the variation in the intensity contrast between the face region and the adjacent region or regions (1010).
- the face When the presented face is “real face”, the face will be closer to the screen or the camera than the background which is captured in the adjacent regions which are not of the user’s person. These adjacent regions are expected to be at least a head’s width behind the face. Accordingly it is expected that the face region will be more illuminated than adjacent regions captured of the background. Therefore, when an illumination level is changed, the effect of this change is expected to cause the largest variation in intensity levels for the face region (e.g., R5 in Figure 11), compared with the regions in which the background behind the person is captured (Rl, R2, R3 in Figure 11).
- the algorithm will calculate a statistic indicating the brightness contrast(s) between region R5 and the adjacent regions (one or more of Rl, R2, R3) in the first images, also referred to as the “inter-region” contrast(s) for the first images.
- the algorithm also calculates a statistic indicating the brightness contrast between those same regions in the second images to obtain the “inter-region” contrasts for the second images.
- the two statistics are compared to determine the amount of variation in the inter-region contrast(s) (e.g., differences in the intensities of the regions) , as caused by the change in lighting intensity.
- the face is more likely to be a real face (1014) than a spoof (1012).
- Liveness estimation in accordance with different embodiments of the invention may include one, two, or three of the facial expression analysis, facial dimensionality analysis, and facial motion analysis. Furthermore, while these analyses, where two or more are provided, can be performed sequentially, they may be performed simultaneously if allowed by processing capabilities of the hardware used. For example, a person is directed to performing an expression during the facial expression analysis, the illumination level can be changed so that the data acquired during the time can also be used to perform the face dimensionality analysis.
- Face fit In one or more of the analyses mentioned above, the user is asked to position his or her face so that the facial image is within a certain target area on the screen.
- there is a tolerance range for the positioning so that the algorithm considers the face image to be “in position” even if there is a slight difference between the area taken by the face image and the target area. Irrespective of whether the tolerance is large or small, if the user places his or her face at the boundary of the tolerance range, there is a lesser degree of freedom for the user to do further required actions, such as smiling (e.g., in a facial expression analysis example).
- the user’s facial image may more easily go out of the tolerance area, and the user may as a result need repeat the process of positioning his or her face again.
- the liveness estimation system applies a novel process in fitting the user’s facial image to the target area.
- An example of the face fitting process is depicted in Figure 12.
- the process starts with a “rough fitting” step 1202 in which the user is asked to move so that an outline of his or her facial image is in a first area (1304 in Figure 13(1)).
- the first area 1304 will be set as a relatively large tolerance range bound by the dashed lines 1306, 1308 which respectively represent the lower and upper limit of the first area 1304.
- the first area 1304 may also be considered the rough fitting target.
- Circle 1302 represents the centre of the rough fitting target 1304.
- Dashed line 1306 may also be considered to define the “negative” tolerance from the target centre 1302 and dashed line 1308 may be considered to define the “positive” tolerance from the target’s centre 1302.
- the position of the boundary of the facial image 1310 is measured. This boundary 1310 is considered to define a reference area.
- a revised, smaller tolerance range is determined with the measured boundary 1310 as the centre of the smaller range. Referring to Figure 13 (2), the tolerance range 1312 around the measured facial boundary 1310, as defined by dashed lines 1314, 1316, is smaller than the rough tolerance range 1304.
- Steps 1204 and 1206 may together be considered the “concise fitting” steps.
- the resulting tolerance range 1312 may be considered to provide the “concise fitting target”.
- the targets and their bounding lines are defined by circles. However these may take other shapes such as an oval or a shape that resembles a facial boundary shape.
- FIG 14 schematically depicts an example of an automated system 1400 for the purpose of authenticating a traveller or registering a traveller.
- the system 1400 includes a device 1402 which includes a camera 1406 which is configured to acquire input image data, or the device 1402 has access to a camera feed.
- the device 1402 includes a processor 1403, which may be a central processing unit or another processor arrangement, configured to execute machine instructions to provide the liveness estimation method mentioned above, either in full or in part.
- the processor 1403 can be configured to only execute the method in part, if a backend system with more powerful processing is required to process any of the steps of the liveness estimation method.
- the machine instructions may be stored in a memory device 1407 collocated with the processor 1403 as shown, or they may partially or wholly reside in one or more remote memory locations accessible by the processor 1403.
- the processor 1403 may also have access to data storage 1405 adapted to contain the data to be processed, and possibly to at least temporarily store results from the processing.
- the device 1402 further includes an interface arrangement 1404 configured to provide audio and/or video interfacing capabilities to interact with the traveller.
- the interface arrangement 1404 includes the display screen and may further include other components such as a speaker, microphone, etc.
- the input image data are processed by a liveness estimator 1408 configured to implement the liveness estimation method.
- the liveness estimator 1408 is provided as a computer program or module, which may be part of an application executed by the processor 1403 of the device 1402.
- the liveness estimator 108 is supported by a remote server or is a cloud-based application, and accessible via a web-based application in a browser.
- the box denoting the device 1402 is represented in dashed lines to conceptually signifies that the components therein may be provided in the same physical device or housing, or one or more the components may instead be located separately.
- the device 1402 is a programmable personal device such as a mobile phone or tablet
- the mobile phone or tablet can provide a single hardware equipment containing the input/output (I/O) interface arrangement 1404, processor 1403, data storage 1405, communications module 1409, camera hardware 1406, local memory 1407.
- the machine instructions for the liveness estimation can be stored locally or accessed from the cloud or a remote location, as mentioned previously.
- the automated system 1400 is used in the travel context.
- a passport image 1410 is provided as a reference image for the purpose of the expression analysis performed by the liveness estimator, in embodiments where the analysis is performed.
- the provision of the passport photo may be by the traveller taking a photogram or a scanned image of the passport page from the device 1402.
- the kiosk may include a scanning device configured to scan the relevant passport page.
- the device 1402 is a “local device” as it is in a wireless connection with a backend system 1412. Such local devices may be provided by mobile phones or tablets.
- the backend system 1412 is a remote server or server system where the 1:N biometric matching engine 1414 resides. Communication between the device 1402 and the backend system 1412 is represented by dashed double arrow 1411, and may be over a wireless network such as but not limited to a 3G, 4G, or 5G data network, or over a WiFi network.
- the backend system 1412 may instead be provided by another server or server system, such as an airport server separate to but in communication with the server performing the 1 :N matching.
- the backend system 1412 may include a backend liveness estimator 1416 configured to implement the same method as that implemented by the liveness estimator 1408, either partially or in full.
- the camera feed data and the passport image 1410 are also sent to the backend liveness estimator 1416. That is, while the liveness estimator 1408 in the device is processing the live camera feed, the camera data is also being fed to the backend server 1412 for the same processing. This serves the purpose of performing a verification run of the processing to ensure there is no corruption in the result(s) returned by the liveness estimation, or for the purpose of performing step(s) in the liveness estimation method which might be too computationally intensive for the local device 1402 to handle, or both.
- the automated process of authenticating or enrolling the traveller will only proceed, for 1 :N matching to occur, if both the results from the local liveness estimator 1408 and the backend liveness estimator 1416 both indicate “liveness” of the facial image in the camera feed.
- liveness estimation is described as part of a check before biometric identification is performed.
- biometric identification does not affect the working of the liveness estimation and thus is not considered a part of the invention in any of the aspects disclosed.
- liveness estimation may be implemented in systems which do not perform biometric identification. For example, it may be implemented in systems to check whether anyone passing through or at a check point is using a spoofing device to conceal his or her identity or to pose as someone else, e.g., to join a video conference or to register themselves onto a particular user database, using a “spoof’.
Landscapes
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Oral & Maxillofacial Surgery (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Human Computer Interaction (AREA)
- Heart & Thoracic Surgery (AREA)
- Public Health (AREA)
- Biophysics (AREA)
- Multimedia (AREA)
- Biomedical Technology (AREA)
- Veterinary Medicine (AREA)
- Medical Informatics (AREA)
- Molecular Biology (AREA)
- Surgery (AREA)
- Animal Behavior & Ethology (AREA)
- Pathology (AREA)
- Dentistry (AREA)
- Computer Security & Cryptography (AREA)
- Ophthalmology & Optometry (AREA)
- Computer Hardware Design (AREA)
- Software Systems (AREA)
- General Engineering & Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Image Analysis (AREA)
Abstract
L'invention divulgue un procédé permettant d'estimer si un visage présenté pour un utilisateur est un visage réel par analyse de trames d'image acquises du visage présenté. Le procédé comprend une ou plusieurs des étapes consistant : à déterminer et à analyser des mouvements effectués par l'utilisateur pour adapter l'image faciale du visage présenté à un objectif randomisé ; à déterminer et à analyser des changements de mesures de caractéristiques faciales dans les images faciales à la suite d'un changement de l'expression de l'utilisateur ; à déterminer et à analyser un effet d'un changement d'éclairage sur des niveaux de contraste entre une région faciale dans les images acquises et une ou plusieurs régions adjacentes à la région faciale.
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| AU2023900391A AU2023900391A0 (en) | 2023-02-16 | Automated facial detection with anti-spoofing | |
| PCT/AU2024/050111 WO2024168396A1 (fr) | 2023-02-16 | 2024-02-16 | Détection faciale automatisée avec anti-mystification |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| EP4655767A1 true EP4655767A1 (fr) | 2025-12-03 |
Family
ID=92421273
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| EP24755770.5A Pending EP4655767A1 (fr) | 2023-02-16 | 2024-02-16 | Détection faciale automatisée avec anti-mystification |
Country Status (2)
| Country | Link |
|---|---|
| EP (1) | EP4655767A1 (fr) |
| WO (1) | WO2024168396A1 (fr) |
Family Cites Families (6)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US10169646B2 (en) * | 2007-12-31 | 2019-01-01 | Applied Recognition Inc. | Face authentication to mitigate spoofing |
| CA2902093C (fr) * | 2014-08-28 | 2023-03-07 | Kevin Alan Tussy | Procede d'authentification de reconnaissance faciale comprenant des parametres de chemin |
| CN106295287B (zh) * | 2015-06-10 | 2019-04-09 | 阿里巴巴集团控股有限公司 | 活体检测方法和装置以及身份认证方法和装置 |
| JP6866847B2 (ja) * | 2015-09-03 | 2021-04-28 | 日本電気株式会社 | 生体判別装置、生体判別方法及び生体判別プログラム |
| CN107886032B (zh) * | 2016-09-30 | 2021-12-14 | 阿里巴巴集团控股有限公司 | 终端设备、智能手机、基于脸部识别的认证方法和系统 |
| AU2019272041B2 (en) * | 2019-11-11 | 2024-12-12 | Icm Airport Technics Pty Ltd | Device with biometric system |
-
2024
- 2024-02-16 EP EP24755770.5A patent/EP4655767A1/fr active Pending
- 2024-02-16 WO PCT/AU2024/050111 patent/WO2024168396A1/fr not_active Ceased
Also Published As
| Publication number | Publication date |
|---|---|
| WO2024168396A1 (fr) | 2024-08-22 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US10853677B2 (en) | Verification method and system | |
| AU2022203880B2 (en) | Methods and systems for determining user liveness and verifying user identities | |
| US10546183B2 (en) | Liveness detection | |
| EP3241151B1 (fr) | Procédé et appareil de traitement de visage dans une image | |
| EP3332403B1 (fr) | Détection de caractère vivant | |
| EP2680191B1 (fr) | Reconnaissance faciale | |
| US8620066B2 (en) | Three-dimensional object determining apparatus, method, and computer program product | |
| US10592728B2 (en) | Methods and systems for enhancing user liveness detection | |
| Li et al. | Seeing your face is not enough: An inertial sensor-based liveness detection for face authentication | |
| US11115408B2 (en) | Methods and systems for determining user liveness and verifying user identities | |
| US12266215B2 (en) | Face liveness detection using background/foreground motion analysis | |
| JP7197485B2 (ja) | 検出システム、検出装置およびその方法 | |
| US12236717B2 (en) | Spoof detection based on challenge response analysis | |
| EP2842075A1 (fr) | Reconnaissance faciale tridimensionnelle pour dispositifs mobiles | |
| JP7264308B2 (ja) | 二次元顔画像の2つ以上の入力に基づいて三次元顔モデルを適応的に構築するためのシステムおよび方法 | |
| US10360441B2 (en) | Image processing method and apparatus | |
| WO2024168396A1 (fr) | Détection faciale automatisée avec anti-mystification | |
| US20240205239A1 (en) | Methods and systems for fraud detection using relative movement of facial features | |
| CN119094651B (zh) | 基于手机的虚拟摄像头识别方法及设备 |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: THE INTERNATIONAL PUBLICATION HAS BEEN MADE |
|
| PUAI | Public reference made under article 153(3) epc to a published international application that has entered the european phase |
Free format text: ORIGINAL CODE: 0009012 |
|
| STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: REQUEST FOR EXAMINATION WAS MADE |
|
| 17P | Request for examination filed |
Effective date: 20250825 |
|
| AK | Designated contracting states |
Kind code of ref document: A1 Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC ME MK MT NL NO PL PT RO RS SE SI SK SM TR |